Relativistic Programming in Haskell Using Types to Enforce a Critical Section Discipline Ted Cooper Jonathan Walpole Portland State University Portland State University theod@pdx.edu walpole@pdx.edu Abstract absolute, but instead depends on the rate, path, and direction of Relativistic programming (RP) is a concurrent programming model each reader’s traversal through the data structure relative to the in which readers can safely access shared data concurrently with writer’s updates1 . In general, each of these traversals is a read-side writers that are modifying it. RP has been used extensively in critical section, a sequence of related reads from shared data with operating system kernels, most notably in Linux, and is widely an explicitly defined beginning and end. At the beginning of each credited with improving the performance and scalability of Linux read-side critical section, readers can access a shared data structure on highly concurrent architectures. However, using RP primitives only through some reference all threads share, such as the head of a safely is tricky, and existing implementations of RP do not identify list or the root of a tree. most unsafe uses statically. This work introduces Monadic RP, a The core of a relativistic programming implementation is a set GHC Haskell library for RP, and presents an example of its use. of ordering mechanisms, that is, ways for writers to guarantee that To our knowledge, Monadic RP is the first RP implementation in readers agree on the order of writes. One such primitive, unique to Haskell. It provides a novel mechanism to statically rule out a subset RP [22], we will call synchronizeRP2 . A writer can guarantee that of unsafe relativistic programs by using types to separate read- the states two writes create are observed by all readers in order by side and write-side critical sections, and to restrict the operations invoking synchronizeRP between them. This primitive introduces available in each. This work describes the current status of our a delay that will not terminate until all readers that are active at implementation, presents a brief experimental evaluation of it, and the start of the delay have finished their current read-side critical discusses directions for ongoing and future work. sections. The RP literature calls this delay a grace period. Note that synchronizeRP does not prevent readers from starting new queries Categories and Subject Descriptors D.1.3 [Concurrent Program- or traversals during the delay, or wait for these readers to complete ming] these new critical sections. Each of these readers began after the first write was committed, and so cannot observe the value it replaced. Keywords relativistic programming, read-copy update, monads, Now the writer can issue the second write, certain that any reader type safety who observes it had an opportunity to see the first. Another way of saying this is that, by waiting for a grace period to complete, the 1. Introduction writer obtains a guarantee that the first write happens before [16] the second. If readers and writers follow a discipline of rules including Relativistic Programming (RP) [29] is a concurrent programming those we introduce in Section 2.2, then readers can safely traverse model where a writer thread can modify a linked shared data or query a data structure updated by writers at any time, so they structure (such as a linked list or a tree) safely while other threads avoid the overhead of synchronizing (for instance, acquiring a lock) (readers) concurrently traverse or query it. Unlike conventional before reading shared data. lock-based models, in which the writer acquires a lock to exclude Researchers have developed relativistic implementations of com- concurrent readers (and vice versa), or transactional memory models, mon data structures — including linked lists [4], hash tables [30], where concurrent reads and writes to the same memory location and red-black trees [11] — bringing concurrency between readers conflict and cause aborts and retries [8], RP allows both the writer and writers, and extremely low-cost read-side primitives, to pro- and concurrent readers who access the same data to proceed and grams that use these data structures. RP is used widely in the Linux complete successfully. So RP offers greater concurrency between kernel, and was introduced in the form of the Linux Read-Copy writers and readers, but because of this concurrency, readers may Update (RCU) API [23], whose read-side critical sections execute not observe concurrent writes in the same order writers issue them. deterministically (without blocking or retrying) and run almost as Also, readers may not agree on the order of concurrent writes. That fast as unsynchronized sequential code. For highly concurrent, read- is, when there is concurrent reading, the order of writes is not mostly workloads that allow relaxed data structure semantics, such as software routing tables [23] and SELinux policy rules [25], these read-side characteristics improve performance, reliability, and scala- bility. For these reasons, RCU is now used widely in all subsystems of the Linux kernel [24], often to replace reader-writer locking. De- 1 Triplettet al. [29] chose the name “relativistic programming” to emphasize this partial ordering of writes, which is akin to relativity of simultaneity in Copyright c 2015 Ted Cooper and Jonathan Walpole. This work is licensed under the physics, described by Einstein [6]. Creative Commons Attribution 4.0 International license. To view a copy of the license, 2 We color write operations red, and will introduce additional colors for other visit http://creativecommons.org/licenses/by/4.0/legalcode. kinds of operations. Portland State University Technical Report 15-01, March 2016 1 2016/3/28 spite RP’s benefits and deployment footprint however, there is little presents implementations of the write-side critical sections existing language- or tool-based support for it. Instead, program- for the examples from Section 2.1, and mers are responsible for knowing and manually following the RP presents a brief experimental evaluation based on these discipline. The work presented here takes some first steps towards examples that tests whether the causal ordering mechanisms addressing this gap. Monadic RP provides are necessary and sufficient. Specifically, we introduce Monadic RP, which, to our knowledge, is the first implementation of RP in Haskell. This work is inspired • Section 7 summarizes this work and directions for future work. by the GHC Software Transactional Memory (STM) implementation of Harris et al. [8], and by the user-level RCU C library (urcu) of This paper describes future work inline, rather than in a separate Desnoyers et al. [4]. Like GHC STM, Monadic RP uses features section, in an effort to show directly how it connects to the work we unique to the Glasgow Haskell Compiler (GHC) and uses a monadic present. abstraction to constrain shared data and the concurrent computations that operate on it. Unlike GHC STM, Monadic RP does not enforce 2. Background an atomicity property among sequences of reads and writes, nor does it enforce a total ordering of writes to the same data. Monadic RP 2.1 Causal Ordering allows concurrent reading and writing of the same data (something 2.1.1 Grace periods enforce causal ordering that is not possible if strongly atomic transactions are used for all reads and writes), and it provides mechanisms that writers can use Most implementations of RP are in systems that rely on the program- as needed to guarantee that other threads see concurrent writes in mer to include code that explicitly allocates and reclaims memory, order. Hence, Monadic RP presents a programming model based on so the programmer must write that code to use grace periods to causal ordering. There is good reason (and evidence) to believe that guarantee that writers do not reclaim memory that readers may yet these two programming models are complementary. For example, visit. In this case, a writer begins a grace period after replacing one Howard and Walpole [10] modified an STM implementation to or more nodes in a data structure accessible through a shared refer- support relativistic reading while retaining transactional concurrency ence by updating the references in existing nodes to point to new control among writers. copies. At this point, one or more readers may have already loaded The main advantage Haskell brings to RP is that its type system the addresses of old copies, whose fields must remain available for captures the structure of imperative computations in enough detail the readers’ inspection. Once the grace period ends, each reader to enforce on programmers a unique discipline tailored to each type has finished its current read-side critical section, and old copies of of computation. Such a discipline can, for example, reject programs updated nodes and unlinked nodes are no longer reachable from that call functions inside computations where they don’t belong. the shared reference, so they can be safely deallocated. That is, the Monadic RP uses types to mark explicitly and separate read-side writer initiates a grace period and waits until the readers announce and write-side critical sections, and to guarantee that readers do they have each finished a read-side critical section, and so cannot not block or write to shared memory. In ongoing work, we are access unlinked nodes. Through this channel of communication, the extending Monadic RP to statically check reader traversal order writer learns from the readers when it is safe to reclaim memory. and writer update order to guarantee that readers observe causally This memory reclamation guarantee is an instance of a more related updates in order. general property: Unlinking a node and reclaiming its memory are This work presents a new implementation of RP, situates it in the causally ordered operations. That is, reclaiming is a consequence context of existing implementations, and explains and demonstrates of unlinking, so when a writer unlinks the node, readers should some of its properties: only have an opportunity to observe the data structure in one of three ordered consistent states: without the node unlinked, with the • Section 2.1 explains the causal ordering guarantees RP relies node unlinked but not yet reclaimed, and finally without the node on using two examples, and outlines ongoing work to statically entirely. If a reader can observe a state with the node reclaimed but check that relativistic programs obey a discipline that provides not unlinked, then causal ordering is violated. To prevent readers these guarantees. In turn, Section 5 demonstrates that when these from observing this inconsistent state, a writer can wait for a grace examples are constructed using Monadic RP, they execute cor- period to complete after unlinking the node (and before reclaiming rectly because the implementation provides the causal ordering it). When a writer waits for a grace period to complete, it guarantees guarantees they need. that readers will not observe any states introduced by writes after the grace period before they observe any states introduced by writes • Section 2.2 elaborates rules from the discipline for correct RP. before the grace period. That is, the pre-grace-period states happen • Section 3: before [16] the post-grace-period states. Unlinking a node and then reclaiming its memory is a common situates Monadic RP in the context of existing work on RP example of a sequence of causally related writes, but it is by no correctness (detailed in Section 6), means the only example. Relativistic red-black trees [11] and hash motivates and describes the abstractions Monadic RP pro- tables [30] both rely on grace periods for causal ordering beyond vides and how they enforce rules from Section 2.2, memory reclamation. One simple example of a sequence of causally presents an implementation of a read-side critical section for ordered updates is moving a node to a different position in a singly the examples from Section 2.1 that uses these abstractions, linked list by inserting a copy at the new position, then unlinking the node at its old position. If we choose to causally relate these describes a case where another RP implementation allows operations, we buy a guarantee that readers will see the list in one of violations of rules Monadic RP enforces, and three ordered consistent states: with the node at its original position, outlines ongoing work on Monadic RP to enforce more rules. with the new copy at the new position and the original node at the old position, or with the new copy at the new postion and the original • Section 4 describes the implementation of Monadic RP, filling node unlinked. By preventing readers from observing the list with in details about the mechanisms that enforce rules from Section no new copy at the new position but the original node unlinked, we 2.2. gain the guarantee that no reader will observe the node temporarily • Section 5: disappearing during a move operation. In exchange, we lose the Portland State University Technical Report 15-01, March 2016 2 2016/3/28 guarantee that no reader will observe two copies of the same node or have already passed the node we changed, and will see State α1 in a traversal, because we allow a state where the node is visible in ([A,B,C,D,E]). both its new and old positions. Figure 4. State α2 Example 1 Consider a singly linked list [A,B,C,D,E] that reader threads are constantly traversing only from left to right. The readers hold a shared reference to the first node (head), and must start each D traversal there. We (the writer) can use a grace period to enforce W1 causal ordering when we move node D from a later position (after C) to an earlier position (after A). Once the move is complete, the head A B C D E readers will observe the modified list [A,D,B,C,E]. Figure 1 shows the partial orders on list states readers can observe during this move operation. If we do nothing to enforce ordering, the Now we’ve arrived at an impasse. We need to remove the left column applies. If we enforce causal ordering, one of the orders old copy of D by pointing the reference in C to E (to obtain in the middle column applies (in this example, we choose the order State α3), but a concurrent reader may have passed A before we in the solid box instead of the order in the dashed box, to guarantee pointed its reference to the new copy of D, and then immediately that readers don’t miss D during the move). If we enforce atomic been preempted, performed a long-running computation, or been ordering, for instance using STM, the order on the right applies. otherwised delayed in flight. If we update the reference in C to point to E before the leisurely reader visits C, that reader will also miss the old copy of D, and see the list in an inconsistent state ([A,B,C,E]). Figure 1. Free through atomic orderings for Example 1 – arrows But we need to unlink the old D to finish our move operation! are transitions between observable states We need a guarantee that there are no readers in flight who missed the new D. Another way of saying that is that we need a unordered no ordering causal ordering atomic ordering ordered guarantee that there are no readers who have missed the write that creates State α2, but may see the write that creates State α3. Yet another way of saying that is that we need to guarantee that State [A,B,C,D,E] [A,B,C,D,E] [A,B,C,D,E] α2 happens before State α3. So we must wait for a grace period to complete, and then make a second write W2 to unlink the old D. This [A,D,B,C,D,E] [A,B,C,E] [A,D,B,C,D,E] [A,D,B,C,E] satisfies the guarantee, since every reader that missed the new D has [A,D,B,C,E] [A,D,B,C,E] finished its traversal and seen State α1, every reader that saw the new D has finished its traversal and seen State α2, and any reader OR who starts a new traversal must do so at head, and will see the new D. [A,B,C,D,E] [A,B,C,E] Figure 5. State α3 W1 W2 [A,D,B,C,E] head A D B C E The initial list is shown in Figure 2 (State α1). D Figure 2. State α1 Now the move operation is complete. Readers may still have seen head A B C D E the reference in C before we updated it to point to E, so they may still visit the old D, so we can’t reclaim the memory it occupies yet. But conveniently, the GHC runtime garbage collector will automatically First, we create a new copy of D in freshly allocated memory. We reclaim that memory after readers are finished with it. write to memory to do so, but no nodes in the shared list point to it yet, so we know readers have no way of reaching it. Figure 6. State α4 W1 W2 Figure 3. State α1 cont’d head A D B C E D At a high level, we made two causally ordered writes (W1 and W2) going from left to right in the list. Readers are traversing the list left to right, and because our writes also progressed from left head A B C D E to right, we had to wait for a grace period to complete between them to guarantee that no reader could observe W2 (which created State α3) without observing W1 (which created State α2, and is Next, we link the new D into the list by updating the reference in also part of State α3). The current version of Monadic RP does not A to point to the new D. This reference is reachable from head, so statically reject programs that omit a grace period in this situation, we have marked its update as W1 since it is the first of two causally as demonstrated in Section 5.3. This paper describes ongoing work ordered writes that are visible to readers. Now we have put the list in to address this at the end of Section 2.1.2. Although we need a grace a new state (State α2) where the new copy of D is linked in at its new period to guarantee causal ordering when writes are committed in position, and the old copy remains at its original position. Readers the same order as reads, when the order of writes is opposite the concurrently traversing will either see State α2 ([A,D,B,C,D,E]), order of reads we do not; Section 2.1.2 explains why. Portland State University Technical Report 15-01, March 2016 3 2016/3/28 2.1.2 Reader traversal order can also enforce causal ordering Figure 10. State β3 Grace periods enforce causal ordering because a writer waits for W2 W1 readers to complete their critical sections before it makes its next head A C D B E causally ordered write. This slows writers down, reducing concur- rency between readers and writers, and could create a bottleneck in write-heavy workloads. So it makes sense to avoid grace periods B when we have an opportunity to do so. When we can create the states we need using a sequence of writes to nodes in the opposite order of reads, and have guarantees that writes are committed in As before, the garbage collector reclaims the old D at some point program order and dependent loads are not reordered, we have such after no thread holds a reference to it. an opportunity. Example 2 Consider the same singly linked list as before Figure 11. State β4 [A,B,C,D,E], with readers traversing again left to right. This W2 W1 time, we will move B from an earlier position (after A) to a later head A C D B E position (after D). After the move operation is complete, readers will observe the modified list [A,C,D,B,E]. This time, because our writes progressed in the opposite direction from reads, we did not have to wait for a grace period to complete Figure 7. State β1 between writes to guarantee that readers saw State β2 happen before State β3. This guarantee rests on two conditions: head A B C D E • Most RP implementations (including Monadic RP) guarantee that writes are committed in program order, so we know that, if W2 is available to readers, then W1 is as well. • Readers visit the site of W2 before they visit the site of W1. So First, we create a new copy of B to link into the list. if a reader sees W2, that means that W1 is available, so when the reader subsequently visits W1’s site, it is guaranteed to see W1. Figure 8. State β1 cont’d Note that this does not mean that readers will see the nodes we changed in the order we changed them. Readers only observe nodes B in traversal order. In this case, that means that they will observe the nodes we changed in the opposite of the order we changed them. Not exactly intuitively, this is the property that guarantees our most recent write happens before our previous ones. head A B C D E In ongoing work, we are developing a domain-specific embed- ded language (DSEL) [2] that tracks relativistic reader traversal direction and causally ordered write sequence direction using type- This time, W1 updates the reference in D to link the new B level state in parameterised monads [1] implemented using GHC into the list. Readers concurrently traversing will either observe type families [26], as demonstrated by Kiselyov et al. [14]. This W1 and see State β2 ([A,B,C,D,B,E]), or miss W1 and see State β1 DSEL prevents construction of type-correct programs that make ([A,B,C,D,E]). The key difference from the previous example is sequences of causally ordered writes in traversal order without in- that the first write is at the later position in reader traversal order, voking synchronizeRP between each pair. However, it does allow instead of at the earlier position. programs that make sequences of causally ordered writes against traversal order that do not invoke synchronizeRP between each pair. Figure 9. State β2 2.2 RP Discipline Researchers have identified a set of rules for correct relativistic B programming [21, 29], which we have adapted to match the terms W1 and abstractions this work describes: head A B C D E 1. A writer must only make changes that leave shared data in a consistent state, and guarantee that readers always observe the consistent states that causally related changes create in order. 2. A thread can read from shared data only: Last time, we had to wait for a grace period to complete after W1, because it was possible for a reader to see W2 but miss W1 in a (a) inside a write-side critical section, or inside a read-side traversal, since readers visit the site of W2 after visiting the site of W1, critical section that announces its end to the world, so grace and both writes could occur between those visits. This time, W2 is at periods track all readers; and an earlier position than W1, so if a reader sees W2, it will definitely see (b) using RP read primitives, to prevent memory access reorder- W1 (and observe State β3). This is because we committed W1 before ings that may lead to inconsistent observations. W2, the reader has seen W2, and has yet to visit the site of W1, and so could not have missed W1 already. If the reader misses W2, it may 3. A thread can write to shared data only: see W1 (and observe State β2), or it may miss W1 (and observe State (a) in a context where writer concurrency can be controlled if β1). necessary, such as a write-side critical section; and Portland State University Technical Report 15-01, March 2016 4 2016/3/28 (b) using RP write primitives, to prevent memory access reorder- The RP module exports four monads, each corresponding to ings that may lead to inconsistent state publications. a stage in the life cycle of an RP computation. Computations in 4. A thread must not wait for a grace period to complete inside a RPR are read-side critical sections, and computations in RPW are read-side critical section. Such a thread would deadlock (waiting write-side critical sections. The constructor names for the monads for its own critical section to end) and cause any threads that are not exported and labeled “Unsafe”, since they can be used to run subsequently wait for a grace period to complete to also block arbitrary effectful IO computations. forever [19]. The readSRef and writeSRef functions are the only way to access SRefs. Because RPR and RPW are in the RPRead typeclass, 5. A thread must not use references obtained during a critical readSRef can be called in each. Because writeSRef has a return section outside that critical section. type in the RPW monad (RPW ()), it can be called only in RPW. Critical sections must run in a thread. That thread may run This is not an exhaustive list of rules in the discipline for correct multiple critical sections in some order and perform computations RP. This work provides and proposes static checks for Rules 1–4, outside of them using the values they return; the programmer defines but only proposes an incomplete solution for 5. these thread computations in the RPE5 monad, read-side sections are wrapped in readRP and write-side sections in writeRP. Within a 3. Contribution write-side critical section, a writer may need to guarantee that some Most existing work on RP correctness, discussed in Section 6.1, writes are causally ordered with respect to others. The writer invokes focuses on proving that an RP implementation is correct, is a synchronizeRP to wait for readers, which is available only in the sound foundation. Checking that relativistic programs obey the RPW monad. RP discipline is another matter entirely. While researchers have Finally, a relativistic program must initialize the references its proposed candidate rules for static checking (see Section 6.3) and threads share with newSRef6 , spawn those threads with forkRP, implemented runtime checks (see Section 6.2), this problem remains and wait for them to complete and return values using joinRP. The unsolved. programmer defines these computations in the RP monad. There is room in this design space for an implementation that These abstractions are sufficient to implement simple RP algo- rithms with writers that proceed one at a time while readers proceed • can be used to write general-purpose relativistic programs, and concurrently. Other implementations allow concurrency between • automatically statically checks that these programs follow the writers to disjoint data [10]. This code skeleton illustrates the struc- RP discipline. ture described above. With these goals in mind, we introduce Monadic RP, a Haskell implementation of relativistic programming that uses types to mark do state ← buildInitialState and separate read-side and write-side critical sections and to restrict reader ← forkRP $ do the programmer to the minimal set of effectful operations required x ← readRP $ readSideCriticalSection state in each. Programmers can write new relativistic programs in Haskell return $ pureComputation x writer ← forkRP $ do using the abstractions Monadic RP provides, which ensure that these writeRP $ writeSideCriticalSection state programs meet the safety guarantees in Section 3.3. x ← joinRP reader joinRP writer 3.1 Explicit Critical Sections Many existing RP implementations [4, 7, 27] track grace periods using explicitly marked read-side critical sections, and control writer The tests in Section 5 are implemented as variants on this concurrency using explicitly marked write-side critical sections. structure that spawn multiple reader threads. Monadic RP adopts this approach, and uses these demarcations to We have not yet attempted to determine whether the various solve three problems: monads introduced in this work satisfy the standard monad laws. We leave this as a topic for future work, observing that a formal proof 1. If read-side critical sections write to shared data (a programmer would likely require a semantics that can account for concurrency error), they cannot safely proceed concurrently with writers or and the inherent nondeterminism of relativistic reads. one-another. Rule 3a prohibits this. Also, we have not yet attempted to add support for a mechanism 2. If a reader waits for a grace period to complete inside a read-side relativistic threads could use to communicate with threads comput- critical section, it will deadlock. Rule 4 prohibits this. ing in IO, at least outside critical sections. This or some other limited IO support is necessary to realize many use cases well-suited to RP, 3. If writers update the same data concurrently, in many cases the such as implementing a DNS server [33]. data may end up in an inconsistent state. Monadic RP controls Appendix A.1 provides the RP source code in full. writer concurrency by preventing it entirely, which certainly satisfies Rule 3a, but alternative methods (such as STM) would 3.3 Rules enforced also satisfy this rule, and could increase writer concurrency. Monadic RP exports types and functions that provide the following 3.2 Expressing RP computations guarantees. Monadic RP represents read-side and write-side critical sections • The only way to create or access SRefs is through the RP with distinct monad types (RPR3 and RPW4 ), and represents shared ref- module’s interface. Also, SRefs cannot be accessed outside the erences with type SRef, which can be read only using readSRef and relativistic computation that creates them. This is necessary to writeSRef. It also provides the synchronizeRP function, which enforce Rule 1, since threads that aren’t part of a relativistic writers can use to wait for a grace period to complete. These types computation could corrupt shared data or observe it in an and functions are defined in the RP module. inconsistent state otherwise. 3 We color operations in the RPR monad blue. 5 We color operations in the RPE monad plum. 4 We color operations in the RPW monad red. 6 We color operations in the RP monad burnt orange. Portland State University Technical Report 15-01, March 2016 5 2016/3/28 • readSRef can be run only in RPR or RPW, and is the only way 3.4 Comparisons to other implementations to read from shared data. These properties enforce Rule 2. Monadic RP statically prevents rule violations that other implemen- • writeSRef can be run only in RPW, and is the only way to write tations do not. For instance, the urcu user-level RCU library in C to shared data. These properties enforce Rule 3. of Desnoyers et al. [4] allows any function to be called in any block • synchronizeRP can be run only in RPW. This enforces rule 4. of code. We illustrate this fact in a urcu program. The urcu library includes a test suite with a test for the verion of • Arbitrary effectful computations in IO cannot be run in any of urcu most similar to Monadic RP, called test qsbr.c. This test the monads that the RP module exports. This is necessary to spawns a configurable number of readers and writers who share a enforce any of the rules, since a computation in IO can break single reference. Each writer repeatedly allocates a new block of them all. memory, stores a sentinel value in it, swaps the address of the new For instance, the snapshot function in the RPR monad imple- block with the shared reference, waits for a grace period to complete ments a read-side critical section that traverses a relativistic linked so no reader still holds a reference to the old block, and then stores a list, returning a snapshot of the current list state. We used this func- second sentinel value in the old block. Each reader polls the shared tion to implement the tests in Section 5. reference, failing if it ever observes a block with the second sentinel. The test runs for a configurable length of time. If any reader fails, data RPList s a = Nil it is evidence that the grace period mechanism or memory access | Cons a (SRef s (RPList s a)) primitives do not preserve causal ordering and consistent states. We snapshot :: RPList s a → RPR s [a] have modified this test program to show that urcu does not statically snapshot Nil = return [] enforce RP rules Monadic RP does: snapshot (Cons x rn) = do • The modified program reads from a shared reference outside l ← readSRef rn rest ← snapshot l a critical section using rcu dereference(), the analogue of return (x : rest) readSRef. This violates Rule 2a, and Monadic RP prevents this error because readSRef can run only in RPWor RPR, both of The snapshot function type-checks, since it only calls effectful which represent critical sections. functions in RPR. • The modified program writes to a shared reference inside a If we add a writeSRef call to this read-side critical section, we cause a type error, since writeSRef is only in RPW, but snapshot is read-side critical section using rcu assign pointer(), the in RPR. analogue of writeSRef. This violates Rule 3a, and Monadic RP prevents this error because writeSRef cannot run in RPR. snapshot :: RPList s a → RPR s [a] snapshot Nil = return [] • The modified program waits for a grace period to complete snapshot (Cons x rn) = do inside a read-side critical section using synchronize rcu(), l ← readSRef rn the analogue of synchronizeRP. This violates Rule 4, and writeSRef rn (Cons undefined rn) Monadic RP prevents this error because synchronizeRP cannot rest ← snapshot l run in RPR. return (x : rest) This modified program compiles without warnings or errors, and Error: fails at runtime. C does not include language-level mechanisms to Couldn’t match expected type RPR s a0 restrict the statements that appear in blocks of code beyond ensuring with actual type RPW s () that they are syntactically well-formed and checking that parameter and return types match argument and result types, respectively, so The same is true for synchronizeRP, and all other effectful this result is unsurprising, and does not represent an oversight on functions not in RPR. the part of Desnoyers et al.. Appendix A.3 contains the text of this The prependA function inserts an ‘A’ at the beginning of a modified version of test qsbr.c. relativistic string. Howard and Walpole [10] used STM to manage concurrent prependA :: SRef (RPList Char) → RPW () relativistic writers, so we are investigating using GHC STM [8] prependA head = do head’ ← copySRef head for this purpose. Although GHC STM is strongly atomic, and so in writeSRef head (Cons ’A’ head’) normal use prevents concurrent reads and writes to the same data, GHC provides the readTVarIO# primitive, which allows threads to Because prependA has a return type in the RPW monad, it can read transactional variables outside of transactions. only be run inside a write-side critical section. As Figure 12 shows, the operations available in monads besides 3.5 Tracking traversal and update direction RPW are severely restricted. The reader (a person, not a thread) may realize that the abstractions Monadic RP provides for reading and updating shared data do Figure 12. A summary of the operations allowed in each monad. not capture any information about traversal or update direction. Because of this, there is no way to track whether synchronizeRP or RP RPE RPR RPW updates against traversal order are correctly used to guarantee causal newSRef • • ordering, as illustrated in Section 2.1, and enforce Rule 1. In ongoing readSRef • • work to address this shortcoming, we are developing a cursor-based writeSRef • interface for critical sections, where the cursor starts at a shared copySRef • reference at the beginning of each critical section, and threads must synchronizeRP • explicitly move the cursor to a node to access it. Inside write-side forkRP • critical sections, this interface forces the programmer to explicitly joinRP • mark causally ordered sequences of writes. This interface, combined readRP • with the type-level representation of traversal and update direction writeRP • described at the end of Section 2.1.2, will statically enforce more of Portland State University Technical Report 15-01, March 2016 6 2016/3/28 the rules from the relativistic programming discipline on programs critical section. Additionally, a thread can opt out of participation in built using Monadic RP. Specifically, it: the clock by setting its local counter to zero, allowing grace periods • rules out causally ordered updates in traversal order that aren’t to complete without any message from the thread. Desnoyers et al. [4] say a thread which has opted out is in an extended quiescent separated by synchronizeRP, state, or offline. A thread always opts out at the beginning of a write- • allows causally ordered updates against traversal order that aren’t side critical section. This is because it is now the writer, and would separated by synchronizeRP, not be able to make progress while deadlocked waiting for itself • prevents synchronizeRP from being invoked outside sequences to complete a read-side critical section. At the end of a write-side of causally related writes, and critical section, the thread (having laid down its writer’s mantle) returns its local counter to its previous state. • rules out other programmer errors, such as invoking synchronizeRP twice with no intervening writes or invoking synchronizeRP be- fore making the first in a sequence of causally ordered writes. We believe that this approach is applicable to relativistic trees 4.3 Primitives because we can generalize the notion of traversal direction to The RP module depends on two memory barrier primitives built distinguish between expanding the frontier of explored nodes in into the GHC runtime and exposed to Haskell programs by the keeping with a partial order where parents precede children, and atomic-primops package. Both act as full compiler-level memory backtracking to nodes already within this frontier. This distinction barriers, preventing both GHC and the C compiler that generates fits naturally with the zipper abstraction. the GHC runtime code from reordering reads or writes across either barrier, but these barriers have different semantics at the hardware 4. Implementation level. One barrier primitive is called writeBarrier, and prevents The RP module exports functions built on top of IO, but does not hardware reordering of writes. On x86, AMD64, ARM (pre v7), and allow arbitrary IO computations inside relativistic computations. SPARC, writes from a particular core are not reordered, so GHC The types we define in the RP module, and the choice of which names [17] correctly implements writeBarrier using no instruction at to export from it, enforce these controlled abstractions. Additionally, all on these architectures. On PowerPC, and ARMv7 and later, GHC the primitives we use to implement Monadic RP meet the memory implements writeBarrier using the appropriate instructions along ordering requirements of Rules 2b and 3b. with a C compiler reordering barrier. The writeSRef function calls writeBarrier before it writes 4.1 Safe effectful abstractions in Haskell to memory. This guarantees that writes are committed in program The IO monad provides all general effectful operations, including order, which in turn means that writes against traversal order file I/O, direct manipulation of memory, and thread management. guarantee causal ordering, as illustrated in Section 2.1.2. The Because relativistic programs directly mutate shared memory and synchronizeRP function uses writeBarrier to ensure that a spawn threads, Monadic RP must be implemented using functions writer commits the write that increments the global counter before in IO. However, if all computations in IO are allowed inside RPR or the writer begins waiting for readers to observe this write. RPW, programmers could violate the RP discipline. For instance, The other barrier primitive is called storeLoadBarrier, and readers could write to arbitrary memory and wait for grace periods prevents hardware reordering of reads and writes. GHC implements to complete. it using appropriate instructions on all supported architectures. On Terei et al. [28] introduce Safe Haskell, and demonstrate a x86 and AMD64, GHC uses a lock-prefixed add instruction that flexible technique for defining a restricted abstraction RIO on top adds zero to the word at the top of the thread’s stack (a no-op). On of IO where clients of RIO can access IO operations only through a Intel architectures, the lock prefix prevents hardware reordering of fixed set of predefined functions exported by RIO. The RP module is reads and writes across the instruction it modifies [12]. Although implemented using this abstraction-building technique. legacy Intel architectures lock the entire bus when executing any lock-prefixed instruction, modern Intel architectures do not when the 4.2 Algorithm instruction accesses a memory location cached in exclusive mode Monadic RP implements grace periods using quiescent states [23], on the core executing the instruction (such as the top of the thread’s in an approach modeled after the Quiescent-State-Based Reclama- stack), so on modern Intel architectures, this primitive does not incur tion (QSBR) RCU implementation of Desnoyers et al. [4]. Their the cost of bus locking. work outlines the implementation in detail, so we present a sum- The synchronizeRP function uses storeLoadBarrier to in- mary here. sulate the code that waits for a grace period to complete from write- In this formulation of Quiescent-State-Based RP, grace periods side critical sections. The readRP function (which encapsulates a are announced and completed using a set of counters. There is a read-side critical section) uses storeLoadBarrier to insulate a global grace period counter, and every thread has its own local reader thread’s updates to its local counter from the memory ac- counter. We can think of the local counters as a variant of a logical cesses inside the current or subsequent critical sections. clock [16] where there are only global events, represented in shared memory. The only event that increments any counter is when the thread currently inside a write-side critical section (the writer) increments the global counter before waiting for a grace period 4.4 Memory Reclamation to complete. When the writer increments the global counter, it sends Kung and Lehman [15] demonstrate that RP implementations can a message to readers that a grace period is begun. In response, each rely on a general-purpose garbage collector to guarantee that node reader sends a message to the writer that it has finished a critical unlinking and node memory reclamation are causally ordered. GHC section (entering a quiescent state) by updating its local counter to has a tracing garbage collector [18] that will not reclaim memory match the global counter. Once the writer has received this message some thread could still reach, so Monadic RP does not need to use from each active reader by blocking on the reader’s local counter any additional causal ordering mechanism to prevent premature until the reader updates it, the writer moves on with its write-side reclamation. Portland State University Technical Report 15-01, March 2016 7 2016/3/28 5. Results writeSRef rd $ Cons b rd’ -- [A,B,C,D,B,E] -- don’t need to synchronizeRP, We have implemented the examples from Section 2.1 using Monadic -- write order against traversal order RP, and used these implementations to test our causal ordering -- unlink the old B mechanisms. We tested three hypotheses: writeSRef ra bn -- [A,C,D,B,E] 1. our implementation of synchronizeRP ensures causal ordering between writes in traversal order, 5.3 Example 1, without synchronizeRP 2. our implementations of writeSRef and readSRef together ensure causal ordering when writes are against traversal order, The third test, moveDbackNoSync, is the same as the first, except that it omits synchronizeRP. 3. in a program that does not use either causal ordering mechanism, readers may observe shared data in an inconsistent state. 5.4 Test Results As in the examples, all tests start with the shared linked list [A,B,C,D,E]. To test hypothesis 1, we implemented Example 1 Figure 13. Aggregate results from all test runs, including counts of from Section 2.1.1. To test hypothesis 2, we implemented Example inconsistent states readers observed. 2 from Section 2.1.2. To test hypothesis 3, we implemented a variant of Example 1 that does not invoke synchronizeRP between W1 and Example 1 with Example 1 w/o Example 2 W2. synchronizeRP synchronizeRP Each test spawns 29 readers and a writer. Each reader traverses time (hours) 13.93 13.97 13.95 snapshots 116,000,000,000 116,000,000,000 116,000,000,000 the list 400,000 times in a tight loop, then returns aggregate counts consistent 116,000,000,000 115,999,999,547 116,000,000,000 of the consistent and inconsistent states it observed during these inconsistent 0 453 0 traversals. The writer simply moves a list element and returns. We ran these tests on a 30-way QEMU guest on top of 40-way host Figure 13 shows that readers observed no inconsistent states dur- with IntelrXeonr E5-2670 v2 @ 2.50GHz CPUs. We ran each ing test runs for Example 1 with synchronizeRP or Example test 10,000 times. 2, giving us confidence that both synchronizeRP and writes 5.1 Example 1 against traversal order ensure causal ordering in this implemen- tation. When we omitted synchronizeRP from Example 1, readers The first test, moveDback, implements Example 1 from Section occasionally observed inconsistent states, confirming that we need 2.1.1, moving D from a later position to an earlier position. It invokes synchronizeRP to guarantee causal ordering in Example 1. To cre- synchronizeRP in between the writeSRef call that links the new ate a meaningful corresponding test for Example 2 without causal copy in at the earlier position and the writeSRef call that unlinks ordering, we would need to omit the write barrier in writeSRef and the old copy at the later position, since these writes are in reader run tests on a highly concurrent architecture that reorders stores from traversal order. a given core. moveDback :: SRef s (RPList s a) → RPW s () The first and third tests demonstrate that in Monadic RP, moveDback head = do synchronizeRP and writes against traversal order provide the (Cons a ra) ← readSRef head -- [A,B,C,D,E] causal ordering guarantees we describe in Section 2.1. That is, they -- duplicate reference to B are sufficient to enforce Rule 1 if used correctly. The second test ra’ ← copySRef ra demonstrates that without using one of these mechanisms, we can- (Cons b rb) ← readSRef ra not guarantee causal ordering in the presence of concurrent reads (Cons c rc) ← readSRef rb and writes. That is, they are necessary to enforce Rule 1. (Cons d rd) ← readSRef rc ne ← readSRef rd -- link in a new D after A 6. Related work writeSRef ra (Cons d ra’) -- [A,D,B,C,D,E] 6.1 Verified implementations -- wait for readers synchronizeRP Hart et al. [9] and Desnoyers et al. [4] have written in C and -- unlink the old D benchmarked RP implementations with a variety of grace-period- writeSRef rc ne -- [A,D,B,C,E] based causal ordering mechanisms. Two of these implementations and example algorithms were modeled on a virtual architecture with a weakly-ordered memory model in Promela. Assertions about the 5.2 Example 2 model, expressed in Linear Temporal Logic, were verified using The second test, moveBforward, implements Example 2 in Section the Spin model checker [5]. The Promela model was constructed 2.1.2. It moves B from an earlier position to a later position by manually and the verification took hours to run; in exchange, inserting a new copy at the later position, then unlinking the old copy Desnoyers et al. were able to detect errors in the implementation at the original position. Because these writes are against traversal at the level of missing memory barriers and algorithm-level race order, the writer does not need to invoke synchronizeRP to enforce conditions. This verification gives users high confidence that these causal ordering. implementations are correct, but says nothing about new programs built on top of them. moveBforward :: SRef s (RPList s a) → RPW s () Gotsman et al. [7] implemented RP in a C-like formal language moveBforward head = do and verified this implementation in a sequentially consistent mem- (Cons a ra) ← readSRef head -- [A,B,C,D,E] ory model using an extension of the logic RGSep [32]. This imple- bn@(Cons b rb) ← readSRef ra (Cons c rc) ← readSRef rb mentation is not intended to be used to build new general-purpose (Cons d rd) ← readSRef rc programs. -- duplicate the reference to E Tassarotti et al. [27] implemented RP in Coq, and verified this rd’ ← copySRef rd implementation in a release-acquire memory model [13] using the -- link in a new B after D logic GPS (a logic with ghost state, separation, and per-location Portland State University Technical Report 15-01, March 2016 8 2016/3/28 protocols introduced by Turon et al. [31]). In this case, the imple- References mentation and the proof are in one system, but each function is [1] R. Atkey. Parameterised notions of computation. Journal of Functional an annotated Hoare triple with GPS predicates and permissions. A Programming, 19(3-4):335–376, 2009. programmer can build new general-purpose programs on top of this [2] E. Brady and K. Hammond. Correct-by-construction concurrency: implementation, but she would need skill in GPS and interactive the- Using dependent types to verify implementations of effectful resource orem proving to prove that these programs are correct as Tassarotti usage protocols. Fundamenta Informaticae, 102(2):145–176, 2010. et al. have for the underlying implementation. [3] J. Corbet. The kernel lock validator. Available: http://lwn.net/ Articles/185666/ [Viewed May 20, 2015], May 2006. 6.2 Lockdep [4] M. Desnoyers, P. McKenney, A. Stern, M. Dagenais, and J. Walpole. McKenney [20] describes Lockdep-RCU, an extension of the Linux User-level implementations of read-copy update. IEEE Transactions kernel lock validator [3] that does flexible runtime assertion check- on Parallel Distributed Systems (USA), 23(2):375–82, 2012. ing in and around Linux RCU API calls. For instance, it is config- [5] M. Desnoyers, P. E. McKenney, and M. R. Dagenais. Multi-core ured to emit a warning when rcu dereference(), the analogue systems modeling for formal verification of parallel algorithms. ACM of readSRef, is used outside a critical section. A Linux developer SIGOPS Operating Systems Review, 47(2):51–65, 2013. URL http: can build the kernel with Lockdep enabled to turn on this runtime //dl.acm.org/citation.cfm?id=2506174. assertion checking, and then run tests to exercise the code paths with [6] A. Einstein. Relativity: The special and general theory. Penguin, 1920. assertions. Lockdep does not provide any guarantee that assertions [7] A. Gotsman, N. Rinetzky, and H. Yang. Verifying concurrent memory will be checked unless they are encountered at runtime, so it gives reclamation algorithms with grace. In Proceedings of the 22nd less assurance than a static analysis. European conference on Programming Languages and Systems, pages 249–269. Springer-Verlag, Springer, 2013. 6.3 Proposed Coccinelle tests [8] T. Harris, S. Marlow, S. Peyton-Jones, and M. Herlihy. Composable memory transactions. In Proceedings of the tenth ACM SIGPLAN McKenney [21] proposes using Coccinelle scripts to automatically Symposium on Principles and Practice of Parallel Programming, pages detect a series of “RCU abuses” in Linux, which include: 48–60. ACM, 2005. • waiting for readers in a read-side critical section (a violation of [9] T. E. Hart, P. E. McKenney, and A. D. Brown. Making lockless Rule 4); synchronization fast: Performance implications of memory reclamation. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. • accessing an RCU-protected shared data structure through a 20th International, pages 10–pp. IEEE, 2006. private reference obtained during a read-side critical section, [10] P. W. Howard and J. Walpole. A relativistic enhancement to software rather than a shared reference (a violation of Rule 5). transactional memory. In Proceedings of the third USENIX conference on Hot Topics in Parallelism (Berkeley, CA, USA, 2011), HotPar, volume 11, pages 1–6, 2011. 7. Conclusion [11] P. W. Howard and J. Walpole. Relativistic red-black trees. Concurrency Monadic RP is a new GHC Haskell library for relativistic program- and Computation: Practice and Experience, 26(16):2684–2712, 2014. ming that uses types and abstractions to guarantee basic safety [12] Intel Corporation. Intel R 64 and IA-32 Architectures Software properties. We have used it to implement working relativistic pro- Developer’s Manual. January 2015. URL http : / / www . intel . grams that demonstrate that it enforces causal ordering in two ways: com/content/dam/www/public/us/en/documents/manuals/ using grace periods, and using writes against reader traversal order. 64-ia-32-architectures-software-developer-vol-3a-part-1-manual. Existing work on RP focuses on developing performant implementa- pdf. tions that do not model or enforce safety properties, runtime checks, [13] ISO. Jtc1/sc22/wg14. iso/iec 9899: 2011. Information technology – or on formal models for that verify correctness of the implementa- Programming languages – C, 2011. URL http://www.iso.org/ tion rather than the correctness of programs that use it. Monadic RP iso/iso\_catalogue/catalogue\_tc/catalogue\_detail. htm. stakes out a novel middle ground: it provides an implementation suit- able for general-purpose RP that statically enforces rules from the [14] O. Kiselyov, S. P. Jones, and C.-c. Shan. Fun with type functions. In RP programming discipline. We accomplish this by exporting types Reflections on the Work of CAR Hoare, pages 301–331. Springer, 2010. and functions from the RP module that statically enforce a separation [15] H. T. Kung and P. L. Lehman. Concurrent manipulation of binary between read-side and write-side critical sections and restrict the search trees. ACM Transactions in Database Systems, 5(3):354–382, operations available in each. Minimizing the set of operations avail- Sept. 1980. able in each kind of critical section keeps writes and other effects [16] L. Lamport. Time, clocks, and the ordering of events in a distributed out of read-side critical sections and keeps effects besides reads, system. Communications of the ACM, 21(7):558–65, July 1978. ISSN writes, and waiting for grace periods out of relativistic programs, 0001-0782. . making them safer. In ongoing work, we are extending Monadic RP [17] S. Marlow. Smp.h (ghc source), December 2014. URL https : to expose a cursor-based abstraction for critical sections that uses //github.com/ghc/ghc/blob/master/includes/stg/SMP.h. parameterised monads to track reader traversal direction and writer [18] S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel causal update direction, and statically reject relativistic programs generational-copying garbage collection with a block-structured heap. that fail to follow rules that guarantee causal ordering. In Proceedings of the 7th International Symposium on Memory Man- agement, pages 11–20. ACM, 2008. [19] P. McKenney. Read-copy update (rcu) validation and verification for Acknowledgments linux (slides), November 2014. URL http : / / www . rdrop . com / We would like to thank Dr. Mark Jones, Dr. Andrew Black, Dr. ~paulmck/RCU/RCUVal.2014.11.11a.Galois.pdf. David Maier, Dr. James Hook, Larry Diehl, and Kendall Stewart of [20] P. E. McKenney. Lockdep-RCU. Available: https://lwn.net/ Portland State University, for teaching the classes and engaging in Articles/371986/ [Viewed May 20, 2015], February 2010. the long series of conversations that led to this work. This work is [21] P. E. McKenney. Opwintro-rcu. Available: http://kernelnewbies. funded by NSF Award No. CNS-1422979. org/OPWIntro-RCU [Viewed May 20, 2015], October 2014. Portland State University Technical Report 15-01, March 2016 9 2016/3/28 [22] P. E. McKenney and J. D. Slingwine. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems, pages 509–518, Las Vegas, NV, October 1998. [23] P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell, D. Sarma, and M. Soni. Read-copy update. In Ottawa Linux Sympo- sium, July 2001. URL http://www.rdrop.com/users/paulmck/ RCU/rclock\_OLS.2001.05.01c.pdf. [24] P. E. McKenney, S. Boyd-Wickizer, and J. Walpole. RCU usage in the linux kernel: One decade later. Technical report paulmck.2013.02.24, September 2013. URL http : / / www2 . rdrop . com / ~paulmck / techreports/RCUUsage.2013.02.24a.pdf. [25] J. Morris. Recent developments in SELinux kernel performance. Available: http : / / www . livejournal . com / users / james \ _morris/2153.html [Viewed May 20, 2015], December 2004. [26] T. Schrijvers, S. Peyton Jones, M. Chakravarty, and M. Sulzmann. Type checking with open type functions. ACM Sigplan Notices, 43(9):51–62, 2008. [27] J. Tassarotti, D. Dreyer, and V. Vafeiadis. Verifying read-copy-update in a logic for weak memory. In ACM Conference on Programming Language Design and Implementation (PLDI ’15). ACM, 2015. To appear. [28] D. Terei, S. Marlow, S. Peyton Jones, and D. Mazières. Safe haskell. In Proceedings of the 2012 Haskell Symposium, Haskell ’12, pages 137–148, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1574-6. [29] J. Triplett, P. W. Howard, P. E. McKenney, and J. Walpole. Scalable correct memory ordering via relativistic programming. 2011. URL http://pdxscholar.library.pdx.edu/compsci\_fac/12/. [30] J. Triplett, P. E. McKenney, and J. Walpole. Resizable, scalable, concurrent hash tables via relativistic programming. In USENIX Annual Technical Conference, page 11, 2011. [31] A. Turon, V. Vafeiadis, and D. Dreyer. GPS: Navigating weak memory with ghosts, protocols, and separation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, pages 691–707. ACM, 2014. [32] V. Vafeiadis and M. Parkinson. A marriage of rely/guarantee and separation logic. In CONCUR 2007–Concurrency Theory, pages 256– 271. Springer, 2007. [33] Y. A. Wagh, P. A. Dhangar, T. C. Powar, A. Magar, and P. Mali. Lightweight dns for multipurpose and multifunctional devices. IJCSNS, 13(12):71, 2013. A. Appendix A.1 RP module Available at https://github.com/anthezium/MonadicRP/ blob/master/src/RP.hs. A.2 RPManyListMoveTest Available at https://github.com/anthezium/MonadicRP/ blob/master/src/RPManyListMoveTest.hs. A.3 test qsbr.c Available at https://github.com/anthezium/MonadicRP/ blob/master/comparison/test_qsbr.c. Portland State University Technical Report 15-01, March 2016 10 2016/3/28
Authors Jonathan Walpole Ted Cooper
License CC-BY-4.0