DOKK Library

Relativistic Programming in Haskell

Authors Jonathan Walpole Ted Cooper

License CC-BY-4.0

                                       Relativistic Programming in Haskell
                                     Using Types to Enforce a Critical Section Discipline

                                     Ted Cooper                                                                    Jonathan Walpole
                               Portland State University                                                         Portland State University

Abstract                                                                                     absolute, but instead depends on the rate, path, and direction of
Relativistic programming (RP) is a concurrent programming model                              each reader’s traversal through the data structure relative to the
in which readers can safely access shared data concurrently with                             writer’s updates1 . In general, each of these traversals is a read-side
writers that are modifying it. RP has been used extensively in                               critical section, a sequence of related reads from shared data with
operating system kernels, most notably in Linux, and is widely                               an explicitly defined beginning and end. At the beginning of each
credited with improving the performance and scalability of Linux                             read-side critical section, readers can access a shared data structure
on highly concurrent architectures. However, using RP primitives                             only through some reference all threads share, such as the head of a
safely is tricky, and existing implementations of RP do not identify                         list or the root of a tree.
most unsafe uses statically. This work introduces Monadic RP, a                                  The core of a relativistic programming implementation is a set
GHC Haskell library for RP, and presents an example of its use.                              of ordering mechanisms, that is, ways for writers to guarantee that
To our knowledge, Monadic RP is the first RP implementation in                               readers agree on the order of writes. One such primitive, unique to
Haskell. It provides a novel mechanism to statically rule out a subset                       RP [22], we will call synchronizeRP2 . A writer can guarantee that
of unsafe relativistic programs by using types to separate read-                             the states two writes create are observed by all readers in order by
side and write-side critical sections, and to restrict the operations                        invoking synchronizeRP between them. This primitive introduces
available in each. This work describes the current status of our                             a delay that will not terminate until all readers that are active at
implementation, presents a brief experimental evaluation of it, and                          the start of the delay have finished their current read-side critical
discusses directions for ongoing and future work.                                            sections. The RP literature calls this delay a grace period. Note that
                                                                                             synchronizeRP does not prevent readers from starting new queries
Categories and Subject Descriptors D.1.3 [Concurrent Program-                                or traversals during the delay, or wait for these readers to complete
ming]                                                                                        these new critical sections. Each of these readers began after the first
                                                                                             write was committed, and so cannot observe the value it replaced.
Keywords relativistic programming, read-copy update, monads,                                 Now the writer can issue the second write, certain that any reader
type safety                                                                                  who observes it had an opportunity to see the first. Another way of
                                                                                             saying this is that, by waiting for a grace period to complete, the
1.    Introduction                                                                           writer obtains a guarantee that the first write happens before [16] the
                                                                                             second. If readers and writers follow a discipline of rules including
Relativistic Programming (RP) [29] is a concurrent programming
                                                                                             those we introduce in Section 2.2, then readers can safely traverse
model where a writer thread can modify a linked shared data
                                                                                             or query a data structure updated by writers at any time, so they
structure (such as a linked list or a tree) safely while other threads
                                                                                             avoid the overhead of synchronizing (for instance, acquiring a lock)
(readers) concurrently traverse or query it. Unlike conventional
                                                                                             before reading shared data.
lock-based models, in which the writer acquires a lock to exclude
                                                                                                 Researchers have developed relativistic implementations of com-
concurrent readers (and vice versa), or transactional memory models,
                                                                                             mon data structures — including linked lists [4], hash tables [30],
where concurrent reads and writes to the same memory location
                                                                                             and red-black trees [11] — bringing concurrency between readers
conflict and cause aborts and retries [8], RP allows both the writer
                                                                                             and writers, and extremely low-cost read-side primitives, to pro-
and concurrent readers who access the same data to proceed and
                                                                                             grams that use these data structures. RP is used widely in the Linux
complete successfully. So RP offers greater concurrency between
                                                                                             kernel, and was introduced in the form of the Linux Read-Copy
writers and readers, but because of this concurrency, readers may
                                                                                             Update (RCU) API [23], whose read-side critical sections execute
not observe concurrent writes in the same order writers issue them.
                                                                                             deterministically (without blocking or retrying) and run almost as
Also, readers may not agree on the order of concurrent writes. That
                                                                                             fast as unsynchronized sequential code. For highly concurrent, read-
is, when there is concurrent reading, the order of writes is not
                                                                                             mostly workloads that allow relaxed data structure semantics, such
                                                                                             as software routing tables [23] and SELinux policy rules [25], these
                                                                                             read-side characteristics improve performance, reliability, and scala-
                                                                                             bility. For these reasons, RCU is now used widely in all subsystems
                                                                                             of the Linux kernel [24], often to replace reader-writer locking. De-

                                                                                             1 Triplettet al. [29] chose the name “relativistic programming” to emphasize
                                                                                             this partial ordering of writes, which is akin to relativity of simultaneity in
Copyright c 2015 Ted Cooper and Jonathan Walpole. This work is licensed under the            physics, described by Einstein [6].
Creative Commons Attribution 4.0 International license. To view a copy of the license,       2 We color write operations red, and will introduce additional colors for other
visit                                  kinds of operations.

Portland State University Technical Report 15-01, March 2016                             1                                                                        2016/3/28
spite RP’s benefits and deployment footprint however, there is little                 presents implementations of the write-side critical sections
existing language- or tool-based support for it. Instead, program-                    for the examples from Section 2.1, and
mers are responsible for knowing and manually following the RP                        presents a brief experimental evaluation based on these
discipline. The work presented here takes some first steps towards                    examples that tests whether the causal ordering mechanisms
addressing this gap.                                                                  Monadic RP provides are necessary and sufficient.
    Specifically, we introduce Monadic RP, which, to our knowledge,
is the first implementation of RP in Haskell. This work is inspired           • Section 7 summarizes this work and directions for future work.
by the GHC Software Transactional Memory (STM) implementation
of Harris et al. [8], and by the user-level RCU C library (urcu) of             This paper describes future work inline, rather than in a separate
Desnoyers et al. [4]. Like GHC STM, Monadic RP uses features                 section, in an effort to show directly how it connects to the work we
unique to the Glasgow Haskell Compiler (GHC) and uses a monadic              present.
abstraction to constrain shared data and the concurrent computations
that operate on it. Unlike GHC STM, Monadic RP does not enforce              2.      Background
an atomicity property among sequences of reads and writes, nor does
it enforce a total ordering of writes to the same data. Monadic RP           2.1     Causal Ordering
allows concurrent reading and writing of the same data (something            2.1.1    Grace periods enforce causal ordering
that is not possible if strongly atomic transactions are used for all
reads and writes), and it provides mechanisms that writers can use           Most implementations of RP are in systems that rely on the program-
as needed to guarantee that other threads see concurrent writes in           mer to include code that explicitly allocates and reclaims memory,
order. Hence, Monadic RP presents a programming model based on               so the programmer must write that code to use grace periods to
causal ordering. There is good reason (and evidence) to believe that         guarantee that writers do not reclaim memory that readers may yet
these two programming models are complementary. For example,                 visit. In this case, a writer begins a grace period after replacing one
Howard and Walpole [10] modified an STM implementation to                    or more nodes in a data structure accessible through a shared refer-
support relativistic reading while retaining transactional concurrency       ence by updating the references in existing nodes to point to new
control among writers.                                                       copies. At this point, one or more readers may have already loaded
    The main advantage Haskell brings to RP is that its type system          the addresses of old copies, whose fields must remain available for
captures the structure of imperative computations in enough detail           the readers’ inspection. Once the grace period ends, each reader
to enforce on programmers a unique discipline tailored to each type          has finished its current read-side critical section, and old copies of
of computation. Such a discipline can, for example, reject programs          updated nodes and unlinked nodes are no longer reachable from
that call functions inside computations where they don’t belong.             the shared reference, so they can be safely deallocated. That is, the
Monadic RP uses types to mark explicitly and separate read-side              writer initiates a grace period and waits until the readers announce
and write-side critical sections, and to guarantee that readers do           they have each finished a read-side critical section, and so cannot
not block or write to shared memory. In ongoing work, we are                 access unlinked nodes. Through this channel of communication, the
extending Monadic RP to statically check reader traversal order              writer learns from the readers when it is safe to reclaim memory.
and writer update order to guarantee that readers observe causally                This memory reclamation guarantee is an instance of a more
related updates in order.                                                    general property: Unlinking a node and reclaiming its memory are
    This work presents a new implementation of RP, situates it in the        causally ordered operations. That is, reclaiming is a consequence
context of existing implementations, and explains and demonstrates           of unlinking, so when a writer unlinks the node, readers should
some of its properties:                                                      only have an opportunity to observe the data structure in one of
                                                                             three ordered consistent states: without the node unlinked, with the
 • Section 2.1 explains the causal ordering guarantees RP relies             node unlinked but not yet reclaimed, and finally without the node
    on using two examples, and outlines ongoing work to statically           entirely. If a reader can observe a state with the node reclaimed but
    check that relativistic programs obey a discipline that provides         not unlinked, then causal ordering is violated. To prevent readers
    these guarantees. In turn, Section 5 demonstrates that when these        from observing this inconsistent state, a writer can wait for a grace
    examples are constructed using Monadic RP, they execute cor-             period to complete after unlinking the node (and before reclaiming
    rectly because the implementation provides the causal ordering           it). When a writer waits for a grace period to complete, it guarantees
    guarantees they need.                                                    that readers will not observe any states introduced by writes after
                                                                             the grace period before they observe any states introduced by writes
 • Section 2.2 elaborates rules from the discipline for correct RP.          before the grace period. That is, the pre-grace-period states happen
 • Section 3:                                                                before [16] the post-grace-period states.
                                                                                  Unlinking a node and then reclaiming its memory is a common
        situates Monadic RP in the context of existing work on RP            example of a sequence of causally related writes, but it is by no
        correctness (detailed in Section 6),                                 means the only example. Relativistic red-black trees [11] and hash
        motivates and describes the abstractions Monadic RP pro-             tables [30] both rely on grace periods for causal ordering beyond
        vides and how they enforce rules from Section 2.2,                   memory reclamation. One simple example of a sequence of causally
        presents an implementation of a read-side critical section for       ordered updates is moving a node to a different position in a singly
        the examples from Section 2.1 that uses these abstractions,          linked list by inserting a copy at the new position, then unlinking
                                                                             the node at its old position. If we choose to causally relate these
        describes a case where another RP implementation allows              operations, we buy a guarantee that readers will see the list in one of
        violations of rules Monadic RP enforces, and                         three ordered consistent states: with the node at its original position,
        outlines ongoing work on Monadic RP to enforce more rules.           with the new copy at the new position and the original node at the
                                                                             old position, or with the new copy at the new postion and the original
 • Section 4 describes the implementation of Monadic RP, filling
                                                                             node unlinked. By preventing readers from observing the list with
    in details about the mechanisms that enforce rules from Section          no new copy at the new position but the original node unlinked, we
    2.2.                                                                     gain the guarantee that no reader will observe the node temporarily
 • Section 5:                                                                disappearing during a move operation. In exchange, we lose the

Portland State University Technical Report 15-01, March 2016             2                                                                  2016/3/28
guarantee that no reader will observe two copies of the same node                                or have already passed the node we changed, and will see State α1
in a traversal, because we allow a state where the node is visible in                            ([A,B,C,D,E]).
both its new and old positions.
                                                                                                                           Figure 4. State α2
Example 1 Consider a singly linked list [A,B,C,D,E] that reader
threads are constantly traversing only from left to right. The readers
hold a shared reference to the first node (head), and must start each                                                D
traversal there. We (the writer) can use a grace period to enforce                                           W1
causal ordering when we move node D from a later position (after
C) to an earlier position (after A). Once the move is complete, the                               head        A             B            C             D              E
readers will observe the modified list [A,D,B,C,E].
    Figure 1 shows the partial orders on list states readers can observe
during this move operation. If we do nothing to enforce ordering, the                                Now we’ve arrived at an impasse. We need to remove the
left column applies. If we enforce causal ordering, one of the orders                            old copy of D by pointing the reference in C to E (to obtain
in the middle column applies (in this example, we choose the order                               State α3), but a concurrent reader may have passed A before we
in the solid box instead of the order in the dashed box, to guarantee                            pointed its reference to the new copy of D, and then immediately
that readers don’t miss D during the move). If we enforce atomic                                 been preempted, performed a long-running computation, or been
ordering, for instance using STM, the order on the right applies.                                otherwised delayed in flight. If we update the reference in C to point
                                                                                                 to E before the leisurely reader visits C, that reader will also miss the
                                                                                                 old copy of D, and see the list in an inconsistent state ([A,B,C,E]).
Figure 1. Free through atomic orderings for Example 1 – arrows                                   But we need to unlink the old D to finish our move operation!
are transitions between observable states                                                            We need a guarantee that there are no readers in flight who
                                                                                                 missed the new D. Another way of saying that is that we need a

                  no ordering            causal ordering     atomic ordering       ordered       guarantee that there are no readers who have missed the write that
                                                                                                 creates State α2, but may see the write that creates State α3. Yet
                                                                                                 another way of saying that is that we need to guarantee that State
                   [A,B,C,D,E]             [A,B,C,D,E]           [A,B,C,D,E]                     α2 happens before State α3. So we must wait for a grace period to
                                                                                                 complete, and then make a second write W2 to unlink the old D. This
       [A,D,B,C,D,E] [A,B,C,E]            [A,D,B,C,D,E]          [A,D,B,C,E]
                                                                                                 satisfies the guarantee, since every reader that missed the new D has
                   [A,D,B,C,E]             [A,D,B,C,E]                                           finished its traversal and seen State α1, every reader that saw the
                                                                                                 new D has finished its traversal and seen State α2, and any reader
                                               OR                                                who starts a new traversal must do so at head, and will see the new

                                                                                                                           Figure 5. State α3
                                                                                                                    W1                                       W2
                                           [A,D,B,C,E]                                            head        A             D            B             C              E

             The initial list is shown in Figure 2 (State α1).
                                    Figure 2. State α1
                                                                                                     Now the move operation is complete. Readers may still have seen
 head                 A              B              C            D             E                 the reference in C before we updated it to point to E, so they may still
                                                                                                 visit the old D, so we can’t reclaim the memory it occupies yet. But
                                                                                                 conveniently, the GHC runtime garbage collector will automatically
    First, we create a new copy of D in freshly allocated memory. We                             reclaim that memory after readers are finished with it.
write to memory to do so, but no nodes in the shared list point to it
yet, so we know readers have no way of reaching it.                                                                        Figure 6. State α4
                                                                                                                    W1                                       W2
                                 Figure 3. State α1 cont’d                                        head        A             D            B             C              E

                             D                                                                       At a high level, we made two causally ordered writes (W1 and
                                                                                                 W2) going from left to right in the list. Readers are traversing the
                                                                                                 list left to right, and because our writes also progressed from left
 head                 A              B              C            D             E                 to right, we had to wait for a grace period to complete between
                                                                                                 them to guarantee that no reader could observe W2 (which created
                                                                                                 State α3) without observing W1 (which created State α2, and is
    Next, we link the new D into the list by updating the reference in                           also part of State α3). The current version of Monadic RP does not
A to point to the new D. This reference is reachable from head, so                               statically reject programs that omit a grace period in this situation,
we have marked its update as W1 since it is the first of two causally                            as demonstrated in Section 5.3. This paper describes ongoing work
ordered writes that are visible to readers. Now we have put the list in                          to address this at the end of Section 2.1.2. Although we need a grace
a new state (State α2) where the new copy of D is linked in at its new                           period to guarantee causal ordering when writes are committed in
position, and the old copy remains at its original position. Readers                             the same order as reads, when the order of writes is opposite the
concurrently traversing will either see State α2 ([A,D,B,C,D,E]),                                order of reads we do not; Section 2.1.2 explains why.

Portland State University Technical Report 15-01, March 2016                                 3                                                                    2016/3/28
2.1.2    Reader traversal order can also enforce causal ordering                                             Figure 10. State β3
Grace periods enforce causal ordering because a writer waits for
                                                                                                       W2                         W1
readers to complete their critical sections before it makes its next               head           A            C            D            B             E
causally ordered write. This slows writers down, reducing concur-
rency between readers and writers, and could create a bottleneck in
write-heavy workloads. So it makes sense to avoid grace periods
when we have an opportunity to do so. When we can create the
states we need using a sequence of writes to nodes in the opposite
order of reads, and have guarantees that writes are committed in
                                                                                       As before, the garbage collector reclaims the old D at some point
program order and dependent loads are not reordered, we have such
                                                                                   after no thread holds a reference to it.
an opportunity.

Example 2 Consider the same singly linked list as before                                                     Figure 11. State β4
[A,B,C,D,E], with readers traversing again left to right. This                                         W2                         W1
time, we will move B from an earlier position (after A) to a later                 head           A            C            D            B             E
position (after D). After the move operation is complete, readers will
observe the modified list [A,C,D,B,E].
                                                                                      This time, because our writes progressed in the opposite direction
                                                                                   from reads, we did not have to wait for a grace period to complete
                             Figure 7. State β1                                    between writes to guarantee that readers saw State β2 happen before
                                                                                   State β3. This guarantee rests on two conditions:
 head         A               B               C                D        E           • Most RP implementations (including Monadic RP) guarantee
                                                                                         that writes are committed in program order, so we know that, if
                                                                                         W2 is available to readers, then W1 is as well.
                                                                                    • Readers visit the site of W2 before they visit the site of W1. So
    First, we create a new copy of B to link into the list.
                                                                                         if a reader sees W2, that means that W1 is available, so when the
                                                                                         reader subsequently visits W1’s site, it is guaranteed to see W1.
                         Figure 8. State β1 cont’d                                     Note that this does not mean that readers will see the nodes we
                                                                                   changed in the order we changed them. Readers only observe nodes
                                                                    B              in traversal order. In this case, that means that they will observe the
                                                                                   nodes we changed in the opposite of the order we changed them.
                                                                                   Not exactly intuitively, this is the property that guarantees our most
                                                                                   recent write happens before our previous ones.
 head         A               B               C                D        E              In ongoing work, we are developing a domain-specific embed-
                                                                                   ded language (DSEL) [2] that tracks relativistic reader traversal
                                                                                   direction and causally ordered write sequence direction using type-
    This time, W1 updates the reference in D to link the new B                     level state in parameterised monads [1] implemented using GHC
into the list. Readers concurrently traversing will either observe                 type families [26], as demonstrated by Kiselyov et al. [14]. This
W1 and see State β2 ([A,B,C,D,B,E]), or miss W1 and see State β1                   DSEL prevents construction of type-correct programs that make
([A,B,C,D,E]). The key difference from the previous example is                     sequences of causally ordered writes in traversal order without in-
that the first write is at the later position in reader traversal order,           voking synchronizeRP between each pair. However, it does allow
instead of at the earlier position.                                                programs that make sequences of causally ordered writes against
                                                                                   traversal order that do not invoke synchronizeRP between each
                             Figure 9. State β2                                    2.2     RP Discipline
                                                                                   Researchers have identified a set of rules for correct relativistic
                                                                    B              programming [21, 29], which we have adapted to match the terms
                                                               W1                  and abstractions this work describes:

 head         A               B               C                D        E          1. A writer must only make changes that leave shared data in a
                                                                                      consistent state, and guarantee that readers always observe the
                                                                                      consistent states that causally related changes create in order.
                                                                                   2. A thread can read from shared data only:
    Last time, we had to wait for a grace period to complete after
W1, because it was possible for a reader to see W2 but miss W1 in a                      (a) inside a write-side critical section, or inside a read-side
traversal, since readers visit the site of W2 after visiting the site of W1,                 critical section that announces its end to the world, so grace
and both writes could occur between those visits. This time, W2 is at                        periods track all readers; and
an earlier position than W1, so if a reader sees W2, it will definitely see              (b) using RP read primitives, to prevent memory access reorder-
W1 (and observe State β3). This is because we committed W1 before                            ings that may lead to inconsistent observations.
W2, the reader has seen W2, and has yet to visit the site of W1, and so
could not have missed W1 already. If the reader misses W2, it may                  3. A thread can write to shared data only:
see W1 (and observe State β2), or it may miss W1 (and observe State                      (a) in a context where writer concurrency can be controlled if
β1).                                                                                         necessary, such as a write-side critical section; and

Portland State University Technical Report 15-01, March 2016                   4                                                                  2016/3/28
      (b) using RP write primitives, to prevent memory access reorder-             The RP module exports four monads, each corresponding to
          ings that may lead to inconsistent state publications.               a stage in the life cycle of an RP computation. Computations in
4. A thread must not wait for a grace period to complete inside a              RPR are read-side critical sections, and computations in RPW are
   read-side critical section. Such a thread would deadlock (waiting           write-side critical sections. The constructor names for the monads
   for its own critical section to end) and cause any threads that             are not exported and labeled “Unsafe”, since they can be used to run
   subsequently wait for a grace period to complete to also block              arbitrary effectful IO computations.
   forever [19].                                                                   The readSRef and writeSRef functions are the only way to
                                                                               access SRefs. Because RPR and RPW are in the RPRead typeclass,
5. A thread must not use references obtained during a critical                 readSRef can be called in each. Because writeSRef has a return
   section outside that critical section.                                      type in the RPW monad (RPW ()), it can be called only in RPW.
                                                                                   Critical sections must run in a thread. That thread may run
   This is not an exhaustive list of rules in the discipline for correct
                                                                               multiple critical sections in some order and perform computations
RP. This work provides and proposes static checks for Rules 1–4,
                                                                               outside of them using the values they return; the programmer defines
but only proposes an incomplete solution for 5.
                                                                               these thread computations in the RPE5 monad, read-side sections are
                                                                               wrapped in readRP and write-side sections in writeRP. Within a
3.      Contribution                                                           write-side critical section, a writer may need to guarantee that some
Most existing work on RP correctness, discussed in Section 6.1,                writes are causally ordered with respect to others. The writer invokes
focuses on proving that an RP implementation is correct, is a                  synchronizeRP to wait for readers, which is available only in the
sound foundation. Checking that relativistic programs obey the                 RPW monad.
RP discipline is another matter entirely. While researchers have                   Finally, a relativistic program must initialize the references its
proposed candidate rules for static checking (see Section 6.3) and             threads share with newSRef6 , spawn those threads with forkRP,
implemented runtime checks (see Section 6.2), this problem remains             and wait for them to complete and return values using joinRP. The
unsolved.                                                                      programmer defines these computations in the RP monad.
   There is room in this design space for an implementation that                   These abstractions are sufficient to implement simple RP algo-
                                                                               rithms with writers that proceed one at a time while readers proceed
 • can be used to write general-purpose relativistic programs, and
                                                                               concurrently. Other implementations allow concurrency between
 • automatically statically checks that these programs follow the              writers to disjoint data [10]. This code skeleton illustrates the struc-
      RP discipline.                                                           ture described above.
    With these goals in mind, we introduce Monadic RP, a Haskell
implementation of relativistic programming that uses types to mark             do state ← buildInitialState
and separate read-side and write-side critical sections and to restrict           reader ← forkRP $ do
the programmer to the minimal set of effectful operations required                  x ← readRP $ readSideCriticalSection state
in each. Programmers can write new relativistic programs in Haskell                 return      $ pureComputation x
                                                                                  writer ← forkRP $ do
using the abstractions Monadic RP provides, which ensure that these                 writeRP     $ writeSideCriticalSection state
programs meet the safety guarantees in Section 3.3.                               x ← joinRP reader
                                                                                  joinRP writer
3.1     Explicit Critical Sections
Many existing RP implementations [4, 7, 27] track grace periods
using explicitly marked read-side critical sections, and control writer            The tests in Section 5 are implemented as variants on this
concurrency using explicitly marked write-side critical sections.              structure that spawn multiple reader threads.
Monadic RP adopts this approach, and uses these demarcations to                    We have not yet attempted to determine whether the various
solve three problems:                                                          monads introduced in this work satisfy the standard monad laws.
                                                                               We leave this as a topic for future work, observing that a formal proof
1. If read-side critical sections write to shared data (a programmer           would likely require a semantics that can account for concurrency
   error), they cannot safely proceed concurrently with writers or             and the inherent nondeterminism of relativistic reads.
   one-another. Rule 3a prohibits this.                                            Also, we have not yet attempted to add support for a mechanism
2. If a reader waits for a grace period to complete inside a read-side         relativistic threads could use to communicate with threads comput-
   critical section, it will deadlock. Rule 4 prohibits this.                  ing in IO, at least outside critical sections. This or some other limited
                                                                               IO support is necessary to realize many use cases well-suited to RP,
3. If writers update the same data concurrently, in many cases the             such as implementing a DNS server [33].
   data may end up in an inconsistent state. Monadic RP controls                   Appendix A.1 provides the RP source code in full.
   writer concurrency by preventing it entirely, which certainly
   satisfies Rule 3a, but alternative methods (such as STM) would              3.3     Rules enforced
   also satisfy this rule, and could increase writer concurrency.
                                                                               Monadic RP exports types and functions that provide the following
3.2     Expressing RP computations                                             guarantees.
Monadic RP represents read-side and write-side critical sections                • The only way to create or access SRefs is through the RP
with distinct monad types (RPR3 and RPW4 ), and represents shared ref-               module’s interface. Also, SRefs cannot be accessed outside the
erences with type SRef, which can be read only using readSRef and                    relativistic computation that creates them. This is necessary to
writeSRef. It also provides the synchronizeRP function, which                        enforce Rule 1, since threads that aren’t part of a relativistic
writers can use to wait for a grace period to complete. These types                  computation could corrupt shared data or observe it in an
and functions are defined in the RP module.                                          inconsistent state otherwise.
3 We   color operations in the RPR monad blue.                                 5 We   color operations in the RPE monad plum.
4 We   color operations in the RPW monad red.                                  6 We   color operations in the RP monad burnt orange.

Portland State University Technical Report 15-01, March 2016               5                                                                  2016/3/28
 • readSRef can be run only in RPR or RPW, and is the only way                 3.4     Comparisons to other implementations
    to read from shared data. These properties enforce Rule 2.                 Monadic RP statically prevents rule violations that other implemen-
 • writeSRef can be run only in RPW, and is the only way to write              tations do not. For instance, the urcu user-level RCU library in C
    to shared data. These properties enforce Rule 3.                           of Desnoyers et al. [4] allows any function to be called in any block
 • synchronizeRP can be run only in RPW. This enforces rule 4.                 of code. We illustrate this fact in a urcu program.
                                                                                    The urcu library includes a test suite with a test for the verion of
 • Arbitrary effectful computations in IO cannot be run in any of              urcu most similar to Monadic RP, called test qsbr.c. This test
    the monads that the RP module exports. This is necessary to                spawns a configurable number of readers and writers who share a
    enforce any of the rules, since a computation in IO can break              single reference. Each writer repeatedly allocates a new block of
    them all.                                                                  memory, stores a sentinel value in it, swaps the address of the new
    For instance, the snapshot function in the RPR monad imple-                block with the shared reference, waits for a grace period to complete
ments a read-side critical section that traverses a relativistic linked        so no reader still holds a reference to the old block, and then stores a
list, returning a snapshot of the current list state. We used this func-       second sentinel value in the old block. Each reader polls the shared
tion to implement the tests in Section 5.                                      reference, failing if it ever observes a block with the second sentinel.
                                                                               The test runs for a configurable length of time. If any reader fails,
data RPList s a = Nil                                                          it is evidence that the grace period mechanism or memory access
                | Cons a (SRef s (RPList s a))
                                                                               primitives do not preserve causal ordering and consistent states. We
snapshot :: RPList s a → RPR s [a]                                             have modified this test program to show that urcu does not statically
snapshot Nil         = return []                                               enforce RP rules Monadic RP does:
snapshot (Cons x rn) = do                                                       • The modified program reads from a shared reference outside
  l    ← readSRef rn
  rest ← snapshot l
                                                                                     a critical section using rcu dereference(), the analogue of
  return (x : rest)                                                                  readSRef. This violates Rule 2a, and Monadic RP prevents this
                                                                                     error because readSRef can run only in RPWor RPR, both of
    The snapshot function type-checks, since it only calls effectful                 which represent critical sections.
functions in RPR.
                                                                                • The modified program writes to a shared reference inside a
    If we add a writeSRef call to this read-side critical section, we
cause a type error, since writeSRef is only in RPW, but snapshot is                  read-side critical section using rcu assign pointer(), the
in RPR.                                                                              analogue of writeSRef. This violates Rule 3a, and Monadic
                                                                                     RP prevents this error because writeSRef cannot run in RPR.
snapshot :: RPList s a → RPR s [a]
snapshot Nil         = return []                                                • The modified program waits for a grace period to complete
snapshot (Cons x rn) = do                                                            inside a read-side critical section using synchronize rcu(),
  l    ← readSRef rn                                                                 the analogue of synchronizeRP. This violates Rule 4, and
  writeSRef rn (Cons undefined rn)                                                   Monadic RP prevents this error because synchronizeRP cannot
  rest ← snapshot l                                                                  run in RPR.
  return (x : rest)
                                                                               This modified program compiles without warnings or errors, and
Error:                                                                         fails at runtime. C does not include language-level mechanisms to
Couldn’t match expected type                   RPR s a0                        restrict the statements that appear in blocks of code beyond ensuring
            with actual type                   RPW s ()                        that they are syntactically well-formed and checking that parameter
                                                                               and return types match argument and result types, respectively, so
    The same is true for synchronizeRP, and all other effectful
                                                                               this result is unsurprising, and does not represent an oversight on
functions not in RPR.
                                                                               the part of Desnoyers et al.. Appendix A.3 contains the text of this
    The prependA function inserts an ‘A’ at the beginning of a
                                                                               modified version of test qsbr.c.
relativistic string.
                                                                                   Howard and Walpole [10] used STM to manage concurrent
prependA :: SRef (RPList Char) → RPW ()                                        relativistic writers, so we are investigating using GHC STM [8]
prependA head = do head’ ← copySRef head                                       for this purpose. Although GHC STM is strongly atomic, and so in
                   writeSRef head (Cons ’A’ head’)                             normal use prevents concurrent reads and writes to the same data,
                                                                               GHC provides the readTVarIO# primitive, which allows threads to
   Because prependA has a return type in the RPW monad, it can                 read transactional variables outside of transactions.
only be run inside a write-side critical section.
   As Figure 12 shows, the operations available in monads besides              3.5     Tracking traversal and update direction
RPW are severely restricted.                                                   The reader (a person, not a thread) may realize that the abstractions
                                                                               Monadic RP provides for reading and updating shared data do
Figure 12. A summary of the operations allowed in each monad.                  not capture any information about traversal or update direction.
                                                                               Because of this, there is no way to track whether synchronizeRP or
                         RP      RPE      RPR      RPW                         updates against traversal order are correctly used to guarantee causal
  newSRef                 •                         •                          ordering, as illustrated in Section 2.1, and enforce Rule 1. In ongoing
  readSRef                                 •        •                          work to address this shortcoming, we are developing a cursor-based
  writeSRef                                         •                          interface for critical sections, where the cursor starts at a shared
  copySRef                                          •                          reference at the beginning of each critical section, and threads must
  synchronizeRP                                     •                          explicitly move the cursor to a node to access it. Inside write-side
  forkRP                  •                                                    critical sections, this interface forces the programmer to explicitly
  joinRP                  •                                                    mark causally ordered sequences of writes. This interface, combined
  readRP                          •                                            with the type-level representation of traversal and update direction
  writeRP                         •                                            described at the end of Section 2.1.2, will statically enforce more of

Portland State University Technical Report 15-01, March 2016               6                                                                  2016/3/28
the rules from the relativistic programming discipline on programs            critical section. Additionally, a thread can opt out of participation in
built using Monadic RP. Specifically, it:                                     the clock by setting its local counter to zero, allowing grace periods
 • rules out causally ordered updates in traversal order that aren’t
                                                                              to complete without any message from the thread. Desnoyers et al.
                                                                              [4] say a thread which has opted out is in an extended quiescent
      separated by synchronizeRP,
                                                                              state, or offline. A thread always opts out at the beginning of a write-
 • allows causally ordered updates against traversal order that aren’t        side critical section. This is because it is now the writer, and would
      separated by synchronizeRP,                                             not be able to make progress while deadlocked waiting for itself
 • prevents synchronizeRP from being invoked outside sequences                to complete a read-side critical section. At the end of a write-side
      of causally related writes, and                                         critical section, the thread (having laid down its writer’s mantle)
                                                                              returns its local counter to its previous state.
 • rules out other programmer errors, such as invoking synchronizeRP
      twice with no intervening writes or invoking synchronizeRP be-
      fore making the first in a sequence of causally ordered writes.
We believe that this approach is applicable to relativistic trees             4.3   Primitives
because we can generalize the notion of traversal direction to                The RP module depends on two memory barrier primitives built
distinguish between expanding the frontier of explored nodes in               into the GHC runtime and exposed to Haskell programs by the
keeping with a partial order where parents precede children, and              atomic-primops package. Both act as full compiler-level memory
backtracking to nodes already within this frontier. This distinction          barriers, preventing both GHC and the C compiler that generates
fits naturally with the zipper abstraction.                                   the GHC runtime code from reordering reads or writes across either
                                                                              barrier, but these barriers have different semantics at the hardware
4.     Implementation                                                         level.
                                                                                  One barrier primitive is called writeBarrier, and prevents
The RP module exports functions built on top of IO, but does not              hardware reordering of writes. On x86, AMD64, ARM (pre v7), and
allow arbitrary IO computations inside relativistic computations.             SPARC, writes from a particular core are not reordered, so GHC
The types we define in the RP module, and the choice of which names           [17] correctly implements writeBarrier using no instruction at
to export from it, enforce these controlled abstractions. Additionally,       all on these architectures. On PowerPC, and ARMv7 and later, GHC
the primitives we use to implement Monadic RP meet the memory                 implements writeBarrier using the appropriate instructions along
ordering requirements of Rules 2b and 3b.                                     with a C compiler reordering barrier.
                                                                                  The writeSRef function calls writeBarrier before it writes
4.1     Safe effectful abstractions in Haskell                                to memory. This guarantees that writes are committed in program
The IO monad provides all general effectful operations, including             order, which in turn means that writes against traversal order
file I/O, direct manipulation of memory, and thread management.               guarantee causal ordering, as illustrated in Section 2.1.2. The
Because relativistic programs directly mutate shared memory and               synchronizeRP function uses writeBarrier to ensure that a
spawn threads, Monadic RP must be implemented using functions                 writer commits the write that increments the global counter before
in IO. However, if all computations in IO are allowed inside RPR or           the writer begins waiting for readers to observe this write.
RPW, programmers could violate the RP discipline. For instance,                   The other barrier primitive is called storeLoadBarrier, and
readers could write to arbitrary memory and wait for grace periods            prevents hardware reordering of reads and writes. GHC implements
to complete.                                                                  it using appropriate instructions on all supported architectures. On
    Terei et al. [28] introduce Safe Haskell, and demonstrate a               x86 and AMD64, GHC uses a lock-prefixed add instruction that
flexible technique for defining a restricted abstraction RIO on top           adds zero to the word at the top of the thread’s stack (a no-op). On
of IO where clients of RIO can access IO operations only through a            Intel architectures, the lock prefix prevents hardware reordering of
fixed set of predefined functions exported by RIO. The RP module is           reads and writes across the instruction it modifies [12]. Although
implemented using this abstraction-building technique.                        legacy Intel architectures lock the entire bus when executing any
                                                                              lock-prefixed instruction, modern Intel architectures do not when the
4.2     Algorithm                                                             instruction accesses a memory location cached in exclusive mode
Monadic RP implements grace periods using quiescent states [23],              on the core executing the instruction (such as the top of the thread’s
in an approach modeled after the Quiescent-State-Based Reclama-               stack), so on modern Intel architectures, this primitive does not incur
tion (QSBR) RCU implementation of Desnoyers et al. [4]. Their                 the cost of bus locking.
work outlines the implementation in detail, so we present a sum-                  The synchronizeRP function uses storeLoadBarrier to in-
mary here.                                                                    sulate the code that waits for a grace period to complete from write-
    In this formulation of Quiescent-State-Based RP, grace periods            side critical sections. The readRP function (which encapsulates a
are announced and completed using a set of counters. There is a               read-side critical section) uses storeLoadBarrier to insulate a
global grace period counter, and every thread has its own local               reader thread’s updates to its local counter from the memory ac-
counter. We can think of the local counters as a variant of a logical         cesses inside the current or subsequent critical sections.
clock [16] where there are only global events, represented in shared
memory. The only event that increments any counter is when the
thread currently inside a write-side critical section (the writer)
increments the global counter before waiting for a grace period               4.4   Memory Reclamation
to complete. When the writer increments the global counter, it sends          Kung and Lehman [15] demonstrate that RP implementations can
a message to readers that a grace period is begun. In response, each          rely on a general-purpose garbage collector to guarantee that node
reader sends a message to the writer that it has finished a critical          unlinking and node memory reclamation are causally ordered. GHC
section (entering a quiescent state) by updating its local counter to         has a tracing garbage collector [18] that will not reclaim memory
match the global counter. Once the writer has received this message           some thread could still reach, so Monadic RP does not need to use
from each active reader by blocking on the reader’s local counter             any additional causal ordering mechanism to prevent premature
until the reader updates it, the writer moves on with its write-side          reclamation.

Portland State University Technical Report 15-01, March 2016              7                                                                  2016/3/28
5.    Results                                                                       writeSRef rd $ Cons b rd’       -- [A,B,C,D,B,E]
                                                                                    -- don’t need to synchronizeRP,
We have implemented the examples from Section 2.1 using Monadic                     -- write order against traversal order
RP, and used these implementations to test our causal ordering                      -- unlink the old B
mechanisms. We tested three hypotheses:                                             writeSRef ra bn                 -- [A,C,D,B,E]
1. our implementation of synchronizeRP ensures causal ordering
   between writes in traversal order,
                                                                               5.3    Example 1, without synchronizeRP
2. our implementations of writeSRef and readSRef together
   ensure causal ordering when writes are against traversal order,             The third test, moveDbackNoSync, is the same as the first, except
                                                                               that it omits synchronizeRP.
3. in a program that does not use either causal ordering mechanism,
   readers may observe shared data in an inconsistent state.                   5.4    Test Results
    As in the examples, all tests start with the shared linked list
[A,B,C,D,E]. To test hypothesis 1, we implemented Example 1                    Figure 13. Aggregate results from all test runs, including counts of
from Section 2.1.1. To test hypothesis 2, we implemented Example               inconsistent states readers observed.
2 from Section 2.1.2. To test hypothesis 3, we implemented a variant
of Example 1 that does not invoke synchronizeRP between W1 and                                 Example 1 with     Example 1 w/o           Example 2
W2.                                                                                            synchronizeRP      synchronizeRP
    Each test spawns 29 readers and a writer. Each reader traverses             time (hours)              13.93              13.97             13.95
                                                                                snapshots       116,000,000,000    116,000,000,000   116,000,000,000
the list 400,000 times in a tight loop, then returns aggregate counts           consistent      116,000,000,000    115,999,999,547   116,000,000,000
of the consistent and inconsistent states it observed during these              inconsistent                  0                453                 0
traversals. The writer simply moves a list element and returns. We
ran these tests on a 30-way QEMU guest on top of 40-way host                   Figure 13 shows that readers observed no inconsistent states dur-
with IntelrXeonr E5-2670 v2 @ 2.50GHz CPUs. We ran each                        ing test runs for Example 1 with synchronizeRP or Example
test 10,000 times.                                                             2, giving us confidence that both synchronizeRP and writes
5.1   Example 1                                                                against traversal order ensure causal ordering in this implemen-
                                                                               tation. When we omitted synchronizeRP from Example 1, readers
The first test, moveDback, implements Example 1 from Section                   occasionally observed inconsistent states, confirming that we need
2.1.1, moving D from a later position to an earlier position. It invokes       synchronizeRP to guarantee causal ordering in Example 1. To cre-
synchronizeRP in between the writeSRef call that links the new                 ate a meaningful corresponding test for Example 2 without causal
copy in at the earlier position and the writeSRef call that unlinks            ordering, we would need to omit the write barrier in writeSRef and
the old copy at the later position, since these writes are in reader           run tests on a highly concurrent architecture that reorders stores from
traversal order.                                                               a given core.
moveDback :: SRef s (RPList s a) → RPW s ()                                        The first and third tests demonstrate that in Monadic RP,
moveDback head = do                                                            synchronizeRP and writes against traversal order provide the
  (Cons a ra) ← readSRef head -- [A,B,C,D,E]                                   causal ordering guarantees we describe in Section 2.1. That is, they
  -- duplicate reference to B                                                  are sufficient to enforce Rule 1 if used correctly. The second test
  ra’         ← copySRef ra                                                    demonstrates that without using one of these mechanisms, we can-
  (Cons b rb) ← readSRef ra                                                    not guarantee causal ordering in the presence of concurrent reads
  (Cons c rc) ← readSRef rb                                                    and writes. That is, they are necessary to enforce Rule 1.
  (Cons d rd) ← readSRef rc
  ne          ← readSRef rd
  -- link in a new D after A                                                   6.     Related work
  writeSRef ra (Cons d ra’)     -- [A,D,B,C,D,E]
                                                                               6.1    Verified implementations
  -- wait for readers
  synchronizeRP                                                                Hart et al. [9] and Desnoyers et al. [4] have written in C and
  -- unlink the old D                                                          benchmarked RP implementations with a variety of grace-period-
  writeSRef rc ne               -- [A,D,B,C,E]                                 based causal ordering mechanisms. Two of these implementations
                                                                               and example algorithms were modeled on a virtual architecture with
                                                                               a weakly-ordered memory model in Promela. Assertions about the
5.2   Example 2                                                                model, expressed in Linear Temporal Logic, were verified using
The second test, moveBforward, implements Example 2 in Section                 the Spin model checker [5]. The Promela model was constructed
2.1.2. It moves B from an earlier position to a later position by              manually and the verification took hours to run; in exchange,
inserting a new copy at the later position, then unlinking the old copy        Desnoyers et al. were able to detect errors in the implementation
at the original position. Because these writes are against traversal           at the level of missing memory barriers and algorithm-level race
order, the writer does not need to invoke synchronizeRP to enforce             conditions. This verification gives users high confidence that these
causal ordering.                                                               implementations are correct, but says nothing about new programs
                                                                               built on top of them.
moveBforward :: SRef s (RPList s a) → RPW s ()                                     Gotsman et al. [7] implemented RP in a C-like formal language
moveBforward head = do                                                         and verified this implementation in a sequentially consistent mem-
  (Cons a ra)    ← readSRef head -- [A,B,C,D,E]                                ory model using an extension of the logic RGSep [32]. This imple-
  bn@(Cons b rb) ← readSRef ra
  (Cons c rc)    ← readSRef rb                                                 mentation is not intended to be used to build new general-purpose
  (Cons d rd)    ← readSRef rc                                                 programs.
  -- duplicate the reference to E                                                  Tassarotti et al. [27] implemented RP in Coq, and verified this
  rd’            ← copySRef rd                                                 implementation in a release-acquire memory model [13] using the
  -- link in a new B after D                                                   logic GPS (a logic with ghost state, separation, and per-location

Portland State University Technical Report 15-01, March 2016               8                                                                   2016/3/28
protocols introduced by Turon et al. [31]). In this case, the imple-         References
mentation and the proof are in one system, but each function is               [1] R. Atkey. Parameterised notions of computation. Journal of Functional
an annotated Hoare triple with GPS predicates and permissions. A                  Programming, 19(3-4):335–376, 2009.
programmer can build new general-purpose programs on top of this
                                                                              [2] E. Brady and K. Hammond. Correct-by-construction concurrency:
implementation, but she would need skill in GPS and interactive the-              Using dependent types to verify implementations of effectful resource
orem proving to prove that these programs are correct as Tassarotti               usage protocols. Fundamenta Informaticae, 102(2):145–176, 2010.
et al. have for the underlying implementation.
                                                                              [3] J. Corbet. The kernel lock validator. Available:
                                                                                  Articles/185666/ [Viewed May 20, 2015], May 2006.
6.2     Lockdep                                                               [4] M. Desnoyers, P. McKenney, A. Stern, M. Dagenais, and J. Walpole.
McKenney [20] describes Lockdep-RCU, an extension of the Linux                    User-level implementations of read-copy update. IEEE Transactions
kernel lock validator [3] that does flexible runtime assertion check-             on Parallel Distributed Systems (USA), 23(2):375–82, 2012.
ing in and around Linux RCU API calls. For instance, it is config-            [5] M. Desnoyers, P. E. McKenney, and M. R. Dagenais. Multi-core
ured to emit a warning when rcu dereference(), the analogue                       systems modeling for formal verification of parallel algorithms. ACM
of readSRef, is used outside a critical section. A Linux developer                SIGOPS Operating Systems Review, 47(2):51–65, 2013. URL http:
can build the kernel with Lockdep enabled to turn on this runtime                 //
assertion checking, and then run tests to exercise the code paths with        [6] A. Einstein. Relativity: The special and general theory. Penguin, 1920.
assertions. Lockdep does not provide any guarantee that assertions            [7] A. Gotsman, N. Rinetzky, and H. Yang. Verifying concurrent memory
will be checked unless they are encountered at runtime, so it gives               reclamation algorithms with grace. In Proceedings of the 22nd
less assurance than a static analysis.                                            European conference on Programming Languages and Systems, pages
                                                                                  249–269. Springer-Verlag, Springer, 2013.
6.3     Proposed Coccinelle tests                                             [8] T. Harris, S. Marlow, S. Peyton-Jones, and M. Herlihy. Composable
                                                                                  memory transactions. In Proceedings of the tenth ACM SIGPLAN
McKenney [21] proposes using Coccinelle scripts to automatically                  Symposium on Principles and Practice of Parallel Programming, pages
detect a series of “RCU abuses” in Linux, which include:                          48–60. ACM, 2005.
 • waiting for readers in a read-side critical section (a violation of        [9] T. E. Hart, P. E. McKenney, and A. D. Brown. Making lockless
      Rule 4);                                                                    synchronization fast: Performance implications of memory reclamation.
                                                                                  In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006.
 • accessing an RCU-protected shared data structure through a                     20th International, pages 10–pp. IEEE, 2006.
      private reference obtained during a read-side critical section,        [10] P. W. Howard and J. Walpole. A relativistic enhancement to software
      rather than a shared reference (a violation of Rule 5).                     transactional memory. In Proceedings of the third USENIX conference
                                                                                  on Hot Topics in Parallelism (Berkeley, CA, USA, 2011), HotPar,
                                                                                  volume 11, pages 1–6, 2011.
7.     Conclusion                                                            [11] P. W. Howard and J. Walpole. Relativistic red-black trees. Concurrency
Monadic RP is a new GHC Haskell library for relativistic program-                 and Computation: Practice and Experience, 26(16):2684–2712, 2014.
ming that uses types and abstractions to guarantee basic safety              [12] Intel Corporation. Intel R 64 and IA-32 Architectures Software
properties. We have used it to implement working relativistic pro-                Developer’s Manual. January 2015. URL http : / / www . intel .
grams that demonstrate that it enforces causal ordering in two ways:              com/content/dam/www/public/us/en/documents/manuals/
using grace periods, and using writes against reader traversal order.             64-ia-32-architectures-software-developer-vol-3a-part-1-manual.
Existing work on RP focuses on developing performant implementa-                  pdf.
tions that do not model or enforce safety properties, runtime checks,        [13] ISO. Jtc1/sc22/wg14. iso/iec 9899: 2011. Information technology –
or on formal models for that verify correctness of the implementa-                Programming languages – C, 2011. URL
tion rather than the correctness of programs that use it. Monadic RP              iso/iso\_catalogue/catalogue\_tc/catalogue\_detail.
stakes out a novel middle ground: it provides an implementation suit-
able for general-purpose RP that statically enforces rules from the          [14] O. Kiselyov, S. P. Jones, and C.-c. Shan. Fun with type functions. In
RP programming discipline. We accomplish this by exporting types                  Reflections on the Work of CAR Hoare, pages 301–331. Springer, 2010.
and functions from the RP module that statically enforce a separation        [15] H. T. Kung and P. L. Lehman. Concurrent manipulation of binary
between read-side and write-side critical sections and restrict the               search trees. ACM Transactions in Database Systems, 5(3):354–382,
operations available in each. Minimizing the set of operations avail-             Sept. 1980.
able in each kind of critical section keeps writes and other effects         [16] L. Lamport. Time, clocks, and the ordering of events in a distributed
out of read-side critical sections and keeps effects besides reads,               system. Communications of the ACM, 21(7):558–65, July 1978. ISSN
writes, and waiting for grace periods out of relativistic programs,               0001-0782. .
making them safer. In ongoing work, we are extending Monadic RP              [17] S. Marlow. Smp.h (ghc source), December 2014. URL https :
to expose a cursor-based abstraction for critical sections that uses              //
parameterised monads to track reader traversal direction and writer          [18] S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel
causal update direction, and statically reject relativistic programs              generational-copying garbage collection with a block-structured heap.
that fail to follow rules that guarantee causal ordering.                         In Proceedings of the 7th International Symposium on Memory Man-
                                                                                  agement, pages 11–20. ACM, 2008.
                                                                             [19] P. McKenney. Read-copy update (rcu) validation and verification for
Acknowledgments                                                                   linux (slides), November 2014. URL http : / / www . rdrop . com /
We would like to thank Dr. Mark Jones, Dr. Andrew Black, Dr.                      ~paulmck/RCU/RCUVal.2014.11.11a.Galois.pdf.
David Maier, Dr. James Hook, Larry Diehl, and Kendall Stewart of             [20] P. E. McKenney. Lockdep-RCU. Available:
Portland State University, for teaching the classes and engaging in               Articles/371986/ [Viewed May 20, 2015], February 2010.
the long series of conversations that led to this work. This work is         [21] P. E. McKenney. Opwintro-rcu. Available: http://kernelnewbies.
funded by NSF Award No. CNS-1422979.                                              org/OPWIntro-RCU [Viewed May 20, 2015], October 2014.

Portland State University Technical Report 15-01, March 2016             9                                                                     2016/3/28
[22] P. E. McKenney and J. D. Slingwine. Read-copy update: Using
     execution history to solve concurrency problems. In Parallel and
     Distributed Computing and Systems, pages 509–518, Las Vegas, NV,
     October 1998.
[23] P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell,
     D. Sarma, and M. Soni. Read-copy update. In Ottawa Linux Sympo-
     sium, July 2001. URL
[24] P. E. McKenney, S. Boyd-Wickizer, and J. Walpole. RCU usage in the
     linux kernel: One decade later. Technical report paulmck.2013.02.24,
     September 2013. URL http : / / www2 . rdrop . com / ~paulmck /
[25] J. Morris. Recent developments in SELinux kernel performance.
     Available: http : / / www . livejournal . com / users / james \
     _morris/2153.html [Viewed May 20, 2015], December 2004.
[26] T. Schrijvers, S. Peyton Jones, M. Chakravarty, and M. Sulzmann. Type
     checking with open type functions. ACM Sigplan Notices, 43(9):51–62,
[27] J. Tassarotti, D. Dreyer, and V. Vafeiadis. Verifying read-copy-update
     in a logic for weak memory. In ACM Conference on Programming
     Language Design and Implementation (PLDI ’15). ACM, 2015. To
[28] D. Terei, S. Marlow, S. Peyton Jones, and D. Mazières. Safe haskell.
     In Proceedings of the 2012 Haskell Symposium, Haskell ’12, pages
     137–148, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1574-6.
[29] J. Triplett, P. W. Howard, P. E. McKenney, and J. Walpole. Scalable
     correct memory ordering via relativistic programming. 2011. URL\_fac/12/.
[30] J. Triplett, P. E. McKenney, and J. Walpole. Resizable, scalable,
     concurrent hash tables via relativistic programming. In USENIX Annual
     Technical Conference, page 11, 2011.
[31] A. Turon, V. Vafeiadis, and D. Dreyer. GPS: Navigating weak memory
     with ghosts, protocols, and separation. In Proceedings of the 2014 ACM
     International Conference on Object Oriented Programming Systems
     Languages & Applications, pages 691–707. ACM, 2014.
[32] V. Vafeiadis and M. Parkinson. A marriage of rely/guarantee and
     separation logic. In CONCUR 2007–Concurrency Theory, pages 256–
     271. Springer, 2007.
[33] Y. A. Wagh, P. A. Dhangar, T. C. Powar, A. Magar, and P. Mali.
     Lightweight dns for multipurpose and multifunctional devices. IJCSNS,
     13(12):71, 2013.

A.     Appendix
A.1    RP module
Available at

A.2    RPManyListMoveTest
Available at

A.3    test qsbr.c
Available at

Portland State University Technical Report 15-01, March 2016                  10   2016/3/28