^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) Explanation of the Linux-Kernel Memory Consistency Model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) :Author: Alan Stern <stern@rowland.harvard.edu>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) :Created: October 2017
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) .. Contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) 1. INTRODUCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) 2. BACKGROUND
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) 3. A SIMPLE EXAMPLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) 4. A SELECTION OF MEMORY MODELS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) 5. ORDERING AND CYCLES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) 6. EVENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) 7. THE PROGRAM ORDER RELATION: po AND po-loc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) 8. A WARNING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) 9. DEPENDENCY RELATIONS: data, addr, and ctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) 10. THE READS-FROM RELATION: rf, rfi, and rfe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) 11. CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) 12. THE FROM-READS RELATION: fr, fri, and fre
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) 13. AN OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) 14. PROPAGATION ORDER RELATION: cumul-fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) 15. DERIVATION OF THE LKMM FROM THE OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) 16. SEQUENTIAL CONSISTENCY PER VARIABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) 17. ATOMIC UPDATES: rmw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) 18. THE PRESERVED PROGRAM ORDER RELATION: ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) 19. AND THEN THERE WAS ALPHA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) 20. THE HAPPENS-BEFORE RELATION: hb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) 21. THE PROPAGATES-BEFORE RELATION: pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) 22. RCU RELATIONS: rcu-link, rcu-gp, rcu-rscsi, rcu-order, rcu-fence, and rb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) 23. LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) 24. PLAIN ACCESSES AND DATA RACES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) 25. ODDS AND ENDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) INTRODUCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) The Linux-kernel memory consistency model (LKMM) is rather complex and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) obscure. This is particularly evident if you read through the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) linux-kernel.bell and linux-kernel.cat files that make up the formal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) version of the model; they are extremely terse and their meanings are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) far from clear.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) This document describes the ideas underlying the LKMM. It is meant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) for people who want to understand how the model was designed. It does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) not go into the details of the code in the .bell and .cat files;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) rather, it explains in English what the code expresses symbolically.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) toward beginners; they explain what memory consistency models are and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) the basic notions shared by all such models. People already familiar
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) with these concepts can skim or skip over them. Sections 6 (EVENTS)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) through 12 (THE FROM_READS RELATION) describe the fundamental
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) relations used in many models. Starting in Section 13 (AN OPERATIONAL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) MODEL), the workings of the LKMM itself are covered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Warning: The code examples in this document are not written in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) proper format for litmus tests. They don't include a header line, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) initializations are not enclosed in braces, the global variables are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) not passed by pointers, and they don't have an "exists" clause at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) end. Converting them to the right format is left as an exercise for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) BACKGROUND
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) A memory consistency model (or just memory model, for short) is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) something which predicts, given a piece of computer code running on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) particular kind of system, what values may be obtained by the code's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) load instructions. The LKMM makes these predictions for code running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) as part of the Linux kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) In practice, people tend to use memory models the other way around.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) That is, given a piece of code and a collection of values specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) for the loads, the model will predict whether it is possible for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) code to run in such a way that the loads will indeed obtain the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) specified values. Of course, this is just another way of expressing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) the same idea.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) For code running on a uniprocessor system, the predictions are easy:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Each load instruction must obtain the value written by the most recent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) store instruction accessing the same location (we ignore complicating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) factors such as DMA and mixed-size accesses.) But on multiprocessor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) systems, with multiple CPUs making concurrent accesses to shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) memory locations, things aren't so simple.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) Different architectures have differing memory models, and the Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) kernel supports a variety of architectures. The LKMM has to be fairly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) permissive, in the sense that any behavior allowed by one of these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) architectures also has to be allowed by the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) A SIMPLE EXAMPLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) Here is a simple example to illustrate the basic concepts. Consider
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) some code running as part of a device driver for an input device. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) driver might contain an interrupt handler which collects data from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) device, stores it in a buffer, and sets a flag to indicate the buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) is full. Running concurrently on a different CPU might be a part of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) the driver code being executed by a process in the midst of a read(2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) system call. This code tests the flag to see whether the buffer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ready, and if it is, copies the data back to userspace. The buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) and the flag are memory locations shared between the two CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) We can abstract out the important pieces of the driver code as follows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) (the reason for using WRITE_ONCE() and READ_ONCE() instead of simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) assignment statements is discussed later):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) WRITE_ONCE(buf, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) if (r1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) r2 = READ_ONCE(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Here the P0() function represents the interrupt handler running on one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) CPU and P1() represents the read() routine running on another. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) value 1 stored in buf represents input data collected from the device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Thus, P0 stores the data in buf and then sets flag. Meanwhile, P1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) reads flag into the private variable r1, and if it is set, reads the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) data from buf into a second private variable r2 for copying to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) userspace. (Presumably if flag is not set then the driver will wait a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) while and try again.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) This pattern of memory accesses, where one CPU stores values to two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) shared memory locations and another CPU loads from those locations in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) the opposite order, is widely known as the "Message Passing" or MP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) pattern. It is typical of memory access patterns in the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) Please note that this example code is a simplified abstraction. Real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) buffers are usually larger than a single integer, real device drivers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) usually use sleep and wakeup mechanisms rather than polling for I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) completion, and real code generally doesn't bother to copy values into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) private variables before using them. All that is beside the point;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) the idea here is simply to illustrate the overall pattern of memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) accesses by the CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) A memory model will predict what values P1 might obtain for its loads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) from flag and buf, or equivalently, what values r1 and r2 might end up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) with after the code has finished running.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) Some predictions are trivial. For instance, no sane memory model would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) predict that r1 = 42 or r2 = -7, because neither of those values ever
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) gets stored in flag or buf.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) Some nontrivial predictions are nonetheless quite simple. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) instance, P1 might run entirely before P0 begins, in which case r1 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) r2 will both be 0 at the end. Or P0 might run entirely before P1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) begins, in which case r1 and r2 will both be 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) The interesting predictions concern what might happen when the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) routines run concurrently. One possibility is that P1 runs after P0's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) store to buf but before the store to flag. In this case, r1 and r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) will again both be 0. (If P1 had been designed to read buf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) unconditionally then we would instead have r1 = 0 and r2 = 1.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) However, the most interesting possibility is where r1 = 1 and r2 = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) If this were to occur it would mean the driver contains a bug, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) incorrect data would get sent to the user: 0 instead of 1. As it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) happens, the LKMM does predict this outcome can occur, and the example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) driver code shown above is indeed buggy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) A SELECTION OF MEMORY MODELS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) The first widely cited memory model, and the simplest to understand,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) is Sequential Consistency. According to this model, systems behave as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) if each CPU executed its instructions in order but with unspecified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) timing. In other words, the instructions from the various CPUs get
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) interleaved in a nondeterministic way, always according to some single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) global order that agrees with the order of the instructions in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) program source for each CPU. The model says that the value obtained
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) by each load is simply the value written by the most recently executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) store to the same memory location, from any CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) For the MP example code shown above, Sequential Consistency predicts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) that the undesired result r1 = 1, r2 = 0 cannot occur. The reasoning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) goes like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) Since r1 = 1, P0 must store 1 to flag before P1 loads 1 from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) it, as loads can obtain values only from earlier stores.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) P1 loads from flag before loading from buf, since CPUs execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) their instructions in order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) P1 must load 0 from buf before P0 stores 1 to it; otherwise r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) would be 1 since a load obtains its value from the most recent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) store to the same address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) P0 stores 1 to buf before storing 1 to flag, since it executes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) its instructions in order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) Since an instruction (in this case, P0's store to flag) cannot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) execute before itself, the specified outcome is impossible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) However, real computer hardware almost never follows the Sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) Consistency memory model; doing so would rule out too many valuable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) performance optimizations. On ARM and PowerPC architectures, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) instance, the MP example code really does sometimes yield r1 = 1 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) r2 = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) x86 and SPARC follow yet a different memory model: TSO (Total Store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) Ordering). This model predicts that the undesired outcome for the MP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) pattern cannot occur, but in other respects it differs from Sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) Consistency. One example is the Store Buffer (SB) pattern, in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) each CPU stores to its own shared location and then loads from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) other CPU's location:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) int x = 0, y = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) r0 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) Sequential Consistency predicts that the outcome r0 = 0, r1 = 0 is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) impossible. (Exercise: Figure out the reasoning.) But TSO allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) this outcome to occur, and in fact it does sometimes occur on x86 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) SPARC systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) The LKMM was inspired by the memory models followed by PowerPC, ARM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) x86, Alpha, and other architectures. However, it is different in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) detail from each of them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) ORDERING AND CYCLES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) Memory models are all about ordering. Often this is temporal ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) (i.e., the order in which certain events occur) but it doesn't have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) be; consider for example the order of instructions in a program's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) source code. We saw above that Sequential Consistency makes an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) important assumption that CPUs execute instructions in the same order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) as those instructions occur in the code, and there are many other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) instances of ordering playing central roles in memory models.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) The counterpart to ordering is a cycle. Ordering rules out cycles:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) It's not possible to have X ordered before Y, Y ordered before Z, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) Z ordered before X, because this would mean that X is ordered before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) itself. The analysis of the MP example under Sequential Consistency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) involved just such an impossible cycle:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) W: P0 stores 1 to flag executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) X: P1 loads 1 from flag executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) Y: P1 loads 0 from buf executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) Z: P0 stores 1 to buf executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) W: P0 stores 1 to flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) In short, if a memory model requires certain accesses to be ordered,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) and a certain outcome for the loads in a piece of code can happen only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) if those accesses would form a cycle, then the memory model predicts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) that outcome cannot occur.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) The LKMM is defined largely in terms of cycles, as we will see.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) EVENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) ------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) The LKMM does not work directly with the C statements that make up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) kernel source code. Instead it considers the effects of those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) statements in a more abstract form, namely, events. The model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) includes three types of events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) Read events correspond to loads from shared memory, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) calls to READ_ONCE(), smp_load_acquire(), or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) rcu_dereference().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) Write events correspond to stores to shared memory, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) calls to WRITE_ONCE(), smp_store_release(), or atomic_set().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) Fence events correspond to memory barriers (also known as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) fences), such as calls to smp_rmb() or rcu_read_lock().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) These categories are not exclusive; a read or write event can also be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) a fence. This happens with functions like smp_load_acquire() or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) spin_lock(). However, no single event can be both a read and a write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) Atomic read-modify-write accesses, such as atomic_inc() or xchg(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) correspond to a pair of events: a read followed by a write. (The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) write event is omitted for executions where it doesn't occur, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) a cmpxchg() where the comparison fails.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) Other parts of the code, those which do not involve interaction with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) shared memory, do not give rise to events. Thus, arithmetic and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) logical computations, control-flow instructions, or accesses to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) private memory or CPU registers are not of central interest to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) memory model. They only affect the model's predictions indirectly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) For example, an arithmetic computation might determine the value that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) gets stored to a shared memory location (or in the case of an array
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) index, the address where the value gets stored), but the memory model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) is concerned only with the store itself -- its value and its address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) -- not the computation leading up to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) Events in the LKMM can be linked by various relations, which we will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) describe in the following sections. The memory model requires certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) of these relations to be orderings, that is, it requires them not to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) have any cycles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) THE PROGRAM ORDER RELATION: po AND po-loc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) The most important relation between events is program order (po). You
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) can think of it as the order in which statements occur in the source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) code after branches are taken into account and loops have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) unrolled. A better description might be the order in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) instructions are presented to a CPU's execution unit. Thus, we say
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) that X is po-before Y (written as "X ->po Y" in formulas) if X occurs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) before Y in the instruction stream.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) This is inherently a single-CPU relation; two instructions executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) on different CPUs are never linked by po. Also, it is by definition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) an ordering so it cannot have any cycles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) po-loc is a sub-relation of po. It links two memory accesses when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) first comes before the second in program order and they access the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) same memory location (the "-loc" suffix).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) Although this may seem straightforward, there is one subtle aspect to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) program order we need to explain. The LKMM was inspired by low-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) architectural memory models which describe the behavior of machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) code, and it retains their outlook to a considerable extent. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) read, write, and fence events used by the model are close in spirit to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) individual machine instructions. Nevertheless, the LKMM describes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) kernel code written in C, and the mapping from C to machine code can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) be extremely complex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) Optimizing compilers have great freedom in the way they translate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) source code to object code. They are allowed to apply transformations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) that add memory accesses, eliminate accesses, combine them, split them
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) into pieces, or move them around. The use of READ_ONCE(), WRITE_ONCE(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) or one of the other atomic or synchronization primitives prevents a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) large number of compiler optimizations. In particular, it is guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) that the compiler will not remove such accesses from the generated code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) (unless it can prove the accesses will never be executed), it will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) change the order in which they occur in the code (within limits imposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) by the C standard), and it will not introduce extraneous accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) The MP and SB examples above used READ_ONCE() and WRITE_ONCE() rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) than ordinary memory accesses. Thanks to this usage, we can be certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) that in the MP example, the compiler won't reorder P0's write event to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) buf and P0's write event to flag, and similarly for the other shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) memory accesses in the examples.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) Since private variables are not shared between CPUs, they can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) accessed normally without READ_ONCE() or WRITE_ONCE(). In fact, they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) need not even be stored in normal memory at all -- in principle a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) private variable could be stored in a CPU register (hence the convention
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) that these variables have names starting with the letter 'r').
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) A WARNING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) The protections provided by READ_ONCE(), WRITE_ONCE(), and others are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) not perfect; and under some circumstances it is possible for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) compiler to undermine the memory model. Here is an example. Suppose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) both branches of an "if" statement store the same value to the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) location:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) if (r1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) WRITE_ONCE(y, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) ... /* do something */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) } else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) WRITE_ONCE(y, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) ... /* do something else */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) For this code, the LKMM predicts that the load from x will always be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) executed before either of the stores to y. However, a compiler could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) lift the stores out of the conditional, transforming the code into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) something resembling:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) WRITE_ONCE(y, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) if (r1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) ... /* do something */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) } else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) ... /* do something else */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) Given this version of the code, the LKMM would predict that the load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) from x could be executed after the store to y. Thus, the memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) model's original prediction could be invalidated by the compiler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) Another issue arises from the fact that in C, arguments to many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) operators and function calls can be evaluated in any order. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) r1 = f(5) + g(6);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) The object code might call f(5) either before or after g(6); the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) memory model cannot assume there is a fixed program order relation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) between them. (In fact, if the function calls are inlined then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) compiler might even interleave their object code.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) DEPENDENCY RELATIONS: data, addr, and ctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) ------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) We say that two events are linked by a dependency relation when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) execution of the second event depends in some way on a value obtained
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) from memory by the first. The first event must be a read, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) value it obtains must somehow affect what the second event does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) There are three kinds of dependencies: data, address (addr), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) control (ctrl).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) A read and a write event are linked by a data dependency if the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) obtained by the read affects the value stored by the write. As a very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) simple example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) WRITE_ONCE(y, r1 + 5);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) The value stored by the WRITE_ONCE obviously depends on the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) loaded by the READ_ONCE. Such dependencies can wind through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) arbitrarily complicated computations, and a write can depend on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) values of multiple reads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) A read event and another memory access event are linked by an address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) dependency if the value obtained by the read affects the location
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) accessed by the other event. The second event can be either a read or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) a write. Here's another simple example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) int a[20];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) int i;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) r1 = READ_ONCE(i);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) r2 = READ_ONCE(a[r1]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) Here the location accessed by the second READ_ONCE() depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) index value loaded by the first. Pointer indirection also gives rise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) to address dependencies, since the address of a location accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) through a pointer will depend on the value read earlier from that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) Finally, a read event and another memory access event are linked by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) control dependency if the value obtained by the read affects whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) the second event is executed at all. Simple example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) if (r1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) WRITE_ONCE(y, 1984);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) Execution of the WRITE_ONCE() is controlled by a conditional expression
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) which depends on the value obtained by the READ_ONCE(); hence there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) a control dependency from the load to the store.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) It should be pretty obvious that events can only depend on reads that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) come earlier in program order. Symbolically, if we have R ->data X,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) R ->addr X, or R ->ctrl X (where R is a read event), then we must also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) have R ->po X. It wouldn't make sense for a computation to depend
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) somehow on a value that doesn't get loaded from shared memory until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) later in the code!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) THE READS-FROM RELATION: rf, rfi, and rfe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) The reads-from relation (rf) links a write event to a read event when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) the value loaded by the read is the value that was stored by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) write. In colloquial terms, the load "reads from" the store. We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) write W ->rf R to indicate that the load R reads from the store W. We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) further distinguish the cases where the load and the store occur on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) the same CPU (internal reads-from, or rfi) and where they occur on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) different CPUs (external reads-from, or rfe).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) For our purposes, a memory location's initial value is treated as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) though it had been written there by an imaginary initial store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) executes on a separate CPU before the main program runs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) Usage of the rf relation implicitly assumes that loads will always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) read from a single store. It doesn't apply properly in the presence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) of load-tearing, where a load obtains some of its bits from one store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) and some of them from another store. Fortunately, use of READ_ONCE()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) and WRITE_ONCE() will prevent load-tearing; it's not possible to have:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) WRITE_ONCE(x, 0x1234);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) and end up with r1 = 0x1200 (partly from x's initial value and partly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) from the value stored by P0).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) On the other hand, load-tearing is unavoidable when mixed-size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) accesses are used. Consider this example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) union {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) u32 w;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) u16 h[2];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) } x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) WRITE_ONCE(x.h[0], 0x1234);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) WRITE_ONCE(x.h[1], 0x5678);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) r1 = READ_ONCE(x.w);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) If r1 = 0x56781234 (little-endian!) at the end, then P1 must have read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) from both of P0's stores. It is possible to handle mixed-size and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) unaligned accesses in a memory model, but the LKMM currently does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) attempt to do so. It requires all accesses to be properly aligned and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) of the location's actual size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) ------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) Cache coherence is a general principle requiring that in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) multi-processor system, the CPUs must share a consistent view of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) memory contents. Specifically, it requires that for each location in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) shared memory, the stores to that location must form a single global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) ordering which all the CPUs agree on (the coherence order), and this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) ordering must be consistent with the program order for accesses to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) that location.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) To put it another way, for any variable x, the coherence order (co) of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) the stores to x is simply the order in which the stores overwrite one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) another. The imaginary store which establishes x's initial value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) comes first in the coherence order; the store which directly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) overwrites the initial value comes second; the store which overwrites
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) that value comes third, and so on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) You can think of the coherence order as being the order in which the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) stores reach x's location in memory (or if you prefer a more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) hardware-centric view, the order in which the stores get written to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) x's cache line). We write W ->co W' if W comes before W' in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) coherence order, that is, if the value stored by W gets overwritten,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) directly or indirectly, by the value stored by W'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) Coherence order is required to be consistent with program order. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) requirement takes the form of four coherency rules:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) Write-write coherence: If W ->po-loc W' (i.e., W comes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) W' in program order and they access the same location), where W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) and W' are two stores, then W ->co W'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) Write-read coherence: If W ->po-loc R, where W is a store and R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) is a load, then R must read from W or from some other store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) which comes after W in the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) Read-write coherence: If R ->po-loc W, where R is a load and W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) is a store, then the store which R reads from must come before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) W in the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) Read-read coherence: If R ->po-loc R', where R and R' are two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) loads, then either they read from the same store or else the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) store read by R comes before the store read by R' in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) This is sometimes referred to as sequential consistency per variable,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) because it means that the accesses to any single memory location obey
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) the rules of the Sequential Consistency memory model. (According to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) Wikipedia, sequential consistency per variable and cache coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) mean the same thing except that cache coherence includes an extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) requirement that every store eventually becomes visible to every CPU.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) Any reasonable memory model will include cache coherence. Indeed, our
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) expectation of cache coherence is so deeply ingrained that violations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) of its requirements look more like hardware bugs than programming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) errors:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) int x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) WRITE_ONCE(x, 17);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) WRITE_ONCE(x, 23);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) If the final value stored in x after this code ran was 17, you would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) think your computer was broken. It would be a violation of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) write-write coherence rule: Since the store of 23 comes later in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) program order, it must also come later in x's coherence order and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) thus must overwrite the store of 17.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) WRITE_ONCE(x, 666);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) If r1 = 666 at the end, this would violate the read-write coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) rule: The READ_ONCE() load comes before the WRITE_ONCE() store in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) program order, so it must not read from that store but rather from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) coming earlier in the coherence order (in this case, x's initial
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) value).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) WRITE_ONCE(x, 5);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) r2 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) If r1 = 5 (reading from P0's store) and r2 = 0 (reading from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) imaginary store which establishes x's initial value) at the end, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) would violate the read-read coherence rule: The r1 load comes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) the r2 load in program order, so it must not read from a store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) comes later in the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) (As a minor curiosity, if this code had used normal loads instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) READ_ONCE() in P1, on Itanium it sometimes could end up with r1 = 5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) and r2 = 0! This results from parallel execution of the operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) encoded in Itanium's Very-Long-Instruction-Word format, and it is yet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) another motivation for using READ_ONCE() when accessing shared memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) locations.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) Just like the po relation, co is inherently an ordering -- it is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) possible for a store to directly or indirectly overwrite itself! And
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) just like with the rf relation, we distinguish between stores that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) occur on the same CPU (internal coherence order, or coi) and stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) that occur on different CPUs (external coherence order, or coe).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) On the other hand, stores to different memory locations are never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) related by co, just as instructions on different CPUs are never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) related by po. Coherence order is strictly per-location, or if you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) prefer, each location has its own independent coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) THE FROM-READS RELATION: fr, fri, and fre
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) The from-reads relation (fr) can be a little difficult for people to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) grok. It describes the situation where a load reads a value that gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) overwritten by a store. In other words, we have R ->fr W when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) value that R reads is overwritten (directly or indirectly) by W, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) equivalently, when R reads from a store which comes earlier than W in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) WRITE_ONCE(x, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) The value loaded from x will be 0 (assuming cache coherence!), and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) gets overwritten by the value 2. Thus there is an fr link from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) READ_ONCE() to the WRITE_ONCE(). If the code contained any later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) stores to x, there would also be fr links from the READ_ONCE() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) As with rf, rfi, and rfe, we subdivide the fr relation into fri (when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) the load and the store are on the same CPU) and fre (when they are on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) different CPUs).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) Note that the fr relation is determined entirely by the rf and co
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) relations; it is not independent. Given a read event R and a write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) event W for the same location, we will have R ->fr W if and only if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) the write which R reads from is co-before W. In symbols,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) (R ->fr W) := (there exists W' with W' ->rf R and W' ->co W).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) AN OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) The LKMM is based on various operational memory models, meaning that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) the models arise from an abstract view of how a computer system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) operates. Here are the main ideas, as incorporated into the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) The system as a whole is divided into the CPUs and a memory subsystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) The CPUs are responsible for executing instructions (not necessarily
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) in program order), and they communicate with the memory subsystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) For the most part, executing an instruction requires a CPU to perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) only internal operations. However, loads, stores, and fences involve
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) more.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) When CPU C executes a store instruction, it tells the memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) to store a certain value at a certain location. The memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) propagates the store to all the other CPUs as well as to RAM. (As a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) special case, we say that the store propagates to its own CPU at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) time it is executed.) The memory subsystem also determines where the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) store falls in the location's coherence order. In particular, it must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) arrange for the store to be co-later than (i.e., to overwrite) any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) other store to the same location which has already propagated to CPU C.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) When a CPU executes a load instruction R, it first checks to see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) whether there are any as-yet unexecuted store instructions, for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) same location, that come before R in program order. If there are, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) uses the value of the po-latest such store as the value obtained by R,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) and we say that the store's value is forwarded to R. Otherwise, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) CPU asks the memory subsystem for the value to load and we say that R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) is satisfied from memory. The memory subsystem hands back the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) of the co-latest store to the location in question which has already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) propagated to that CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) (In fact, the picture needs to be a little more complicated than this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) CPUs have local caches, and propagating a store to a CPU really means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) propagating it to the CPU's local cache. A local cache can take some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) time to process the stores that it receives, and a store can't be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) to satisfy one of the CPU's loads until it has been processed. On
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) most architectures, the local caches process stores in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) First-In-First-Out order, and consequently the processing delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) doesn't matter for the memory model. But on Alpha, the local caches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) have a partitioned design that results in non-FIFO behavior. We will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764) discuss this in more detail later.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) Note that load instructions may be executed speculatively and may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) restarted under certain circumstances. The memory model ignores these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) premature executions; we simply say that the load executes at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) final time it is forwarded or satisfied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) Executing a fence (or memory barrier) instruction doesn't require a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) CPU to do anything special other than informing the memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) about the fence. However, fences do constrain the way CPUs and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) memory subsystem handle other instructions, in two respects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) First, a fence forces the CPU to execute various instructions in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) program order. Exactly which instructions are ordered depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) type of fence:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) Strong fences, including smp_mb() and synchronize_rcu(), force
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) the CPU to execute all po-earlier instructions before any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) po-later instructions;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784) smp_rmb() forces the CPU to execute all po-earlier loads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) before any po-later loads;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) smp_wmb() forces the CPU to execute all po-earlier stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) before any po-later stores;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) Acquire fences, such as smp_load_acquire(), force the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791) execute the load associated with the fence (e.g., the load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) part of an smp_load_acquire()) before any po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) instructions;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795) Release fences, such as smp_store_release(), force the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) execute all po-earlier instructions before the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) associated with the fence (e.g., the store part of an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) smp_store_release()).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800) Second, some types of fence affect the way the memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) propagates stores. When a fence instruction is executed on CPU C:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803) For each other CPU C', smp_wmb() forces all po-earlier stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804) on C to propagate to C' before any po-later stores do.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806) For each other CPU C', any store which propagates to C before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807) a release fence is executed (including all po-earlier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808) stores executed on C) is forced to propagate to C' before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809) store associated with the release fence does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811) Any store which propagates to C before a strong fence is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812) executed (including all po-earlier stores on C) is forced to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813) propagate to all other CPUs before any instructions po-after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814) the strong fence are executed on C.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816) The propagation ordering enforced by release fences and strong fences
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817) affects stores from other CPUs that propagate to CPU C before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818) fence is executed, as well as stores that are executed on C before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819) fence. We describe this property by saying that release fences and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820) strong fences are A-cumulative. By contrast, smp_wmb() fences are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821) A-cumulative; they only affect the propagation of stores that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822) executed on C before the fence (i.e., those which precede the fence in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823) program order).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825) rcu_read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826) other properties which we discuss later.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829) PROPAGATION ORDER RELATION: cumul-fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832) The fences which affect propagation order (i.e., strong, release, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833) smp_wmb() fences) are collectively referred to as cumul-fences, even
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834) though smp_wmb() isn't A-cumulative. The cumul-fence relation is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835) defined to link memory access events E and F whenever:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837) E and F are both stores on the same CPU and an smp_wmb() fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838) event occurs between them in program order; or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840) F is a release fence and some X comes before F in program order,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841) where either X = E or else E ->rf X; or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) A strong fence event occurs between some X and F in program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844) order, where either X = E or else E ->rf X.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846) The operational model requires that whenever W and W' are both stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847) and W ->cumul-fence W', then W must propagate to any given CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848) before W' does. However, for different CPUs C and C', it does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849) require W to propagate to C before W' propagates to C'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852) DERIVATION OF THE LKMM FROM THE OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853) -------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) The LKMM is derived from the restrictions imposed by the design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856) outlined above. These restrictions involve the necessity of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857) maintaining cache coherence and the fact that a CPU can't operate on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858) value before it knows what that value is, among other things.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860) The formal version of the LKMM is defined by six requirements, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861) axioms:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863) Sequential consistency per variable: This requires that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864) system obey the four coherency rules.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866) Atomicity: This requires that atomic read-modify-write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) operations really are atomic, that is, no other stores can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868) sneak into the middle of such an update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870) Happens-before: This requires that certain instructions are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871) executed in a specific order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873) Propagation: This requires that certain stores propagate to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874) CPUs and to RAM in a specific order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876) Rcu: This requires that RCU read-side critical sections and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877) grace periods obey the rules of RCU, in particular, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878) Grace-Period Guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) Plain-coherence: This requires that plain memory accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) (those not using READ_ONCE(), WRITE_ONCE(), etc.) must obey
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882) the operational model's rules regarding cache coherence.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884) The first and second are quite common; they can be found in many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885) memory models (such as those for C11/C++11). The "happens-before" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886) "propagation" axioms have analogs in other memory models as well. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887) "rcu" and "plain-coherence" axioms are specific to the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889) Each of these axioms is discussed below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892) SEQUENTIAL CONSISTENCY PER VARIABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) According to the principle of cache coherence, the stores to any fixed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896) shared location in memory form a global ordering. We can imagine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897) inserting the loads from that location into this ordering, by placing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898) each load between the store that it reads from and the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899) store. This leaves the relative positions of loads that read from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900) same store unspecified; let's say they are inserted in program order,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901) first for CPU 0, then CPU 1, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903) You can check that the four coherency rules imply that the rf, co, fr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904) and po-loc relations agree with this global ordering; in other words,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905) whenever we have X ->rf Y or X ->co Y or X ->fr Y or X ->po-loc Y, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906) X event comes before the Y event in the global ordering. The LKMM's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907) "coherence" axiom expresses this by requiring the union of these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908) relations not to have any cycles. This means it must not be possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909) to find events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911) X0 -> X1 -> X2 -> ... -> Xn -> X0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913) where each of the links is either rf, co, fr, or po-loc. This has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914) hold if the accesses to the fixed memory location can be ordered as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915) cache coherence demands.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917) Although it is not obvious, it can be shown that the converse is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918) true: This LKMM axiom implies that the four coherency rules are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919) obeyed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922) ATOMIC UPDATES: rmw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) What does it mean to say that a read-modify-write (rmw) update, such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) as atomic_inc(&x), is atomic? It means that the memory location (x in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927) this case) does not get altered between the read and the write events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928) making up the atomic operation. In particular, if two CPUs perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929) atomic_inc(&x) concurrently, it must be guaranteed that the final
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930) value of x will be the initial value plus two. We should never have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931) the following sequence of events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933) CPU 0 loads x obtaining 13;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934) CPU 1 loads x obtaining 13;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935) CPU 0 stores 14 to x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936) CPU 1 stores 14 to x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938) where the final value of x is wrong (14 rather than 15).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) In this example, CPU 0's increment effectively gets lost because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941) occurs in between CPU 1's load and store. To put it another way, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942) problem is that the position of CPU 0's store in x's coherence order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943) is between the store that CPU 1 reads from and the store that CPU 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944) performs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946) The same analysis applies to all atomic update operations. Therefore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947) to enforce atomicity the LKMM requires that atomic updates follow this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948) rule: Whenever R and W are the read and write events composing an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949) atomic read-modify-write and W' is the write event which R reads from,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950) there must not be any stores coming between W' and W in the coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951) order. Equivalently,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953) (R ->rmw W) implies (there is no X with R ->fr X and X ->co W),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955) where the rmw relation links the read and write events making up each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) atomic update. This is what the LKMM's "atomic" axiom says.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959) THE PRESERVED PROGRAM ORDER RELATION: ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962) There are many situations where a CPU is obliged to execute two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963) instructions in program order. We amalgamate them into the ppo (for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964) "preserved program order") relation, which links the po-earlier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965) instruction to the po-later instruction and is thus a sub-relation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966) po.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968) The operational model already includes a description of one such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969) situation: Fences are a source of ppo links. Suppose X and Y are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970) memory accesses with X ->po Y; then the CPU must execute X before Y if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971) any of the following hold:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973) A strong (smp_mb() or synchronize_rcu()) fence occurs between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974) X and Y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976) X and Y are both stores and an smp_wmb() fence occurs between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977) them;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979) X and Y are both loads and an smp_rmb() fence occurs between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980) them;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982) X is also an acquire fence, such as smp_load_acquire();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 984) Y is also a release fence, such as smp_store_release().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 985)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 986) Another possibility, not mentioned earlier but discussed in the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 987) section, is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 988)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 989) X and Y are both loads, X ->addr Y (i.e., there is an address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 990) dependency from X to Y), and X is a READ_ONCE() or an atomic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 991) access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 992)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 993) Dependencies can also cause instructions to be executed in program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 994) order. This is uncontroversial when the second instruction is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 995) store; either a data, address, or control dependency from a load R to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 996) a store W will force the CPU to execute R before W. This is very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 997) simply because the CPU cannot tell the memory subsystem about W's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 998) store before it knows what value should be stored (in the case of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 999) data dependency), what location it should be stored into (in the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) of an address dependency), or whether the store should actually take
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) place (in the case of a control dependency).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) Dependencies to load instructions are more problematic. To begin with,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) there is no such thing as a data dependency to a load. Next, a CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) has no reason to respect a control dependency to a load, because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) can always satisfy the second load speculatively before the first, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) then ignore the result if it turns out that the second load shouldn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) be executed after all. And lastly, the real difficulties begin when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) we consider address dependencies to loads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) To be fair about it, all Linux-supported architectures do execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) loads in program order if there is an address dependency between them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) After all, a CPU cannot ask the memory subsystem to load a value from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) a particular location before it knows what that location is. However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) the split-cache design used by Alpha can cause it to behave in a way
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) that looks as if the loads were executed out of order (see the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) section for more details). The kernel includes a workaround for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) problem when the loads come from READ_ONCE(), and therefore the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) includes address dependencies to loads in the ppo relation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) On the other hand, dependencies can indirectly affect the ordering of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) two loads. This happens when there is a dependency from a load to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) store and a second, po-later load reads from that store:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) R ->dep W ->rfi R',
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) where the dep link can be either an address or a data dependency. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) this situation we know it is possible for the CPU to execute R' before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) W, because it can forward the value that W will store to R'. But it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) cannot execute R' before R, because it cannot forward the value before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) it knows what that value is, or that W and R' do access the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) location. However, if there is merely a control dependency between R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) and W then the CPU can speculatively forward W to R' before executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) R; if the speculation turns out to be wrong then the CPU merely has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) restart or abandon R'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) (In theory, a CPU might forward a store to a load when it runs across
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) an address dependency like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) r1 = READ_ONCE(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) WRITE_ONCE(*r1, 17);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) because it could tell that the store and the second load access the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) same location even before it knows what the location's address is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) However, none of the architectures supported by the Linux kernel do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) this.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) Two memory accesses of the same location must always be executed in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) program order if the second access is a store. Thus, if we have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) R ->po-loc W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) (the po-loc link says that R comes before W in program order and they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) access the same location), the CPU is obliged to execute W after R.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) If it executed W first then the memory subsystem would respond to R's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) read request with the value stored by W (or an even later store), in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) violation of the read-write coherence rule. Similarly, if we had
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) W ->po-loc W'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) and the CPU executed W' before W, then the memory subsystem would put
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) W' before W in the coherence order. It would effectively cause W to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) overwrite W', in violation of the write-write coherence rule.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) (Interestingly, an early ARMv8 memory model, now obsolete, proposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) allowing out-of-order writes like this to occur. The model avoided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) violating the write-write coherence rule by requiring the CPU not to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) send the W write to the memory subsystem at all!)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) AND THEN THERE WAS ALPHA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) As mentioned above, the Alpha architecture is unique in that it does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) not appear to respect address dependencies to loads. This means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) code such as the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) int y = -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) int *ptr = &y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) WRITE_ONCE(ptr, &x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) int *r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) r1 = ptr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) can malfunction on Alpha systems (notice that P1 uses an ordinary load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) to read ptr instead of READ_ONCE()). It is quite possible that r1 = &x
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) and r2 = 0 at the end, in spite of the address dependency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) At first glance this doesn't seem to make sense. We know that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) smp_wmb() forces P0's store to x to propagate to P1 before the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) to ptr does. And since P1 can't execute its second load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) until it knows what location to load from, i.e., after executing its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) first load, the value x = 1 must have propagated to P1 before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) second load executed. So why doesn't r2 end up equal to 1?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) The answer lies in the Alpha's split local caches. Although the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) stores do reach P1's local cache in the proper order, it can happen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) that the first store is processed by a busy part of the cache while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) the second store is processed by an idle part. As a result, the x = 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) value may not become available for P1's CPU to read until after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) ptr = &x value does, leading to the undesirable result above. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) final effect is that even though the two loads really are executed in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) program order, it appears that they aren't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) This could not have happened if the local cache had processed the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) incoming stores in FIFO order. By contrast, other architectures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) maintain at least the appearance of FIFO order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) In practice, this difficulty is solved by inserting a special fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) between P1's two loads when the kernel is compiled for the Alpha
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) architecture. In fact, as of version 4.15, the kernel automatically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125) adds this fence after every READ_ONCE() and atomic load on Alpha. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) effect of the fence is to cause the CPU not to execute any po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) instructions until after the local cache has finished processing all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) the stores it has already received. Thus, if the code was changed to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) int *r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135) r1 = READ_ONCE(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) then we would never get r1 = &x and r2 = 0. By the time P1 executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) its second load, the x = 1 store would already be fully processed by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) the local cache and available for satisfying the read request. Thus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) we have yet another reason why shared data should always be read with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) READ_ONCE() or another synchronization primitive rather than accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) The LKMM requires that smp_rmb(), acquire fences, and strong fences
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) share this property: They do not allow the CPU to execute any po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) instructions (or po-later loads in the case of smp_rmb()) until all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) outstanding stores have been processed by the local cache. In the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) case of a strong fence, the CPU first has to wait for all of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) po-earlier stores to propagate to every other CPU in the system; then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152) it has to wait for the local cache to process all the stores received
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) as of that time -- not just the stores received when the strong fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) began.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156) And of course, none of this matters for any architecture other than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157) Alpha.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160) THE HAPPENS-BEFORE RELATION: hb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) The happens-before relation (hb) links memory accesses that have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) execute in a certain order. hb includes the ppo relation and two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) others, one of which is rfe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) W ->rfe R implies that W and R are on different CPUs. It also means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168) that W's store must have propagated to R's CPU before R executed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169) otherwise R could not have read the value stored by W. Therefore W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170) must have executed before R, and so we have W ->hb R.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172) The equivalent fact need not hold if W ->rfi R (i.e., W and R are on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) the same CPU). As we have already seen, the operational model allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174) W's value to be forwarded to R in such cases, meaning that R may well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175) execute before W does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) It's important to understand that neither coe nor fre is included in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) hb, despite their similarities to rfe. For example, suppose we have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179) W ->coe W'. This means that W and W' are stores to the same location,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) they execute on different CPUs, and W comes before W' in the coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) order (i.e., W' overwrites W). Nevertheless, it is possible for W' to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182) execute before W, because the decision as to which store overwrites
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183) the other is made later by the memory subsystem. When the stores are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184) nearly simultaneous, either one can come out on top. Similarly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185) R ->fre W means that W overwrites the value which R reads, but it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186) doesn't mean that W has to execute after R. All that's necessary is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187) for the memory subsystem not to propagate W to R's CPU until after R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188) has executed, which is possible if W executes shortly before R.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190) The third relation included in hb is like ppo, in that it only links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) events that are on the same CPU. However it is more difficult to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192) explain, because it arises only indirectly from the requirement of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) cache coherence. The relation is called prop, and it links two events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194) on CPU C in situations where a store from some other CPU comes after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) the first event in the coherence order and propagates to C before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) second event executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) This is best explained with some examples. The simplest case looks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201) int x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1212) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1213) WRITE_ONCE(x, 8);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1214) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1216) If r1 = 8 at the end then P0's accesses must have executed in program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1217) order. We can deduce this from the operational model; if P0's load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1218) had executed before its store then the value of the store would have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1219) been forwarded to the load, so r1 would have ended up equal to 1, not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1220) 8. In this case there is a prop link from P0's write event to its read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1221) event, because P1's store came after P0's store in x's coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1222) order, and P1's store propagated to P0 before P0's load executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1223)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1224) An equally simple case involves two loads of the same location that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1225) read from different stores:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1227) int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1228)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1229) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1230) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1231) int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1233) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1234) r2 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1235) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1236)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1237) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1238) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1239) WRITE_ONCE(x, 9);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1240) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1241)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1242) If r1 = 0 and r2 = 9 at the end then P0's accesses must have executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1243) in program order. If the second load had executed before the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1244) then the x = 9 store must have been propagated to P0 before the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1245) load executed, and so r1 would have been 9 rather than 0. In this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1246) case there is a prop link from P0's first read event to its second,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1247) because P1's store overwrote the value read by P0's first load, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1248) P1's store propagated to P0 before P0's second load executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1249)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1250) Less trivial examples of prop all involve fences. Unlike the simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1251) examples above, they can require that some instructions are executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1252) out of program order. This next one should look familiar:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1254) int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1255)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1256) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1257) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1258) WRITE_ONCE(buf, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1259) smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1260) WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1261) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1262)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1263) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1264) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1265) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1266) int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1268) r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1269) r2 = READ_ONCE(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1270) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1272) This is the MP pattern again, with an smp_wmb() fence between the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1273) stores. If r1 = 1 and r2 = 0 at the end then there is a prop link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1274) from P1's second load to its first (backwards!). The reason is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1275) similar to the previous examples: The value P1 loads from buf gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1276) overwritten by P0's store to buf, the fence guarantees that the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1277) to buf will propagate to P1 before the store to flag does, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1278) store to flag propagates to P1 before P1 reads flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1280) The prop link says that in order to obtain the r1 = 1, r2 = 0 result,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1281) P1 must execute its second load before the first. Indeed, if the load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1282) from flag were executed first, then the buf = 1 store would already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1283) have propagated to P1 by the time P1's load from buf executed, so r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1284) would have been 1 at the end, not 0. (The reasoning holds even for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1285) Alpha, although the details are more complicated and we will not go
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1286) into them.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1287)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1288) But what if we put an smp_rmb() fence between P1's loads? The fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1289) would force the two loads to be executed in program order, and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1290) would generate a cycle in the hb relation: The fence would create a ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1291) link (hence an hb link) from the first load to the second, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1292) prop relation would give an hb link from the second load to the first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1293) Since an instruction can't execute before itself, we are forced to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1294) conclude that if an smp_rmb() fence is added, the r1 = 1, r2 = 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1295) outcome is impossible -- as it should be.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1297) The formal definition of the prop relation involves a coe or fre link,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1298) followed by an arbitrary number of cumul-fence links, ending with an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1299) rfe link. You can concoct more exotic examples, containing more than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1300) one fence, although this quickly leads to diminishing returns in terms
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1301) of complexity. For instance, here's an example containing a coe link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1302) followed by two cumul-fences and an rfe link, utilizing the fact that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1303) release fences are A-cumulative:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1305) int x, y, z;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1306)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1307) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1308) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1309) int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1310)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1311) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1312) r0 = READ_ONCE(z);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1313) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1314)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1315) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1316) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1317) WRITE_ONCE(x, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1318) smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1319) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1320) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1322) P2()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1323) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1324) int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1325)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1326) r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1327) smp_store_release(&z, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1328) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1330) If x = 2, r0 = 1, and r2 = 1 after this code runs then there is a prop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1331) link from P0's store to its load. This is because P0's store gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1332) overwritten by P1's store since x = 2 at the end (a coe link), the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1333) smp_wmb() ensures that P1's store to x propagates to P2 before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1334) store to y does (the first cumul-fence), the store to y propagates to P2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1335) before P2's load and store execute, P2's smp_store_release()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1336) guarantees that the stores to x and y both propagate to P0 before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1337) store to z does (the second cumul-fence), and P0's load executes after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1338) store to z has propagated to P0 (an rfe link).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1339)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1340) In summary, the fact that the hb relation links memory access events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1341) in the order they execute means that it must not have cycles. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1342) requirement is the content of the LKMM's "happens-before" axiom.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1343)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1344) The LKMM defines yet another relation connected to times of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1345) instruction execution, but it is not included in hb. It relies on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1346) particular properties of strong fences, which we cover in the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1347) section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1348)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1349)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1350) THE PROPAGATES-BEFORE RELATION: pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1351) ----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1352)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1353) The propagates-before (pb) relation capitalizes on the special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1354) features of strong fences. It links two events E and F whenever some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1355) store is coherence-later than E and propagates to every CPU and to RAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1356) before F executes. The formal definition requires that E be linked to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1357) F via a coe or fre link, an arbitrary number of cumul-fences, an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1358) optional rfe link, a strong fence, and an arbitrary number of hb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1359) links. Let's see how this definition works out.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1360)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1361) Consider first the case where E is a store (implying that the sequence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1362) of links begins with coe). Then there are events W, X, Y, and Z such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1363) that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1364)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1365) E ->coe W ->cumul-fence* X ->rfe? Y ->strong-fence Z ->hb* F,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1366)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1367) where the * suffix indicates an arbitrary number of links of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1368) specified type, and the ? suffix indicates the link is optional (Y may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1369) be equal to X). Because of the cumul-fence links, we know that W will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1370) propagate to Y's CPU before X does, hence before Y executes and hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1371) before the strong fence executes. Because this fence is strong, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1372) know that W will propagate to every CPU and to RAM before Z executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1373) And because of the hb links, we know that Z will execute before F.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1374) Thus W, which comes later than E in the coherence order, will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1375) propagate to every CPU and to RAM before F executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1376)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1377) The case where E is a load is exactly the same, except that the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1378) link in the sequence is fre instead of coe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1379)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1380) The existence of a pb link from E to F implies that E must execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1381) before F. To see why, suppose that F executed first. Then W would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1382) have propagated to E's CPU before E executed. If E was a store, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1383) memory subsystem would then be forced to make E come after W in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1384) coherence order, contradicting the fact that E ->coe W. If E was a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1385) load, the memory subsystem would then be forced to satisfy E's read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1386) request with the value stored by W or an even later store,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1387) contradicting the fact that E ->fre W.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1388)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1389) A good example illustrating how pb works is the SB pattern with strong
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1390) fences:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1391)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1392) int x = 0, y = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1393)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1394) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1395) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1396) int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1397)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1398) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1399) smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1400) r0 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1401) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1402)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1403) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1404) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1405) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1406)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1407) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1408) smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1409) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1410) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1411)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1412) If r0 = 0 at the end then there is a pb link from P0's load to P1's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1413) load: an fre link from P0's load to P1's store (which overwrites the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1414) value read by P0), and a strong fence between P1's store and its load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1415) In this example, the sequences of cumul-fence and hb links are empty.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1416) Note that this pb link is not included in hb as an instance of prop,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1417) because it does not start and end on the same CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1418)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1419) Similarly, if r1 = 0 at the end then there is a pb link from P1's load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1420) to P0's. This means that if both r1 and r2 were 0 there would be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1421) cycle in pb, which is not possible since an instruction cannot execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1422) before itself. Thus, adding smp_mb() fences to the SB pattern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1423) prevents the r0 = 0, r1 = 0 outcome.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1424)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1425) In summary, the fact that the pb relation links events in the order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1426) they execute means that it cannot have cycles. This requirement is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1427) the content of the LKMM's "propagation" axiom.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1428)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1429)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1430) RCU RELATIONS: rcu-link, rcu-gp, rcu-rscsi, rcu-order, rcu-fence, and rb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1431) ------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1432)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1433) RCU (Read-Copy-Update) is a powerful synchronization mechanism. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1434) rests on two concepts: grace periods and read-side critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1435)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1436) A grace period is the span of time occupied by a call to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1437) synchronize_rcu(). A read-side critical section (or just critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1438) section, for short) is a region of code delimited by rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1439) at the start and rcu_read_unlock() at the end. Critical sections can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1440) be nested, although we won't make use of this fact.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1442) As far as memory models are concerned, RCU's main feature is its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1443) Grace-Period Guarantee, which states that a critical section can never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1444) span a full grace period. In more detail, the Guarantee says:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1445)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1446) For any critical section C and any grace period G, at least
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1447) one of the following statements must hold:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1448)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1449) (1) C ends before G does, and in addition, every store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1450) propagates to C's CPU before the end of C must propagate to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1451) every CPU before G ends.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1452)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1453) (2) G starts before C does, and in addition, every store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1454) propagates to G's CPU before the start of G must propagate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1455) to every CPU before C starts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1456)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1457) In particular, it is not possible for a critical section to both start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1458) before and end after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1459)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1460) Here is a simple example of RCU in action:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1461)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1462) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1463)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1464) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1465) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1466) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1467) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1468) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1469) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1470) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1471)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1472) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1473) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1474) int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1475)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1476) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1477) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1478) r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1479) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1480)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1481) The Grace Period Guarantee tells us that when this code runs, it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1482) never end with r1 = 1 and r2 = 0. The reasoning is as follows. r1 = 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1483) means that P0's store to x propagated to P1 before P1 called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1484) synchronize_rcu(), so P0's critical section must have started before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1485) P1's grace period, contrary to part (2) of the Guarantee. On the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1486) other hand, r2 = 0 means that P0's store to y, which occurs before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1487) end of the critical section, did not propagate to P1 before the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1488) the grace period, contrary to part (1). Together the results violate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1489) the Guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1490)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1491) In the kernel's implementations of RCU, the requirements for stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1492) to propagate to every CPU are fulfilled by placing strong fences at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1493) suitable places in the RCU-related code. Thus, if a critical section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1494) starts before a grace period does then the critical section's CPU will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1495) execute an smp_mb() fence after the end of the critical section and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1496) some time before the grace period's synchronize_rcu() call returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1497) And if a critical section ends after a grace period does then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1498) synchronize_rcu() routine will execute an smp_mb() fence at its start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1499) and some time before the critical section's opening rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1500) executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1501)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1502) What exactly do we mean by saying that a critical section "starts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1503) before" or "ends after" a grace period? Some aspects of the meaning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1504) are pretty obvious, as in the example above, but the details aren't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1505) entirely clear. The LKMM formalizes this notion by means of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1506) rcu-link relation. rcu-link encompasses a very general notion of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1507) "before": If E and F are RCU fence events (i.e., rcu_read_lock(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1508) rcu_read_unlock(), or synchronize_rcu()) then among other things,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1509) E ->rcu-link F includes cases where E is po-before some memory-access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1510) event X, F is po-after some memory-access event Y, and we have any of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1511) X ->rfe Y, X ->co Y, or X ->fr Y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1512)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1513) The formal definition of the rcu-link relation is more than a little
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1514) obscure, and we won't give it here. It is closely related to the pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1515) relation, and the details don't matter unless you want to comb through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1516) a somewhat lengthy formal proof. Pretty much all you need to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1517) about rcu-link is the information in the preceding paragraph.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1518)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1519) The LKMM also defines the rcu-gp and rcu-rscsi relations. They bring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1520) grace periods and read-side critical sections into the picture, in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1521) following way:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1522)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1523) E ->rcu-gp F means that E and F are in fact the same event,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1524) and that event is a synchronize_rcu() fence (i.e., a grace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1525) period).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1526)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1527) E ->rcu-rscsi F means that E and F are the rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1528) and rcu_read_lock() fence events delimiting some read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1529) critical section. (The 'i' at the end of the name emphasizes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1530) that this relation is "inverted": It links the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1531) critical section to the start.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1532)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1533) If we think of the rcu-link relation as standing for an extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1534) "before", then X ->rcu-gp Y ->rcu-link Z roughly says that X is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1535) grace period which ends before Z begins. (In fact it covers more than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1536) this, because it also includes cases where some store propagates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1537) Z's CPU before Z begins but doesn't propagate to some other CPU until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1538) after X ends.) Similarly, X ->rcu-rscsi Y ->rcu-link Z says that X is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1539) the end of a critical section which starts before Z begins.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1540)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1541) The LKMM goes on to define the rcu-order relation as a sequence of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1542) rcu-gp and rcu-rscsi links separated by rcu-link links, in which the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1543) number of rcu-gp links is >= the number of rcu-rscsi links. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1544) example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1545)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1546) X ->rcu-gp Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1547)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1548) would imply that X ->rcu-order V, because this sequence contains two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1549) rcu-gp links and one rcu-rscsi link. (It also implies that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1550) X ->rcu-order T and Z ->rcu-order V.) On the other hand:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1551)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1552) X ->rcu-rscsi Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1553)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1554) does not imply X ->rcu-order V, because the sequence contains only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1555) one rcu-gp link but two rcu-rscsi links.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1556)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1557) The rcu-order relation is important because the Grace Period Guarantee
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1558) means that rcu-order links act kind of like strong fences. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1559) particular, E ->rcu-order F implies not only that E begins before F
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1560) ends, but also that any write po-before E will propagate to every CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1561) before any instruction po-after F can execute. (However, it does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1562) imply that E must execute before F; in fact, each synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1563) fence event is linked to itself by rcu-order as a degenerate case.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1564)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1565) To prove this in full generality requires some intellectual effort.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1566) We'll consider just a very simple case:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1567)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1568) G ->rcu-gp W ->rcu-link Z ->rcu-rscsi F.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1569)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1570) This formula means that G and W are the same event (a grace period),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1571) and there are events X, Y and a read-side critical section C such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1572)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1573) 1. G = W is po-before or equal to X;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1574)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1575) 2. X comes "before" Y in some sense (including rfe, co and fr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1576)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1577) 3. Y is po-before Z;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1578)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1579) 4. Z is the rcu_read_unlock() event marking the end of C;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1580)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1581) 5. F is the rcu_read_lock() event marking the start of C.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1582)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1583) From 1 - 4 we deduce that the grace period G ends before the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1584) section C. Then part (2) of the Grace Period Guarantee says not only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1585) that G starts before C does, but also that any write which executes on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1586) G's CPU before G starts must propagate to every CPU before C starts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1587) In particular, the write propagates to every CPU before F finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1588) executing and hence before any instruction po-after F can execute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1589) This sort of reasoning can be extended to handle all the situations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1590) covered by rcu-order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1591)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1592) The rcu-fence relation is a simple extension of rcu-order. While
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1593) rcu-order only links certain fence events (calls to synchronize_rcu(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1594) rcu_read_lock(), or rcu_read_unlock()), rcu-fence links any events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1595) that are separated by an rcu-order link. This is analogous to the way
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1596) the strong-fence relation links events that are separated by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1597) smp_mb() fence event (as mentioned above, rcu-order links act kind of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1598) like strong fences). Written symbolically, X ->rcu-fence Y means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1599) there are fence events E and F such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1600)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1601) X ->po E ->rcu-order F ->po Y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1602)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1603) From the discussion above, we see this implies not only that X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1604) executes before Y, but also (if X is a store) that X propagates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1605) every CPU before Y executes. Thus rcu-fence is sort of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1606) "super-strong" fence: Unlike the original strong fences (smp_mb() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1607) synchronize_rcu()), rcu-fence is able to link events on different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1608) CPUs. (Perhaps this fact should lead us to say that rcu-fence isn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1609) really a fence at all!)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1610)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1611) Finally, the LKMM defines the RCU-before (rb) relation in terms of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1612) rcu-fence. This is done in essentially the same way as the pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1613) relation was defined in terms of strong-fence. We will omit the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1614) details; the end result is that E ->rb F implies E must execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1615) before F, just as E ->pb F does (and for much the same reasons).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1616)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1617) Putting this all together, the LKMM expresses the Grace Period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1618) Guarantee by requiring that the rb relation does not contain a cycle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1619) Equivalently, this "rcu" axiom requires that there are no events E
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1620) and F with E ->rcu-link F ->rcu-order E. Or to put it a third way,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1621) the axiom requires that there are no cycles consisting of rcu-gp and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1622) rcu-rscsi alternating with rcu-link, where the number of rcu-gp links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1623) is >= the number of rcu-rscsi links.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1624)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1625) Justifying the axiom isn't easy, but it is in fact a valid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1626) formalization of the Grace Period Guarantee. We won't attempt to go
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1627) through the detailed argument, but the following analysis gives a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1628) taste of what is involved. Suppose both parts of the Guarantee are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1629) violated: A critical section starts before a grace period, and some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1630) store propagates to the critical section's CPU before the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1631) critical section but doesn't propagate to some other CPU until after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1632) the end of the grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1633)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1634) Putting symbols to these ideas, let L and U be the rcu_read_lock() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1635) rcu_read_unlock() fence events delimiting the critical section in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1636) question, and let S be the synchronize_rcu() fence event for the grace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1637) period. Saying that the critical section starts before S means there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1638) are events Q and R where Q is po-after L (which marks the start of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1639) critical section), Q is "before" R in the sense used by the rcu-link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1640) relation, and R is po-before the grace period S. Thus we have:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1641)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1642) L ->rcu-link S.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1643)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1644) Let W be the store mentioned above, let Y come before the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1645) critical section and witness that W propagates to the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1646) section's CPU by reading from W, and let Z on some arbitrary CPU be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1647) witness that W has not propagated to that CPU, where Z happens after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1648) some event X which is po-after S. Symbolically, this amounts to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1649)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1650) S ->po X ->hb* Z ->fr W ->rf Y ->po U.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1651)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1652) The fr link from Z to W indicates that W has not propagated to Z's CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1653) at the time that Z executes. From this, it can be shown (see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1654) discussion of the rcu-link relation earlier) that S and U are related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1655) by rcu-link:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1656)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1657) S ->rcu-link U.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1658)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1659) Since S is a grace period we have S ->rcu-gp S, and since L and U are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1660) the start and end of the critical section C we have U ->rcu-rscsi L.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1661) From this we obtain:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1662)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1663) S ->rcu-gp S ->rcu-link U ->rcu-rscsi L ->rcu-link S,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1664)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1665) a forbidden cycle. Thus the "rcu" axiom rules out this violation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1666) the Grace Period Guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1667)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1668) For something a little more down-to-earth, let's see how the axiom
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1669) works out in practice. Consider the RCU code example from above, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1670) time with statement labels added:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1671)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1672) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1673)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1674) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1675) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1676) L: rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1677) X: WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1678) Y: WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1679) U: rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1680) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1681)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1682) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1683) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1684) int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1685)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1686) Z: r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1687) S: synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1688) W: r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1689) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1690)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1691)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1692) If r2 = 0 at the end then P0's store at Y overwrites the value that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1693) P1's load at W reads from, so we have W ->fre Y. Since S ->po W and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1694) also Y ->po U, we get S ->rcu-link U. In addition, S ->rcu-gp S
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1695) because S is a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1696)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1697) If r1 = 1 at the end then P1's load at Z reads from P0's store at X,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1698) so we have X ->rfe Z. Together with L ->po X and Z ->po S, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1699) yields L ->rcu-link S. And since L and U are the start and end of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1700) critical section, we have U ->rcu-rscsi L.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1701)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1702) Then U ->rcu-rscsi L ->rcu-link S ->rcu-gp S ->rcu-link U is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1703) forbidden cycle, violating the "rcu" axiom. Hence the outcome is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1704) allowed by the LKMM, as we would expect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1705)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1706) For contrast, let's see what can happen in a more complicated example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1707)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1708) int x, y, z;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1709)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1710) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1711) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1712) int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1713)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1714) L0: rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1715) r0 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1716) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1717) U0: rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1718) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1719)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1720) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1721) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1722) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1723)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1724) r1 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1725) S1: synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1726) WRITE_ONCE(z, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1727) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1728)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1729) P2()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1730) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1731) int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1732)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1733) L2: rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1734) r2 = READ_ONCE(z);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1735) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1736) U2: rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1737) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1738)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1739) If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1740) that U0 ->rcu-rscsi L0 ->rcu-link S1 ->rcu-gp S1 ->rcu-link U2 ->rcu-rscsi
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1741) L2 ->rcu-link U0. However this cycle is not forbidden, because the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1742) sequence of relations contains fewer instances of rcu-gp (one) than of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1743) rcu-rscsi (two). Consequently the outcome is allowed by the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1744) The following instruction timing diagram shows how it might actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1745) occur:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1746)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1747) P0 P1 P2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1748) -------------------- -------------------- --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1749) rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1750) WRITE_ONCE(y, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1751) r1 = READ_ONCE(y)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1752) synchronize_rcu() starts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1753) . rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1754) . WRITE_ONCE(x, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1755) r0 = READ_ONCE(x) .
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1756) rcu_read_unlock() .
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1757) synchronize_rcu() ends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1758) WRITE_ONCE(z, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1759) r2 = READ_ONCE(z)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1760) rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1761)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1762) This requires P0 and P2 to execute their loads and stores out of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1763) program order, but of course they are allowed to do so. And as you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1764) can see, the Grace Period Guarantee is not violated: The critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1765) section in P0 both starts before P1's grace period does and ends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1766) before it does, and the critical section in P2 both starts after P1's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1767) grace period does and ends after it does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1768)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1769) Addendum: The LKMM now supports SRCU (Sleepable Read-Copy-Update) in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1770) addition to normal RCU. The ideas involved are much the same as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1771) above, with new relations srcu-gp and srcu-rscsi added to represent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1772) SRCU grace periods and read-side critical sections. There is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1773) restriction on the srcu-gp and srcu-rscsi links that can appear in an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1774) rcu-order sequence (the srcu-rscsi links must be paired with srcu-gp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1775) links having the same SRCU domain with proper nesting); the details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1776) are relatively unimportant.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1777)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1778)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1779) LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1780) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1781)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1782) The LKMM includes locking. In fact, there is special code for locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1783) in the formal model, added in order to make tools run faster.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1784) However, this special code is intended to be more or less equivalent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1785) to concepts we have already covered. A spinlock_t variable is treated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1786) the same as an int, and spin_lock(&s) is treated almost the same as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1787)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1788) while (cmpxchg_acquire(&s, 0, 1) != 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1789) cpu_relax();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1790)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1791) This waits until s is equal to 0 and then atomically sets it to 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1792) and the read part of the cmpxchg operation acts as an acquire fence.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1793) An alternate way to express the same thing would be:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1794)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1795) r = xchg_acquire(&s, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1796)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1797) along with a requirement that at the end, r = 0. Similarly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1798) spin_trylock(&s) is treated almost the same as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1799)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1800) return !cmpxchg_acquire(&s, 0, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1801)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1802) which atomically sets s to 1 if it is currently equal to 0 and returns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1803) true if it succeeds (the read part of the cmpxchg operation acts as an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1804) acquire fence only if the operation is successful). spin_unlock(&s)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1805) is treated almost the same as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1806)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1807) smp_store_release(&s, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1808)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1809) The "almost" qualifiers above need some explanation. In the LKMM, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1810) store-release in a spin_unlock() and the load-acquire which forms the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1811) first half of the atomic rmw update in a spin_lock() or a successful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1812) spin_trylock() -- we can call these things lock-releases and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1813) lock-acquires -- have two properties beyond those of ordinary releases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1814) and acquires.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1815)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1816) First, when a lock-acquire reads from a lock-release, the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1817) requires that every instruction po-before the lock-release must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1818) execute before any instruction po-after the lock-acquire. This would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1819) naturally hold if the release and acquire operations were on different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1820) CPUs, but the LKMM says it holds even when they are on the same CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1821) For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1822)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1823) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1824) spinlock_t s;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1825)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1826) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1827) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1828) int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1829)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1830) spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1831) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1832) spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1833) spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1834) r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1835) spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1836) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1837)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1838) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1839) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1840) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1841) smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1842) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1843) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1844)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1845) Here the second spin_lock() reads from the first spin_unlock(), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1846) therefore the load of x must execute before the load of y. Thus we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1847) cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1848) MP pattern).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1849)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1850) This requirement does not apply to ordinary release and acquire
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1851) fences, only to lock-related operations. For instance, suppose P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1852) in the example had been written as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1853)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1854) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1855) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1856) int r1, r2, r3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1857)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1858) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1859) smp_store_release(&s, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1860) r3 = smp_load_acquire(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1861) r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1862) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1863)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1864) Then the CPU would be allowed to forward the s = 1 value from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1865) smp_store_release() to the smp_load_acquire(), executing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1866) instructions in the following order:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1867)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1868) r3 = smp_load_acquire(&s); // Obtains r3 = 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1869) r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1870) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1871) smp_store_release(&s, 1); // Value is forwarded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1872)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1873) and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1874)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1875) Second, when a lock-acquire reads from a lock-release, and some other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1876) stores W and W' occur po-before the lock-release and po-after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1877) lock-acquire respectively, the LKMM requires that W must propagate to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1878) each CPU before W' does. For example, consider:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1879)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1880) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1881) spinlock_t x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1882)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1883) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1884) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1885) spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1886) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1887) spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1888) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1889)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1890) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1891) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1892) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1893)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1894) spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1895) r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1896) WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1897) spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1898) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1899)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1900) P2()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1901) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1902) int r2, r3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1903)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1904) r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1905) smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1906) r3 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1907) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1908)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1909) If r1 = 1 at the end then the spin_lock() in P1 must have read from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1910) the spin_unlock() in P0. Hence the store to x must propagate to P2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1911) before the store to y does, so we cannot have r2 = 1 and r3 = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1912)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1913) These two special requirements for lock-release and lock-acquire do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1914) not arise from the operational model. Nevertheless, kernel developers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1915) have come to expect and rely on them because they do hold on all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1916) architectures supported by the Linux kernel, albeit for various
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1917) differing reasons.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1918)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1919)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1920) PLAIN ACCESSES AND DATA RACES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1921) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1922)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1923) In the LKMM, memory accesses such as READ_ONCE(x), atomic_inc(&y),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1924) smp_load_acquire(&z), and so on are collectively referred to as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1925) "marked" accesses, because they are all annotated with special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1926) operations of one kind or another. Ordinary C-language memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1927) accesses such as x or y = 0 are simply called "plain" accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1928)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1929) Early versions of the LKMM had nothing to say about plain accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1930) The C standard allows compilers to assume that the variables affected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1931) by plain accesses are not concurrently read or written by any other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1932) threads or CPUs. This leaves compilers free to implement all manner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1933) of transformations or optimizations of code containing plain accesses,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1934) making such code very difficult for a memory model to handle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1935)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1936) Here is just one example of a possible pitfall:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1937)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1938) int a = 6;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1939) int *x = &a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1940)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1941) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1942) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1943) int *r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1944) int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1945)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1946) r1 = x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1947) if (r1 != NULL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1948) r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1949) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1950)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1951) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1952) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1953) WRITE_ONCE(x, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1954) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1955)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1956) On the face of it, one would expect that when this code runs, the only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1957) possible final values for r2 are 6 and 0, depending on whether or not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1958) P1's store to x propagates to P0 before P0's load from x executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1959) But since P0's load from x is a plain access, the compiler may decide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1960) to carry out the load twice (for the comparison against NULL, then again
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1961) for the READ_ONCE()) and eliminate the temporary variable r1. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1962) object code generated for P0 could therefore end up looking rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1963) like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1964)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1965) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1966) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1967) int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1968)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1969) if (x != NULL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1970) r2 = READ_ONCE(*x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1971) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1972)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1973) And now it is obvious that this code runs the risk of dereferencing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1974) NULL pointer, because P1's store to x might propagate to P0 after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1975) test against NULL has been made but before the READ_ONCE() executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1976) If the original code had said "r1 = READ_ONCE(x)" instead of "r1 = x",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1977) the compiler would not have performed this optimization and there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1978) would be no possibility of a NULL-pointer dereference.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1979)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1980) Given the possibility of transformations like this one, the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1981) doesn't try to predict all possible outcomes of code containing plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1982) accesses. It is instead content to determine whether the code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1983) violates the compiler's assumptions, which would render the ultimate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1984) outcome undefined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1985)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1986) In technical terms, the compiler is allowed to assume that when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1987) program executes, there will not be any data races. A "data race"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1988) occurs when there are two memory accesses such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1989)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1990) 1. they access the same location,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1991)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1992) 2. at least one of them is a store,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1993)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1994) 3. at least one of them is plain,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1995)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1996) 4. they occur on different CPUs (or in different threads on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1997) same CPU), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1998)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1999) 5. they execute concurrently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2000)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2001) In the literature, two accesses are said to "conflict" if they satisfy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2002) 1 and 2 above. We'll go a little farther and say that two accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2003) are "race candidates" if they satisfy 1 - 4. Thus, whether or not two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2004) race candidates actually do race in a given execution depends on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2005) whether they are concurrent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2006)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2007) The LKMM tries to determine whether a program contains race candidates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2008) which may execute concurrently; if it does then the LKMM says there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2009) a potential data race and makes no predictions about the program's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2010) outcome.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2011)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2012) Determining whether two accesses are race candidates is easy; you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2013) see that all the concepts involved in the definition above are already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2014) part of the memory model. The hard part is telling whether they may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2015) execute concurrently. The LKMM takes a conservative attitude,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2016) assuming that accesses may be concurrent unless it can prove they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2017) are not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2018)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2019) If two memory accesses aren't concurrent then one must execute before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2020) the other. Therefore the LKMM decides two accesses aren't concurrent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2021) if they can be connected by a sequence of hb, pb, and rb links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2022) (together referred to as xb, for "executes before"). However, there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2023) are two complicating factors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2024)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2025) If X is a load and X executes before a store Y, then indeed there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2026) no danger of X and Y being concurrent. After all, Y can't have any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2027) effect on the value obtained by X until the memory subsystem has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2028) propagated Y from its own CPU to X's CPU, which won't happen until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2029) some time after Y executes and thus after X executes. But if X is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2030) store, then even if X executes before Y it is still possible that X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2031) will propagate to Y's CPU just as Y is executing. In such a case X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2032) could very well interfere somehow with Y, and we would have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2033) consider X and Y to be concurrent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2034)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2035) Therefore when X is a store, for X and Y to be non-concurrent the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2036) requires not only that X must execute before Y but also that X must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2037) propagate to Y's CPU before Y executes. (Or vice versa, of course, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2038) Y executes before X -- then Y must propagate to X's CPU before X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2039) executes if Y is a store.) This is expressed by the visibility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2040) relation (vis), where X ->vis Y is defined to hold if there is an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2041) intermediate event Z such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2042)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2043) X is connected to Z by a possibly empty sequence of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2044) cumul-fence links followed by an optional rfe link (if none of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2045) these links are present, X and Z are the same event),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2046)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2047) and either:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2048)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2049) Z is connected to Y by a strong-fence link followed by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2050) possibly empty sequence of xb links,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2051)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2052) or:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2053)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2054) Z is on the same CPU as Y and is connected to Y by a possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2055) empty sequence of xb links (again, if the sequence is empty it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2056) means Z and Y are the same event).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2057)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2058) The motivations behind this definition are straightforward:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2059)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2060) cumul-fence memory barriers force stores that are po-before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2061) the barrier to propagate to other CPUs before stores that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2062) po-after the barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2063)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2064) An rfe link from an event W to an event R says that R reads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2065) from W, which certainly means that W must have propagated to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2066) R's CPU before R executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2067)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2068) strong-fence memory barriers force stores that are po-before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2069) the barrier, or that propagate to the barrier's CPU before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2070) barrier executes, to propagate to all CPUs before any events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2071) po-after the barrier can execute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2072)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2073) To see how this works out in practice, consider our old friend, the MP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2074) pattern (with fences and statement labels, but without the conditional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2075) test):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2076)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2077) int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2078)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2079) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2080) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2081) X: WRITE_ONCE(buf, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2082) smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2083) W: WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2084) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2085)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2086) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2087) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2088) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2089) int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2090)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2091) Z: r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2092) smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2093) Y: r2 = READ_ONCE(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2094) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2095)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2096) The smp_wmb() memory barrier gives a cumul-fence link from X to W, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2097) assuming r1 = 1 at the end, there is an rfe link from W to Z. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2098) means that the store to buf must propagate from P0 to P1 before Z
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2099) executes. Next, Z and Y are on the same CPU and the smp_rmb() fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2100) provides an xb link from Z to Y (i.e., it forces Z to execute before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2101) Y). Therefore we have X ->vis Y: X must propagate to Y's CPU before Y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2102) executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2104) The second complicating factor mentioned above arises from the fact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2105) that when we are considering data races, some of the memory accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2106) are plain. Now, although we have not said so explicitly, up to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2107) point most of the relations defined by the LKMM (ppo, hb, prop,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2108) cumul-fence, pb, and so on -- including vis) apply only to marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2109) accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2111) There are good reasons for this restriction. The compiler is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2112) allowed to apply fancy transformations to marked accesses, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2113) consequently each such access in the source code corresponds more or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2114) less directly to a single machine instruction in the object code. But
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2115) plain accesses are a different story; the compiler may combine them,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2116) split them up, duplicate them, eliminate them, invent new ones, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2117) who knows what else. Seeing a plain access in the source code tells
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2118) you almost nothing about what machine instructions will end up in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2119) object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2121) Fortunately, the compiler isn't completely free; it is subject to some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2122) limitations. For one, it is not allowed to introduce a data race into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2123) the object code if the source code does not already contain a data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2124) race (if it could, memory models would be useless and no multithreaded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2125) code would be safe!). For another, it cannot move a plain access past
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2126) a compiler barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2128) A compiler barrier is a kind of fence, but as the name implies, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2129) only affects the compiler; it does not necessarily have any effect on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2130) how instructions are executed by the CPU. In Linux kernel source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2131) code, the barrier() function is a compiler barrier. It doesn't give
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2132) rise directly to any machine instructions in the object code; rather,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2133) it affects how the compiler generates the rest of the object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2134) Given source code like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2136) ... some memory accesses ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2137) barrier();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2138) ... some other memory accesses ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2140) the barrier() function ensures that the machine instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2141) corresponding to the first group of accesses will all end po-before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2142) any machine instructions corresponding to the second group of accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2143) -- even if some of the accesses are plain. (Of course, the CPU may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2144) then execute some of those accesses out of program order, but we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2145) already know how to deal with such issues.) Without the barrier()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2146) there would be no such guarantee; the two groups of accesses could be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2147) intermingled or even reversed in the object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2149) The LKMM doesn't say much about the barrier() function, but it does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2150) require that all fences are also compiler barriers. In addition, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2151) requires that the ordering properties of memory barriers such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2152) smp_rmb() or smp_store_release() apply to plain accesses as well as to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2153) marked accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2155) This is the key to analyzing data races. Consider the MP pattern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2156) again, now using plain accesses for buf:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2158) int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2159)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2160) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2161) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2162) U: buf = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2163) smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2164) X: WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2165) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2166)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2167) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2168) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2169) int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2170) int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2172) Y: r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2173) if (r1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2174) smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2175) V: r2 = buf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2176) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2177) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2179) This program does not contain a data race. Although the U and V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2180) accesses are race candidates, the LKMM can prove they are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2181) concurrent as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2183) The smp_wmb() fence in P0 is both a compiler barrier and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2184) cumul-fence. It guarantees that no matter what hash of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2185) machine instructions the compiler generates for the plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2186) access U, all those instructions will be po-before the fence.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2187) Consequently U's store to buf, no matter how it is carried out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2188) at the machine level, must propagate to P1 before X's store to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2189) flag does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2191) X and Y are both marked accesses. Hence an rfe link from X to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2192) Y is a valid indicator that X propagated to P1 before Y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2193) executed, i.e., X ->vis Y. (And if there is no rfe link then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2194) r1 will be 0, so V will not be executed and ipso facto won't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2195) race with U.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2196)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2197) The smp_rmb() fence in P1 is a compiler barrier as well as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2198) fence. It guarantees that all the machine-level instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2199) corresponding to the access V will be po-after the fence, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2200) therefore any loads among those instructions will execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2201) after the fence does and hence after Y does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2203) Thus U's store to buf is forced to propagate to P1 before V's load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2204) executes (assuming V does execute), ruling out the possibility of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2205) data race between them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2207) This analysis illustrates how the LKMM deals with plain accesses in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2208) general. Suppose R is a plain load and we want to show that R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2209) executes before some marked access E. We can do this by finding a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2210) marked access X such that R and X are ordered by a suitable fence and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2211) X ->xb* E. If E was also a plain access, we would also look for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2212) marked access Y such that X ->xb* Y, and Y and E are ordered by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2213) fence. We describe this arrangement by saying that R is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2214) "post-bounded" by X and E is "pre-bounded" by Y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2216) In fact, we go one step further: Since R is a read, we say that R is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2217) "r-post-bounded" by X. Similarly, E would be "r-pre-bounded" or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2218) "w-pre-bounded" by Y, depending on whether E was a store or a load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2219) This distinction is needed because some fences affect only loads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2220) (i.e., smp_rmb()) and some affect only stores (smp_wmb()); otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2221) the two types of bounds are the same. And as a degenerate case, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2222) say that a marked access pre-bounds and post-bounds itself (e.g., if R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2223) above were a marked load then X could simply be taken to be R itself.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2225) The need to distinguish between r- and w-bounding raises yet another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2226) issue. When the source code contains a plain store, the compiler is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2227) allowed to put plain loads of the same location into the object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2228) For example, given the source code:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2230) x = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2232) the compiler is theoretically allowed to generate object code that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2233) looks like:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2235) if (x != 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2236) x = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2238) thereby adding a load (and possibly replacing the store entirely).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2239) For this reason, whenever the LKMM requires a plain store to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2240) w-pre-bounded or w-post-bounded by a marked access, it also requires
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2241) the store to be r-pre-bounded or r-post-bounded, so as to handle cases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2242) where the compiler adds a load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2244) (This may be overly cautious. We don't know of any examples where a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2245) compiler has augmented a store with a load in this fashion, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2246) Linux kernel developers would probably fight pretty hard to change a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2247) compiler if it ever did this. Still, better safe than sorry.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2248)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2249) Incidentally, the other tranformation -- augmenting a plain load by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2250) adding in a store to the same location -- is not allowed. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2251) because the compiler cannot know whether any other CPUs might perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2252) a concurrent load from that location. Two concurrent loads don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2253) constitute a race (they can't interfere with each other), but a store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2254) does race with a concurrent load. Thus adding a store might create a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2255) data race where one was not already present in the source code,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2256) something the compiler is forbidden to do. Augmenting a store with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2257) load, on the other hand, is acceptable because doing so won't create a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2258) data race unless one already existed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2260) The LKMM includes a second way to pre-bound plain accesses, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2261) addition to fences: an address dependency from a marked load. That
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2262) is, in the sequence:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2264) p = READ_ONCE(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2265) r = *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2266)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2267) the LKMM says that the marked load of ptr pre-bounds the plain load of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2268) *p; the marked load must execute before any of the machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2269) instructions corresponding to the plain load. This is a reasonable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2270) stipulation, since after all, the CPU can't perform the load of *p
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2271) until it knows what value p will hold. Furthermore, without some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2272) assumption like this one, some usages typical of RCU would count as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2273) data races. For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2274)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2275) int a = 1, b;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2276) int *ptr = &a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2278) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2279) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2280) b = 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2281) rcu_assign_pointer(ptr, &b);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2282) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2283)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2284) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2285) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2286) int *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2287) int r;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2289) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2290) p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2291) r = *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2292) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2293) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2294)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2295) (In this example the rcu_read_lock() and rcu_read_unlock() calls don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2296) really do anything, because there aren't any grace periods. They are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2297) included merely for the sake of good form; typically P0 would call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2298) synchronize_rcu() somewhere after the rcu_assign_pointer().)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2299)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2300) rcu_assign_pointer() performs a store-release, so the plain store to b
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2301) is definitely w-post-bounded before the store to ptr, and the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2302) stores will propagate to P1 in that order. However, rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2303) is only equivalent to READ_ONCE(). While it is a marked access, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2304) not a fence or compiler barrier. Hence the only guarantee we have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2305) that the load of ptr in P1 is r-pre-bounded before the load of *p
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2306) (thus avoiding a race) is the assumption about address dependencies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2307)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2308) This is a situation where the compiler can undermine the memory model,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2309) and a certain amount of care is required when programming constructs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2310) like this one. In particular, comparisons between the pointer and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2311) other known addresses can cause trouble. If you have something like:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2312)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2313) p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2314) if (p == &x)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2315) r = *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2317) then the compiler just might generate object code resembling:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2318)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2319) p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2320) if (p == &x)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2321) r = x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2322)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2323) or even:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2325) rtemp = x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2326) p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2327) if (p == &x)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2328) r = rtemp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2330) which would invalidate the memory model's assumption, since the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2331) could now perform the load of x before the load of ptr (there might be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2332) a control dependency but no address dependency at the machine level).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2333)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2334) Finally, it turns out there is a situation in which a plain write does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2335) not need to be w-post-bounded: when it is separated from the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2336) race-candidate access by a fence. At first glance this may seem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2337) impossible. After all, to be race candidates the two accesses must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2338) be on different CPUs, and fences don't link events on different CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2339) Well, normal fences don't -- but rcu-fence can! Here's an example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2340)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2341) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2342)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2343) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2344) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2345) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2346) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2347) y = 3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2348) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2349)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2350) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2351) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2352) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2353) if (READ_ONCE(x) == 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2354) y = 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2355) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2356) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2357)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2358) Do the plain stores to y race? Clearly not if P1 reads a non-zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2359) value for x, so let's assume the READ_ONCE(x) does obtain 0. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2360) means that the read-side critical section in P1 must finish executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2361) before the grace period in P0 does, because RCU's Grace-Period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2362) Guarantee says that otherwise P0's store to x would have propagated to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2363) P1 before the critical section started and so would have been visible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2364) to the READ_ONCE(). (Another way of putting it is that the fre link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2365) from the READ_ONCE() to the WRITE_ONCE() gives rise to an rcu-link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2366) between those two events.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2367)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2368) This means there is an rcu-fence link from P1's "y = 2" store to P0's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2369) "y = 3" store, and consequently the first must propagate from P1 to P0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2370) before the second can execute. Therefore the two stores cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2371) concurrent and there is no race, even though P1's plain store to y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2372) isn't w-post-bounded by any marked accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2374) Putting all this material together yields the following picture. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2375) race-candidate stores W and W', where W ->co W', the LKMM says the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2376) stores don't race if W can be linked to W' by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2377)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2378) w-post-bounded ; vis ; w-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2379)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2380) sequence. If W is plain then they also have to be linked by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2381)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2382) r-post-bounded ; xb* ; w-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2383)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2384) sequence, and if W' is plain then they also have to be linked by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2385)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2386) w-post-bounded ; vis ; r-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2387)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2388) sequence. For race-candidate load R and store W, the LKMM says the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2389) two accesses don't race if R can be linked to W by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2390)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2391) r-post-bounded ; xb* ; w-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2392)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2393) sequence or if W can be linked to R by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2394)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2395) w-post-bounded ; vis ; r-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2396)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2397) sequence. For the cases involving a vis link, the LKMM also accepts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2398) sequences in which W is linked to W' or R by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2399)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2400) strong-fence ; xb* ; {w and/or r}-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2401)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2402) sequence with no post-bounding, and in every case the LKMM also allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2403) the link simply to be a fence with no bounding at all. If no sequence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2404) of the appropriate sort exists, the LKMM says that the accesses race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2405)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2406) There is one more part of the LKMM related to plain accesses (although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2407) not to data races) we should discuss. Recall that many relations such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2408) as hb are limited to marked accesses only. As a result, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2409) happens-before, propagates-before, and rcu axioms (which state that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2410) various relation must not contain a cycle) doesn't apply to plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2411) accesses. Nevertheless, we do want to rule out such cycles, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2412) they don't make sense even for plain accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2413)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2414) To this end, the LKMM imposes three extra restrictions, together
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2415) called the "plain-coherence" axiom because of their resemblance to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2416) rules used by the operational model to ensure cache coherence (that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2417) is, the rules governing the memory subsystem's choice of a store to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2418) satisfy a load request and its determination of where a store will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2419) fall in the coherence order):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2420)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2421) If R and W are race candidates and it is possible to link R to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2422) W by one of the xb* sequences listed above, then W ->rfe R is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2423) not allowed (i.e., a load cannot read from a store that it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2424) executes before, even if one or both is plain).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2425)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2426) If W and R are race candidates and it is possible to link W to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2427) R by one of the vis sequences listed above, then R ->fre W is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2428) not allowed (i.e., if a store is visible to a load then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2429) load must read from that store or one coherence-after it).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2430)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2431) If W and W' are race candidates and it is possible to link W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2432) to W' by one of the vis sequences listed above, then W' ->co W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2433) is not allowed (i.e., if one store is visible to a second then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2434) the second must come after the first in the coherence order).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2435)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2436) This is the extent to which the LKMM deals with plain accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2437) Perhaps it could say more (for example, plain accesses might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2438) contribute to the ppo relation), but at the moment it seems that this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2439) minimal, conservative approach is good enough.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2440)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2442) ODDS AND ENDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2443) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2444)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2445) This section covers material that didn't quite fit anywhere in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2446) earlier sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2447)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2448) The descriptions in this document don't always match the formal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2449) version of the LKMM exactly. For example, the actual formal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2450) definition of the prop relation makes the initial coe or fre part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2451) optional, and it doesn't require the events linked by the relation to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2452) be on the same CPU. These differences are very unimportant; indeed,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2453) instances where the coe/fre part of prop is missing are of no interest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2454) because all the other parts (fences and rfe) are already included in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2455) hb anyway, and where the formal model adds prop into hb, it includes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2456) an explicit requirement that the events being linked are on the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2457) CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2458)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2459) Another minor difference has to do with events that are both memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2460) accesses and fences, such as those corresponding to smp_load_acquire()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2461) calls. In the formal model, these events aren't actually both reads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2462) and fences; rather, they are read events with an annotation marking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2463) them as acquires. (Or write events annotated as releases, in the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2464) smp_store_release().) The final effect is the same.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2465)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2466) Although we didn't mention it above, the instruction execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2467) ordering provided by the smp_rmb() fence doesn't apply to read events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2468) that are part of a non-value-returning atomic update. For instance,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2469) given:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2470)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2471) atomic_inc(&x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2472) smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2473) r1 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2474)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2475) it is not guaranteed that the load from y will execute after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2476) update to x. This is because the ARMv8 architecture allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2477) non-value-returning atomic operations effectively to be executed off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2478) the CPU. Basically, the CPU tells the memory subsystem to increment
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2479) x, and then the increment is carried out by the memory hardware with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2480) no further involvement from the CPU. Since the CPU doesn't ever read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2481) the value of x, there is nothing for the smp_rmb() fence to act on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2482)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2483) The LKMM defines a few extra synchronization operations in terms of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2484) things we have already covered. In particular, rcu_dereference() is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2485) treated as READ_ONCE() and rcu_assign_pointer() is treated as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2486) smp_store_release() -- which is basically how the Linux kernel treats
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2487) them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2488)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2489) Although we said that plain accesses are not linked by the ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2490) relation, they do contribute to it indirectly. Namely, when there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2491) an address dependency from a marked load R to a plain store W,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2492) followed by smp_wmb() and then a marked store W', the LKMM creates a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2493) ppo link from R to W'. The reasoning behind this is perhaps a little
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2494) shaky, but essentially it says there is no way to generate object code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2495) for this source code in which W' could execute before R. Just as with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2496) pre-bounding by address dependencies, it is possible for the compiler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2497) to undermine this relation if sufficient care is not taken.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2498)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2499) There are a few oddball fences which need special treatment:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2500) smp_mb__before_atomic(), smp_mb__after_atomic(), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2501) smp_mb__after_spinlock(). The LKMM uses fence events with special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2502) annotations for them; they act as strong fences just like smp_mb()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2503) except for the sets of events that they order. Instead of ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2504) all po-earlier events against all po-later events, as smp_mb() does,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2505) they behave as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2506)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2507) smp_mb__before_atomic() orders all po-earlier events against
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2508) po-later atomic updates and the events following them;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2509)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2510) smp_mb__after_atomic() orders po-earlier atomic updates and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2511) the events preceding them against all po-later events;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2512)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2513) smp_mb_after_spinlock() orders po-earlier lock acquisition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2514) events and the events preceding them against all po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2515) events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2516)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2517) Interestingly, RCU and locking each introduce the possibility of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2518) deadlock. When faced with code sequences such as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2519)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2520) spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2521) spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2522) spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2523) spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2524)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2525) or:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2526)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2527) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2528) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2529) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2530)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2531) what does the LKMM have to say? Answer: It says there are no allowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2532) executions at all, which makes sense. But this can also lead to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2533) misleading results, because if a piece of code has multiple possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2534) executions, some of which deadlock, the model will report only on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2535) non-deadlocking executions. For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2536)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2537) int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2538)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2539) P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2540) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2541) int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2542)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2543) WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2544) r0 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2545) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2546)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2547) P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2548) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2549) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2550) if (READ_ONCE(x) > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2551) WRITE_ONCE(y, 36);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2552) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2553) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2554) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2555) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2556)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2557) Is it possible to end up with r0 = 36 at the end? The LKMM will tell
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2558) you it is not, but the model won't mention that this is because P1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2559) will self-deadlock in the executions where it stores 36 in y.