Orange Pi5 kernel

^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    1) Explanation of the Linux-Kernel Memory Consistency Model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    4) :Author: Alan Stern <stern@rowland.harvard.edu>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    5) :Created: October 2017
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    7) .. Contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    9)   1. INTRODUCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   10)   2. BACKGROUND
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   11)   3. A SIMPLE EXAMPLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   12)   4. A SELECTION OF MEMORY MODELS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   13)   5. ORDERING AND CYCLES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   14)   6. EVENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   15)   7. THE PROGRAM ORDER RELATION: po AND po-loc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   16)   8. A WARNING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   17)   9. DEPENDENCY RELATIONS: data, addr, and ctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   18)   10. THE READS-FROM RELATION: rf, rfi, and rfe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   19)   11. CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   20)   12. THE FROM-READS RELATION: fr, fri, and fre
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   21)   13. AN OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   22)   14. PROPAGATION ORDER RELATION: cumul-fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   23)   15. DERIVATION OF THE LKMM FROM THE OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   24)   16. SEQUENTIAL CONSISTENCY PER VARIABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   25)   17. ATOMIC UPDATES: rmw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   26)   18. THE PRESERVED PROGRAM ORDER RELATION: ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   27)   19. AND THEN THERE WAS ALPHA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   28)   20. THE HAPPENS-BEFORE RELATION: hb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   29)   21. THE PROPAGATES-BEFORE RELATION: pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   30)   22. RCU RELATIONS: rcu-link, rcu-gp, rcu-rscsi, rcu-order, rcu-fence, and rb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   31)   23. LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   32)   24. PLAIN ACCESSES AND DATA RACES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   33)   25. ODDS AND ENDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   37) INTRODUCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   38) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   40) The Linux-kernel memory consistency model (LKMM) is rather complex and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   41) obscure.  This is particularly evident if you read through the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   42) linux-kernel.bell and linux-kernel.cat files that make up the formal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   43) version of the model; they are extremely terse and their meanings are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   44) far from clear.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   46) This document describes the ideas underlying the LKMM.  It is meant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   47) for people who want to understand how the model was designed.  It does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   48) not go into the details of the code in the .bell and .cat files;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   49) rather, it explains in English what the code expresses symbolically.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   51) Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   52) toward beginners; they explain what memory consistency models are and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   53) the basic notions shared by all such models.  People already familiar
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   54) with these concepts can skim or skip over them.  Sections 6 (EVENTS)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   55) through 12 (THE FROM_READS RELATION) describe the fundamental
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   56) relations used in many models.  Starting in Section 13 (AN OPERATIONAL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   57) MODEL), the workings of the LKMM itself are covered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   59) Warning: The code examples in this document are not written in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   60) proper format for litmus tests.  They don't include a header line, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   61) initializations are not enclosed in braces, the global variables are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   62) not passed by pointers, and they don't have an "exists" clause at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   63) end.  Converting them to the right format is left as an exercise for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   64) the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   67) BACKGROUND
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   68) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   70) A memory consistency model (or just memory model, for short) is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   71) something which predicts, given a piece of computer code running on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   72) particular kind of system, what values may be obtained by the code's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   73) load instructions.  The LKMM makes these predictions for code running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   74) as part of the Linux kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   76) In practice, people tend to use memory models the other way around.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   77) That is, given a piece of code and a collection of values specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   78) for the loads, the model will predict whether it is possible for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   79) code to run in such a way that the loads will indeed obtain the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   80) specified values.  Of course, this is just another way of expressing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   81) the same idea.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   83) For code running on a uniprocessor system, the predictions are easy:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   84) Each load instruction must obtain the value written by the most recent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   85) store instruction accessing the same location (we ignore complicating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   86) factors such as DMA and mixed-size accesses.)  But on multiprocessor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   87) systems, with multiple CPUs making concurrent accesses to shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   88) memory locations, things aren't so simple.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   90) Different architectures have differing memory models, and the Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   91) kernel supports a variety of architectures.  The LKMM has to be fairly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   92) permissive, in the sense that any behavior allowed by one of these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   93) architectures also has to be allowed by the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   96) A SIMPLE EXAMPLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   97) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   99) Here is a simple example to illustrate the basic concepts.  Consider
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  100) some code running as part of a device driver for an input device.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  101) driver might contain an interrupt handler which collects data from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  102) device, stores it in a buffer, and sets a flag to indicate the buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  103) is full.  Running concurrently on a different CPU might be a part of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  104) the driver code being executed by a process in the midst of a read(2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  105) system call.  This code tests the flag to see whether the buffer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  106) ready, and if it is, copies the data back to userspace.  The buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  107) and the flag are memory locations shared between the two CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  109) We can abstract out the important pieces of the driver code as follows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  110) (the reason for using WRITE_ONCE() and READ_ONCE() instead of simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  111) assignment statements is discussed later):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  113) 	int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  115) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  116) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  117) 		WRITE_ONCE(buf, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  118) 		WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  119) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  121) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  122) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  123) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  124) 		int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  126) 		r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  127) 		if (r1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  128) 			r2 = READ_ONCE(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  129) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  131) Here the P0() function represents the interrupt handler running on one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  132) CPU and P1() represents the read() routine running on another.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  133) value 1 stored in buf represents input data collected from the device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  134) Thus, P0 stores the data in buf and then sets flag.  Meanwhile, P1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  135) reads flag into the private variable r1, and if it is set, reads the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  136) data from buf into a second private variable r2 for copying to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  137) userspace.  (Presumably if flag is not set then the driver will wait a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  138) while and try again.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  140) This pattern of memory accesses, where one CPU stores values to two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  141) shared memory locations and another CPU loads from those locations in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  142) the opposite order, is widely known as the "Message Passing" or MP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  143) pattern.  It is typical of memory access patterns in the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  145) Please note that this example code is a simplified abstraction.  Real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  146) buffers are usually larger than a single integer, real device drivers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  147) usually use sleep and wakeup mechanisms rather than polling for I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  148) completion, and real code generally doesn't bother to copy values into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  149) private variables before using them.  All that is beside the point;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  150) the idea here is simply to illustrate the overall pattern of memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  151) accesses by the CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  153) A memory model will predict what values P1 might obtain for its loads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  154) from flag and buf, or equivalently, what values r1 and r2 might end up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  155) with after the code has finished running.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  157) Some predictions are trivial.  For instance, no sane memory model would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  158) predict that r1 = 42 or r2 = -7, because neither of those values ever
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  159) gets stored in flag or buf.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  161) Some nontrivial predictions are nonetheless quite simple.  For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  162) instance, P1 might run entirely before P0 begins, in which case r1 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  163) r2 will both be 0 at the end.  Or P0 might run entirely before P1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  164) begins, in which case r1 and r2 will both be 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  166) The interesting predictions concern what might happen when the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  167) routines run concurrently.  One possibility is that P1 runs after P0's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  168) store to buf but before the store to flag.  In this case, r1 and r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  169) will again both be 0.  (If P1 had been designed to read buf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  170) unconditionally then we would instead have r1 = 0 and r2 = 1.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  172) However, the most interesting possibility is where r1 = 1 and r2 = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  173) If this were to occur it would mean the driver contains a bug, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  174) incorrect data would get sent to the user: 0 instead of 1.  As it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  175) happens, the LKMM does predict this outcome can occur, and the example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  176) driver code shown above is indeed buggy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  179) A SELECTION OF MEMORY MODELS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  180) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  182) The first widely cited memory model, and the simplest to understand,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  183) is Sequential Consistency.  According to this model, systems behave as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  184) if each CPU executed its instructions in order but with unspecified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  185) timing.  In other words, the instructions from the various CPUs get
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  186) interleaved in a nondeterministic way, always according to some single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  187) global order that agrees with the order of the instructions in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  188) program source for each CPU.  The model says that the value obtained
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  189) by each load is simply the value written by the most recently executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  190) store to the same memory location, from any CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  192) For the MP example code shown above, Sequential Consistency predicts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  193) that the undesired result r1 = 1, r2 = 0 cannot occur.  The reasoning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  194) goes like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  196) 	Since r1 = 1, P0 must store 1 to flag before P1 loads 1 from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  197) 	it, as loads can obtain values only from earlier stores.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  199) 	P1 loads from flag before loading from buf, since CPUs execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  200) 	their instructions in order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  202) 	P1 must load 0 from buf before P0 stores 1 to it; otherwise r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  203) 	would be 1 since a load obtains its value from the most recent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  204) 	store to the same address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  206) 	P0 stores 1 to buf before storing 1 to flag, since it executes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  207) 	its instructions in order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  208) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  209) 	Since an instruction (in this case, P0's store to flag) cannot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  210) 	execute before itself, the specified outcome is impossible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  212) However, real computer hardware almost never follows the Sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  213) Consistency memory model; doing so would rule out too many valuable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  214) performance optimizations.  On ARM and PowerPC architectures, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  215) instance, the MP example code really does sometimes yield r1 = 1 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  216) r2 = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  218) x86 and SPARC follow yet a different memory model: TSO (Total Store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  219) Ordering).  This model predicts that the undesired outcome for the MP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  220) pattern cannot occur, but in other respects it differs from Sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  221) Consistency.  One example is the Store Buffer (SB) pattern, in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  222) each CPU stores to its own shared location and then loads from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  223) other CPU's location:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  225) 	int x = 0, y = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  227) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  228) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  229) 		int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  230) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  231) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  232) 		r0 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  233) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  235) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  236) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  237) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  239) 		WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  240) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  241) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  243) Sequential Consistency predicts that the outcome r0 = 0, r1 = 0 is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  244) impossible.  (Exercise: Figure out the reasoning.)  But TSO allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  245) this outcome to occur, and in fact it does sometimes occur on x86 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  246) SPARC systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  248) The LKMM was inspired by the memory models followed by PowerPC, ARM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  249) x86, Alpha, and other architectures.  However, it is different in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  250) detail from each of them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  251) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  253) ORDERING AND CYCLES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  254) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  256) Memory models are all about ordering.  Often this is temporal ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  257) (i.e., the order in which certain events occur) but it doesn't have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  258) be; consider for example the order of instructions in a program's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  259) source code.  We saw above that Sequential Consistency makes an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  260) important assumption that CPUs execute instructions in the same order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  261) as those instructions occur in the code, and there are many other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  262) instances of ordering playing central roles in memory models.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  264) The counterpart to ordering is a cycle.  Ordering rules out cycles:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  265) It's not possible to have X ordered before Y, Y ordered before Z, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  266) Z ordered before X, because this would mean that X is ordered before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  267) itself.  The analysis of the MP example under Sequential Consistency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  268) involved just such an impossible cycle:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  269) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  270) 	W: P0 stores 1 to flag   executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  271) 	X: P1 loads 1 from flag  executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  272) 	Y: P1 loads 0 from buf   executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  273) 	Z: P0 stores 1 to buf    executes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  274) 	W: P0 stores 1 to flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  275) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  276) In short, if a memory model requires certain accesses to be ordered,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  277) and a certain outcome for the loads in a piece of code can happen only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  278) if those accesses would form a cycle, then the memory model predicts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  279) that outcome cannot occur.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  280) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  281) The LKMM is defined largely in terms of cycles, as we will see.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  282) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  283) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  284) EVENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  285) ------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  287) The LKMM does not work directly with the C statements that make up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  288) kernel source code.  Instead it considers the effects of those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  289) statements in a more abstract form, namely, events.  The model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  290) includes three types of events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  292) 	Read events correspond to loads from shared memory, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  293) 	calls to READ_ONCE(), smp_load_acquire(), or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  294) 	rcu_dereference().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  296) 	Write events correspond to stores to shared memory, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  297) 	calls to WRITE_ONCE(), smp_store_release(), or atomic_set().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  299) 	Fence events correspond to memory barriers (also known as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  300) 	fences), such as calls to smp_rmb() or rcu_read_lock().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  301) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  302) These categories are not exclusive; a read or write event can also be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  303) a fence.  This happens with functions like smp_load_acquire() or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  304) spin_lock().  However, no single event can be both a read and a write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  305) Atomic read-modify-write accesses, such as atomic_inc() or xchg(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  306) correspond to a pair of events: a read followed by a write.  (The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  307) write event is omitted for executions where it doesn't occur, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  308) a cmpxchg() where the comparison fails.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  310) Other parts of the code, those which do not involve interaction with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  311) shared memory, do not give rise to events.  Thus, arithmetic and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  312) logical computations, control-flow instructions, or accesses to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  313) private memory or CPU registers are not of central interest to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  314) memory model.  They only affect the model's predictions indirectly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  315) For example, an arithmetic computation might determine the value that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  316) gets stored to a shared memory location (or in the case of an array
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  317) index, the address where the value gets stored), but the memory model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  318) is concerned only with the store itself -- its value and its address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  319) -- not the computation leading up to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  321) Events in the LKMM can be linked by various relations, which we will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  322) describe in the following sections.  The memory model requires certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  323) of these relations to be orderings, that is, it requires them not to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  324) have any cycles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  325) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  327) THE PROGRAM ORDER RELATION: po AND po-loc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  328) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  330) The most important relation between events is program order (po).  You
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  331) can think of it as the order in which statements occur in the source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  332) code after branches are taken into account and loops have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  333) unrolled.  A better description might be the order in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  334) instructions are presented to a CPU's execution unit.  Thus, we say
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  335) that X is po-before Y (written as "X ->po Y" in formulas) if X occurs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  336) before Y in the instruction stream.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  338) This is inherently a single-CPU relation; two instructions executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  339) on different CPUs are never linked by po.  Also, it is by definition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  340) an ordering so it cannot have any cycles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  342) po-loc is a sub-relation of po.  It links two memory accesses when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  343) first comes before the second in program order and they access the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  344) same memory location (the "-loc" suffix).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  345) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  346) Although this may seem straightforward, there is one subtle aspect to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  347) program order we need to explain.  The LKMM was inspired by low-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  348) architectural memory models which describe the behavior of machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  349) code, and it retains their outlook to a considerable extent.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  350) read, write, and fence events used by the model are close in spirit to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  351) individual machine instructions.  Nevertheless, the LKMM describes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  352) kernel code written in C, and the mapping from C to machine code can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  353) be extremely complex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  354) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  355) Optimizing compilers have great freedom in the way they translate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  356) source code to object code.  They are allowed to apply transformations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  357) that add memory accesses, eliminate accesses, combine them, split them
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  358) into pieces, or move them around.  The use of READ_ONCE(), WRITE_ONCE(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  359) or one of the other atomic or synchronization primitives prevents a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  360) large number of compiler optimizations.  In particular, it is guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  361) that the compiler will not remove such accesses from the generated code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  362) (unless it can prove the accesses will never be executed), it will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  363) change the order in which they occur in the code (within limits imposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  364) by the C standard), and it will not introduce extraneous accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  365) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  366) The MP and SB examples above used READ_ONCE() and WRITE_ONCE() rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  367) than ordinary memory accesses.  Thanks to this usage, we can be certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  368) that in the MP example, the compiler won't reorder P0's write event to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  369) buf and P0's write event to flag, and similarly for the other shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  370) memory accesses in the examples.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  371) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  372) Since private variables are not shared between CPUs, they can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  373) accessed normally without READ_ONCE() or WRITE_ONCE().  In fact, they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  374) need not even be stored in normal memory at all -- in principle a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  375) private variable could be stored in a CPU register (hence the convention
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  376) that these variables have names starting with the letter 'r').
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  377) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  379) A WARNING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  380) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  381) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  382) The protections provided by READ_ONCE(), WRITE_ONCE(), and others are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  383) not perfect; and under some circumstances it is possible for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  384) compiler to undermine the memory model.  Here is an example.  Suppose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  385) both branches of an "if" statement store the same value to the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  386) location:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  387) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  388) 	r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  389) 	if (r1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  390) 		WRITE_ONCE(y, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  391) 		...  /* do something */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  392) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  393) 		WRITE_ONCE(y, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  394) 		...  /* do something else */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  395) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  396) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  397) For this code, the LKMM predicts that the load from x will always be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  398) executed before either of the stores to y.  However, a compiler could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  399) lift the stores out of the conditional, transforming the code into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  400) something resembling:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  402) 	r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  403) 	WRITE_ONCE(y, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  404) 	if (r1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  405) 		...  /* do something */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  406) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  407) 		...  /* do something else */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  408) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  409) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  410) Given this version of the code, the LKMM would predict that the load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  411) from x could be executed after the store to y.  Thus, the memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  412) model's original prediction could be invalidated by the compiler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  413) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  414) Another issue arises from the fact that in C, arguments to many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  415) operators and function calls can be evaluated in any order.  For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  416) example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  417) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  418) 	r1 = f(5) + g(6);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  420) The object code might call f(5) either before or after g(6); the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  421) memory model cannot assume there is a fixed program order relation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  422) between them.  (In fact, if the function calls are inlined then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  423) compiler might even interleave their object code.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  425) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  426) DEPENDENCY RELATIONS: data, addr, and ctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  427) ------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  428) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  429) We say that two events are linked by a dependency relation when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  430) execution of the second event depends in some way on a value obtained
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  431) from memory by the first.  The first event must be a read, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  432) value it obtains must somehow affect what the second event does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  433) There are three kinds of dependencies: data, address (addr), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  434) control (ctrl).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  436) A read and a write event are linked by a data dependency if the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  437) obtained by the read affects the value stored by the write.  As a very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  438) simple example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  439) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  440) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  442) 	r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  443) 	WRITE_ONCE(y, r1 + 5);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  444) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  445) The value stored by the WRITE_ONCE obviously depends on the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  446) loaded by the READ_ONCE.  Such dependencies can wind through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  447) arbitrarily complicated computations, and a write can depend on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  448) values of multiple reads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  449) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  450) A read event and another memory access event are linked by an address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  451) dependency if the value obtained by the read affects the location
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  452) accessed by the other event.  The second event can be either a read or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  453) a write.  Here's another simple example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  454) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  455) 	int a[20];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  456) 	int i;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  457) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  458) 	r1 = READ_ONCE(i);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  459) 	r2 = READ_ONCE(a[r1]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  460) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  461) Here the location accessed by the second READ_ONCE() depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  462) index value loaded by the first.  Pointer indirection also gives rise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  463) to address dependencies, since the address of a location accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  464) through a pointer will depend on the value read earlier from that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  465) pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  466) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  467) Finally, a read event and another memory access event are linked by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  468) control dependency if the value obtained by the read affects whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  469) the second event is executed at all.  Simple example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  470) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  471) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  472) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  473) 	r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  474) 	if (r1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  475) 		WRITE_ONCE(y, 1984);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  476) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  477) Execution of the WRITE_ONCE() is controlled by a conditional expression
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  478) which depends on the value obtained by the READ_ONCE(); hence there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  479) a control dependency from the load to the store.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  481) It should be pretty obvious that events can only depend on reads that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  482) come earlier in program order.  Symbolically, if we have R ->data X,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  483) R ->addr X, or R ->ctrl X (where R is a read event), then we must also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  484) have R ->po X.  It wouldn't make sense for a computation to depend
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  485) somehow on a value that doesn't get loaded from shared memory until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  486) later in the code!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  487) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  488) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  489) THE READS-FROM RELATION: rf, rfi, and rfe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  490) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  491) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  492) The reads-from relation (rf) links a write event to a read event when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  493) the value loaded by the read is the value that was stored by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  494) write.  In colloquial terms, the load "reads from" the store.  We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  495) write W ->rf R to indicate that the load R reads from the store W.  We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  496) further distinguish the cases where the load and the store occur on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  497) the same CPU (internal reads-from, or rfi) and where they occur on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  498) different CPUs (external reads-from, or rfe).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  499) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  500) For our purposes, a memory location's initial value is treated as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  501) though it had been written there by an imaginary initial store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  502) executes on a separate CPU before the main program runs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  503) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  504) Usage of the rf relation implicitly assumes that loads will always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  505) read from a single store.  It doesn't apply properly in the presence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  506) of load-tearing, where a load obtains some of its bits from one store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  507) and some of them from another store.  Fortunately, use of READ_ONCE()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  508) and WRITE_ONCE() will prevent load-tearing; it's not possible to have:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  510) 	int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  511) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  512) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  513) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  514) 		WRITE_ONCE(x, 0x1234);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  515) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  517) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  518) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  519) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  520) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  521) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  522) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  523) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  524) and end up with r1 = 0x1200 (partly from x's initial value and partly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  525) from the value stored by P0).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  526) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  527) On the other hand, load-tearing is unavoidable when mixed-size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  528) accesses are used.  Consider this example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  529) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  530) 	union {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  531) 		u32	w;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  532) 		u16	h[2];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  533) 	} x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  535) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  536) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  537) 		WRITE_ONCE(x.h[0], 0x1234);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  538) 		WRITE_ONCE(x.h[1], 0x5678);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  539) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  540) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  541) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  542) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  543) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  544) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  545) 		r1 = READ_ONCE(x.w);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  546) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  547) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  548) If r1 = 0x56781234 (little-endian!) at the end, then P1 must have read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  549) from both of P0's stores.  It is possible to handle mixed-size and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  550) unaligned accesses in a memory model, but the LKMM currently does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  551) attempt to do so.  It requires all accesses to be properly aligned and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  552) of the location's actual size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  553) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  555) CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  556) ------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  557) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  558) Cache coherence is a general principle requiring that in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  559) multi-processor system, the CPUs must share a consistent view of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  560) memory contents.  Specifically, it requires that for each location in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  561) shared memory, the stores to that location must form a single global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  562) ordering which all the CPUs agree on (the coherence order), and this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  563) ordering must be consistent with the program order for accesses to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  564) that location.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  565) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  566) To put it another way, for any variable x, the coherence order (co) of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  567) the stores to x is simply the order in which the stores overwrite one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  568) another.  The imaginary store which establishes x's initial value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  569) comes first in the coherence order; the store which directly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  570) overwrites the initial value comes second; the store which overwrites
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  571) that value comes third, and so on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  572) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  573) You can think of the coherence order as being the order in which the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  574) stores reach x's location in memory (or if you prefer a more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  575) hardware-centric view, the order in which the stores get written to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  576) x's cache line).  We write W ->co W' if W comes before W' in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  577) coherence order, that is, if the value stored by W gets overwritten,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  578) directly or indirectly, by the value stored by W'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  579) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  580) Coherence order is required to be consistent with program order.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  581) requirement takes the form of four coherency rules:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  583) 	Write-write coherence: If W ->po-loc W' (i.e., W comes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  584) 	W' in program order and they access the same location), where W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  585) 	and W' are two stores, then W ->co W'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  586) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  587) 	Write-read coherence: If W ->po-loc R, where W is a store and R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  588) 	is a load, then R must read from W or from some other store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  589) 	which comes after W in the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  590) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  591) 	Read-write coherence: If R ->po-loc W, where R is a load and W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  592) 	is a store, then the store which R reads from must come before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  593) 	W in the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  594) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  595) 	Read-read coherence: If R ->po-loc R', where R and R' are two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  596) 	loads, then either they read from the same store or else the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  597) 	store read by R comes before the store read by R' in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  598) 	coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  599) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  600) This is sometimes referred to as sequential consistency per variable,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  601) because it means that the accesses to any single memory location obey
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  602) the rules of the Sequential Consistency memory model.  (According to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  603) Wikipedia, sequential consistency per variable and cache coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  604) mean the same thing except that cache coherence includes an extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  605) requirement that every store eventually becomes visible to every CPU.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  606) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  607) Any reasonable memory model will include cache coherence.  Indeed, our
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  608) expectation of cache coherence is so deeply ingrained that violations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  609) of its requirements look more like hardware bugs than programming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  610) errors:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  611) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  612) 	int x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  613) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  614) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  615) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  616) 		WRITE_ONCE(x, 17);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  617) 		WRITE_ONCE(x, 23);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  618) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  619) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  620) If the final value stored in x after this code ran was 17, you would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  621) think your computer was broken.  It would be a violation of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  622) write-write coherence rule: Since the store of 23 comes later in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  623) program order, it must also come later in x's coherence order and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  624) thus must overwrite the store of 17.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  625) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  626) 	int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  627) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  628) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  629) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  630) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  631) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  632) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  633) 		WRITE_ONCE(x, 666);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  634) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  635) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  636) If r1 = 666 at the end, this would violate the read-write coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  637) rule: The READ_ONCE() load comes before the WRITE_ONCE() store in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  638) program order, so it must not read from that store but rather from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  639) coming earlier in the coherence order (in this case, x's initial
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  640) value).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  641) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  642) 	int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  643) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  644) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  645) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  646) 		WRITE_ONCE(x, 5);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  647) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  648) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  649) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  650) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  651) 		int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  652) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  653) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  654) 		r2 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  655) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  656) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  657) If r1 = 5 (reading from P0's store) and r2 = 0 (reading from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  658) imaginary store which establishes x's initial value) at the end, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  659) would violate the read-read coherence rule: The r1 load comes before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  660) the r2 load in program order, so it must not read from a store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  661) comes later in the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  662) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  663) (As a minor curiosity, if this code had used normal loads instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  664) READ_ONCE() in P1, on Itanium it sometimes could end up with r1 = 5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  665) and r2 = 0!  This results from parallel execution of the operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  666) encoded in Itanium's Very-Long-Instruction-Word format, and it is yet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  667) another motivation for using READ_ONCE() when accessing shared memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  668) locations.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  669) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  670) Just like the po relation, co is inherently an ordering -- it is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  671) possible for a store to directly or indirectly overwrite itself!  And
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  672) just like with the rf relation, we distinguish between stores that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  673) occur on the same CPU (internal coherence order, or coi) and stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  674) that occur on different CPUs (external coherence order, or coe).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  675) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  676) On the other hand, stores to different memory locations are never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  677) related by co, just as instructions on different CPUs are never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  678) related by po.  Coherence order is strictly per-location, or if you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  679) prefer, each location has its own independent coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  680) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  681) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  682) THE FROM-READS RELATION: fr, fri, and fre
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  683) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  684) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  685) The from-reads relation (fr) can be a little difficult for people to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  686) grok.  It describes the situation where a load reads a value that gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  687) overwritten by a store.  In other words, we have R ->fr W when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  688) value that R reads is overwritten (directly or indirectly) by W, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  689) equivalently, when R reads from a store which comes earlier than W in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  690) the coherence order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  691) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  692) For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  693) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  694) 	int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  695) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  696) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  697) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  698) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  699) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  700) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  701) 		WRITE_ONCE(x, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  702) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  703) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  704) The value loaded from x will be 0 (assuming cache coherence!), and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  705) gets overwritten by the value 2.  Thus there is an fr link from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  706) READ_ONCE() to the WRITE_ONCE().  If the code contained any later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  707) stores to x, there would also be fr links from the READ_ONCE() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  708) them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  709) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  710) As with rf, rfi, and rfe, we subdivide the fr relation into fri (when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  711) the load and the store are on the same CPU) and fre (when they are on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  712) different CPUs).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  713) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  714) Note that the fr relation is determined entirely by the rf and co
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  715) relations; it is not independent.  Given a read event R and a write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  716) event W for the same location, we will have R ->fr W if and only if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  717) the write which R reads from is co-before W.  In symbols,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  718) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  719) 	(R ->fr W) := (there exists W' with W' ->rf R and W' ->co W).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  721) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  722) AN OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  723) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  724) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  725) The LKMM is based on various operational memory models, meaning that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  726) the models arise from an abstract view of how a computer system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  727) operates.  Here are the main ideas, as incorporated into the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  728) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  729) The system as a whole is divided into the CPUs and a memory subsystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  730) The CPUs are responsible for executing instructions (not necessarily
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  731) in program order), and they communicate with the memory subsystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  732) For the most part, executing an instruction requires a CPU to perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  733) only internal operations.  However, loads, stores, and fences involve
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  734) more.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  735) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  736) When CPU C executes a store instruction, it tells the memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  737) to store a certain value at a certain location.  The memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  738) propagates the store to all the other CPUs as well as to RAM.  (As a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  739) special case, we say that the store propagates to its own CPU at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  740) time it is executed.)  The memory subsystem also determines where the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  741) store falls in the location's coherence order.  In particular, it must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  742) arrange for the store to be co-later than (i.e., to overwrite) any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  743) other store to the same location which has already propagated to CPU C.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  744) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  745) When a CPU executes a load instruction R, it first checks to see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  746) whether there are any as-yet unexecuted store instructions, for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  747) same location, that come before R in program order.  If there are, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  748) uses the value of the po-latest such store as the value obtained by R,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  749) and we say that the store's value is forwarded to R.  Otherwise, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  750) CPU asks the memory subsystem for the value to load and we say that R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  751) is satisfied from memory.  The memory subsystem hands back the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  752) of the co-latest store to the location in question which has already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  753) propagated to that CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  754) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  755) (In fact, the picture needs to be a little more complicated than this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  756) CPUs have local caches, and propagating a store to a CPU really means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  757) propagating it to the CPU's local cache.  A local cache can take some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  758) time to process the stores that it receives, and a store can't be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  759) to satisfy one of the CPU's loads until it has been processed.  On
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  760) most architectures, the local caches process stores in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  761) First-In-First-Out order, and consequently the processing delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  762) doesn't matter for the memory model.  But on Alpha, the local caches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  763) have a partitioned design that results in non-FIFO behavior.  We will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  764) discuss this in more detail later.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  765) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  766) Note that load instructions may be executed speculatively and may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  767) restarted under certain circumstances.  The memory model ignores these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  768) premature executions; we simply say that the load executes at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  769) final time it is forwarded or satisfied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  770) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  771) Executing a fence (or memory barrier) instruction doesn't require a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  772) CPU to do anything special other than informing the memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  773) about the fence.  However, fences do constrain the way CPUs and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  774) memory subsystem handle other instructions, in two respects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  775) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  776) First, a fence forces the CPU to execute various instructions in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  777) program order.  Exactly which instructions are ordered depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  778) type of fence:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  779) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  780) 	Strong fences, including smp_mb() and synchronize_rcu(), force
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  781) 	the CPU to execute all po-earlier instructions before any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  782) 	po-later instructions;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  783) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  784) 	smp_rmb() forces the CPU to execute all po-earlier loads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  785) 	before any po-later loads;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  786) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  787) 	smp_wmb() forces the CPU to execute all po-earlier stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  788) 	before any po-later stores;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  789) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  790) 	Acquire fences, such as smp_load_acquire(), force the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  791) 	execute the load associated with the fence (e.g., the load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  792) 	part of an smp_load_acquire()) before any po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  793) 	instructions;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  794) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  795) 	Release fences, such as smp_store_release(), force the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  796) 	execute all po-earlier instructions before the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  797) 	associated with the fence (e.g., the store part of an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  798) 	smp_store_release()).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  799) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  800) Second, some types of fence affect the way the memory subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  801) propagates stores.  When a fence instruction is executed on CPU C:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  802) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  803) 	For each other CPU C', smp_wmb() forces all po-earlier stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  804) 	on C to propagate to C' before any po-later stores do.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  805) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  806) 	For each other CPU C', any store which propagates to C before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  807) 	a release fence is executed (including all po-earlier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  808) 	stores executed on C) is forced to propagate to C' before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  809) 	store associated with the release fence does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  810) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  811) 	Any store which propagates to C before a strong fence is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  812) 	executed (including all po-earlier stores on C) is forced to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  813) 	propagate to all other CPUs before any instructions po-after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  814) 	the strong fence are executed on C.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  815) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  816) The propagation ordering enforced by release fences and strong fences
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  817) affects stores from other CPUs that propagate to CPU C before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  818) fence is executed, as well as stores that are executed on C before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  819) fence.  We describe this property by saying that release fences and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  820) strong fences are A-cumulative.  By contrast, smp_wmb() fences are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  821) A-cumulative; they only affect the propagation of stores that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  822) executed on C before the fence (i.e., those which precede the fence in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  823) program order).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  824) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  825) rcu_read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  826) other properties which we discuss later.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  827) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  828) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  829) PROPAGATION ORDER RELATION: cumul-fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  830) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  831) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  832) The fences which affect propagation order (i.e., strong, release, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  833) smp_wmb() fences) are collectively referred to as cumul-fences, even
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  834) though smp_wmb() isn't A-cumulative.  The cumul-fence relation is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  835) defined to link memory access events E and F whenever:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  836) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  837) 	E and F are both stores on the same CPU and an smp_wmb() fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  838) 	event occurs between them in program order; or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  839) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  840) 	F is a release fence and some X comes before F in program order,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  841) 	where either X = E or else E ->rf X; or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  842) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  843) 	A strong fence event occurs between some X and F in program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  844) 	order, where either X = E or else E ->rf X.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  845) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  846) The operational model requires that whenever W and W' are both stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  847) and W ->cumul-fence W', then W must propagate to any given CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  848) before W' does.  However, for different CPUs C and C', it does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  849) require W to propagate to C before W' propagates to C'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  850) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  851) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  852) DERIVATION OF THE LKMM FROM THE OPERATIONAL MODEL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  853) -------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  854) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  855) The LKMM is derived from the restrictions imposed by the design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  856) outlined above.  These restrictions involve the necessity of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  857) maintaining cache coherence and the fact that a CPU can't operate on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  858) value before it knows what that value is, among other things.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  859) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  860) The formal version of the LKMM is defined by six requirements, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  861) axioms:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  862) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  863) 	Sequential consistency per variable: This requires that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  864) 	system obey the four coherency rules.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  865) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  866) 	Atomicity: This requires that atomic read-modify-write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  867) 	operations really are atomic, that is, no other stores can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  868) 	sneak into the middle of such an update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  869) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  870) 	Happens-before: This requires that certain instructions are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  871) 	executed in a specific order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  872) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  873) 	Propagation: This requires that certain stores propagate to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  874) 	CPUs and to RAM in a specific order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  875) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  876) 	Rcu: This requires that RCU read-side critical sections and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  877) 	grace periods obey the rules of RCU, in particular, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  878) 	Grace-Period Guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  879) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  880) 	Plain-coherence: This requires that plain memory accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  881) 	(those not using READ_ONCE(), WRITE_ONCE(), etc.) must obey
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  882) 	the operational model's rules regarding cache coherence.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  883) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  884) The first and second are quite common; they can be found in many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  885) memory models (such as those for C11/C++11).  The "happens-before" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  886) "propagation" axioms have analogs in other memory models as well.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  887) "rcu" and "plain-coherence" axioms are specific to the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  888) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  889) Each of these axioms is discussed below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  890) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  891) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  892) SEQUENTIAL CONSISTENCY PER VARIABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  893) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  894) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  895) According to the principle of cache coherence, the stores to any fixed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  896) shared location in memory form a global ordering.  We can imagine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  897) inserting the loads from that location into this ordering, by placing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  898) each load between the store that it reads from and the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  899) store.  This leaves the relative positions of loads that read from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  900) same store unspecified; let's say they are inserted in program order,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  901) first for CPU 0, then CPU 1, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  902) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  903) You can check that the four coherency rules imply that the rf, co, fr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  904) and po-loc relations agree with this global ordering; in other words,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  905) whenever we have X ->rf Y or X ->co Y or X ->fr Y or X ->po-loc Y, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  906) X event comes before the Y event in the global ordering.  The LKMM's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  907) "coherence" axiom expresses this by requiring the union of these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  908) relations not to have any cycles.  This means it must not be possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  909) to find events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  910) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  911) 	X0 -> X1 -> X2 -> ... -> Xn -> X0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  912) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  913) where each of the links is either rf, co, fr, or po-loc.  This has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  914) hold if the accesses to the fixed memory location can be ordered as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  915) cache coherence demands.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  916) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  917) Although it is not obvious, it can be shown that the converse is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  918) true: This LKMM axiom implies that the four coherency rules are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  919) obeyed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  920) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  921) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  922) ATOMIC UPDATES: rmw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  923) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  924) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  925) What does it mean to say that a read-modify-write (rmw) update, such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  926) as atomic_inc(&x), is atomic?  It means that the memory location (x in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  927) this case) does not get altered between the read and the write events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  928) making up the atomic operation.  In particular, if two CPUs perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  929) atomic_inc(&x) concurrently, it must be guaranteed that the final
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  930) value of x will be the initial value plus two.  We should never have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  931) the following sequence of events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  932) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  933) 	CPU 0 loads x obtaining 13;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  934) 					CPU 1 loads x obtaining 13;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  935) 	CPU 0 stores 14 to x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  936) 					CPU 1 stores 14 to x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  937) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  938) where the final value of x is wrong (14 rather than 15).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  939) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  940) In this example, CPU 0's increment effectively gets lost because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  941) occurs in between CPU 1's load and store.  To put it another way, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  942) problem is that the position of CPU 0's store in x's coherence order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  943) is between the store that CPU 1 reads from and the store that CPU 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  944) performs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  945) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  946) The same analysis applies to all atomic update operations.  Therefore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  947) to enforce atomicity the LKMM requires that atomic updates follow this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  948) rule: Whenever R and W are the read and write events composing an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  949) atomic read-modify-write and W' is the write event which R reads from,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  950) there must not be any stores coming between W' and W in the coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  951) order.  Equivalently,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  952) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  953) 	(R ->rmw W) implies (there is no X with R ->fr X and X ->co W),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  954) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  955) where the rmw relation links the read and write events making up each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  956) atomic update.  This is what the LKMM's "atomic" axiom says.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  957) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  958) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  959) THE PRESERVED PROGRAM ORDER RELATION: ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  960) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  961) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  962) There are many situations where a CPU is obliged to execute two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  963) instructions in program order.  We amalgamate them into the ppo (for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  964) "preserved program order") relation, which links the po-earlier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  965) instruction to the po-later instruction and is thus a sub-relation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  966) po.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  967) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  968) The operational model already includes a description of one such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  969) situation: Fences are a source of ppo links.  Suppose X and Y are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  970) memory accesses with X ->po Y; then the CPU must execute X before Y if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  971) any of the following hold:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  972) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  973) 	A strong (smp_mb() or synchronize_rcu()) fence occurs between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  974) 	X and Y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  975) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  976) 	X and Y are both stores and an smp_wmb() fence occurs between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  977) 	them;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  978) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  979) 	X and Y are both loads and an smp_rmb() fence occurs between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  980) 	them;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  981) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  982) 	X is also an acquire fence, such as smp_load_acquire();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  983) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  984) 	Y is also a release fence, such as smp_store_release().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  985) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  986) Another possibility, not mentioned earlier but discussed in the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  987) section, is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  988) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  989) 	X and Y are both loads, X ->addr Y (i.e., there is an address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  990) 	dependency from X to Y), and X is a READ_ONCE() or an atomic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  991) 	access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  992) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  993) Dependencies can also cause instructions to be executed in program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  994) order.  This is uncontroversial when the second instruction is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  995) store; either a data, address, or control dependency from a load R to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  996) a store W will force the CPU to execute R before W.  This is very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  997) simply because the CPU cannot tell the memory subsystem about W's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  998) store before it knows what value should be stored (in the case of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  999) data dependency), what location it should be stored into (in the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) of an address dependency), or whether the store should actually take
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) place (in the case of a control dependency).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) Dependencies to load instructions are more problematic.  To begin with,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) there is no such thing as a data dependency to a load.  Next, a CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) has no reason to respect a control dependency to a load, because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) can always satisfy the second load speculatively before the first, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) then ignore the result if it turns out that the second load shouldn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) be executed after all.  And lastly, the real difficulties begin when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) we consider address dependencies to loads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) To be fair about it, all Linux-supported architectures do execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) loads in program order if there is an address dependency between them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) After all, a CPU cannot ask the memory subsystem to load a value from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) a particular location before it knows what that location is.  However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) the split-cache design used by Alpha can cause it to behave in a way
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) that looks as if the loads were executed out of order (see the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) section for more details).  The kernel includes a workaround for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) problem when the loads come from READ_ONCE(), and therefore the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) includes address dependencies to loads in the ppo relation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) On the other hand, dependencies can indirectly affect the ordering of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) two loads.  This happens when there is a dependency from a load to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) store and a second, po-later load reads from that store:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) 	R ->dep W ->rfi R',
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) where the dep link can be either an address or a data dependency.  In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) this situation we know it is possible for the CPU to execute R' before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) W, because it can forward the value that W will store to R'.  But it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) cannot execute R' before R, because it cannot forward the value before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) it knows what that value is, or that W and R' do access the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) location.  However, if there is merely a control dependency between R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) and W then the CPU can speculatively forward W to R' before executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) R; if the speculation turns out to be wrong then the CPU merely has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) restart or abandon R'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) (In theory, a CPU might forward a store to a load when it runs across
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) an address dependency like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) 	r1 = READ_ONCE(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) 	WRITE_ONCE(*r1, 17);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) 	r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) because it could tell that the store and the second load access the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) same location even before it knows what the location's address is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) However, none of the architectures supported by the Linux kernel do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) this.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) Two memory accesses of the same location must always be executed in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) program order if the second access is a store.  Thus, if we have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) 	R ->po-loc W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) (the po-loc link says that R comes before W in program order and they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) access the same location), the CPU is obliged to execute W after R.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) If it executed W first then the memory subsystem would respond to R's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) read request with the value stored by W (or an even later store), in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) violation of the read-write coherence rule.  Similarly, if we had
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) 	W ->po-loc W'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) and the CPU executed W' before W, then the memory subsystem would put
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) W' before W in the coherence order.  It would effectively cause W to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) overwrite W', in violation of the write-write coherence rule.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) (Interestingly, an early ARMv8 memory model, now obsolete, proposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) allowing out-of-order writes like this to occur.  The model avoided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) violating the write-write coherence rule by requiring the CPU not to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) send the W write to the memory subsystem at all!)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) AND THEN THERE WAS ALPHA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) As mentioned above, the Alpha architecture is unique in that it does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) not appear to respect address dependencies to loads.  This means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) code such as the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) 	int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) 	int y = -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) 	int *ptr = &y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) 		smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) 		WRITE_ONCE(ptr, &x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) 		int *r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) 		int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) 		r1 = ptr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) 		r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) can malfunction on Alpha systems (notice that P1 uses an ordinary load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) to read ptr instead of READ_ONCE()).  It is quite possible that r1 = &x
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) and r2 = 0 at the end, in spite of the address dependency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) At first glance this doesn't seem to make sense.  We know that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) smp_wmb() forces P0's store to x to propagate to P1 before the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) to ptr does.  And since P1 can't execute its second load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) until it knows what location to load from, i.e., after executing its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) first load, the value x = 1 must have propagated to P1 before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) second load executed.  So why doesn't r2 end up equal to 1?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) The answer lies in the Alpha's split local caches.  Although the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) stores do reach P1's local cache in the proper order, it can happen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) that the first store is processed by a busy part of the cache while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) the second store is processed by an idle part.  As a result, the x = 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) value may not become available for P1's CPU to read until after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) ptr = &x value does, leading to the undesirable result above.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) final effect is that even though the two loads really are executed in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) program order, it appears that they aren't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) This could not have happened if the local cache had processed the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) incoming stores in FIFO order.  By contrast, other architectures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) maintain at least the appearance of FIFO order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) In practice, this difficulty is solved by inserting a special fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) between P1's two loads when the kernel is compiled for the Alpha
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) architecture.  In fact, as of version 4.15, the kernel automatically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125) adds this fence after every READ_ONCE() and atomic load on Alpha.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) effect of the fence is to cause the CPU not to execute any po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) instructions until after the local cache has finished processing all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) the stores it has already received.  Thus, if the code was changed to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) 		int *r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) 		int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135) 		r1 = READ_ONCE(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) 		r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) then we would never get r1 = &x and r2 = 0.  By the time P1 executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) its second load, the x = 1 store would already be fully processed by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) the local cache and available for satisfying the read request.  Thus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) we have yet another reason why shared data should always be read with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) READ_ONCE() or another synchronization primitive rather than accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) The LKMM requires that smp_rmb(), acquire fences, and strong fences
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) share this property: They do not allow the CPU to execute any po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) instructions (or po-later loads in the case of smp_rmb()) until all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) outstanding stores have been processed by the local cache.  In the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) case of a strong fence, the CPU first has to wait for all of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) po-earlier stores to propagate to every other CPU in the system; then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152) it has to wait for the local cache to process all the stores received
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) as of that time -- not just the stores received when the strong fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) began.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156) And of course, none of this matters for any architecture other than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157) Alpha.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160) THE HAPPENS-BEFORE RELATION: hb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) The happens-before relation (hb) links memory accesses that have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) execute in a certain order.  hb includes the ppo relation and two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) others, one of which is rfe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) W ->rfe R implies that W and R are on different CPUs.  It also means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168) that W's store must have propagated to R's CPU before R executed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169) otherwise R could not have read the value stored by W.  Therefore W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170) must have executed before R, and so we have W ->hb R.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172) The equivalent fact need not hold if W ->rfi R (i.e., W and R are on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) the same CPU).  As we have already seen, the operational model allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174) W's value to be forwarded to R in such cases, meaning that R may well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175) execute before W does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) It's important to understand that neither coe nor fre is included in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) hb, despite their similarities to rfe.  For example, suppose we have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179) W ->coe W'.  This means that W and W' are stores to the same location,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) they execute on different CPUs, and W comes before W' in the coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) order (i.e., W' overwrites W).  Nevertheless, it is possible for W' to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182) execute before W, because the decision as to which store overwrites
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183) the other is made later by the memory subsystem.  When the stores are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184) nearly simultaneous, either one can come out on top.  Similarly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185) R ->fre W means that W overwrites the value which R reads, but it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186) doesn't mean that W has to execute after R.  All that's necessary is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187) for the memory subsystem not to propagate W to R's CPU until after R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188) has executed, which is possible if W executes shortly before R.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190) The third relation included in hb is like ppo, in that it only links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) events that are on the same CPU.  However it is more difficult to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192) explain, because it arises only indirectly from the requirement of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) cache coherence.  The relation is called prop, and it links two events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194) on CPU C in situations where a store from some other CPU comes after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) the first event in the coherence order and propagates to C before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) second event executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) This is best explained with some examples.  The simplest case looks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201) 	int x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1212) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1213) 		WRITE_ONCE(x, 8);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1214) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1216) If r1 = 8 at the end then P0's accesses must have executed in program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1217) order.  We can deduce this from the operational model; if P0's load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1218) had executed before its store then the value of the store would have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1219) been forwarded to the load, so r1 would have ended up equal to 1, not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1220) 8.  In this case there is a prop link from P0's write event to its read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1221) event, because P1's store came after P0's store in x's coherence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1222) order, and P1's store propagated to P0 before P0's load executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1223) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1224) An equally simple case involves two loads of the same location that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1225) read from different stores:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1227) 	int x = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1229) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1230) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1231) 		int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1233) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1234) 		r2 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1235) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1237) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1238) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1239) 		WRITE_ONCE(x, 9);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1240) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1241) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1242) If r1 = 0 and r2 = 9 at the end then P0's accesses must have executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1243) in program order.  If the second load had executed before the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1244) then the x = 9 store must have been propagated to P0 before the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1245) load executed, and so r1 would have been 9 rather than 0.  In this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1246) case there is a prop link from P0's first read event to its second,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1247) because P1's store overwrote the value read by P0's first load, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1248) P1's store propagated to P0 before P0's second load executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1250) Less trivial examples of prop all involve fences.  Unlike the simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1251) examples above, they can require that some instructions are executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1252) out of program order.  This next one should look familiar:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1253) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1254) 	int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1256) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1257) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1258) 		WRITE_ONCE(buf, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1259) 		smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1260) 		WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1261) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1262) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1263) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1264) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1265) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1266) 		int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1268) 		r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1269) 		r2 = READ_ONCE(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1270) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1272) This is the MP pattern again, with an smp_wmb() fence between the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1273) stores.  If r1 = 1 and r2 = 0 at the end then there is a prop link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1274) from P1's second load to its first (backwards!).  The reason is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1275) similar to the previous examples: The value P1 loads from buf gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1276) overwritten by P0's store to buf, the fence guarantees that the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1277) to buf will propagate to P1 before the store to flag does, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1278) store to flag propagates to P1 before P1 reads flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1280) The prop link says that in order to obtain the r1 = 1, r2 = 0 result,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1281) P1 must execute its second load before the first.  Indeed, if the load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1282) from flag were executed first, then the buf = 1 store would already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1283) have propagated to P1 by the time P1's load from buf executed, so r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1284) would have been 1 at the end, not 0.  (The reasoning holds even for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1285) Alpha, although the details are more complicated and we will not go
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1286) into them.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1287) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1288) But what if we put an smp_rmb() fence between P1's loads?  The fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1289) would force the two loads to be executed in program order, and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1290) would generate a cycle in the hb relation: The fence would create a ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1291) link (hence an hb link) from the first load to the second, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1292) prop relation would give an hb link from the second load to the first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1293) Since an instruction can't execute before itself, we are forced to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1294) conclude that if an smp_rmb() fence is added, the r1 = 1, r2 = 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1295) outcome is impossible -- as it should be.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1296) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1297) The formal definition of the prop relation involves a coe or fre link,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1298) followed by an arbitrary number of cumul-fence links, ending with an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1299) rfe link.  You can concoct more exotic examples, containing more than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1300) one fence, although this quickly leads to diminishing returns in terms
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1301) of complexity.  For instance, here's an example containing a coe link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1302) followed by two cumul-fences and an rfe link, utilizing the fact that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1303) release fences are A-cumulative:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1305) 	int x, y, z;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1306) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1307) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1308) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1309) 		int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1310) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1311) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1312) 		r0 = READ_ONCE(z);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1313) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1314) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1315) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1316) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1317) 		WRITE_ONCE(x, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1318) 		smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1319) 		WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1320) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1321) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1322) 	P2()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1323) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1324) 		int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1325) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1326) 		r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1327) 		smp_store_release(&z, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1328) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1330) If x = 2, r0 = 1, and r2 = 1 after this code runs then there is a prop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1331) link from P0's store to its load.  This is because P0's store gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1332) overwritten by P1's store since x = 2 at the end (a coe link), the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1333) smp_wmb() ensures that P1's store to x propagates to P2 before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1334) store to y does (the first cumul-fence), the store to y propagates to P2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1335) before P2's load and store execute, P2's smp_store_release()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1336) guarantees that the stores to x and y both propagate to P0 before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1337) store to z does (the second cumul-fence), and P0's load executes after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1338) store to z has propagated to P0 (an rfe link).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1339) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1340) In summary, the fact that the hb relation links memory access events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1341) in the order they execute means that it must not have cycles.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1342) requirement is the content of the LKMM's "happens-before" axiom.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1344) The LKMM defines yet another relation connected to times of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1345) instruction execution, but it is not included in hb.  It relies on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1346) particular properties of strong fences, which we cover in the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1347) section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1348) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1350) THE PROPAGATES-BEFORE RELATION: pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1351) ----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1353) The propagates-before (pb) relation capitalizes on the special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1354) features of strong fences.  It links two events E and F whenever some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1355) store is coherence-later than E and propagates to every CPU and to RAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1356) before F executes.  The formal definition requires that E be linked to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1357) F via a coe or fre link, an arbitrary number of cumul-fences, an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1358) optional rfe link, a strong fence, and an arbitrary number of hb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1359) links.  Let's see how this definition works out.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1361) Consider first the case where E is a store (implying that the sequence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1362) of links begins with coe).  Then there are events W, X, Y, and Z such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1363) that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1365) 	E ->coe W ->cumul-fence* X ->rfe? Y ->strong-fence Z ->hb* F,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1366) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1367) where the * suffix indicates an arbitrary number of links of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1368) specified type, and the ? suffix indicates the link is optional (Y may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1369) be equal to X).  Because of the cumul-fence links, we know that W will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1370) propagate to Y's CPU before X does, hence before Y executes and hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1371) before the strong fence executes.  Because this fence is strong, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1372) know that W will propagate to every CPU and to RAM before Z executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1373) And because of the hb links, we know that Z will execute before F.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1374) Thus W, which comes later than E in the coherence order, will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1375) propagate to every CPU and to RAM before F executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1376) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1377) The case where E is a load is exactly the same, except that the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1378) link in the sequence is fre instead of coe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1379) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1380) The existence of a pb link from E to F implies that E must execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1381) before F.  To see why, suppose that F executed first.  Then W would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1382) have propagated to E's CPU before E executed.  If E was a store, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1383) memory subsystem would then be forced to make E come after W in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1384) coherence order, contradicting the fact that E ->coe W.  If E was a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1385) load, the memory subsystem would then be forced to satisfy E's read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1386) request with the value stored by W or an even later store,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1387) contradicting the fact that E ->fre W.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1389) A good example illustrating how pb works is the SB pattern with strong
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1390) fences:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1391) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1392) 	int x = 0, y = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1393) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1394) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1395) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1396) 		int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1397) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1398) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1399) 		smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1400) 		r0 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1401) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1402) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1403) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1404) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1405) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1406) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1407) 		WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1408) 		smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1409) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1410) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1411) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1412) If r0 = 0 at the end then there is a pb link from P0's load to P1's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1413) load: an fre link from P0's load to P1's store (which overwrites the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1414) value read by P0), and a strong fence between P1's store and its load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1415) In this example, the sequences of cumul-fence and hb links are empty.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1416) Note that this pb link is not included in hb as an instance of prop,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1417) because it does not start and end on the same CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1418) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1419) Similarly, if r1 = 0 at the end then there is a pb link from P1's load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1420) to P0's.  This means that if both r1 and r2 were 0 there would be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1421) cycle in pb, which is not possible since an instruction cannot execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1422) before itself.  Thus, adding smp_mb() fences to the SB pattern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1423) prevents the r0 = 0, r1 = 0 outcome.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1425) In summary, the fact that the pb relation links events in the order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1426) they execute means that it cannot have cycles.  This requirement is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1427) the content of the LKMM's "propagation" axiom.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1428) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1430) RCU RELATIONS: rcu-link, rcu-gp, rcu-rscsi, rcu-order, rcu-fence, and rb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1431) ------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1432) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1433) RCU (Read-Copy-Update) is a powerful synchronization mechanism.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1434) rests on two concepts: grace periods and read-side critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1436) A grace period is the span of time occupied by a call to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1437) synchronize_rcu().  A read-side critical section (or just critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1438) section, for short) is a region of code delimited by rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1439) at the start and rcu_read_unlock() at the end.  Critical sections can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1440) be nested, although we won't make use of this fact.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1442) As far as memory models are concerned, RCU's main feature is its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1443) Grace-Period Guarantee, which states that a critical section can never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1444) span a full grace period.  In more detail, the Guarantee says:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1445) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1446) 	For any critical section C and any grace period G, at least
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1447) 	one of the following statements must hold:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1448) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1449) (1)	C ends before G does, and in addition, every store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1450) 	propagates to C's CPU before the end of C must propagate to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1451) 	every CPU before G ends.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1452) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1453) (2)	G starts before C does, and in addition, every store that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1454) 	propagates to G's CPU before the start of G must propagate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1455) 	to every CPU before C starts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1456) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1457) In particular, it is not possible for a critical section to both start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1458) before and end after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1459) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1460) Here is a simple example of RCU in action:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1461) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1462) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1463) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1464) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1465) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1466) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1467) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1468) 		WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1469) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1470) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1471) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1472) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1473) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1474) 		int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1475) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1476) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1477) 		synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1478) 		r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1479) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1481) The Grace Period Guarantee tells us that when this code runs, it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1482) never end with r1 = 1 and r2 = 0.  The reasoning is as follows.  r1 = 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1483) means that P0's store to x propagated to P1 before P1 called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1484) synchronize_rcu(), so P0's critical section must have started before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1485) P1's grace period, contrary to part (2) of the Guarantee.  On the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1486) other hand, r2 = 0 means that P0's store to y, which occurs before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1487) end of the critical section, did not propagate to P1 before the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1488) the grace period, contrary to part (1).  Together the results violate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1489) the Guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1490) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1491) In the kernel's implementations of RCU, the requirements for stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1492) to propagate to every CPU are fulfilled by placing strong fences at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1493) suitable places in the RCU-related code.  Thus, if a critical section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1494) starts before a grace period does then the critical section's CPU will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1495) execute an smp_mb() fence after the end of the critical section and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1496) some time before the grace period's synchronize_rcu() call returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1497) And if a critical section ends after a grace period does then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1498) synchronize_rcu() routine will execute an smp_mb() fence at its start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1499) and some time before the critical section's opening rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1500) executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1501) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1502) What exactly do we mean by saying that a critical section "starts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1503) before" or "ends after" a grace period?  Some aspects of the meaning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1504) are pretty obvious, as in the example above, but the details aren't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1505) entirely clear.  The LKMM formalizes this notion by means of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1506) rcu-link relation.  rcu-link encompasses a very general notion of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1507) "before": If E and F are RCU fence events (i.e., rcu_read_lock(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1508) rcu_read_unlock(), or synchronize_rcu()) then among other things,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1509) E ->rcu-link F includes cases where E is po-before some memory-access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1510) event X, F is po-after some memory-access event Y, and we have any of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1511) X ->rfe Y, X ->co Y, or X ->fr Y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1512) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1513) The formal definition of the rcu-link relation is more than a little
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1514) obscure, and we won't give it here.  It is closely related to the pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1515) relation, and the details don't matter unless you want to comb through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1516) a somewhat lengthy formal proof.  Pretty much all you need to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1517) about rcu-link is the information in the preceding paragraph.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1518) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1519) The LKMM also defines the rcu-gp and rcu-rscsi relations.  They bring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1520) grace periods and read-side critical sections into the picture, in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1521) following way:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1522) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1523) 	E ->rcu-gp F means that E and F are in fact the same event,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1524) 	and that event is a synchronize_rcu() fence (i.e., a grace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1525) 	period).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1526) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1527) 	E ->rcu-rscsi F means that E and F are the rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1528) 	and rcu_read_lock() fence events delimiting some read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1529) 	critical section.  (The 'i' at the end of the name emphasizes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1530) 	that this relation is "inverted": It links the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1531) 	critical section to the start.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1532) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1533) If we think of the rcu-link relation as standing for an extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1534) "before", then X ->rcu-gp Y ->rcu-link Z roughly says that X is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1535) grace period which ends before Z begins.  (In fact it covers more than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1536) this, because it also includes cases where some store propagates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1537) Z's CPU before Z begins but doesn't propagate to some other CPU until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1538) after X ends.)  Similarly, X ->rcu-rscsi Y ->rcu-link Z says that X is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1539) the end of a critical section which starts before Z begins.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1540) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1541) The LKMM goes on to define the rcu-order relation as a sequence of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1542) rcu-gp and rcu-rscsi links separated by rcu-link links, in which the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1543) number of rcu-gp links is >= the number of rcu-rscsi links.  For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1544) example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1545) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1546) 	X ->rcu-gp Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1547) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1548) would imply that X ->rcu-order V, because this sequence contains two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1549) rcu-gp links and one rcu-rscsi link.  (It also implies that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1550) X ->rcu-order T and Z ->rcu-order V.)  On the other hand:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1551) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1552) 	X ->rcu-rscsi Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1553) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1554) does not imply X ->rcu-order V, because the sequence contains only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1555) one rcu-gp link but two rcu-rscsi links.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1556) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1557) The rcu-order relation is important because the Grace Period Guarantee
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1558) means that rcu-order links act kind of like strong fences.  In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1559) particular, E ->rcu-order F implies not only that E begins before F
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1560) ends, but also that any write po-before E will propagate to every CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1561) before any instruction po-after F can execute.  (However, it does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1562) imply that E must execute before F; in fact, each synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1563) fence event is linked to itself by rcu-order as a degenerate case.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1564) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1565) To prove this in full generality requires some intellectual effort.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1566) We'll consider just a very simple case:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1567) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1568) 	G ->rcu-gp W ->rcu-link Z ->rcu-rscsi F.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1569) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1570) This formula means that G and W are the same event (a grace period),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1571) and there are events X, Y and a read-side critical section C such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1572) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1573) 	1. G = W is po-before or equal to X;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1574) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1575) 	2. X comes "before" Y in some sense (including rfe, co and fr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1576) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1577) 	3. Y is po-before Z;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1578) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1579) 	4. Z is the rcu_read_unlock() event marking the end of C;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1580) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1581) 	5. F is the rcu_read_lock() event marking the start of C.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1583) From 1 - 4 we deduce that the grace period G ends before the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1584) section C.  Then part (2) of the Grace Period Guarantee says not only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1585) that G starts before C does, but also that any write which executes on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1586) G's CPU before G starts must propagate to every CPU before C starts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1587) In particular, the write propagates to every CPU before F finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1588) executing and hence before any instruction po-after F can execute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1589) This sort of reasoning can be extended to handle all the situations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1590) covered by rcu-order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1591) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1592) The rcu-fence relation is a simple extension of rcu-order.  While
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1593) rcu-order only links certain fence events (calls to synchronize_rcu(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1594) rcu_read_lock(), or rcu_read_unlock()), rcu-fence links any events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1595) that are separated by an rcu-order link.  This is analogous to the way
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1596) the strong-fence relation links events that are separated by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1597) smp_mb() fence event (as mentioned above, rcu-order links act kind of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1598) like strong fences).  Written symbolically, X ->rcu-fence Y means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1599) there are fence events E and F such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1600) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1601) 	X ->po E ->rcu-order F ->po Y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1602) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1603) From the discussion above, we see this implies not only that X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1604) executes before Y, but also (if X is a store) that X propagates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1605) every CPU before Y executes.  Thus rcu-fence is sort of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1606) "super-strong" fence: Unlike the original strong fences (smp_mb() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1607) synchronize_rcu()), rcu-fence is able to link events on different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1608) CPUs.  (Perhaps this fact should lead us to say that rcu-fence isn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1609) really a fence at all!)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1610) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1611) Finally, the LKMM defines the RCU-before (rb) relation in terms of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1612) rcu-fence.  This is done in essentially the same way as the pb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1613) relation was defined in terms of strong-fence.  We will omit the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1614) details; the end result is that E ->rb F implies E must execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1615) before F, just as E ->pb F does (and for much the same reasons).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1616) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1617) Putting this all together, the LKMM expresses the Grace Period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1618) Guarantee by requiring that the rb relation does not contain a cycle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1619) Equivalently, this "rcu" axiom requires that there are no events E
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1620) and F with E ->rcu-link F ->rcu-order E.  Or to put it a third way,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1621) the axiom requires that there are no cycles consisting of rcu-gp and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1622) rcu-rscsi alternating with rcu-link, where the number of rcu-gp links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1623) is >= the number of rcu-rscsi links.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1624) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1625) Justifying the axiom isn't easy, but it is in fact a valid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1626) formalization of the Grace Period Guarantee.  We won't attempt to go
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1627) through the detailed argument, but the following analysis gives a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1628) taste of what is involved.  Suppose both parts of the Guarantee are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1629) violated: A critical section starts before a grace period, and some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1630) store propagates to the critical section's CPU before the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1631) critical section but doesn't propagate to some other CPU until after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1632) the end of the grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1633) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1634) Putting symbols to these ideas, let L and U be the rcu_read_lock() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1635) rcu_read_unlock() fence events delimiting the critical section in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1636) question, and let S be the synchronize_rcu() fence event for the grace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1637) period.  Saying that the critical section starts before S means there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1638) are events Q and R where Q is po-after L (which marks the start of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1639) critical section), Q is "before" R in the sense used by the rcu-link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1640) relation, and R is po-before the grace period S.  Thus we have:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1641) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1642) 	L ->rcu-link S.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1643) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1644) Let W be the store mentioned above, let Y come before the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1645) critical section and witness that W propagates to the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1646) section's CPU by reading from W, and let Z on some arbitrary CPU be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1647) witness that W has not propagated to that CPU, where Z happens after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1648) some event X which is po-after S.  Symbolically, this amounts to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1649) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1650) 	S ->po X ->hb* Z ->fr W ->rf Y ->po U.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1651) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1652) The fr link from Z to W indicates that W has not propagated to Z's CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1653) at the time that Z executes.  From this, it can be shown (see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1654) discussion of the rcu-link relation earlier) that S and U are related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1655) by rcu-link:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1656) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1657) 	S ->rcu-link U.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1658) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1659) Since S is a grace period we have S ->rcu-gp S, and since L and U are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1660) the start and end of the critical section C we have U ->rcu-rscsi L.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1661) From this we obtain:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1662) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1663) 	S ->rcu-gp S ->rcu-link U ->rcu-rscsi L ->rcu-link S,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1664) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1665) a forbidden cycle.  Thus the "rcu" axiom rules out this violation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1666) the Grace Period Guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1667) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1668) For something a little more down-to-earth, let's see how the axiom
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1669) works out in practice.  Consider the RCU code example from above, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1670) time with statement labels added:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1671) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1672) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1673) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1674) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1675) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1676) 		L: rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1677) 		X: WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1678) 		Y: WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1679) 		U: rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1680) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1681) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1682) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1683) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1684) 		int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1685) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1686) 		Z: r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1687) 		S: synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1688) 		W: r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1689) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1690) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1691) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1692) If r2 = 0 at the end then P0's store at Y overwrites the value that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1693) P1's load at W reads from, so we have W ->fre Y.  Since S ->po W and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1694) also Y ->po U, we get S ->rcu-link U.  In addition, S ->rcu-gp S
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1695) because S is a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1696) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1697) If r1 = 1 at the end then P1's load at Z reads from P0's store at X,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1698) so we have X ->rfe Z.  Together with L ->po X and Z ->po S, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1699) yields L ->rcu-link S.  And since L and U are the start and end of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1700) critical section, we have U ->rcu-rscsi L.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1701) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1702) Then U ->rcu-rscsi L ->rcu-link S ->rcu-gp S ->rcu-link U is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1703) forbidden cycle, violating the "rcu" axiom.  Hence the outcome is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1704) allowed by the LKMM, as we would expect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1705) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1706) For contrast, let's see what can happen in a more complicated example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1707) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1708) 	int x, y, z;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1709) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1710) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1711) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1712) 		int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1713) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1714) 		L0: rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1715) 		    r0 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1716) 		    WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1717) 		U0: rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1718) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1719) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1720) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1721) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1722) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1723) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1724) 		    r1 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1725) 		S1: synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1726) 		    WRITE_ONCE(z, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1727) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1728) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1729) 	P2()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1730) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1731) 		int r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1732) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1733) 		L2: rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1734) 		    r2 = READ_ONCE(z);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1735) 		    WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1736) 		U2: rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1737) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1738) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1739) If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1740) that U0 ->rcu-rscsi L0 ->rcu-link S1 ->rcu-gp S1 ->rcu-link U2 ->rcu-rscsi
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1741) L2 ->rcu-link U0.  However this cycle is not forbidden, because the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1742) sequence of relations contains fewer instances of rcu-gp (one) than of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1743) rcu-rscsi (two).  Consequently the outcome is allowed by the LKMM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1744) The following instruction timing diagram shows how it might actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1745) occur:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1746) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1747) P0			P1			P2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1748) --------------------	--------------------	--------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1749) rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1750) WRITE_ONCE(y, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1751) 			r1 = READ_ONCE(y)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1752) 			synchronize_rcu() starts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1753) 			.			rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1754) 			.			WRITE_ONCE(x, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1755) r0 = READ_ONCE(x)	.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1756) rcu_read_unlock()	.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1757) 			synchronize_rcu() ends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1758) 			WRITE_ONCE(z, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1759) 						r2 = READ_ONCE(z)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1760) 						rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1761) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1762) This requires P0 and P2 to execute their loads and stores out of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1763) program order, but of course they are allowed to do so.  And as you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1764) can see, the Grace Period Guarantee is not violated: The critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1765) section in P0 both starts before P1's grace period does and ends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1766) before it does, and the critical section in P2 both starts after P1's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1767) grace period does and ends after it does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1768) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1769) Addendum: The LKMM now supports SRCU (Sleepable Read-Copy-Update) in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1770) addition to normal RCU.  The ideas involved are much the same as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1771) above, with new relations srcu-gp and srcu-rscsi added to represent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1772) SRCU grace periods and read-side critical sections.  There is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1773) restriction on the srcu-gp and srcu-rscsi links that can appear in an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1774) rcu-order sequence (the srcu-rscsi links must be paired with srcu-gp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1775) links having the same SRCU domain with proper nesting); the details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1776) are relatively unimportant.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1777) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1778) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1779) LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1780) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1781) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1782) The LKMM includes locking.  In fact, there is special code for locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1783) in the formal model, added in order to make tools run faster.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1784) However, this special code is intended to be more or less equivalent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1785) to concepts we have already covered.  A spinlock_t variable is treated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1786) the same as an int, and spin_lock(&s) is treated almost the same as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1787) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1788) 	while (cmpxchg_acquire(&s, 0, 1) != 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1789) 		cpu_relax();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1790) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1791) This waits until s is equal to 0 and then atomically sets it to 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1792) and the read part of the cmpxchg operation acts as an acquire fence.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1793) An alternate way to express the same thing would be:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1794) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1795) 	r = xchg_acquire(&s, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1796) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1797) along with a requirement that at the end, r = 0.  Similarly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1798) spin_trylock(&s) is treated almost the same as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1799) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1800) 	return !cmpxchg_acquire(&s, 0, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1801) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1802) which atomically sets s to 1 if it is currently equal to 0 and returns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1803) true if it succeeds (the read part of the cmpxchg operation acts as an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1804) acquire fence only if the operation is successful).  spin_unlock(&s)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1805) is treated almost the same as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1806) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1807) 	smp_store_release(&s, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1808) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1809) The "almost" qualifiers above need some explanation.  In the LKMM, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1810) store-release in a spin_unlock() and the load-acquire which forms the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1811) first half of the atomic rmw update in a spin_lock() or a successful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1812) spin_trylock() -- we can call these things lock-releases and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1813) lock-acquires -- have two properties beyond those of ordinary releases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1814) and acquires.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1815) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1816) First, when a lock-acquire reads from a lock-release, the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1817) requires that every instruction po-before the lock-release must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1818) execute before any instruction po-after the lock-acquire.  This would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1819) naturally hold if the release and acquire operations were on different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1820) CPUs, but the LKMM says it holds even when they are on the same CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1821) For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1822) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1823) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1824) 	spinlock_t s;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1825) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1826) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1827) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1828) 		int r1, r2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1829) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1830) 		spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1831) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1832) 		spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1833) 		spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1834) 		r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1835) 		spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1836) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1837) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1838) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1839) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1840) 		WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1841) 		smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1842) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1843) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1844) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1845) Here the second spin_lock() reads from the first spin_unlock(), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1846) therefore the load of x must execute before the load of y.  Thus we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1847) cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1848) MP pattern).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1849) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1850) This requirement does not apply to ordinary release and acquire
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1851) fences, only to lock-related operations.  For instance, suppose P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1852) in the example had been written as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1853) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1854) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1855) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1856) 		int r1, r2, r3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1857) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1858) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1859) 		smp_store_release(&s, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1860) 		r3 = smp_load_acquire(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1861) 		r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1862) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1863) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1864) Then the CPU would be allowed to forward the s = 1 value from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1865) smp_store_release() to the smp_load_acquire(), executing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1866) instructions in the following order:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1867) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1868) 		r3 = smp_load_acquire(&s);	// Obtains r3 = 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1869) 		r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1870) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1871) 		smp_store_release(&s, 1);	// Value is forwarded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1872) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1873) and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1874) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1875) Second, when a lock-acquire reads from a lock-release, and some other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1876) stores W and W' occur po-before the lock-release and po-after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1877) lock-acquire respectively, the LKMM requires that W must propagate to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1878) each CPU before W' does.  For example, consider:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1879) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1880) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1881) 	spinlock_t x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1882) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1883) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1884) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1885) 		spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1886) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1887) 		spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1888) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1889) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1890) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1891) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1892) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1893) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1894) 		spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1895) 		r1 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1896) 		WRITE_ONCE(y, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1897) 		spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1898) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1899) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1900) 	P2()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1901) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1902) 		int r2, r3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1903) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1904) 		r2 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1905) 		smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1906) 		r3 = READ_ONCE(x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1907) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1908) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1909) If r1 = 1 at the end then the spin_lock() in P1 must have read from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1910) the spin_unlock() in P0.  Hence the store to x must propagate to P2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1911) before the store to y does, so we cannot have r2 = 1 and r3 = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1912) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1913) These two special requirements for lock-release and lock-acquire do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1914) not arise from the operational model.  Nevertheless, kernel developers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1915) have come to expect and rely on them because they do hold on all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1916) architectures supported by the Linux kernel, albeit for various
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1917) differing reasons.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1918) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1919) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1920) PLAIN ACCESSES AND DATA RACES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1921) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1922) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1923) In the LKMM, memory accesses such as READ_ONCE(x), atomic_inc(&y),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1924) smp_load_acquire(&z), and so on are collectively referred to as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1925) "marked" accesses, because they are all annotated with special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1926) operations of one kind or another.  Ordinary C-language memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1927) accesses such as x or y = 0 are simply called "plain" accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1928) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1929) Early versions of the LKMM had nothing to say about plain accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1930) The C standard allows compilers to assume that the variables affected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1931) by plain accesses are not concurrently read or written by any other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1932) threads or CPUs.  This leaves compilers free to implement all manner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1933) of transformations or optimizations of code containing plain accesses,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1934) making such code very difficult for a memory model to handle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1935) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1936) Here is just one example of a possible pitfall:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1937) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1938) 	int a = 6;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1939) 	int *x = &a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1940) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1941) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1942) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1943) 		int *r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1944) 		int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1945) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1946) 		r1 = x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1947) 		if (r1 != NULL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1948) 			r2 = READ_ONCE(*r1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1949) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1950) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1951) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1952) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1953) 		WRITE_ONCE(x, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1954) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1955) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1956) On the face of it, one would expect that when this code runs, the only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1957) possible final values for r2 are 6 and 0, depending on whether or not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1958) P1's store to x propagates to P0 before P0's load from x executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1959) But since P0's load from x is a plain access, the compiler may decide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1960) to carry out the load twice (for the comparison against NULL, then again
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1961) for the READ_ONCE()) and eliminate the temporary variable r1.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1962) object code generated for P0 could therefore end up looking rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1963) like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1964) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1965) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1966) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1967) 		int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1968) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1969) 		if (x != NULL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1970) 			r2 = READ_ONCE(*x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1971) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1972) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1973) And now it is obvious that this code runs the risk of dereferencing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1974) NULL pointer, because P1's store to x might propagate to P0 after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1975) test against NULL has been made but before the READ_ONCE() executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1976) If the original code had said "r1 = READ_ONCE(x)" instead of "r1 = x",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1977) the compiler would not have performed this optimization and there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1978) would be no possibility of a NULL-pointer dereference.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1979) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1980) Given the possibility of transformations like this one, the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1981) doesn't try to predict all possible outcomes of code containing plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1982) accesses.  It is instead content to determine whether the code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1983) violates the compiler's assumptions, which would render the ultimate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1984) outcome undefined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1985) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1986) In technical terms, the compiler is allowed to assume that when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1987) program executes, there will not be any data races.  A "data race"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1988) occurs when there are two memory accesses such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1989) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1990) 1.	they access the same location,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1991) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1992) 2.	at least one of them is a store,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1993) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1994) 3.	at least one of them is plain,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1995) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1996) 4.	they occur on different CPUs (or in different threads on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1997) 	same CPU), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1998) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1999) 5.	they execute concurrently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2000) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2001) In the literature, two accesses are said to "conflict" if they satisfy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2002) 1 and 2 above.  We'll go a little farther and say that two accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2003) are "race candidates" if they satisfy 1 - 4.  Thus, whether or not two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2004) race candidates actually do race in a given execution depends on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2005) whether they are concurrent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2006) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2007) The LKMM tries to determine whether a program contains race candidates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2008) which may execute concurrently; if it does then the LKMM says there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2009) a potential data race and makes no predictions about the program's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2010) outcome.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2011) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2012) Determining whether two accesses are race candidates is easy; you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2013) see that all the concepts involved in the definition above are already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2014) part of the memory model.  The hard part is telling whether they may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2015) execute concurrently.  The LKMM takes a conservative attitude,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2016) assuming that accesses may be concurrent unless it can prove they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2017) are not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2018) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2019) If two memory accesses aren't concurrent then one must execute before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2020) the other.  Therefore the LKMM decides two accesses aren't concurrent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2021) if they can be connected by a sequence of hb, pb, and rb links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2022) (together referred to as xb, for "executes before").  However, there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2023) are two complicating factors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2024) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2025) If X is a load and X executes before a store Y, then indeed there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2026) no danger of X and Y being concurrent.  After all, Y can't have any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2027) effect on the value obtained by X until the memory subsystem has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2028) propagated Y from its own CPU to X's CPU, which won't happen until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2029) some time after Y executes and thus after X executes.  But if X is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2030) store, then even if X executes before Y it is still possible that X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2031) will propagate to Y's CPU just as Y is executing.  In such a case X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2032) could very well interfere somehow with Y, and we would have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2033) consider X and Y to be concurrent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2034) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2035) Therefore when X is a store, for X and Y to be non-concurrent the LKMM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2036) requires not only that X must execute before Y but also that X must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2037) propagate to Y's CPU before Y executes.  (Or vice versa, of course, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2038) Y executes before X -- then Y must propagate to X's CPU before X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2039) executes if Y is a store.)  This is expressed by the visibility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2040) relation (vis), where X ->vis Y is defined to hold if there is an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2041) intermediate event Z such that:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2042) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2043) 	X is connected to Z by a possibly empty sequence of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2044) 	cumul-fence links followed by an optional rfe link (if none of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2045) 	these links are present, X and Z are the same event),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2046) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2047) and either:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2048) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2049) 	Z is connected to Y by a strong-fence link followed by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2050) 	possibly empty sequence of xb links,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2051) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2052) or:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2053) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2054) 	Z is on the same CPU as Y and is connected to Y by a possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2055) 	empty sequence of xb links (again, if the sequence is empty it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2056) 	means Z and Y are the same event).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2057) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2058) The motivations behind this definition are straightforward:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2059) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2060) 	cumul-fence memory barriers force stores that are po-before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2061) 	the barrier to propagate to other CPUs before stores that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2062) 	po-after the barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2063) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2064) 	An rfe link from an event W to an event R says that R reads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2065) 	from W, which certainly means that W must have propagated to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2066) 	R's CPU before R executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2067) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2068) 	strong-fence memory barriers force stores that are po-before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2069) 	the barrier, or that propagate to the barrier's CPU before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2070) 	barrier executes, to propagate to all CPUs before any events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2071) 	po-after the barrier can execute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2072) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2073) To see how this works out in practice, consider our old friend, the MP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2074) pattern (with fences and statement labels, but without the conditional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2075) test):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2076) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2077) 	int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2078) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2079) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2080) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2081) 		X: WRITE_ONCE(buf, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2082) 		   smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2083) 		W: WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2084) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2085) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2086) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2087) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2088) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2089) 		int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2090) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2091) 		Z: r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2092) 		   smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2093) 		Y: r2 = READ_ONCE(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2094) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2095) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2096) The smp_wmb() memory barrier gives a cumul-fence link from X to W, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2097) assuming r1 = 1 at the end, there is an rfe link from W to Z.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2098) means that the store to buf must propagate from P0 to P1 before Z
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2099) executes.  Next, Z and Y are on the same CPU and the smp_rmb() fence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2100) provides an xb link from Z to Y (i.e., it forces Z to execute before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2101) Y).  Therefore we have X ->vis Y: X must propagate to Y's CPU before Y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2102) executes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2104) The second complicating factor mentioned above arises from the fact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2105) that when we are considering data races, some of the memory accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2106) are plain.  Now, although we have not said so explicitly, up to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2107) point most of the relations defined by the LKMM (ppo, hb, prop,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2108) cumul-fence, pb, and so on -- including vis) apply only to marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2109) accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2111) There are good reasons for this restriction.  The compiler is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2112) allowed to apply fancy transformations to marked accesses, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2113) consequently each such access in the source code corresponds more or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2114) less directly to a single machine instruction in the object code.  But
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2115) plain accesses are a different story; the compiler may combine them,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2116) split them up, duplicate them, eliminate them, invent new ones, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2117) who knows what else.  Seeing a plain access in the source code tells
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2118) you almost nothing about what machine instructions will end up in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2119) object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2121) Fortunately, the compiler isn't completely free; it is subject to some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2122) limitations.  For one, it is not allowed to introduce a data race into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2123) the object code if the source code does not already contain a data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2124) race (if it could, memory models would be useless and no multithreaded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2125) code would be safe!).  For another, it cannot move a plain access past
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2126) a compiler barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2128) A compiler barrier is a kind of fence, but as the name implies, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2129) only affects the compiler; it does not necessarily have any effect on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2130) how instructions are executed by the CPU.  In Linux kernel source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2131) code, the barrier() function is a compiler barrier.  It doesn't give
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2132) rise directly to any machine instructions in the object code; rather,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2133) it affects how the compiler generates the rest of the object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2134) Given source code like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2136) 	... some memory accesses ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2137) 	barrier();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2138) 	... some other memory accesses ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2140) the barrier() function ensures that the machine instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2141) corresponding to the first group of accesses will all end po-before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2142) any machine instructions corresponding to the second group of accesses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2143) -- even if some of the accesses are plain.  (Of course, the CPU may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2144) then execute some of those accesses out of program order, but we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2145) already know how to deal with such issues.)  Without the barrier()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2146) there would be no such guarantee; the two groups of accesses could be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2147) intermingled or even reversed in the object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2149) The LKMM doesn't say much about the barrier() function, but it does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2150) require that all fences are also compiler barriers.  In addition, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2151) requires that the ordering properties of memory barriers such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2152) smp_rmb() or smp_store_release() apply to plain accesses as well as to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2153) marked accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2155) This is the key to analyzing data races.  Consider the MP pattern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2156) again, now using plain accesses for buf:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2158) 	int buf = 0, flag = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2160) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2161) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2162) 		U: buf = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2163) 		   smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2164) 		X: WRITE_ONCE(flag, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2165) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2167) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2168) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2169) 		int r1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2170) 		int r2 = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2172) 		Y: r1 = READ_ONCE(flag);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2173) 		   if (r1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2174) 			   smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2175) 			V: r2 = buf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2176) 		   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2177) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2179) This program does not contain a data race.  Although the U and V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2180) accesses are race candidates, the LKMM can prove they are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2181) concurrent as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2183) 	The smp_wmb() fence in P0 is both a compiler barrier and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2184) 	cumul-fence.  It guarantees that no matter what hash of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2185) 	machine instructions the compiler generates for the plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2186) 	access U, all those instructions will be po-before the fence.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2187) 	Consequently U's store to buf, no matter how it is carried out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2188) 	at the machine level, must propagate to P1 before X's store to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2189) 	flag does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2191) 	X and Y are both marked accesses.  Hence an rfe link from X to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2192) 	Y is a valid indicator that X propagated to P1 before Y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2193) 	executed, i.e., X ->vis Y.  (And if there is no rfe link then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2194) 	r1 will be 0, so V will not be executed and ipso facto won't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2195) 	race with U.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2197) 	The smp_rmb() fence in P1 is a compiler barrier as well as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2198) 	fence.  It guarantees that all the machine-level instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2199) 	corresponding to the access V will be po-after the fence, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2200) 	therefore any loads among those instructions will execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2201) 	after the fence does and hence after Y does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2203) Thus U's store to buf is forced to propagate to P1 before V's load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2204) executes (assuming V does execute), ruling out the possibility of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2205) data race between them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2207) This analysis illustrates how the LKMM deals with plain accesses in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2208) general.  Suppose R is a plain load and we want to show that R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2209) executes before some marked access E.  We can do this by finding a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2210) marked access X such that R and X are ordered by a suitable fence and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2211) X ->xb* E.  If E was also a plain access, we would also look for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2212) marked access Y such that X ->xb* Y, and Y and E are ordered by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2213) fence.  We describe this arrangement by saying that R is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2214) "post-bounded" by X and E is "pre-bounded" by Y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2216) In fact, we go one step further: Since R is a read, we say that R is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2217) "r-post-bounded" by X.  Similarly, E would be "r-pre-bounded" or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2218) "w-pre-bounded" by Y, depending on whether E was a store or a load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2219) This distinction is needed because some fences affect only loads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2220) (i.e., smp_rmb()) and some affect only stores (smp_wmb()); otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2221) the two types of bounds are the same.  And as a degenerate case, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2222) say that a marked access pre-bounds and post-bounds itself (e.g., if R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2223) above were a marked load then X could simply be taken to be R itself.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2225) The need to distinguish between r- and w-bounding raises yet another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2226) issue.  When the source code contains a plain store, the compiler is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2227) allowed to put plain loads of the same location into the object code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2228) For example, given the source code:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2230) 	x = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2232) the compiler is theoretically allowed to generate object code that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2233) looks like:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2235) 	if (x != 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2236) 		x = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2237) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2238) thereby adding a load (and possibly replacing the store entirely).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2239) For this reason, whenever the LKMM requires a plain store to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2240) w-pre-bounded or w-post-bounded by a marked access, it also requires
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2241) the store to be r-pre-bounded or r-post-bounded, so as to handle cases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2242) where the compiler adds a load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2244) (This may be overly cautious.  We don't know of any examples where a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2245) compiler has augmented a store with a load in this fashion, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2246) Linux kernel developers would probably fight pretty hard to change a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2247) compiler if it ever did this.  Still, better safe than sorry.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2248) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2249) Incidentally, the other tranformation -- augmenting a plain load by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2250) adding in a store to the same location -- is not allowed.  This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2251) because the compiler cannot know whether any other CPUs might perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2252) a concurrent load from that location.  Two concurrent loads don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2253) constitute a race (they can't interfere with each other), but a store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2254) does race with a concurrent load.  Thus adding a store might create a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2255) data race where one was not already present in the source code,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2256) something the compiler is forbidden to do.  Augmenting a store with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2257) load, on the other hand, is acceptable because doing so won't create a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2258) data race unless one already existed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2260) The LKMM includes a second way to pre-bound plain accesses, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2261) addition to fences: an address dependency from a marked load.  That
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2262) is, in the sequence:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2264) 	p = READ_ONCE(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2265) 	r = *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2266) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2267) the LKMM says that the marked load of ptr pre-bounds the plain load of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2268) *p; the marked load must execute before any of the machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2269) instructions corresponding to the plain load.  This is a reasonable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2270) stipulation, since after all, the CPU can't perform the load of *p
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2271) until it knows what value p will hold.  Furthermore, without some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2272) assumption like this one, some usages typical of RCU would count as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2273) data races.  For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2274) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2275) 	int a = 1, b;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2276) 	int *ptr = &a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2278) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2279) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2280) 		b = 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2281) 		rcu_assign_pointer(ptr, &b);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2282) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2283) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2284) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2285) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2286) 		int *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2287) 		int r;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2288) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2289) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2290) 		p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2291) 		r = *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2292) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2293) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2294) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2295) (In this example the rcu_read_lock() and rcu_read_unlock() calls don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2296) really do anything, because there aren't any grace periods.  They are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2297) included merely for the sake of good form; typically P0 would call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2298) synchronize_rcu() somewhere after the rcu_assign_pointer().)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2300) rcu_assign_pointer() performs a store-release, so the plain store to b
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2301) is definitely w-post-bounded before the store to ptr, and the two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2302) stores will propagate to P1 in that order.  However, rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2303) is only equivalent to READ_ONCE().  While it is a marked access, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2304) not a fence or compiler barrier.  Hence the only guarantee we have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2305) that the load of ptr in P1 is r-pre-bounded before the load of *p
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2306) (thus avoiding a race) is the assumption about address dependencies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2308) This is a situation where the compiler can undermine the memory model,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2309) and a certain amount of care is required when programming constructs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2310) like this one.  In particular, comparisons between the pointer and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2311) other known addresses can cause trouble.  If you have something like:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2313) 	p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2314) 	if (p == &x)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2315) 		r = *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2317) then the compiler just might generate object code resembling:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2318) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2319) 	p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2320) 	if (p == &x)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2321) 		r = x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2322) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2323) or even:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2325) 	rtemp = x;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2326) 	p = rcu_dereference(ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2327) 	if (p == &x)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2328) 		r = rtemp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2330) which would invalidate the memory model's assumption, since the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2331) could now perform the load of x before the load of ptr (there might be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2332) a control dependency but no address dependency at the machine level).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2334) Finally, it turns out there is a situation in which a plain write does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2335) not need to be w-post-bounded: when it is separated from the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2336) race-candidate access by a fence.  At first glance this may seem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2337) impossible.  After all, to be race candidates the two accesses must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2338) be on different CPUs, and fences don't link events on different CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2339) Well, normal fences don't -- but rcu-fence can!  Here's an example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2340) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2341) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2342) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2343) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2344) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2345) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2346) 		synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2347) 		y = 3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2348) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2350) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2351) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2352) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2353) 		if (READ_ONCE(x) == 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2354) 			y = 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2355) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2356) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2357) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2358) Do the plain stores to y race?  Clearly not if P1 reads a non-zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2359) value for x, so let's assume the READ_ONCE(x) does obtain 0.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2360) means that the read-side critical section in P1 must finish executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2361) before the grace period in P0 does, because RCU's Grace-Period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2362) Guarantee says that otherwise P0's store to x would have propagated to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2363) P1 before the critical section started and so would have been visible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2364) to the READ_ONCE().  (Another way of putting it is that the fre link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2365) from the READ_ONCE() to the WRITE_ONCE() gives rise to an rcu-link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2366) between those two events.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2368) This means there is an rcu-fence link from P1's "y = 2" store to P0's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2369) "y = 3" store, and consequently the first must propagate from P1 to P0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2370) before the second can execute.  Therefore the two stores cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2371) concurrent and there is no race, even though P1's plain store to y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2372) isn't w-post-bounded by any marked accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2373) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2374) Putting all this material together yields the following picture.  For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2375) race-candidate stores W and W', where W ->co W', the LKMM says the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2376) stores don't race if W can be linked to W' by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2377) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2378) 	w-post-bounded ; vis ; w-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2379) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2380) sequence.  If W is plain then they also have to be linked by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2381) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2382) 	r-post-bounded ; xb* ; w-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2383) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2384) sequence, and if W' is plain then they also have to be linked by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2385) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2386) 	w-post-bounded ; vis ; r-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2387) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2388) sequence.  For race-candidate load R and store W, the LKMM says the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2389) two accesses don't race if R can be linked to W by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2390) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2391) 	r-post-bounded ; xb* ; w-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2393) sequence or if W can be linked to R by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2394) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2395) 	w-post-bounded ; vis ; r-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2396) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2397) sequence.  For the cases involving a vis link, the LKMM also accepts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2398) sequences in which W is linked to W' or R by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2399) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2400) 	strong-fence ; xb* ; {w and/or r}-pre-bounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2402) sequence with no post-bounding, and in every case the LKMM also allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2403) the link simply to be a fence with no bounding at all.  If no sequence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2404) of the appropriate sort exists, the LKMM says that the accesses race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2405) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2406) There is one more part of the LKMM related to plain accesses (although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2407) not to data races) we should discuss.  Recall that many relations such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2408) as hb are limited to marked accesses only.  As a result, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2409) happens-before, propagates-before, and rcu axioms (which state that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2410) various relation must not contain a cycle) doesn't apply to plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2411) accesses.  Nevertheless, we do want to rule out such cycles, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2412) they don't make sense even for plain accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2413) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2414) To this end, the LKMM imposes three extra restrictions, together
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2415) called the "plain-coherence" axiom because of their resemblance to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2416) rules used by the operational model to ensure cache coherence (that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2417) is, the rules governing the memory subsystem's choice of a store to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2418) satisfy a load request and its determination of where a store will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2419) fall in the coherence order):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2420) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2421) 	If R and W are race candidates and it is possible to link R to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2422) 	W by one of the xb* sequences listed above, then W ->rfe R is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2423) 	not allowed (i.e., a load cannot read from a store that it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2424) 	executes before, even if one or both is plain).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2425) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2426) 	If W and R are race candidates and it is possible to link W to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2427) 	R by one of the vis sequences listed above, then R ->fre W is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2428) 	not allowed (i.e., if a store is visible to a load then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2429) 	load must read from that store or one coherence-after it).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2430) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2431) 	If W and W' are race candidates and it is possible to link W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2432) 	to W' by one of the vis sequences listed above, then W' ->co W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2433) 	is not allowed (i.e., if one store is visible to a second then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2434) 	the second must come after the first in the coherence order).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2436) This is the extent to which the LKMM deals with plain accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2437) Perhaps it could say more (for example, plain accesses might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2438) contribute to the ppo relation), but at the moment it seems that this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2439) minimal, conservative approach is good enough.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2440) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2442) ODDS AND ENDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2443) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2444) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2445) This section covers material that didn't quite fit anywhere in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2446) earlier sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2447) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2448) The descriptions in this document don't always match the formal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2449) version of the LKMM exactly.  For example, the actual formal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2450) definition of the prop relation makes the initial coe or fre part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2451) optional, and it doesn't require the events linked by the relation to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2452) be on the same CPU.  These differences are very unimportant; indeed,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2453) instances where the coe/fre part of prop is missing are of no interest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2454) because all the other parts (fences and rfe) are already included in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2455) hb anyway, and where the formal model adds prop into hb, it includes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2456) an explicit requirement that the events being linked are on the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2457) CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2458) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2459) Another minor difference has to do with events that are both memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2460) accesses and fences, such as those corresponding to smp_load_acquire()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2461) calls.  In the formal model, these events aren't actually both reads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2462) and fences; rather, they are read events with an annotation marking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2463) them as acquires.  (Or write events annotated as releases, in the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2464) smp_store_release().)  The final effect is the same.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2466) Although we didn't mention it above, the instruction execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2467) ordering provided by the smp_rmb() fence doesn't apply to read events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2468) that are part of a non-value-returning atomic update.  For instance,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2469) given:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2470) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2471) 	atomic_inc(&x);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2472) 	smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2473) 	r1 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2474) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2475) it is not guaranteed that the load from y will execute after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2476) update to x.  This is because the ARMv8 architecture allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2477) non-value-returning atomic operations effectively to be executed off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2478) the CPU.  Basically, the CPU tells the memory subsystem to increment
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2479) x, and then the increment is carried out by the memory hardware with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2480) no further involvement from the CPU.  Since the CPU doesn't ever read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2481) the value of x, there is nothing for the smp_rmb() fence to act on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2482) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2483) The LKMM defines a few extra synchronization operations in terms of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2484) things we have already covered.  In particular, rcu_dereference() is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2485) treated as READ_ONCE() and rcu_assign_pointer() is treated as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2486) smp_store_release() -- which is basically how the Linux kernel treats
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2487) them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2488) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2489) Although we said that plain accesses are not linked by the ppo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2490) relation, they do contribute to it indirectly.  Namely, when there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2491) an address dependency from a marked load R to a plain store W,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2492) followed by smp_wmb() and then a marked store W', the LKMM creates a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2493) ppo link from R to W'.  The reasoning behind this is perhaps a little
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2494) shaky, but essentially it says there is no way to generate object code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2495) for this source code in which W' could execute before R.  Just as with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2496) pre-bounding by address dependencies, it is possible for the compiler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2497) to undermine this relation if sufficient care is not taken.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2498) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2499) There are a few oddball fences which need special treatment:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2500) smp_mb__before_atomic(), smp_mb__after_atomic(), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2501) smp_mb__after_spinlock().  The LKMM uses fence events with special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2502) annotations for them; they act as strong fences just like smp_mb()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2503) except for the sets of events that they order.  Instead of ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2504) all po-earlier events against all po-later events, as smp_mb() does,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2505) they behave as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2506) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2507) 	smp_mb__before_atomic() orders all po-earlier events against
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2508) 	po-later atomic updates and the events following them;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2510) 	smp_mb__after_atomic() orders po-earlier atomic updates and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2511) 	the events preceding them against all po-later events;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2512) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2513) 	smp_mb_after_spinlock() orders po-earlier lock acquisition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2514) 	events and the events preceding them against all po-later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2515) 	events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2517) Interestingly, RCU and locking each introduce the possibility of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2518) deadlock.  When faced with code sequences such as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2519) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2520) 	spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2521) 	spin_lock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2522) 	spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2523) 	spin_unlock(&s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2524) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2525) or:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2526) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2527) 	rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2528) 	synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2529) 	rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2530) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2531) what does the LKMM have to say?  Answer: It says there are no allowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2532) executions at all, which makes sense.  But this can also lead to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2533) misleading results, because if a piece of code has multiple possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2534) executions, some of which deadlock, the model will report only on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2535) non-deadlocking executions.  For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2536) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2537) 	int x, y;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2538) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2539) 	P0()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2540) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2541) 		int r0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2542) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2543) 		WRITE_ONCE(x, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2544) 		r0 = READ_ONCE(y);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2545) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2546) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2547) 	P1()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2548) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2549) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2550) 		if (READ_ONCE(x) > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2551) 			WRITE_ONCE(y, 36);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2552) 			synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2553) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2554) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2555) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2556) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2557) Is it possible to end up with r0 = 36 at the end?  The LKMM will tell
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2558) you it is not, but the model won't mention that this is because P1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2559) will self-deadlock in the executions where it stores 36 in y.