Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    1) .. _whatisrcu_doc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    3) What is RCU?  --  "Read, Copy, Update"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    4) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    5) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    6) Please note that the "What is RCU?" LWN series is an excellent place
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    7) to start learning about RCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    9) | 1.	What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   10) | 2.	What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   11) | 3.	RCU part 3: the RCU API      http://lwn.net/Articles/264090/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   12) | 4.	The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   13) | 	2010 Big API Table           http://lwn.net/Articles/419086/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   14) | 5.	The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   15) |	2014 Big API Table           http://lwn.net/Articles/609973/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   18) What is RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   20) RCU is a synchronization mechanism that was added to the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   21) during the 2.5 development effort that is optimized for read-mostly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   22) situations.  Although RCU is actually quite simple once you understand it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   23) getting there can sometimes be a challenge.  Part of the problem is that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   24) most of the past descriptions of RCU have been written with the mistaken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   25) assumption that there is "one true way" to describe RCU.  Instead,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   26) the experience has been that different people must take different paths
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   27) to arrive at an understanding of RCU.  This document provides several
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   28) different paths, as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   30) :ref:`1.	RCU OVERVIEW <1_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   32) :ref:`2.	WHAT IS RCU'S CORE API? <2_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   34) :ref:`3.	WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   36) :ref:`4.	WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   38) :ref:`5.	WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   40) :ref:`6.	ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   42) :ref:`7.	FULL LIST OF RCU APIs <7_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   44) :ref:`8.	ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   46) People who prefer starting with a conceptual overview should focus on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   47) Section 1, though most readers will profit by reading this section at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   48) some point.  People who prefer to start with an API that they can then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   49) experiment with should focus on Section 2.  People who prefer to start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   50) with example uses should focus on Sections 3 and 4.  People who need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   51) understand the RCU implementation should focus on Section 5, then dive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   52) into the kernel source code.  People who reason best by analogy should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   53) focus on Section 6.  Section 7 serves as an index to the docbook API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   54) documentation, and Section 8 is the traditional answer key.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   56) So, start with the section that makes the most sense to you and your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   57) preferred method of learning.  If you need to know everything about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   58) everything, feel free to read the whole thing -- but if you are really
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   59) that type of person, you have perused the source code and will therefore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   60) never need this document anyway.  ;-)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   62) .. _1_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   64) 1.  RCU OVERVIEW
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   65) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   67) The basic idea behind RCU is to split updates into "removal" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   68) "reclamation" phases.  The removal phase removes references to data items
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   69) within a data structure (possibly by replacing them with references to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   70) new versions of these data items), and can run concurrently with readers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   71) The reason that it is safe to run the removal phase concurrently with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   72) readers is the semantics of modern CPUs guarantee that readers will see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   73) either the old or the new version of the data structure rather than a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   74) partially updated reference.  The reclamation phase does the work of reclaiming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   75) (e.g., freeing) the data items removed from the data structure during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   76) removal phase.  Because reclaiming data items can disrupt any readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   77) concurrently referencing those data items, the reclamation phase must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   78) not start until readers no longer hold references to those data items.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   80) Splitting the update into removal and reclamation phases permits the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   81) updater to perform the removal phase immediately, and to defer the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   82) reclamation phase until all readers active during the removal phase have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   83) completed, either by blocking until they finish or by registering a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   84) callback that is invoked after they finish.  Only readers that are active
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   85) during the removal phase need be considered, because any reader starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   86) after the removal phase will be unable to gain a reference to the removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   87) data items, and therefore cannot be disrupted by the reclamation phase.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   89) So the typical RCU update sequence goes something like the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   91) a.	Remove pointers to a data structure, so that subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   92) 	readers cannot gain a reference to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   94) b.	Wait for all previous readers to complete their RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   95) 	critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   97) c.	At this point, there cannot be any readers who hold references
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   98) 	to the data structure, so it now may safely be reclaimed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   99) 	(e.g., kfree()d).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  101) Step (b) above is the key idea underlying RCU's deferred destruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  102) The ability to wait until all readers are done allows RCU readers to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  103) use much lighter-weight synchronization, in some cases, absolutely no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  104) synchronization at all.  In contrast, in more conventional lock-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  105) schemes, readers must use heavy-weight synchronization in order to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  106) prevent an updater from deleting the data structure out from under them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  107) This is because lock-based updaters typically update data items in place,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  108) and must therefore exclude readers.  In contrast, RCU-based updaters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  109) typically take advantage of the fact that writes to single aligned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  110) pointers are atomic on modern CPUs, allowing atomic insertion, removal,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  111) and replacement of data items in a linked structure without disrupting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  112) readers.  Concurrent RCU readers can then continue accessing the old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  113) versions, and can dispense with the atomic operations, memory barriers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  114) and communications cache misses that are so expensive on present-day
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  115) SMP computer systems, even in absence of lock contention.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  117) In the three-step procedure shown above, the updater is performing both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  118) the removal and the reclamation step, but it is often helpful for an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  119) entirely different thread to do the reclamation, as is in fact the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  120) in the Linux kernel's directory-entry cache (dcache).  Even if the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  121) thread performs both the update step (step (a) above) and the reclamation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  122) step (step (c) above), it is often helpful to think of them separately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  123) For example, RCU readers and updaters need not communicate at all,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  124) but RCU provides implicit low-overhead communication between readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  125) and reclaimers, namely, in step (b) above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  127) So how the heck can a reclaimer tell when a reader is done, given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  128) that readers are not doing any sort of synchronization operations???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  129) Read on to learn about how RCU's API makes this easy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  131) .. _2_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  133) 2.  WHAT IS RCU'S CORE API?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  134) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  136) The core RCU API is quite small:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  138) a.	rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  139) b.	rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  140) c.	synchronize_rcu() / call_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  141) d.	rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  142) e.	rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  144) There are many other members of the RCU API, but the rest can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  145) expressed in terms of these five, though most implementations instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  146) express synchronize_rcu() in terms of the call_rcu() callback API.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  148) The five core RCU APIs are described below, the other 18 will be enumerated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  149) later.  See the kernel docbook documentation for more info, or look directly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  150) at the function header comments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  152) rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  153) ^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  154) 	void rcu_read_lock(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  156) 	Used by a reader to inform the reclaimer that the reader is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  157) 	entering an RCU read-side critical section.  It is illegal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  158) 	to block while in an RCU read-side critical section, though
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  159) 	kernels built with CONFIG_PREEMPT_RCU can preempt RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  160) 	read-side critical sections.  Any RCU-protected data structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  161) 	accessed during an RCU read-side critical section is guaranteed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  162) 	remain unreclaimed for the full duration of that critical section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  163) 	Reference counts may be used in conjunction with RCU to maintain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  164) 	longer-term references to data structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  166) rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  167) ^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  168) 	void rcu_read_unlock(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  169) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  170) 	Used by a reader to inform the reclaimer that the reader is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  171) 	exiting an RCU read-side critical section.  Note that RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  172) 	read-side critical sections may be nested and/or overlapping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  174) synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  175) ^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  176) 	void synchronize_rcu(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  178) 	Marks the end of updater code and the beginning of reclaimer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  179) 	code.  It does this by blocking until all pre-existing RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  180) 	read-side critical sections on all CPUs have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  181) 	Note that synchronize_rcu() will **not** necessarily wait for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  182) 	any subsequent RCU read-side critical sections to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  183) 	For example, consider the following sequence of events::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  185) 	         CPU 0                  CPU 1                 CPU 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  186) 	     ----------------- ------------------------- ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  187) 	 1.  rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  188) 	 2.                    enters synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  189) 	 3.                                               rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  190) 	 4.  rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  191) 	 5.                     exits synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  192) 	 6.                                              rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  193) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  194) 	To reiterate, synchronize_rcu() waits only for ongoing RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  195) 	read-side critical sections to complete, not necessarily for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  196) 	any that begin after synchronize_rcu() is invoked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  198) 	Of course, synchronize_rcu() does not necessarily return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  199) 	**immediately** after the last pre-existing RCU read-side critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  200) 	section completes.  For one thing, there might well be scheduling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  201) 	delays.  For another thing, many RCU implementations process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  202) 	requests in batches in order to improve efficiencies, which can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  203) 	further delay synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  205) 	Since synchronize_rcu() is the API that must figure out when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  206) 	readers are done, its implementation is key to RCU.  For RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  207) 	to be useful in all but the most read-intensive situations,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  208) 	synchronize_rcu()'s overhead must also be quite small.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  210) 	The call_rcu() API is a callback form of synchronize_rcu(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  211) 	and is described in more detail in a later section.  Instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  212) 	blocking, it registers a function and argument which are invoked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  213) 	after all ongoing RCU read-side critical sections have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  214) 	This callback variant is particularly useful in situations where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  215) 	it is illegal to block or where update-side performance is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  216) 	critically important.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  218) 	However, the call_rcu() API should not be used lightly, as use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  219) 	of the synchronize_rcu() API generally results in simpler code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  220) 	In addition, the synchronize_rcu() API has the nice property
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  221) 	of automatically limiting update rate should grace periods
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  222) 	be delayed.  This property results in system resilience in face
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  223) 	of denial-of-service attacks.  Code using call_rcu() should limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  224) 	update rate in order to gain this same sort of resilience.  See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  225) 	checklist.txt for some approaches to limiting the update rate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  227) rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  228) ^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  229) 	void rcu_assign_pointer(p, typeof(p) v);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  230) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  231) 	Yes, rcu_assign_pointer() **is** implemented as a macro, though it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  232) 	would be cool to be able to declare a function in this manner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  233) 	(Compiler experts will no doubt disagree.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  235) 	The updater uses this function to assign a new value to an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  236) 	RCU-protected pointer, in order to safely communicate the change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  237) 	in value from the updater to the reader.  This macro does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  238) 	evaluate to an rvalue, but it does execute any memory-barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  239) 	instructions required for a given CPU architecture.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  241) 	Perhaps just as important, it serves to document (1) which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  242) 	pointers are protected by RCU and (2) the point at which a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  243) 	given structure becomes accessible to other CPUs.  That said,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  244) 	rcu_assign_pointer() is most frequently used indirectly, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  245) 	the _rcu list-manipulation primitives such as list_add_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  247) rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  248) ^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  249) 	typeof(p) rcu_dereference(p);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  250) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  251) 	Like rcu_assign_pointer(), rcu_dereference() must be implemented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  252) 	as a macro.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  253) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  254) 	The reader uses rcu_dereference() to fetch an RCU-protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  255) 	pointer, which returns a value that may then be safely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  256) 	dereferenced.  Note that rcu_dereference() does not actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  257) 	dereference the pointer, instead, it protects the pointer for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  258) 	later dereferencing.  It also executes any needed memory-barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  259) 	instructions for a given CPU architecture.  Currently, only Alpha
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  260) 	needs memory barriers within rcu_dereference() -- on other CPUs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  261) 	it compiles to nothing, not even a compiler directive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  262) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  263) 	Common coding practice uses rcu_dereference() to copy an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  264) 	RCU-protected pointer to a local variable, then dereferences
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  265) 	this local variable, for example as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  266) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  267) 		p = rcu_dereference(head.next);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  268) 		return p->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  269) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  270) 	However, in this case, one could just as easily combine these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  271) 	into one statement::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  272) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  273) 		return rcu_dereference(head.next)->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  274) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  275) 	If you are going to be fetching multiple fields from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  276) 	RCU-protected structure, using the local variable is of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  277) 	course preferred.  Repeated rcu_dereference() calls look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  278) 	ugly, do not guarantee that the same pointer will be returned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  279) 	if an update happened while in the critical section, and incur
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  280) 	unnecessary overhead on Alpha CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  282) 	Note that the value returned by rcu_dereference() is valid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  283) 	only within the enclosing RCU read-side critical section [1]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  284) 	For example, the following is **not** legal::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  286) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  287) 		p = rcu_dereference(head.next);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  288) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  289) 		x = p->address;	/* BUG!!! */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  290) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  291) 		y = p->data;	/* BUG!!! */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  292) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  294) 	Holding a reference from one RCU read-side critical section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  295) 	to another is just as illegal as holding a reference from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  296) 	one lock-based critical section to another!  Similarly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  297) 	using a reference outside of the critical section in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  298) 	it was acquired is just as illegal as doing so with normal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  299) 	locking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  301) 	As with rcu_assign_pointer(), an important function of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  302) 	rcu_dereference() is to document which pointers are protected by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  303) 	RCU, in particular, flagging a pointer that is subject to changing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  304) 	at any time, including immediately after the rcu_dereference().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  305) 	And, again like rcu_assign_pointer(), rcu_dereference() is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  306) 	typically used indirectly, via the _rcu list-manipulation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  307) 	primitives, such as list_for_each_entry_rcu() [2]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  309) .. 	[1] The variant rcu_dereference_protected() can be used outside
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  310) 	of an RCU read-side critical section as long as the usage is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  311) 	protected by locks acquired by the update-side code.  This variant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  312) 	avoids the lockdep warning that would happen when using (for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  313) 	example) rcu_dereference() without rcu_read_lock() protection.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  314) 	Using rcu_dereference_protected() also has the advantage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  315) 	of permitting compiler optimizations that rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  316) 	must prohibit.	The rcu_dereference_protected() variant takes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  317) 	a lockdep expression to indicate which locks must be acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  318) 	by the caller. If the indicated protection is not provided,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  319) 	a lockdep splat is emitted.  See Documentation/RCU/Design/Requirements/Requirements.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  320) 	and the API's code comments for more details and example usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  321) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  322) .. 	[2] If the list_for_each_entry_rcu() instance might be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  323) 	update-side code as well as by RCU readers, then an additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  324) 	lockdep expression can be added to its list of arguments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  325) 	For example, given an additional "lock_is_held(&mylock)" argument,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  326) 	the RCU lockdep code would complain only if this instance was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  327) 	invoked outside of an RCU read-side critical section and without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  328) 	the protection of mylock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  330) The following diagram shows how each API communicates among the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  331) reader, updater, and reclaimer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  332) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  334) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  335) 	    rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  336) 	                            +--------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  337) 	    +---------------------->| reader |---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  338) 	    |                       +--------+         |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  339) 	    |                           |              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  340) 	    |                           |              | Protect:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  341) 	    |                           |              | rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  342) 	    |                           |              | rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  343) 	    |        rcu_dereference()  |              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  344) 	    +---------+                 |              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  345) 	    | updater |<----------------+              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  346) 	    +---------+                                V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  347) 	    |                                    +-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  348) 	    +----------------------------------->| reclaimer |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  349) 	                                         +-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  350) 	      Defer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  351) 	      synchronize_rcu() & call_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  353) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  354) The RCU infrastructure observes the time sequence of rcu_read_lock(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  355) rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  356) order to determine when (1) synchronize_rcu() invocations may return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  357) to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  358) implementations of the RCU infrastructure make heavy use of batching in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  359) order to amortize their overhead over many uses of the corresponding APIs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  361) There are at least three flavors of RCU usage in the Linux kernel. The diagram
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  362) above shows the most common one. On the updater side, the rcu_assign_pointer(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  363) synchronize_rcu() and call_rcu() primitives used are the same for all three
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  364) flavors. However for protection (on the reader side), the primitives used vary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  365) depending on the flavor:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  366) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  367) a.	rcu_read_lock() / rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  368) 	rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  370) b.	rcu_read_lock_bh() / rcu_read_unlock_bh()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  371) 	local_bh_disable() / local_bh_enable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  372) 	rcu_dereference_bh()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  373) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  374) c.	rcu_read_lock_sched() / rcu_read_unlock_sched()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  375) 	preempt_disable() / preempt_enable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  376) 	local_irq_save() / local_irq_restore()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  377) 	hardirq enter / hardirq exit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  378) 	NMI enter / NMI exit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  379) 	rcu_dereference_sched()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  380) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  381) These three flavors are used as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  382) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  383) a.	RCU applied to normal data structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  384) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  385) b.	RCU applied to networking data structures that may be subjected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  386) 	to remote denial-of-service attacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  387) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  388) c.	RCU applied to scheduler and interrupt/NMI-handler tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  389) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  390) Again, most uses will be of (a).  The (b) and (c) cases are important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  391) for specialized uses, but are relatively uncommon.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  393) .. _3_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  394) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  395) 3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  396) -----------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  397) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  398) This section shows a simple use of the core RCU API to protect a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  399) global pointer to a dynamically allocated structure.  More-typical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  400) uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  401) :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  402) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  404) 	struct foo {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  405) 		int a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  406) 		char b;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  407) 		long c;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  408) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  409) 	DEFINE_SPINLOCK(foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  411) 	struct foo __rcu *gbl_foo;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  413) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  414) 	 * Create a new struct foo that is the same as the one currently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  415) 	 * pointed to by gbl_foo, except that field "a" is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  416) 	 * with "new_a".  Points gbl_foo to the new structure, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  417) 	 * frees up the old structure after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  418) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  419) 	 * Uses rcu_assign_pointer() to ensure that concurrent readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  420) 	 * see the initialized version of the new structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  421) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  422) 	 * Uses synchronize_rcu() to ensure that any readers that might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  423) 	 * have references to the old structure complete before freeing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  424) 	 * the old structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  425) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  426) 	void foo_update_a(int new_a)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  427) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  428) 		struct foo *new_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  429) 		struct foo *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  430) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  431) 		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  432) 		spin_lock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  433) 		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  434) 		*new_fp = *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  435) 		new_fp->a = new_a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  436) 		rcu_assign_pointer(gbl_foo, new_fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  437) 		spin_unlock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  438) 		synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  439) 		kfree(old_fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  440) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  442) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  443) 	 * Return the value of field "a" of the current gbl_foo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  444) 	 * structure.  Use rcu_read_lock() and rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  445) 	 * to ensure that the structure does not get deleted out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  446) 	 * from under us, and use rcu_dereference() to ensure that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  447) 	 * we see the initialized version of the structure (important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  448) 	 * for DEC Alpha and for people reading the code).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  449) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  450) 	int foo_get_a(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  451) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  452) 		int retval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  453) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  454) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  455) 		retval = rcu_dereference(gbl_foo)->a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  456) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  457) 		return retval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  458) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  459) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  460) So, to sum up:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  461) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  462) -	Use rcu_read_lock() and rcu_read_unlock() to guard RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  463) 	read-side critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  464) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  465) -	Within an RCU read-side critical section, use rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  466) 	to dereference RCU-protected pointers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  467) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  468) -	Use some solid scheme (such as locks or semaphores) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  469) 	keep concurrent updates from interfering with each other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  470) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  471) -	Use rcu_assign_pointer() to update an RCU-protected pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  472) 	This primitive protects concurrent readers from the updater,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  473) 	**not** concurrent updates from each other!  You therefore still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  474) 	need to use locking (or something similar) to keep concurrent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  475) 	rcu_assign_pointer() primitives from interfering with each other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  476) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  477) -	Use synchronize_rcu() **after** removing a data element from an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  478) 	RCU-protected data structure, but **before** reclaiming/freeing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  479) 	the data element, in order to wait for the completion of all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  480) 	RCU read-side critical sections that might be referencing that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  481) 	data item.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  482) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  483) See checklist.txt for additional rules to follow when using RCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  484) And again, more-typical uses of RCU may be found in :ref:`listRCU.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  485) <list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  486) <NMI_rcu_doc>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  487) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  488) .. _4_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  489) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  490) 4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  491) --------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  492) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  493) In the example above, foo_update_a() blocks until a grace period elapses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  494) This is quite simple, but in some cases one cannot afford to wait so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  495) long -- there might be other high-priority work to be done.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  496) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  497) In such cases, one uses call_rcu() rather than synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  498) The call_rcu() API is as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  499) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  500) 	void call_rcu(struct rcu_head * head,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  501) 		      void (*func)(struct rcu_head *head));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  502) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  503) This function invokes func(head) after a grace period has elapsed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  504) This invocation might happen from either softirq or process context,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  505) so the function is not permitted to block.  The foo struct needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  506) have an rcu_head structure added, perhaps as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  507) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  508) 	struct foo {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  509) 		int a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  510) 		char b;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  511) 		long c;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  512) 		struct rcu_head rcu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  513) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  514) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  515) The foo_update_a() function might then be written as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  517) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  518) 	 * Create a new struct foo that is the same as the one currently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  519) 	 * pointed to by gbl_foo, except that field "a" is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  520) 	 * with "new_a".  Points gbl_foo to the new structure, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  521) 	 * frees up the old structure after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  522) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  523) 	 * Uses rcu_assign_pointer() to ensure that concurrent readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  524) 	 * see the initialized version of the new structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  525) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  526) 	 * Uses call_rcu() to ensure that any readers that might have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  527) 	 * references to the old structure complete before freeing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  528) 	 * old structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  529) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  530) 	void foo_update_a(int new_a)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  531) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  532) 		struct foo *new_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  533) 		struct foo *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  535) 		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  536) 		spin_lock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  537) 		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  538) 		*new_fp = *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  539) 		new_fp->a = new_a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  540) 		rcu_assign_pointer(gbl_foo, new_fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  541) 		spin_unlock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  542) 		call_rcu(&old_fp->rcu, foo_reclaim);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  543) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  544) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  545) The foo_reclaim() function might appear as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  546) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  547) 	void foo_reclaim(struct rcu_head *rp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  548) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  549) 		struct foo *fp = container_of(rp, struct foo, rcu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  550) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  551) 		foo_cleanup(fp->a);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  552) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  553) 		kfree(fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  554) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  555) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  556) The container_of() primitive is a macro that, given a pointer into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  557) struct, the type of the struct, and the pointed-to field within the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  558) struct, returns a pointer to the beginning of the struct.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  559) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  560) The use of call_rcu() permits the caller of foo_update_a() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  561) immediately regain control, without needing to worry further about the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  562) old version of the newly updated element.  It also clearly shows the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  563) RCU distinction between updater, namely foo_update_a(), and reclaimer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  564) namely foo_reclaim().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  565) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  566) The summary of advice is the same as for the previous section, except
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  567) that we are now using call_rcu() rather than synchronize_rcu():
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  568) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  569) -	Use call_rcu() **after** removing a data element from an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  570) 	RCU-protected data structure in order to register a callback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  571) 	function that will be invoked after the completion of all RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  572) 	read-side critical sections that might be referencing that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  573) 	data item.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  574) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  575) If the callback for call_rcu() is not doing anything more than calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  576) kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  577) to avoid having to write your own callback::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  578) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  579) 	kfree_rcu(old_fp, rcu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  580) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  581) Again, see checklist.txt for additional rules governing the use of RCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  583) .. _5_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  584) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  585) 5.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  586) ------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  588) One of the nice things about RCU is that it has extremely simple "toy"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  589) implementations that are a good first step towards understanding the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  590) production-quality implementations in the Linux kernel.  This section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  591) presents two such "toy" implementations of RCU, one that is implemented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  592) in terms of familiar locking primitives, and another that more closely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  593) resembles "classic" RCU.  Both are way too simple for real-world use,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  594) lacking both functionality and performance.  However, they are useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  595) in getting a feel for how RCU works.  See kernel/rcu/update.c for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  596) production-quality implementation, and see:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  597) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  598) 	http://www.rdrop.com/users/paulmck/RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  599) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  600) for papers describing the Linux kernel RCU implementation.  The OLS'01
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  601) and OLS'02 papers are a good introduction, and the dissertation provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  602) more details on the current implementation as of early 2004.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  603) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  604) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  605) 5A.  "TOY" IMPLEMENTATION #1: LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  606) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  607) This section presents a "toy" RCU implementation that is based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  608) familiar locking primitives.  Its overhead makes it a non-starter for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  609) real-life use, as does its lack of scalability.  It is also unsuitable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  610) for realtime use, since it allows scheduling latency to "bleed" from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  611) one read-side critical section to another.  It also assumes recursive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  612) reader-writer locks:  If you try this with non-recursive locks, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  613) you allow nested rcu_read_lock() calls, you can deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  614) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  615) However, it is probably the easiest implementation to relate to, so is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  616) a good starting point.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  617) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  618) It is extremely simple::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  619) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  620) 	static DEFINE_RWLOCK(rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  621) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  622) 	void rcu_read_lock(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  623) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  624) 		read_lock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  625) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  626) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  627) 	void rcu_read_unlock(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  628) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  629) 		read_unlock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  630) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  631) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  632) 	void synchronize_rcu(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  633) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  634) 		write_lock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  635) 		smp_mb__after_spinlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  636) 		write_unlock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  637) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  638) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  639) [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  640) much.  But here are simplified versions anyway.  And whatever you do,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  641) don't forget about them when submitting patches making use of RCU!]::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  642) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  643) 	#define rcu_assign_pointer(p, v) \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  644) 	({ \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  645) 		smp_store_release(&(p), (v)); \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  646) 	})
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  647) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  648) 	#define rcu_dereference(p) \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  649) 	({ \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  650) 		typeof(p) _________p1 = READ_ONCE(p); \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  651) 		(_________p1); \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  652) 	})
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  653) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  654) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  655) The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  656) and release a global reader-writer lock.  The synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  657) primitive write-acquires this same lock, then releases it.  This means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  658) that once synchronize_rcu() exits, all RCU read-side critical sections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  659) that were in progress before synchronize_rcu() was called are guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  660) to have completed -- there is no way that synchronize_rcu() would have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  661) been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  662) promotes synchronize_rcu() to a full memory barrier in compliance with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  663) the "Memory-Barrier Guarantees" listed in:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  664) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  665) 	Documentation/RCU/Design/Requirements/Requirements.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  666) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  667) It is possible to nest rcu_read_lock(), since reader-writer locks may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  668) be recursively acquired.  Note also that rcu_read_lock() is immune
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  669) from deadlock (an important property of RCU).  The reason for this is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  670) that the only thing that can block rcu_read_lock() is a synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  671) But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  672) so there can be no deadlock cycle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  673) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  674) .. _quiz_1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  675) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  676) Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  677) 		Why is this argument naive?  How could a deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  678) 		occur when using this algorithm in a real-world Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  679) 		kernel?  How could this deadlock be avoided?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  680) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  681) :ref:`Answers to Quick Quiz <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  682) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  683) 5B.  "TOY" EXAMPLE #2: CLASSIC RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  684) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  685) This section presents a "toy" RCU implementation that is based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  686) "classic RCU".  It is also short on performance (but only for updates) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  687) on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  688) kernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  689) are the same as those shown in the preceding section, so they are omitted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  690) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  691) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  692) 	void rcu_read_lock(void) { }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  693) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  694) 	void rcu_read_unlock(void) { }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  695) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  696) 	void synchronize_rcu(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  697) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  698) 		int cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  699) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  700) 		for_each_possible_cpu(cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  701) 			run_on(cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  702) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  703) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  704) Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  705) This is the great strength of classic RCU in a non-preemptive kernel:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  706) read-side overhead is precisely zero, at least on non-Alpha CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  707) And there is absolutely no way that rcu_read_lock() can possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  708) participate in a deadlock cycle!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  709) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  710) The implementation of synchronize_rcu() simply schedules itself on each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  711) CPU in turn.  The run_on() primitive can be implemented straightforwardly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  712) in terms of the sched_setaffinity() primitive.  Of course, a somewhat less
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  713) "toy" implementation would restore the affinity upon completion rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  714) than just leaving all tasks running on the last CPU, but when I said
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  715) "toy", I meant **toy**!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  716) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  717) So how the heck is this supposed to work???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  718) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  719) Remember that it is illegal to block while in an RCU read-side critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  720) section.  Therefore, if a given CPU executes a context switch, we know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  721) that it must have completed all preceding RCU read-side critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  722) Once **all** CPUs have executed a context switch, then **all** preceding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  723) RCU read-side critical sections will have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  724) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  725) So, suppose that we remove a data item from its structure and then invoke
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  726) synchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  727) that there are no RCU read-side critical sections holding a reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  728) to that data item, so we can safely reclaim it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  729) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  730) .. _quiz_2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  731) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  732) Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  733) 		Give an example where Classic RCU's read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  734) 		overhead is **negative**.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  735) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  736) :ref:`Answers to Quick Quiz <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  737) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  738) .. _quiz_3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  739) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  740) Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  741) 		If it is illegal to block in an RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  742) 		critical section, what the heck do you do in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  743) 		PREEMPT_RT, where normal spinlocks can block???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  744) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  745) :ref:`Answers to Quick Quiz <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  746) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  747) .. _6_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  748) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  749) 6.  ANALOGY WITH READER-WRITER LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  750) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  751) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  752) Although RCU can be used in many different ways, a very common use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  753) RCU is analogous to reader-writer locking.  The following unified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  754) diff shows how closely related RCU and reader-writer locking can be.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  755) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  756) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  757) 	@@ -5,5 +5,5 @@ struct el {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  758) 	 	int data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  759) 	 	/* Other data fields */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  760) 	 };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  761) 	-rwlock_t listmutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  762) 	+spinlock_t listmutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  763) 	 struct el head;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  764) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  765) 	@@ -13,15 +14,15 @@
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  766) 		struct list_head *lp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  767) 		struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  768) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  769) 	-	read_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  770) 	-	list_for_each_entry(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  771) 	+	rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  772) 	+	list_for_each_entry_rcu(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  773) 			if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  774) 				*result = p->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  775) 	-			read_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  776) 	+			rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  777) 				return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  778) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  779) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  780) 	-	read_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  781) 	+	rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  782) 		return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  783) 	 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  784) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  785) 	@@ -29,15 +30,16 @@
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  786) 	 {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  787) 		struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  788) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  789) 	-	write_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  790) 	+	spin_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  791) 		list_for_each_entry(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  792) 			if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  793) 	-			list_del(&p->list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  794) 	-			write_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  795) 	+			list_del_rcu(&p->list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  796) 	+			spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  797) 	+			synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  798) 				kfree(p);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  799) 				return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  800) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  801) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  802) 	-	write_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  803) 	+	spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  804) 		return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  805) 	 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  806) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  807) Or, for those who prefer a side-by-side listing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  808) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  809)  1 struct el {                          1 struct el {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  810)  2   struct list_head list;             2   struct list_head list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  811)  3   long key;                          3   long key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  812)  4   spinlock_t mutex;                  4   spinlock_t mutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  813)  5   int data;                          5   int data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  814)  6   /* Other data fields */            6   /* Other data fields */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  815)  7 };                                   7 };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  816)  8 rwlock_t listmutex;                  8 spinlock_t listmutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  817)  9 struct el head;                      9 struct el head;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  818) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  819) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  820) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  821)   1 int search(long key, int *result)    1 int search(long key, int *result)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  822)   2 {                                    2 {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  823)   3   struct list_head *lp;              3   struct list_head *lp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  824)   4   struct el *p;                      4   struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  825)   5                                      5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  826)   6   read_lock(&listmutex);             6   rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  827)   7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  828)   8     if (p->key == key) {             8     if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  829)   9       *result = p->data;             9       *result = p->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  830)  10       read_unlock(&listmutex);      10       rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  831)  11       return 1;                     11       return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  832)  12     }                               12     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  833)  13   }                                 13   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  834)  14   read_unlock(&listmutex);          14   rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  835)  15   return 0;                         15   return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  836)  16 }                                   16 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  837) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  838) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  839) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  840)   1 int delete(long key)                 1 int delete(long key)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  841)   2 {                                    2 {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  842)   3   struct el *p;                      3   struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  843)   4                                      4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  844)   5   write_lock(&listmutex);            5   spin_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  845)   6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  846)   7     if (p->key == key) {             7     if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  847)   8       list_del(&p->list);            8       list_del_rcu(&p->list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  848)   9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  849)                                         10       synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  850)  10       kfree(p);                     11       kfree(p);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  851)  11       return 1;                     12       return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  852)  12     }                               13     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  853)  13   }                                 14   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  854)  14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  855)  15   return 0;                         16   return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  856)  16 }                                   17 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  857) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  858) Either way, the differences are quite small.  Read-side locking moves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  859) to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  860) a reader-writer lock to a simple spinlock, and a synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  861) precedes the kfree().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  862) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  863) However, there is one potential catch: the read-side and update-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  864) critical sections can now run concurrently.  In many cases, this will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  865) not be a problem, but it is necessary to check carefully regardless.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  866) For example, if multiple independent list updates must be seen as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  867) a single atomic update, converting to RCU will require special care.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  868) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  869) Also, the presence of synchronize_rcu() means that the RCU version of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  870) delete() can now block.  If this is a problem, there is a callback-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  871) mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  872) be used in place of synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  873) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  874) .. _7_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  875) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  876) 7.  FULL LIST OF RCU APIs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  877) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  878) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  879) The RCU APIs are documented in docbook-format header comments in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  880) Linux-kernel source code, but it helps to have a full list of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  881) APIs, since there does not appear to be a way to categorize them
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  882) in docbook.  Here is the list, by category.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  883) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  884) RCU list traversal::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  885) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  886) 	list_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  887) 	list_entry_lockless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  888) 	list_first_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  889) 	list_next_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  890) 	list_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  891) 	list_for_each_entry_continue_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  892) 	list_for_each_entry_from_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  893) 	list_first_or_null_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  894) 	list_next_or_null_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  895) 	hlist_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  896) 	hlist_next_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  897) 	hlist_pprev_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  898) 	hlist_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  899) 	hlist_for_each_entry_rcu_bh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  900) 	hlist_for_each_entry_from_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  901) 	hlist_for_each_entry_continue_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  902) 	hlist_for_each_entry_continue_rcu_bh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  903) 	hlist_nulls_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  904) 	hlist_nulls_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  905) 	hlist_bl_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  906) 	hlist_bl_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  907) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  908) RCU pointer/list update::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  909) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  910) 	rcu_assign_pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  911) 	list_add_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  912) 	list_add_tail_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  913) 	list_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  914) 	list_replace_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  915) 	hlist_add_behind_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  916) 	hlist_add_before_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  917) 	hlist_add_head_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  918) 	hlist_add_tail_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  919) 	hlist_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  920) 	hlist_del_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  921) 	hlist_replace_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  922) 	list_splice_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  923) 	list_splice_tail_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  924) 	hlist_nulls_del_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  925) 	hlist_nulls_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  926) 	hlist_nulls_add_head_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  927) 	hlist_bl_add_head_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  928) 	hlist_bl_del_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  929) 	hlist_bl_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  930) 	hlist_bl_set_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  931) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  932) RCU::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  933) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  934) 	Critical sections	Grace period		Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  935) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  936) 	rcu_read_lock		synchronize_net		rcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  937) 	rcu_read_unlock		synchronize_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  938) 	rcu_dereference		synchronize_rcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  939) 	rcu_read_lock_held	call_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  940) 	rcu_dereference_check	kfree_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  941) 	rcu_dereference_protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  942) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  943) bh::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  944) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  945) 	Critical sections	Grace period		Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  946) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  947) 	rcu_read_lock_bh	call_rcu		rcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  948) 	rcu_read_unlock_bh	synchronize_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  949) 	[local_bh_disable]	synchronize_rcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  950) 	[and friends]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  951) 	rcu_dereference_bh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  952) 	rcu_dereference_bh_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  953) 	rcu_dereference_bh_protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  954) 	rcu_read_lock_bh_held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  955) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  956) sched::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  957) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  958) 	Critical sections	Grace period		Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  959) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  960) 	rcu_read_lock_sched	call_rcu		rcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  961) 	rcu_read_unlock_sched	synchronize_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  962) 	[preempt_disable]	synchronize_rcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  963) 	[and friends]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  964) 	rcu_read_lock_sched_notrace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  965) 	rcu_read_unlock_sched_notrace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  966) 	rcu_dereference_sched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  967) 	rcu_dereference_sched_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  968) 	rcu_dereference_sched_protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  969) 	rcu_read_lock_sched_held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  970) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  971) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  972) SRCU::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  973) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  974) 	Critical sections	Grace period		Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  975) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  976) 	srcu_read_lock		call_srcu		srcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  977) 	srcu_read_unlock	synchronize_srcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  978) 	srcu_dereference	synchronize_srcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  979) 	srcu_dereference_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  980) 	srcu_read_lock_held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  981) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  982) SRCU: Initialization/cleanup::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  983) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  984) 	DEFINE_SRCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  985) 	DEFINE_STATIC_SRCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  986) 	init_srcu_struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  987) 	cleanup_srcu_struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  988) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  989) All: lockdep-checked RCU-protected pointer access::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  990) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  991) 	rcu_access_pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  992) 	rcu_dereference_raw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  993) 	RCU_LOCKDEP_WARN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  994) 	rcu_sleep_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  995) 	RCU_NONIDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  996) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  997) See the comment headers in the source code (or the docbook generated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  998) from them) for more information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  999) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) However, given that there are no fewer than four families of RCU APIs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) in the Linux kernel, how do you choose which one to use?  The following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) list can be helpful:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) a.	Will readers need to block?  If so, you need SRCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) b.	What about the -rt patchset?  If readers would need to block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) 	in an non-rt kernel, you need SRCU.  If readers would block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) 	in a -rt kernel, but not in a non-rt kernel, SRCU is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) 	necessary.  (The -rt patchset turns spinlocks into sleeplocks,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) 	hence this distinction.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) c.	Do you need to treat NMI handlers, hardirq handlers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) 	and code segments with preemption disabled (whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) 	via preempt_disable(), local_irq_save(), local_bh_disable(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) 	or some other mechanism) as if they were explicit RCU readers?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) 	If so, RCU-sched is the only choice that will work for you.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) d.	Do you need RCU grace periods to complete even in the face
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) 	of softirq monopolization of one or more of the CPUs?  For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) 	example, is your code subject to network-based denial-of-service
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) 	attacks?  If so, you should disable softirq across your readers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) 	for example, by using rcu_read_lock_bh().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) e.	Is your workload too update-intensive for normal use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) 	RCU, but inappropriate for other synchronization mechanisms?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) 	If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) 	named SLAB_DESTROY_BY_RCU).  But please be careful!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) f.	Do you need read-side critical sections that are respected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) 	even though they are in the middle of the idle loop, during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) 	user-mode execution, or on an offlined CPU?  If so, SRCU is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) 	only choice that will work for you.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) g.	Otherwise, use RCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) Of course, this all assumes that you have determined that RCU is in fact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) the right tool for your job.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) .. _8_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) 8.  ANSWERS TO QUICK QUIZZES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) 		Why is this argument naive?  How could a deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) 		occur when using this algorithm in a real-world Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) 		kernel?  [Referring to the lock-based "toy" RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) 		algorithm.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) Answer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) 		Consider the following sequence of events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) 		1.	CPU 0 acquires some unrelated lock, call it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) 			"problematic_lock", disabling irq via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) 			spin_lock_irqsave().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) 		2.	CPU 1 enters synchronize_rcu(), write-acquiring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) 			rcu_gp_mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) 		3.	CPU 0 enters rcu_read_lock(), but must wait
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) 			because CPU 1 holds rcu_gp_mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) 		4.	CPU 1 is interrupted, and the irq handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) 			attempts to acquire problematic_lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) 		The system is now deadlocked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) 		One way to avoid this deadlock is to use an approach like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) 		that of CONFIG_PREEMPT_RT, where all normal spinlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) 		become blocking locks, and all irq handlers execute in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) 		the context of special tasks.  In this case, in step 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) 		above, the irq handler would block, allowing CPU 1 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) 		release rcu_gp_mutex, avoiding the deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) 		Even in the absence of deadlock, this RCU implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) 		allows latency to "bleed" from readers to other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) 		readers through synchronize_rcu().  To see this,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) 		consider task A in an RCU read-side critical section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) 		(thus read-holding rcu_gp_mutex), task B blocked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) 		attempting to write-acquire rcu_gp_mutex, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) 		task C blocked in rcu_read_lock() attempting to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) 		read_acquire rcu_gp_mutex.  Task A's RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) 		latency is holding up task C, albeit indirectly via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) 		task B.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) 		Realtime RCU implementations therefore use a counter-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) 		approach where tasks in RCU read-side critical sections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) 		cannot be blocked by tasks executing synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) :ref:`Back to Quick Quiz #1 <quiz_1>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) 		Give an example where Classic RCU's read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) 		overhead is **negative**.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) Answer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) 		Imagine a single-CPU system with a non-CONFIG_PREEMPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) 		kernel where a routing table is used by process-context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) 		code, but can be updated by irq-context code (for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) 		by an "ICMP REDIRECT" packet).	The usual way of handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101) 		this would be to have the process-context code disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) 		interrupts while searching the routing table.  Use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) 		RCU allows such interrupt-disabling to be dispensed with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) 		Thus, without RCU, you pay the cost of disabling interrupts,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) 		and with RCU you don't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) 		One can argue that the overhead of RCU in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) 		case is negative with respect to the single-CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) 		interrupt-disabling approach.  Others might argue that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) 		the overhead of RCU is merely zero, and that replacing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) 		the positive overhead of the interrupt-disabling scheme
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) 		with the zero-overhead RCU scheme does not constitute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) 		negative overhead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) 		In real life, of course, things are more complex.  But
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) 		even the theoretical possibility of negative overhead for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117) 		a synchronization primitive is a bit unexpected.  ;-)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) :ref:`Back to Quick Quiz #2 <quiz_2>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) 		If it is illegal to block in an RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) 		critical section, what the heck do you do in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) 		PREEMPT_RT, where normal spinlocks can block???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) Answer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) 		Just as PREEMPT_RT permits preemption of spinlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) 		critical sections, it permits preemption of RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) 		read-side critical sections.  It also permits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) 		spinlocks blocking while in RCU read-side critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) 		sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) 		Why the apparent inconsistency?  Because it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) 		possible to use priority boosting to keep the RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135) 		grace periods short if need be (for example, if running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) 		short of memory).  In contrast, if blocking waiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) 		for (say) network reception, there is no way to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) 		what should be boosted.  Especially given that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) 		process we need to boost might well be a human being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) 		who just went out for a pizza or something.  And although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) 		a computer-operated cattle prod might arouse serious
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) 		interest, it might also provoke serious objections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) 		Besides, how does the computer know what pizza parlor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) 		the human being went to???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) :ref:`Back to Quick Quiz #3 <quiz_3>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) ACKNOWLEDGEMENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) My thanks to the people who helped make this human-readable, including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) For more information, see http://www.rdrop.com/users/paulmck/RCU.