^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _whatisrcu_doc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) What is RCU? -- "Read, Copy, Update"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) Please note that the "What is RCU?" LWN series is an excellent place
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) to start learning about RCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) | 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) | 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) | 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) | 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) | 2010 Big API Table http://lwn.net/Articles/419086/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) | 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) | 2014 Big API Table http://lwn.net/Articles/609973/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) What is RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) RCU is a synchronization mechanism that was added to the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) during the 2.5 development effort that is optimized for read-mostly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) situations. Although RCU is actually quite simple once you understand it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) getting there can sometimes be a challenge. Part of the problem is that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) most of the past descriptions of RCU have been written with the mistaken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) assumption that there is "one true way" to describe RCU. Instead,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) the experience has been that different people must take different paths
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) to arrive at an understanding of RCU. This document provides several
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) different paths, as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) :ref:`1. RCU OVERVIEW <1_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) :ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) :ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) :ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) :ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) :ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) :ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) :ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) People who prefer starting with a conceptual overview should focus on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) Section 1, though most readers will profit by reading this section at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) some point. People who prefer to start with an API that they can then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) experiment with should focus on Section 2. People who prefer to start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) with example uses should focus on Sections 3 and 4. People who need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) understand the RCU implementation should focus on Section 5, then dive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) into the kernel source code. People who reason best by analogy should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) focus on Section 6. Section 7 serves as an index to the docbook API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) documentation, and Section 8 is the traditional answer key.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) So, start with the section that makes the most sense to you and your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) preferred method of learning. If you need to know everything about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) everything, feel free to read the whole thing -- but if you are really
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) that type of person, you have perused the source code and will therefore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) never need this document anyway. ;-)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) .. _1_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) 1. RCU OVERVIEW
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) The basic idea behind RCU is to split updates into "removal" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) "reclamation" phases. The removal phase removes references to data items
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) within a data structure (possibly by replacing them with references to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) new versions of these data items), and can run concurrently with readers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) The reason that it is safe to run the removal phase concurrently with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) readers is the semantics of modern CPUs guarantee that readers will see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) either the old or the new version of the data structure rather than a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) partially updated reference. The reclamation phase does the work of reclaiming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) (e.g., freeing) the data items removed from the data structure during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) removal phase. Because reclaiming data items can disrupt any readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) concurrently referencing those data items, the reclamation phase must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) not start until readers no longer hold references to those data items.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) Splitting the update into removal and reclamation phases permits the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) updater to perform the removal phase immediately, and to defer the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) reclamation phase until all readers active during the removal phase have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) completed, either by blocking until they finish or by registering a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) callback that is invoked after they finish. Only readers that are active
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) during the removal phase need be considered, because any reader starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) after the removal phase will be unable to gain a reference to the removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) data items, and therefore cannot be disrupted by the reclamation phase.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) So the typical RCU update sequence goes something like the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) a. Remove pointers to a data structure, so that subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) readers cannot gain a reference to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) b. Wait for all previous readers to complete their RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) c. At this point, there cannot be any readers who hold references
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) to the data structure, so it now may safely be reclaimed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) (e.g., kfree()d).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) Step (b) above is the key idea underlying RCU's deferred destruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) The ability to wait until all readers are done allows RCU readers to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) use much lighter-weight synchronization, in some cases, absolutely no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) synchronization at all. In contrast, in more conventional lock-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) schemes, readers must use heavy-weight synchronization in order to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) prevent an updater from deleting the data structure out from under them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) This is because lock-based updaters typically update data items in place,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) and must therefore exclude readers. In contrast, RCU-based updaters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) typically take advantage of the fact that writes to single aligned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) pointers are atomic on modern CPUs, allowing atomic insertion, removal,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) and replacement of data items in a linked structure without disrupting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) readers. Concurrent RCU readers can then continue accessing the old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) versions, and can dispense with the atomic operations, memory barriers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) and communications cache misses that are so expensive on present-day
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) SMP computer systems, even in absence of lock contention.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) In the three-step procedure shown above, the updater is performing both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) the removal and the reclamation step, but it is often helpful for an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) entirely different thread to do the reclamation, as is in fact the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) in the Linux kernel's directory-entry cache (dcache). Even if the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) thread performs both the update step (step (a) above) and the reclamation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) step (step (c) above), it is often helpful to think of them separately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) For example, RCU readers and updaters need not communicate at all,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) but RCU provides implicit low-overhead communication between readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) and reclaimers, namely, in step (b) above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) So how the heck can a reclaimer tell when a reader is done, given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) that readers are not doing any sort of synchronization operations???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Read on to learn about how RCU's API makes this easy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) .. _2_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 2. WHAT IS RCU'S CORE API?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) The core RCU API is quite small:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) a. rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) b. rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) c. synchronize_rcu() / call_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) d. rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) e. rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) There are many other members of the RCU API, but the rest can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) expressed in terms of these five, though most implementations instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) express synchronize_rcu() in terms of the call_rcu() callback API.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) The five core RCU APIs are described below, the other 18 will be enumerated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) later. See the kernel docbook documentation for more info, or look directly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) at the function header comments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) ^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) void rcu_read_lock(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Used by a reader to inform the reclaimer that the reader is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) entering an RCU read-side critical section. It is illegal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) to block while in an RCU read-side critical section, though
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) kernels built with CONFIG_PREEMPT_RCU can preempt RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) read-side critical sections. Any RCU-protected data structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) accessed during an RCU read-side critical section is guaranteed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) remain unreclaimed for the full duration of that critical section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) Reference counts may be used in conjunction with RCU to maintain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) longer-term references to data structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) ^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) void rcu_read_unlock(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) Used by a reader to inform the reclaimer that the reader is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) exiting an RCU read-side critical section. Note that RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) read-side critical sections may be nested and/or overlapping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) ^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) void synchronize_rcu(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) Marks the end of updater code and the beginning of reclaimer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) code. It does this by blocking until all pre-existing RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) read-side critical sections on all CPUs have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) Note that synchronize_rcu() will **not** necessarily wait for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) any subsequent RCU read-side critical sections to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) For example, consider the following sequence of events::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) CPU 0 CPU 1 CPU 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) ----------------- ------------------------- ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 1. rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 2. enters synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 3. rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 4. rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 5. exits synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 6. rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) To reiterate, synchronize_rcu() waits only for ongoing RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) read-side critical sections to complete, not necessarily for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) any that begin after synchronize_rcu() is invoked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) Of course, synchronize_rcu() does not necessarily return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) **immediately** after the last pre-existing RCU read-side critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) section completes. For one thing, there might well be scheduling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) delays. For another thing, many RCU implementations process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) requests in batches in order to improve efficiencies, which can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) further delay synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) Since synchronize_rcu() is the API that must figure out when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) readers are done, its implementation is key to RCU. For RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) to be useful in all but the most read-intensive situations,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) synchronize_rcu()'s overhead must also be quite small.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) The call_rcu() API is a callback form of synchronize_rcu(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) and is described in more detail in a later section. Instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) blocking, it registers a function and argument which are invoked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) after all ongoing RCU read-side critical sections have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) This callback variant is particularly useful in situations where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) it is illegal to block or where update-side performance is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) critically important.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) However, the call_rcu() API should not be used lightly, as use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) of the synchronize_rcu() API generally results in simpler code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) In addition, the synchronize_rcu() API has the nice property
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) of automatically limiting update rate should grace periods
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) be delayed. This property results in system resilience in face
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) of denial-of-service attacks. Code using call_rcu() should limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) update rate in order to gain this same sort of resilience. See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) checklist.txt for some approaches to limiting the update rate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) ^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) void rcu_assign_pointer(p, typeof(p) v);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) Yes, rcu_assign_pointer() **is** implemented as a macro, though it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) would be cool to be able to declare a function in this manner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) (Compiler experts will no doubt disagree.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) The updater uses this function to assign a new value to an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) RCU-protected pointer, in order to safely communicate the change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) in value from the updater to the reader. This macro does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) evaluate to an rvalue, but it does execute any memory-barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) instructions required for a given CPU architecture.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) Perhaps just as important, it serves to document (1) which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) pointers are protected by RCU and (2) the point at which a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) given structure becomes accessible to other CPUs. That said,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) rcu_assign_pointer() is most frequently used indirectly, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) the _rcu list-manipulation primitives such as list_add_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) ^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) typeof(p) rcu_dereference(p);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) Like rcu_assign_pointer(), rcu_dereference() must be implemented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) as a macro.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) The reader uses rcu_dereference() to fetch an RCU-protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) pointer, which returns a value that may then be safely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) dereferenced. Note that rcu_dereference() does not actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) dereference the pointer, instead, it protects the pointer for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) later dereferencing. It also executes any needed memory-barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) instructions for a given CPU architecture. Currently, only Alpha
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) needs memory barriers within rcu_dereference() -- on other CPUs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) it compiles to nothing, not even a compiler directive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) Common coding practice uses rcu_dereference() to copy an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) RCU-protected pointer to a local variable, then dereferences
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) this local variable, for example as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) p = rcu_dereference(head.next);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) return p->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) However, in this case, one could just as easily combine these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) into one statement::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) return rcu_dereference(head.next)->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) If you are going to be fetching multiple fields from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) RCU-protected structure, using the local variable is of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) course preferred. Repeated rcu_dereference() calls look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) ugly, do not guarantee that the same pointer will be returned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) if an update happened while in the critical section, and incur
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) unnecessary overhead on Alpha CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) Note that the value returned by rcu_dereference() is valid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) only within the enclosing RCU read-side critical section [1]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) For example, the following is **not** legal::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) p = rcu_dereference(head.next);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) x = p->address; /* BUG!!! */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) y = p->data; /* BUG!!! */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) Holding a reference from one RCU read-side critical section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) to another is just as illegal as holding a reference from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) one lock-based critical section to another! Similarly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) using a reference outside of the critical section in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) it was acquired is just as illegal as doing so with normal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) locking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) As with rcu_assign_pointer(), an important function of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) rcu_dereference() is to document which pointers are protected by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) RCU, in particular, flagging a pointer that is subject to changing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) at any time, including immediately after the rcu_dereference().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) And, again like rcu_assign_pointer(), rcu_dereference() is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) typically used indirectly, via the _rcu list-manipulation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) primitives, such as list_for_each_entry_rcu() [2]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) .. [1] The variant rcu_dereference_protected() can be used outside
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) of an RCU read-side critical section as long as the usage is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) protected by locks acquired by the update-side code. This variant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) avoids the lockdep warning that would happen when using (for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) example) rcu_dereference() without rcu_read_lock() protection.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) Using rcu_dereference_protected() also has the advantage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) of permitting compiler optimizations that rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) must prohibit. The rcu_dereference_protected() variant takes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) a lockdep expression to indicate which locks must be acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) by the caller. If the indicated protection is not provided,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) and the API's code comments for more details and example usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) .. [2] If the list_for_each_entry_rcu() instance might be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) update-side code as well as by RCU readers, then an additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) lockdep expression can be added to its list of arguments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) For example, given an additional "lock_is_held(&mylock)" argument,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) the RCU lockdep code would complain only if this instance was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) invoked outside of an RCU read-side critical section and without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) the protection of mylock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) The following diagram shows how each API communicates among the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) reader, updater, and reclaimer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) +--------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) +---------------------->| reader |---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) | +--------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) | | | Protect:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) | | | rcu_read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) | | | rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) | rcu_dereference() | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) +---------+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) | updater |<----------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) +---------+ V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) | +-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) +----------------------------------->| reclaimer |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) +-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) Defer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) synchronize_rcu() & call_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) The RCU infrastructure observes the time sequence of rcu_read_lock(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) order to determine when (1) synchronize_rcu() invocations may return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) to their callers and (2) call_rcu() callbacks may be invoked. Efficient
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) implementations of the RCU infrastructure make heavy use of batching in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) order to amortize their overhead over many uses of the corresponding APIs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) There are at least three flavors of RCU usage in the Linux kernel. The diagram
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) above shows the most common one. On the updater side, the rcu_assign_pointer(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) synchronize_rcu() and call_rcu() primitives used are the same for all three
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) flavors. However for protection (on the reader side), the primitives used vary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) depending on the flavor:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) a. rcu_read_lock() / rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) b. rcu_read_lock_bh() / rcu_read_unlock_bh()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) local_bh_disable() / local_bh_enable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) rcu_dereference_bh()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) c. rcu_read_lock_sched() / rcu_read_unlock_sched()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) preempt_disable() / preempt_enable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) local_irq_save() / local_irq_restore()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) hardirq enter / hardirq exit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) NMI enter / NMI exit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) rcu_dereference_sched()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) These three flavors are used as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) a. RCU applied to normal data structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) b. RCU applied to networking data structures that may be subjected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) to remote denial-of-service attacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) c. RCU applied to scheduler and interrupt/NMI-handler tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) Again, most uses will be of (a). The (b) and (c) cases are important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) for specialized uses, but are relatively uncommon.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) .. _3_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) -----------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) This section shows a simple use of the core RCU API to protect a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) global pointer to a dynamically allocated structure. More-typical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) struct foo {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) int a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) char b;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) long c;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) DEFINE_SPINLOCK(foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) struct foo __rcu *gbl_foo;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) * Create a new struct foo that is the same as the one currently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) * pointed to by gbl_foo, except that field "a" is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) * with "new_a". Points gbl_foo to the new structure, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) * frees up the old structure after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) * Uses rcu_assign_pointer() to ensure that concurrent readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) * see the initialized version of the new structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) * Uses synchronize_rcu() to ensure that any readers that might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) * have references to the old structure complete before freeing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) * the old structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) void foo_update_a(int new_a)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) struct foo *new_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) struct foo *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) spin_lock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) *new_fp = *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) new_fp->a = new_a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) rcu_assign_pointer(gbl_foo, new_fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) spin_unlock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) kfree(old_fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) * Return the value of field "a" of the current gbl_foo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) * structure. Use rcu_read_lock() and rcu_read_unlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) * to ensure that the structure does not get deleted out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) * from under us, and use rcu_dereference() to ensure that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) * we see the initialized version of the structure (important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) * for DEC Alpha and for people reading the code).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) int foo_get_a(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) int retval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) retval = rcu_dereference(gbl_foo)->a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) return retval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) So, to sum up:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) - Use rcu_read_lock() and rcu_read_unlock() to guard RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) read-side critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) - Within an RCU read-side critical section, use rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) to dereference RCU-protected pointers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) - Use some solid scheme (such as locks or semaphores) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) keep concurrent updates from interfering with each other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) - Use rcu_assign_pointer() to update an RCU-protected pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) This primitive protects concurrent readers from the updater,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) **not** concurrent updates from each other! You therefore still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) need to use locking (or something similar) to keep concurrent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) rcu_assign_pointer() primitives from interfering with each other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) - Use synchronize_rcu() **after** removing a data element from an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) RCU-protected data structure, but **before** reclaiming/freeing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) the data element, in order to wait for the completion of all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) RCU read-side critical sections that might be referencing that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) data item.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) See checklist.txt for additional rules to follow when using RCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) And again, more-typical uses of RCU may be found in :ref:`listRCU.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) <list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) <NMI_rcu_doc>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) .. _4_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) --------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) In the example above, foo_update_a() blocks until a grace period elapses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) This is quite simple, but in some cases one cannot afford to wait so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) long -- there might be other high-priority work to be done.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) In such cases, one uses call_rcu() rather than synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) The call_rcu() API is as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) void call_rcu(struct rcu_head * head,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) void (*func)(struct rcu_head *head));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) This function invokes func(head) after a grace period has elapsed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) This invocation might happen from either softirq or process context,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) so the function is not permitted to block. The foo struct needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) have an rcu_head structure added, perhaps as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) struct foo {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) int a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) char b;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) long c;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) struct rcu_head rcu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) The foo_update_a() function might then be written as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) * Create a new struct foo that is the same as the one currently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) * pointed to by gbl_foo, except that field "a" is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) * with "new_a". Points gbl_foo to the new structure, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) * frees up the old structure after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) * Uses rcu_assign_pointer() to ensure that concurrent readers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) * see the initialized version of the new structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) * Uses call_rcu() to ensure that any readers that might have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) * references to the old structure complete before freeing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) * old structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) void foo_update_a(int new_a)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) struct foo *new_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) struct foo *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) spin_lock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) *new_fp = *old_fp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) new_fp->a = new_a;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) rcu_assign_pointer(gbl_foo, new_fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) spin_unlock(&foo_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) call_rcu(&old_fp->rcu, foo_reclaim);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) The foo_reclaim() function might appear as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) void foo_reclaim(struct rcu_head *rp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) struct foo *fp = container_of(rp, struct foo, rcu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) foo_cleanup(fp->a);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) kfree(fp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) The container_of() primitive is a macro that, given a pointer into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) struct, the type of the struct, and the pointed-to field within the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) struct, returns a pointer to the beginning of the struct.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) The use of call_rcu() permits the caller of foo_update_a() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) immediately regain control, without needing to worry further about the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) old version of the newly updated element. It also clearly shows the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) RCU distinction between updater, namely foo_update_a(), and reclaimer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) namely foo_reclaim().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) The summary of advice is the same as for the previous section, except
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) that we are now using call_rcu() rather than synchronize_rcu():
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) - Use call_rcu() **after** removing a data element from an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) RCU-protected data structure in order to register a callback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) function that will be invoked after the completion of all RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) read-side critical sections that might be referencing that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) data item.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) If the callback for call_rcu() is not doing anything more than calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) to avoid having to write your own callback::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) kfree_rcu(old_fp, rcu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) Again, see checklist.txt for additional rules governing the use of RCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) .. _5_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) ------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) One of the nice things about RCU is that it has extremely simple "toy"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) implementations that are a good first step towards understanding the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) production-quality implementations in the Linux kernel. This section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) presents two such "toy" implementations of RCU, one that is implemented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) in terms of familiar locking primitives, and another that more closely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) resembles "classic" RCU. Both are way too simple for real-world use,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) lacking both functionality and performance. However, they are useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) in getting a feel for how RCU works. See kernel/rcu/update.c for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) production-quality implementation, and see:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) http://www.rdrop.com/users/paulmck/RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) for papers describing the Linux kernel RCU implementation. The OLS'01
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) and OLS'02 papers are a good introduction, and the dissertation provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) more details on the current implementation as of early 2004.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) 5A. "TOY" IMPLEMENTATION #1: LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) This section presents a "toy" RCU implementation that is based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) familiar locking primitives. Its overhead makes it a non-starter for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) real-life use, as does its lack of scalability. It is also unsuitable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) for realtime use, since it allows scheduling latency to "bleed" from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) one read-side critical section to another. It also assumes recursive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) reader-writer locks: If you try this with non-recursive locks, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) you allow nested rcu_read_lock() calls, you can deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) However, it is probably the easiest implementation to relate to, so is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) a good starting point.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) It is extremely simple::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) static DEFINE_RWLOCK(rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) void rcu_read_lock(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) read_lock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) void rcu_read_unlock(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) read_unlock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) void synchronize_rcu(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) write_lock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) smp_mb__after_spinlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) write_unlock(&rcu_gp_mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) much. But here are simplified versions anyway. And whatever you do,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) don't forget about them when submitting patches making use of RCU!]::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) #define rcu_assign_pointer(p, v) \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) ({ \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) smp_store_release(&(p), (v)); \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) })
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) #define rcu_dereference(p) \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) ({ \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) typeof(p) _________p1 = READ_ONCE(p); \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) (_________p1); \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) })
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) and release a global reader-writer lock. The synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) primitive write-acquires this same lock, then releases it. This means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) that once synchronize_rcu() exits, all RCU read-side critical sections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) that were in progress before synchronize_rcu() was called are guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) to have completed -- there is no way that synchronize_rcu() would have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) been able to write-acquire the lock otherwise. The smp_mb__after_spinlock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) promotes synchronize_rcu() to a full memory barrier in compliance with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) the "Memory-Barrier Guarantees" listed in:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) Documentation/RCU/Design/Requirements/Requirements.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) It is possible to nest rcu_read_lock(), since reader-writer locks may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) be recursively acquired. Note also that rcu_read_lock() is immune
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) from deadlock (an important property of RCU). The reason for this is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) that the only thing that can block rcu_read_lock() is a synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) so there can be no deadlock cycle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) .. _quiz_1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) Why is this argument naive? How could a deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) occur when using this algorithm in a real-world Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) kernel? How could this deadlock be avoided?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) :ref:`Answers to Quick Quiz <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) 5B. "TOY" EXAMPLE #2: CLASSIC RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) This section presents a "toy" RCU implementation that is based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) "classic RCU". It is also short on performance (but only for updates) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) are the same as those shown in the preceding section, so they are omitted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) void rcu_read_lock(void) { }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) void rcu_read_unlock(void) { }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) void synchronize_rcu(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698) int cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) for_each_possible_cpu(cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) run_on(cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) This is the great strength of classic RCU in a non-preemptive kernel:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) read-side overhead is precisely zero, at least on non-Alpha CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) And there is absolutely no way that rcu_read_lock() can possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) participate in a deadlock cycle!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) The implementation of synchronize_rcu() simply schedules itself on each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) CPU in turn. The run_on() primitive can be implemented straightforwardly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) in terms of the sched_setaffinity() primitive. Of course, a somewhat less
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) "toy" implementation would restore the affinity upon completion rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) than just leaving all tasks running on the last CPU, but when I said
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) "toy", I meant **toy**!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) So how the heck is this supposed to work???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) Remember that it is illegal to block while in an RCU read-side critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) section. Therefore, if a given CPU executes a context switch, we know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) that it must have completed all preceding RCU read-side critical sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) Once **all** CPUs have executed a context switch, then **all** preceding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) RCU read-side critical sections will have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) So, suppose that we remove a data item from its structure and then invoke
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) that there are no RCU read-side critical sections holding a reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) to that data item, so we can safely reclaim it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) .. _quiz_2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) Give an example where Classic RCU's read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) overhead is **negative**.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) :ref:`Answers to Quick Quiz <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) .. _quiz_3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) If it is illegal to block in an RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) critical section, what the heck do you do in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) PREEMPT_RT, where normal spinlocks can block???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) :ref:`Answers to Quick Quiz <8_whatisRCU>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) .. _6_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) 6. ANALOGY WITH READER-WRITER LOCKING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) Although RCU can be used in many different ways, a very common use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) RCU is analogous to reader-writer locking. The following unified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) diff shows how closely related RCU and reader-writer locking can be.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) @@ -5,5 +5,5 @@ struct el {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) int data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) /* Other data fields */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) -rwlock_t listmutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) +spinlock_t listmutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) struct el head;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) @@ -13,15 +14,15 @@
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) struct list_head *lp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) - read_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) - list_for_each_entry(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) + rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) + list_for_each_entry_rcu(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) *result = p->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) - read_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) + rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) - read_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) + rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) @@ -29,15 +30,16 @@
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789) - write_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) + spin_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791) list_for_each_entry(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) - list_del(&p->list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) - write_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795) + list_del_rcu(&p->list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) + spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) + synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) kfree(p);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799) return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) - write_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803) + spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807) Or, for those who prefer a side-by-side listing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809) 1 struct el { 1 struct el {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810) 2 struct list_head list; 2 struct list_head list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811) 3 long key; 3 long key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812) 4 spinlock_t mutex; 4 spinlock_t mutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813) 5 int data; 5 int data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814) 6 /* Other data fields */ 6 /* Other data fields */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815) 7 }; 7 };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816) 8 rwlock_t listmutex; 8 spinlock_t listmutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817) 9 struct el head; 9 struct el head;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821) 1 int search(long key, int *result) 1 int search(long key, int *result)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822) 2 { 2 {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823) 3 struct list_head *lp; 3 struct list_head *lp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824) 4 struct el *p; 4 struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825) 5 5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826) 6 read_lock(&listmutex); 6 rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827) 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828) 8 if (p->key == key) { 8 if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829) 9 *result = p->data; 9 *result = p->data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) 10 read_unlock(&listmutex); 10 rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831) 11 return 1; 11 return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832) 12 } 12 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833) 13 } 13 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834) 14 read_unlock(&listmutex); 14 rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835) 15 return 0; 15 return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836) 16 } 16 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840) 1 int delete(long key) 1 int delete(long key)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841) 2 { 2 {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842) 3 struct el *p; 3 struct el *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) 4 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844) 5 write_lock(&listmutex); 5 spin_lock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845) 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846) 7 if (p->key == key) { 7 if (p->key == key) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847) 8 list_del(&p->list); 8 list_del_rcu(&p->list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848) 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849) 10 synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850) 10 kfree(p); 11 kfree(p);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851) 11 return 1; 12 return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852) 12 } 13 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853) 13 } 14 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854) 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) 15 return 0; 16 return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856) 16 } 17 }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858) Either way, the differences are quite small. Read-side locking moves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859) to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860) a reader-writer lock to a simple spinlock, and a synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861) precedes the kfree().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863) However, there is one potential catch: the read-side and update-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864) critical sections can now run concurrently. In many cases, this will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865) not be a problem, but it is necessary to check carefully regardless.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866) For example, if multiple independent list updates must be seen as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) a single atomic update, converting to RCU will require special care.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869) Also, the presence of synchronize_rcu() means that the RCU version of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870) delete() can now block. If this is a problem, there is a callback-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871) mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872) be used in place of synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874) .. _7_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876) 7. FULL LIST OF RCU APIs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879) The RCU APIs are documented in docbook-format header comments in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) Linux-kernel source code, but it helps to have a full list of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) APIs, since there does not appear to be a way to categorize them
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882) in docbook. Here is the list, by category.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884) RCU list traversal::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886) list_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887) list_entry_lockless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888) list_first_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889) list_next_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890) list_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891) list_for_each_entry_continue_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892) list_for_each_entry_from_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893) list_first_or_null_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894) list_next_or_null_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) hlist_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896) hlist_next_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897) hlist_pprev_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898) hlist_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899) hlist_for_each_entry_rcu_bh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900) hlist_for_each_entry_from_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901) hlist_for_each_entry_continue_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902) hlist_for_each_entry_continue_rcu_bh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903) hlist_nulls_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904) hlist_nulls_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905) hlist_bl_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906) hlist_bl_for_each_entry_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908) RCU pointer/list update::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910) rcu_assign_pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911) list_add_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912) list_add_tail_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913) list_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914) list_replace_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915) hlist_add_behind_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916) hlist_add_before_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917) hlist_add_head_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918) hlist_add_tail_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919) hlist_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920) hlist_del_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921) hlist_replace_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922) list_splice_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923) list_splice_tail_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924) hlist_nulls_del_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) hlist_nulls_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) hlist_nulls_add_head_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927) hlist_bl_add_head_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928) hlist_bl_del_init_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929) hlist_bl_del_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930) hlist_bl_set_first_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932) RCU::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934) Critical sections Grace period Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936) rcu_read_lock synchronize_net rcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937) rcu_read_unlock synchronize_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938) rcu_dereference synchronize_rcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939) rcu_read_lock_held call_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) rcu_dereference_check kfree_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941) rcu_dereference_protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943) bh::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945) Critical sections Grace period Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947) rcu_read_lock_bh call_rcu rcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948) rcu_read_unlock_bh synchronize_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949) [local_bh_disable] synchronize_rcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950) [and friends]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951) rcu_dereference_bh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952) rcu_dereference_bh_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953) rcu_dereference_bh_protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954) rcu_read_lock_bh_held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) sched::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958) Critical sections Grace period Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960) rcu_read_lock_sched call_rcu rcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961) rcu_read_unlock_sched synchronize_rcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962) [preempt_disable] synchronize_rcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963) [and friends]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964) rcu_read_lock_sched_notrace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965) rcu_read_unlock_sched_notrace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966) rcu_dereference_sched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967) rcu_dereference_sched_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968) rcu_dereference_sched_protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969) rcu_read_lock_sched_held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972) SRCU::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974) Critical sections Grace period Barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976) srcu_read_lock call_srcu srcu_barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977) srcu_read_unlock synchronize_srcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978) srcu_dereference synchronize_srcu_expedited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979) srcu_dereference_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980) srcu_read_lock_held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982) SRCU: Initialization/cleanup::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 984) DEFINE_SRCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 985) DEFINE_STATIC_SRCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 986) init_srcu_struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 987) cleanup_srcu_struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 988)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 989) All: lockdep-checked RCU-protected pointer access::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 990)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 991) rcu_access_pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 992) rcu_dereference_raw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 993) RCU_LOCKDEP_WARN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 994) rcu_sleep_check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 995) RCU_NONIDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 996)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 997) See the comment headers in the source code (or the docbook generated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 998) from them) for more information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 999)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) However, given that there are no fewer than four families of RCU APIs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) in the Linux kernel, how do you choose which one to use? The following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) list can be helpful:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) a. Will readers need to block? If so, you need SRCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) b. What about the -rt patchset? If readers would need to block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) in an non-rt kernel, you need SRCU. If readers would block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) in a -rt kernel, but not in a non-rt kernel, SRCU is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) necessary. (The -rt patchset turns spinlocks into sleeplocks,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) hence this distinction.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) c. Do you need to treat NMI handlers, hardirq handlers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) and code segments with preemption disabled (whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) via preempt_disable(), local_irq_save(), local_bh_disable(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) or some other mechanism) as if they were explicit RCU readers?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) If so, RCU-sched is the only choice that will work for you.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) d. Do you need RCU grace periods to complete even in the face
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) of softirq monopolization of one or more of the CPUs? For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) example, is your code subject to network-based denial-of-service
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) attacks? If so, you should disable softirq across your readers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) for example, by using rcu_read_lock_bh().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) e. Is your workload too update-intensive for normal use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) RCU, but inappropriate for other synchronization mechanisms?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) named SLAB_DESTROY_BY_RCU). But please be careful!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) f. Do you need read-side critical sections that are respected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) even though they are in the middle of the idle loop, during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) user-mode execution, or on an offlined CPU? If so, SRCU is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) only choice that will work for you.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) g. Otherwise, use RCU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) Of course, this all assumes that you have determined that RCU is in fact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) the right tool for your job.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) .. _8_whatisRCU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) 8. ANSWERS TO QUICK QUIZZES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) Why is this argument naive? How could a deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) occur when using this algorithm in a real-world Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) kernel? [Referring to the lock-based "toy" RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) algorithm.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) Answer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) Consider the following sequence of events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) 1. CPU 0 acquires some unrelated lock, call it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) "problematic_lock", disabling irq via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) spin_lock_irqsave().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) 2. CPU 1 enters synchronize_rcu(), write-acquiring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) rcu_gp_mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) 3. CPU 0 enters rcu_read_lock(), but must wait
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) because CPU 1 holds rcu_gp_mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) 4. CPU 1 is interrupted, and the irq handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) attempts to acquire problematic_lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) The system is now deadlocked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) One way to avoid this deadlock is to use an approach like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) that of CONFIG_PREEMPT_RT, where all normal spinlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) become blocking locks, and all irq handlers execute in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) the context of special tasks. In this case, in step 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) above, the irq handler would block, allowing CPU 1 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) release rcu_gp_mutex, avoiding the deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) Even in the absence of deadlock, this RCU implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) allows latency to "bleed" from readers to other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) readers through synchronize_rcu(). To see this,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) consider task A in an RCU read-side critical section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) (thus read-holding rcu_gp_mutex), task B blocked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) attempting to write-acquire rcu_gp_mutex, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) task C blocked in rcu_read_lock() attempting to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) read_acquire rcu_gp_mutex. Task A's RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) latency is holding up task C, albeit indirectly via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) task B.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) Realtime RCU implementations therefore use a counter-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) approach where tasks in RCU read-side critical sections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) cannot be blocked by tasks executing synchronize_rcu().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) :ref:`Back to Quick Quiz #1 <quiz_1>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) Give an example where Classic RCU's read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) overhead is **negative**.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) Answer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) Imagine a single-CPU system with a non-CONFIG_PREEMPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) kernel where a routing table is used by process-context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) code, but can be updated by irq-context code (for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) by an "ICMP REDIRECT" packet). The usual way of handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101) this would be to have the process-context code disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) interrupts while searching the routing table. Use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) RCU allows such interrupt-disabling to be dispensed with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) Thus, without RCU, you pay the cost of disabling interrupts,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) and with RCU you don't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) One can argue that the overhead of RCU in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) case is negative with respect to the single-CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) interrupt-disabling approach. Others might argue that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) the overhead of RCU is merely zero, and that replacing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) the positive overhead of the interrupt-disabling scheme
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) with the zero-overhead RCU scheme does not constitute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) negative overhead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) In real life, of course, things are more complex. But
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) even the theoretical possibility of negative overhead for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117) a synchronization primitive is a bit unexpected. ;-)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) :ref:`Back to Quick Quiz #2 <quiz_2>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) If it is illegal to block in an RCU read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) critical section, what the heck do you do in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) PREEMPT_RT, where normal spinlocks can block???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) Answer:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) Just as PREEMPT_RT permits preemption of spinlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) critical sections, it permits preemption of RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) read-side critical sections. It also permits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) spinlocks blocking while in RCU read-side critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) Why the apparent inconsistency? Because it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) possible to use priority boosting to keep the RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135) grace periods short if need be (for example, if running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) short of memory). In contrast, if blocking waiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) for (say) network reception, there is no way to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) what should be boosted. Especially given that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) process we need to boost might well be a human being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) who just went out for a pizza or something. And although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) a computer-operated cattle prod might arouse serious
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) interest, it might also provoke serious objections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) Besides, how does the computer know what pizza parlor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) the human being went to???
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) :ref:`Back to Quick Quiz #3 <quiz_3>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) ACKNOWLEDGEMENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) My thanks to the people who helped make this human-readable, including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) For more information, see http://www.rdrop.com/users/paulmck/RCU.