^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _up_doc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) RCU on Uniprocessor Systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) A common misconception is that, on UP systems, the call_rcu() primitive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) may immediately invoke its function. The basis of this misconception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) is that since there is only one CPU, it should not be necessary to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) wait for anything else to get done, since there are no other CPUs for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) anything else to be happening on. Although this approach will *sort of*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) work a surprising amount of the time, it is a very bad idea in general.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) This document presents three examples that demonstrate exactly how bad
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) an idea this is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Example 1: softirq Suicide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) Suppose that an RCU-based algorithm scans a linked list containing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) elements A, B, and C in process context, and can delete elements from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) this same list in softirq context. Suppose that the process-context scan
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) is referencing element B when it is interrupted by softirq processing,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) which deletes element B, and then invokes call_rcu() to free element B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) Now, if call_rcu() were to directly invoke its arguments, then upon return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) from softirq, the list scan would find itself referencing a newly freed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) element B. This situation can greatly decrease the life expectancy of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) your kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) This same problem can occur if call_rcu() is invoked from a hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) interrupt handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) Example 2: Function-Call Fatality
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Of course, one could avert the suicide described in the preceding example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) by having call_rcu() directly invoke its arguments only if it was called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) from process context. However, this can fail in a similar manner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Suppose that an RCU-based algorithm again scans a linked list containing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) elements A, B, and C in process contexts, but that it invokes a function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) on each element as it is scanned. Suppose further that this function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) deletes element B from the list, then passes it to call_rcu() for deferred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) freeing. This may be a bit unconventional, but it is perfectly legal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) RCU usage, since call_rcu() must wait for a grace period to elapse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) Therefore, in this case, allowing call_rcu() to immediately invoke
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) its arguments would cause it to fail to make the fundamental guarantee
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) underlying RCU, namely that call_rcu() defers invoking its arguments until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) all RCU read-side critical sections currently executing have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) Why is it *not* legal to invoke synchronize_rcu() in this case?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) :ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) Example 3: Death by Deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Suppose that call_rcu() is invoked while holding a lock, and that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) callback function must acquire this same lock. In this case, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) call_rcu() were to directly invoke the callback, the result would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) be self-deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) In some cases, it would possible to restructure to code so that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) the call_rcu() is delayed until after the lock is released. However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) there are cases where this can be quite ugly:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) 1. If a number of items need to be passed to call_rcu() within
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) the same critical section, then the code would need to create
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) a list of them, then traverse the list once the lock was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) released.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) 2. In some cases, the lock will be held across some kernel API,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) so that delaying the call_rcu() until the lock is released
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) requires that the data item be passed up via a common API.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) It is far better to guarantee that callbacks are invoked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) with no locks held than to have to modify such APIs to allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) arbitrary data items to be passed back up through them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) If call_rcu() directly invokes the callback, painful locking restrictions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) or API changes would be required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) What locking restriction must RCU callbacks respect?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) :ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) Summary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) Permitting call_rcu() to immediately invoke its arguments breaks RCU,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) even on a UP system. So do not do it! Even on a UP system, the RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) infrastructure *must* respect grace periods, and *must* invoke callbacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) from a known environment in which no locks are held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) Note that it *is* safe for synchronize_rcu() to return immediately on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) UP systems, including PREEMPT SMP builds running on UP systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Why can't synchronize_rcu() return immediately on UP systems running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) preemptable RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) .. _answer_quick_quiz_up:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Answer to Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) Why is it *not* legal to invoke synchronize_rcu() in this case?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) Because the calling function is scanning an RCU-protected linked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) list, and is therefore within an RCU read-side critical section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Therefore, the called function has been invoked within an RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) read-side critical section, and is not permitted to block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Answer to Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) What locking restriction must RCU callbacks respect?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Any lock that is acquired within an RCU callback must be acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) elsewhere using an _bh variant of the spinlock primitive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) For example, if "mylock" is acquired by an RCU callback, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) a process-context acquisition of this lock must use something
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) like spin_lock_bh() to acquire the lock. Please note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) it is also OK to use _irq variants of spinlocks, for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) spin_lock_irqsave().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) If the process-context code were to simply use spin_lock(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) then, since RCU callbacks can be invoked from softirq context,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) the callback might be called from a softirq that interrupted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) the process-context critical section. This would result in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) self-deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) This restriction might seem gratuitous, since very few RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) callbacks acquire locks directly. However, a great many RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) callbacks do acquire locks *indirectly*, for example, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) the kfree() primitive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Answer to Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) Why can't synchronize_rcu() return immediately on UP systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) running preemptable RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Because some other task might have been preempted in the middle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) of an RCU read-side critical section. If synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) simply immediately returned, it would prematurely signal the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) end of the grace period, which would come as a nasty shock to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) that other thread when it started running again.