Orange Pi5 kernel

^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. _up_doc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) RCU on Uniprocessor Systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) A common misconception is that, on UP systems, the call_rcu() primitive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) may immediately invoke its function.  The basis of this misconception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) is that since there is only one CPU, it should not be necessary to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) wait for anything else to get done, since there are no other CPUs for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) anything else to be happening on.  Although this approach will *sort of*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) work a surprising amount of the time, it is a very bad idea in general.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) This document presents three examples that demonstrate exactly how bad
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) an idea this is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) Example 1: softirq Suicide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Suppose that an RCU-based algorithm scans a linked list containing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) elements A, B, and C in process context, and can delete elements from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) this same list in softirq context.  Suppose that the process-context scan
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) is referencing element B when it is interrupted by softirq processing,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) which deletes element B, and then invokes call_rcu() to free element B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) after a grace period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) Now, if call_rcu() were to directly invoke its arguments, then upon return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) from softirq, the list scan would find itself referencing a newly freed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) element B.  This situation can greatly decrease the life expectancy of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) your kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) This same problem can occur if call_rcu() is invoked from a hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) interrupt handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) Example 2: Function-Call Fatality
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) Of course, one could avert the suicide described in the preceding example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) by having call_rcu() directly invoke its arguments only if it was called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) from process context.  However, this can fail in a similar manner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) Suppose that an RCU-based algorithm again scans a linked list containing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) elements A, B, and C in process contexts, but that it invokes a function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) on each element as it is scanned.  Suppose further that this function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) deletes element B from the list, then passes it to call_rcu() for deferred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) freeing.  This may be a bit unconventional, but it is perfectly legal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) RCU usage, since call_rcu() must wait for a grace period to elapse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) Therefore, in this case, allowing call_rcu() to immediately invoke
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) its arguments would cause it to fail to make the fundamental guarantee
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) underlying RCU, namely that call_rcu() defers invoking its arguments until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) all RCU read-side critical sections currently executing have completed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 	Why is it *not* legal to invoke synchronize_rcu() in this case?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) :ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) Example 3: Death by Deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) Suppose that call_rcu() is invoked while holding a lock, and that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) callback function must acquire this same lock.  In this case, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) call_rcu() were to directly invoke the callback, the result would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) be self-deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) In some cases, it would possible to restructure to code so that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) the call_rcu() is delayed until after the lock is released.  However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) there are cases where this can be quite ugly:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 1.	If a number of items need to be passed to call_rcu() within
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 	the same critical section, then the code would need to create
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 	a list of them, then traverse the list once the lock was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 	released.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 2.	In some cases, the lock will be held across some kernel API,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 	so that delaying the call_rcu() until the lock is released
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 	requires that the data item be passed up via a common API.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 	It is far better to guarantee that callbacks are invoked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 	with no locks held than to have to modify such APIs to allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 	arbitrary data items to be passed back up through them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) If call_rcu() directly invokes the callback, painful locking restrictions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) or API changes would be required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 	What locking restriction must RCU callbacks respect?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) :ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) Summary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) Permitting call_rcu() to immediately invoke its arguments breaks RCU,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) even on a UP system.  So do not do it!  Even on a UP system, the RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) infrastructure *must* respect grace periods, and *must* invoke callbacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) from a known environment in which no locks are held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) Note that it *is* safe for synchronize_rcu() to return immediately on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) UP systems, including PREEMPT SMP builds running on UP systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 	Why can't synchronize_rcu() return immediately on UP systems running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 	preemptable RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) .. _answer_quick_quiz_up:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Answer to Quick Quiz #1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 	Why is it *not* legal to invoke synchronize_rcu() in this case?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 	Because the calling function is scanning an RCU-protected linked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 	list, and is therefore within an RCU read-side critical section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 	Therefore, the called function has been invoked within an RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 	read-side critical section, and is not permitted to block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Answer to Quick Quiz #2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 	What locking restriction must RCU callbacks respect?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 	Any lock that is acquired within an RCU callback must be acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 	elsewhere using an _bh variant of the spinlock primitive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	For example, if "mylock" is acquired by an RCU callback, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 	a process-context acquisition of this lock must use something
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 	like spin_lock_bh() to acquire the lock.  Please note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 	it is also OK to use _irq variants of spinlocks, for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 	spin_lock_irqsave().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 	If the process-context code were to simply use spin_lock(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 	then, since RCU callbacks can be invoked from softirq context,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 	the callback might be called from a softirq that interrupted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 	the process-context critical section.  This would result in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 	self-deadlock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 	This restriction might seem gratuitous, since very few RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 	callbacks acquire locks directly.  However, a great many RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 	callbacks do acquire locks *indirectly*, for example, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 	the kfree() primitive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Answer to Quick Quiz #3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 	Why can't synchronize_rcu() return immediately on UP systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 	running preemptable RCU?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 	Because some other task might have been preempted in the middle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 	of an RCU read-side critical section.  If synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 	simply immediately returned, it would prematurely signal the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 	end of the grace period, which would come as a nasty shock to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 	that other thread when it started running again.