Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ===========================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===========================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) :Author: Robert Love <rml@tech9.net>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) A preemptible kernel creates new locking issues.  The issues are the same as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) those under SMP: concurrency and reentrancy.  Thankfully, the Linux preemptible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) kernel model leverages existing SMP locking mechanisms.  Thus, the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) requires explicit additional locking for very few additional situations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) This document is for all kernel hackers.  Developing code in the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) requires protecting these situations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19)  
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) RULE #1: Per-CPU data structures need explicit protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) Two similar problems arise. An example code snippet::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 	struct this_needs_locking tux[NR_CPUS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 	tux[smp_processor_id()] = some_value;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 	/* task is preempted here... */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 	something = tux[smp_processor_id()];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) First, since the data is per-CPU, it may not have explicit SMP locking, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) require it otherwise.  Second, when a preempted task is finally rescheduled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) the previous value of smp_processor_id may not equal the current.  You must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) protect these situations by disabling preemption around them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) You can also use put_cpu() and get_cpu(), which will disable preemption.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) RULE #2: CPU state must be protected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) Under preemption, the state of the CPU must be protected.  This is arch-
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) dependent, but includes CPU structures and state not preserved over a context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) switch.  For example, on x86, entering and exiting FPU mode is now a critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) section that must occur while preemption is disabled.  Think what would happen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) if the kernel is executing a floating-point instruction and is then preempted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) Remember, the kernel does not save FPU state except for user tasks.  Therefore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) upon preemption, the FPU registers will be sold to the lowest bidder.  Thus,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) preemption must be disabled around such regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) Note, some FPU functions are already explicitly preempt safe.  For example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) RULE #3: Lock acquire and release must be performed by same task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) A lock acquired in one task must be released by the same task.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) means you can't do oddball things like acquire a lock and go off to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) play while another task releases it.  If you want to do something
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) like this, acquire and release the task in the same code path and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) have the caller wait on an event by the other task.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) Solution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) Data protection under preemption is achieved by disabling preemption for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) duration of the critical region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)   preempt_enable()		decrement the preempt counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)   preempt_disable()		increment the preempt counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)   preempt_enable_no_resched()	decrement, but do not immediately preempt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)   preempt_check_resched()	if needed, reschedule
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)   preempt_count()		return the preempt counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) The functions are nestable.  In other words, you can call preempt_disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) n-times in a code path, and preemption will not be reenabled until the n-th
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) call to preempt_enable.  The preempt statements define to nothing if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) preemption is not enabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) Note that you do not need to explicitly prevent preemption if you are holding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) any locks or interrupts are disabled, since preemption is implicitly disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) in those cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) disabling preemption - any cond_resched() or cond_resched_lock() might trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) a reschedule if the preempt count is 0. A simple printk() might trigger a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) reschedule. So use this implicit preemption-disabling property only if you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) know that the affected codepath does not do any of this. Best policy is to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) this only for small, atomic code that you wrote and which calls no complex
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	cpucache_t *cc; /* this is per-CPU */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	preempt_disable();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 	cc = cc_data(searchp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 	if (cc && cc->avail) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 		__free_block(searchp, cc_entry(cc), cc->avail);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 		cc->avail = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 	preempt_enable();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Notice how the preemption statements must encompass every reference of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) critical variables.  Another example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 	int buf[NR_CPUS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 	set_cpu_val(buf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 	if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	spin_lock(&buf_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 	/* ... */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) This code is not preempt-safe, but see how easily we can fix it by simply
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) moving the spin_lock up two lines.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) Preventing preemption using interrupt disabling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) It is possible to prevent a preemption event using local_irq_disable and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) local_irq_save.  Note, when doing so, you must be very careful to not cause
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) an event that would set need_resched and result in a preemption check.  When
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) in doubt, rely on locking or explicit preemption disabling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) An additional concern is proper usage of local_irq_disable and local_irq_save.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) These may be used to protect from preemption, however, on exit, if preemption
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) may be enabled, a test to see if preemption is required should be done.  If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) these are called from the spin_lock and read/write lock macros, the right thing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) is done.  They may also be called within a spin-lock protected region, however,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) if they are ever called outside of this context, a test for preemption should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) be made. Do note that calls from interrupt context or bottom half/ tasklets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) are also protected by preemption locks and so may use the versions which do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) not check preemption.