Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ====================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Reference-count design for elements of lists/arrays protected by RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ====================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Please note that the percpu-ref feature is likely your first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) stop if you need to combine reference counts and RCU.  Please see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) include/linux/percpu-refcount.h for more information.  However, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) those unusual cases where percpu-ref would consume too much memory,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) please read on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) ------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) Reference counting on elements of lists which are protected by traditional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) reader/writer spinlocks or semaphores are straightforward:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) CODE LISTING A::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)     1.					    2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)     add()				    search_and_reference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23)     {					    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 	alloc_object				read_lock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 	...					search_for_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 	atomic_set(&el->rc, 1);			atomic_inc(&el->rc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 	write_lock(&list_lock);			 ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 	add_element				read_unlock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 	...					...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 	write_unlock(&list_lock);	   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)     3.					    4.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)     release_referenced()		    delete()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)     {					    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 	...					write_lock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 	if(atomic_dec_and_test(&el->rc))	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 	    kfree(el);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 	...					remove_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)     }						write_unlock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 						...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 						if (atomic_dec_and_test(&el->rc))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 						    kfree(el);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 						...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 					    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) If this list/array is made lock free using RCU as in changing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) write_lock() in add() and delete() to spin_lock() and changing read_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) in search_and_reference() to rcu_read_lock(), the atomic_inc() in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) search_and_reference() could potentially hold reference to an element which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) has already been deleted from the list/array.  Use atomic_inc_not_zero()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) in this scenario as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) CODE LISTING B::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)     1.					    2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)     add()				    search_and_reference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)     {					    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 	alloc_object				rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 	...					search_for_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 	atomic_set(&el->rc, 1);			if (!atomic_inc_not_zero(&el->rc)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 	spin_lock(&list_lock);			    rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 						    return FAIL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 	add_element				}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 	...					...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 	spin_unlock(&list_lock);		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)     }					    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)     3.					    4.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)     release_referenced()		    delete()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)     {					    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 	...					spin_lock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 	if (atomic_dec_and_test(&el->rc))	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 	    call_rcu(&el->head, el_free);	remove_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 	...					spin_unlock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)     }						...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 						if (atomic_dec_and_test(&el->rc))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 						    call_rcu(&el->head, el_free);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 						...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 					    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) Sometimes, a reference to the element needs to be obtained in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) update (write) stream.	In such cases, atomic_inc_not_zero() might be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) overkill, since we hold the update-side spinlock.  One might instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) use atomic_inc() in such cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) It is not always convenient to deal with "FAIL" in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) search_and_reference() code path.  In such cases, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) atomic_dec_and_test() may be moved from delete() to el_free()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) CODE LISTING C::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)     1.					    2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)     add()				    search_and_reference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)     {					    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 	alloc_object				rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 	...					search_for_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 	atomic_set(&el->rc, 1);			atomic_inc(&el->rc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 	spin_lock(&list_lock);			...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 	add_element				rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	...				    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	spin_unlock(&list_lock);	    4.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)     }					    delete()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)     3.					    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)     release_referenced()			spin_lock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)     {						...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 	...					remove_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 	if (atomic_dec_and_test(&el->rc))	spin_unlock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 	    kfree(el);				...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 	...					call_rcu(&el->head, el_free);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)     }						...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)     5.					    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)     void el_free(struct rcu_head *rhp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 	release_referenced();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) The key point is that the initial reference added by add() is not removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) until after a grace period has elapsed following removal.  This means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) search_and_reference() cannot find this element, which means that the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) of el->rc cannot increase.  Thus, once it reaches zero, there are no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) readers that can or ever will be able to reference the element.	 The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) element can therefore safely be freed.	This in turn guarantees that if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) any reader finds the element, that reader may safely acquire a reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) without checking the value of the reference counter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) A clear advantage of the RCU-based pattern in listing C over the one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) in listing B is that any call to search_and_reference() that locates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) a given object will succeed in obtaining a reference to that object,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) even given a concurrent invocation of delete() for that same object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) Similarly, a clear advantage of both listings B and C over listing A is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) that a call to delete() is not delayed even if there are an arbitrarily
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) large number of calls to search_and_reference() searching for the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) object that delete() was invoked on.  Instead, all that is delayed is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) the eventual invocation of kfree(), which is usually not a problem on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) modern computer systems, even the small ones.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) In cases where delete() can sleep, synchronize_rcu() can be called from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) delete(), so that el_free() can be subsumed into delete as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)     4.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)     delete()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 	spin_lock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 	remove_element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 	spin_unlock(&list_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 	synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 	if (atomic_dec_and_test(&el->rc))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 	    kfree(el);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) As additional examples in the kernel, the pattern in listing C is used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) reference counting of struct pid, while the pattern in listing B is used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) struct posix_acl.