Orange Pi5 kernel

^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) vlocks for Bare-Metal Mutual Exclusion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) mechanism, with reasonable but minimal requirements on the memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) These are intended to be used to coordinate critical activity among CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) which are otherwise non-coherent, in situations where the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) provides no other mechanism to support this and ordinary spinlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) cannot be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) vlocks make use of the atomicity provided by the memory system for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) writes to a single memory location.  To arbitrate, every CPU "votes for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) itself", by storing a unique number to a common memory location.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) final value seen in that memory location when all the votes have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) cast identifies the winner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) In order to make sure that the election produces an unambiguous result
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) in finite time, a CPU will only enter the election in the first place if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) no winner has been chosen and the election does not appear to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) started yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) Algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) The easiest way to explain the vlocks algorithm is with some pseudo-code::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 	int currently_voting[NR_CPUS] = { 0, };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 	int last_vote = -1; /* no votes yet */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 	bool vlock_trylock(int this_cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 		/* signal our desire to vote */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 		currently_voting[this_cpu] = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 		if (last_vote != -1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 			/* someone already volunteered himself */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 			currently_voting[this_cpu] = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 			return false; /* not ourself */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 		/* let's suggest ourself */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 		last_vote = this_cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 		currently_voting[this_cpu] = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 		/* then wait until everyone else is done voting */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 		for_each_cpu(i) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 			while (currently_voting[i] != 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 				/* wait */;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 		/* result */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 		if (last_vote == this_cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 			return true; /* we won */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 		return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 	bool vlock_unlock(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 		last_vote = -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) The currently_voting[] array provides a way for the CPUs to determine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) whether an election is in progress, and plays a role analogous to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) "entering" array in Lamport's bakery algorithm [1].
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) However, once the election has started, the underlying memory system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) atomicity is used to pick the winner.  This avoids the need for a static
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) priority rule to act as a tie-breaker, or any counters which could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) overflow.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) As long as the last_vote variable is globally visible to all CPUs, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) will contain only one value that won't change once every CPU has cleared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) its currently_voting flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) Features and limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)  * vlocks are not intended to be fair.  In the contended case, it is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)    _last_ CPU which attempts to get the lock which will be most likely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)    to win.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)    vlocks are therefore best suited to situations where it is necessary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)    to pick a unique winner, but it does not matter which CPU actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)    wins.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)  * Like other similar mechanisms, vlocks will not scale well to a large
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)    number of CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)    vlocks can be cascaded in a voting hierarchy to permit better scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)    if necessary, as in the following hypothetical example for 4096 CPUs::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 	/* first level: local election */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 	my_town = towns[(this_cpu >> 4) & 0xf];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 	I_won = vlock_trylock(my_town, this_cpu & 0xf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	if (I_won) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 		/* we won the town election, let's go for the state */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 		my_state = states[(this_cpu >> 8) & 0xf];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 		I_won = vlock_lock(my_state, this_cpu & 0xf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 		if (I_won) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 			/* and so on */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 			I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 			if (I_won) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 				/* ... */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 			vlock_unlock(the_whole_country);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 		vlock_unlock(my_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 	vlock_unlock(my_town);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) ARM implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) The current ARM implementation [2] contains some optimisations beyond
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) the basic algorithm:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)  * By packing the members of the currently_voting array close together,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)    we can read the whole array in one transaction (providing the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)    of CPUs potentially contending the lock is small enough).  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)    reduces the number of round-trips required to external memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)    In the ARM implementation, this means that we can use a single load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)    and comparison::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 	LDR	Rt, [Rn]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 	CMP	Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)    ...in place of code equivalent to::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 	LDRB	Rt, [Rn]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 	CMP	Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 	LDRBEQ	Rt, [Rn, #1]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 	CMPEQ	Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 	LDRBEQ	Rt, [Rn, #2]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 	CMPEQ	Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 	LDRBEQ	Rt, [Rn, #3]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 	CMPEQ	Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)    This cuts down on the fast-path latency, as well as potentially
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)    reducing bus contention in contended cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)    The optimisation relies on the fact that the ARM memory system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)    guarantees coherency between overlapping memory accesses of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)    different sizes, similarly to many other architectures.  Note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)    we do not care which element of currently_voting appears in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)    bits of Rt, so there is no need to worry about endianness in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)    optimisation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)    If there are too many CPUs to read the currently_voting array in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)    one transaction then multiple transations are still required.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)    implementation uses a simple loop of word-sized loads for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)    case.  The number of transactions is still fewer than would be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)    required if bytes were loaded individually.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)    In principle, we could aggregate further by using LDRD or LDM, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)    to keep the code simple this was not attempted in the initial
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)    implementation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)  * vlocks are currently only used to coordinate between CPUs which are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)    unable to enable their caches yet.  This means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)    implementation removes many of the barriers which would be required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)    when executing the algorithm in cached memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)    packing of the currently_voting array does not work with cached
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)    memory unless all CPUs contending the lock are cache-coherent, due
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)    to cache writebacks from one CPU clobbering values written by other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)    CPUs.  (Though if all the CPUs are cache-coherent, you should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)    probably be using proper spinlocks instead anyway).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)  * The "no votes yet" value used for the last_vote variable is 0 (not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)    -1 as in the pseudocode).  This allows statically-allocated vlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)    to be implicitly initialised to an unlocked state simply by putting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)    them in .bss.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)    An offset is added to each CPU's ID for the purpose of setting this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)    variable, so that no CPU uses the value 0 for its ID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) Colophon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) Originally created and documented by Dave Martin for Linaro Limited, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) use in ARM-based big.LITTLE platforms, with review and input gratefully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) grabbing most of this text out of the relevant mail thread and writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) up the pseudocode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) Copyright (C) 2012-2013  Linaro Limited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) Distributed under the terms of Version 2 of the GNU General Public
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) License, as defined in linux/COPYING.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) [1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)     Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)     https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) [2] linux/arch/arm/common/vlock.S, www.kernel.org.