^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) vlocks for Bare-Metal Mutual Exclusion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) mechanism, with reasonable but minimal requirements on the memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) These are intended to be used to coordinate critical activity among CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) which are otherwise non-coherent, in situations where the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) provides no other mechanism to support this and ordinary spinlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) cannot be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) vlocks make use of the atomicity provided by the memory system for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) writes to a single memory location. To arbitrate, every CPU "votes for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) itself", by storing a unique number to a common memory location. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) final value seen in that memory location when all the votes have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) cast identifies the winner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) In order to make sure that the election produces an unambiguous result
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) in finite time, a CPU will only enter the election in the first place if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) no winner has been chosen and the election does not appear to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) started yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) Algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) The easiest way to explain the vlocks algorithm is with some pseudo-code::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) int currently_voting[NR_CPUS] = { 0, };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) int last_vote = -1; /* no votes yet */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) bool vlock_trylock(int this_cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) /* signal our desire to vote */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) currently_voting[this_cpu] = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) if (last_vote != -1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) /* someone already volunteered himself */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) currently_voting[this_cpu] = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) return false; /* not ourself */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) /* let's suggest ourself */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) last_vote = this_cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) currently_voting[this_cpu] = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) /* then wait until everyone else is done voting */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) for_each_cpu(i) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) while (currently_voting[i] != 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) /* wait */;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) /* result */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) if (last_vote == this_cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) return true; /* we won */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) bool vlock_unlock(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) last_vote = -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) The currently_voting[] array provides a way for the CPUs to determine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) whether an election is in progress, and plays a role analogous to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) "entering" array in Lamport's bakery algorithm [1].
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) However, once the election has started, the underlying memory system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) atomicity is used to pick the winner. This avoids the need for a static
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) priority rule to act as a tie-breaker, or any counters which could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) overflow.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) As long as the last_vote variable is globally visible to all CPUs, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) will contain only one value that won't change once every CPU has cleared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) its currently_voting flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) Features and limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) * vlocks are not intended to be fair. In the contended case, it is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) _last_ CPU which attempts to get the lock which will be most likely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) to win.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) vlocks are therefore best suited to situations where it is necessary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) to pick a unique winner, but it does not matter which CPU actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) wins.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) * Like other similar mechanisms, vlocks will not scale well to a large
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) number of CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) vlocks can be cascaded in a voting hierarchy to permit better scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) if necessary, as in the following hypothetical example for 4096 CPUs::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) /* first level: local election */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) my_town = towns[(this_cpu >> 4) & 0xf];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) I_won = vlock_trylock(my_town, this_cpu & 0xf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) if (I_won) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) /* we won the town election, let's go for the state */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) my_state = states[(this_cpu >> 8) & 0xf];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) I_won = vlock_lock(my_state, this_cpu & 0xf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) if (I_won) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) /* and so on */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) if (I_won) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) /* ... */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) vlock_unlock(the_whole_country);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) vlock_unlock(my_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) vlock_unlock(my_town);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) ARM implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) The current ARM implementation [2] contains some optimisations beyond
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) the basic algorithm:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) * By packing the members of the currently_voting array close together,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) we can read the whole array in one transaction (providing the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) of CPUs potentially contending the lock is small enough). This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) reduces the number of round-trips required to external memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) In the ARM implementation, this means that we can use a single load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) and comparison::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) LDR Rt, [Rn]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) CMP Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) ...in place of code equivalent to::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) LDRB Rt, [Rn]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) CMP Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) LDRBEQ Rt, [Rn, #1]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) CMPEQ Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) LDRBEQ Rt, [Rn, #2]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) CMPEQ Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) LDRBEQ Rt, [Rn, #3]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) CMPEQ Rt, #0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) This cuts down on the fast-path latency, as well as potentially
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) reducing bus contention in contended cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) The optimisation relies on the fact that the ARM memory system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) guarantees coherency between overlapping memory accesses of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) different sizes, similarly to many other architectures. Note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) we do not care which element of currently_voting appears in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) bits of Rt, so there is no need to worry about endianness in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) optimisation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) If there are too many CPUs to read the currently_voting array in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) one transaction then multiple transations are still required. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) implementation uses a simple loop of word-sized loads for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) case. The number of transactions is still fewer than would be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) required if bytes were loaded individually.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) In principle, we could aggregate further by using LDRD or LDM, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) to keep the code simple this was not attempted in the initial
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) implementation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) * vlocks are currently only used to coordinate between CPUs which are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) unable to enable their caches yet. This means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) implementation removes many of the barriers which would be required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) when executing the algorithm in cached memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) packing of the currently_voting array does not work with cached
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) memory unless all CPUs contending the lock are cache-coherent, due
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) to cache writebacks from one CPU clobbering values written by other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) CPUs. (Though if all the CPUs are cache-coherent, you should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) probably be using proper spinlocks instead anyway).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) * The "no votes yet" value used for the last_vote variable is 0 (not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) -1 as in the pseudocode). This allows statically-allocated vlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) to be implicitly initialised to an unlocked state simply by putting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) them in .bss.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) An offset is added to each CPU's ID for the purpose of setting this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) variable, so that no CPU uses the value 0 for its ID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) Colophon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) Originally created and documented by Dave Martin for Linaro Limited, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) use in ARM-based big.LITTLE platforms, with review and input gratefully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) received from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) grabbing most of this text out of the relevant mail thread and writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) up the pseudocode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) Copyright (C) 2012-2013 Linaro Limited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) Distributed under the terms of Version 2 of the GNU General Public
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) License, as defined in linux/COPYING.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) [1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) [2] linux/arch/arm/common/vlock.S, www.kernel.org.