Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Cluster-wide Power-up/power-down race avoidance algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) This file documents the algorithm which is used to coordinate CPU and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) cluster setup and teardown operations and to manage hardware coherency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) controls safely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) The section "Rationale" explains what the algorithm is for and why it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) needed.  "Basic model" explains general concepts using a simplified view
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) of the system.  The other sections explain the actual details of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) algorithm in use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) Rationale
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) In a system containing multiple CPUs, it is desirable to have the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) ability to turn off individual CPUs when the system is idle, reducing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) power consumption and thermal dissipation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) In a system containing multiple clusters of CPUs, it is also desirable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) to have the ability to turn off entire clusters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) Turning entire clusters off and on is a risky business, because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) involves performing potentially destructive operations affecting a group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) of independently running CPUs, while the OS continues to run.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) means that we need some coordination in order to ensure that critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) cluster-level operations are only performed when it is truly safe to do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) so.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) Simple locking may not be sufficient to solve this problem, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) mechanisms like Linux spinlocks may rely on coherency mechanisms which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) are not immediately enabled when a cluster powers up.  Since enabling or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) disabling those mechanisms may itself be a non-atomic operation (such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) writing some hardware registers and invalidating large caches), other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) methods of coordination are required in order to guarantee safe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) power-down and power-up at the cluster level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) The mechanism presented in this document describes a coherent memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) based protocol for performing the needed coordination.  It aims to be as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) lightweight as possible, while providing the required safety properties.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) Basic model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) Each cluster and CPU is assigned a state, as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 	- DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 	- COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 	- UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 	- GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 	    +---------> UP ----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 	    |                        v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 	COMING_UP                GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 	    ^                        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 	    +--------- DOWN <--------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) DOWN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 	The CPU or cluster is not coherent, and is either powered off or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 	suspended, or is ready to be powered off or suspended.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 	The CPU or cluster has committed to moving to the UP state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 	It may be part way through the process of initialisation and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 	enabling coherency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 	The CPU or cluster is active and coherent at the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 	level.  A CPU in this state is not necessarily being used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 	actively by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) GOING_DOWN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 	The CPU or cluster has committed to moving to the DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 	state.  It may be part way through the process of teardown and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 	coherency exit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) Each CPU has one of these states assigned to it at any point in time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) The CPU states are described in the "CPU state" section, below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) Each cluster is also assigned a state, but it is necessary to split the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) state value into two parts (the "cluster" state and "inbound" state) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) to introduce additional states in order to avoid races between different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) CPUs in the cluster simultaneously modifying the state.  The cluster-
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) level states are described in the "Cluster state" section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) To help distinguish the CPU states from cluster states in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) discussion, the state names are given a `CPU_` prefix for the CPU states,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) and a `CLUSTER_` or `INBOUND_` prefix for the cluster states.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) CPU state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) In this algorithm, each individual core in a multi-core processor is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) referred to as a "CPU".  CPUs are assumed to be single-threaded:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) therefore, a CPU can only be doing one thing at a single point in time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) This means that CPUs fit the basic model closely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) The algorithm defines the following states for each CPU in the system:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 	- CPU_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 	- CPU_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 	- CPU_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 	- CPU_GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	 cluster setup and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 	CPU setup complete          policy decision
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 	      +-----------> CPU_UP ------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 	      |                                v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 	CPU_COMING_UP                   CPU_GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 	      ^                                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 	      +----------- CPU_DOWN <----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 	 policy decision           CPU teardown complete
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 	or hardware event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) The definitions of the four states correspond closely to the states of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) the basic model.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Transitions between states occur as follows.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) A trigger event (spontaneous) means that the CPU can transition to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) next state as a result of making local progress only, with no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) requirement for any external event to happen.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) CPU_DOWN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 	A CPU reaches the CPU_DOWN state when it is ready for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 	power-down.  On reaching this state, the CPU will typically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 	power itself down or suspend itself, via a WFI instruction or a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 	firmware call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 		CPU_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 		none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 		a) an explicit hardware power-up operation, resulting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 		   from a policy decision on another CPU;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 		b) a hardware event, such as an interrupt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) CPU_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 	A CPU cannot start participating in hardware coherency until the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 	cluster is set up and coherent.  If the cluster is not ready,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 	then the CPU will wait in the CPU_COMING_UP state until the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 	cluster has been set up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 		CPU_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 		The CPU's parent cluster must be in CLUSTER_UP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 		Transition of the parent cluster to CLUSTER_UP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 	Refer to the "Cluster state" section for a description of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 	CLUSTER_UP state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) CPU_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 	When a CPU reaches the CPU_UP state, it is safe for the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 	start participating in local coherency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 	This is done by jumping to the kernel's CPU resume code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 	Note that the definition of this state is slightly different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 	from the basic model definition: CPU_UP does not mean that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 	CPU is coherent yet, but it does mean that it is safe to resume
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 	the kernel.  The kernel handles the rest of the resume
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 	procedure, so the remaining steps are not visible as part of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 	race avoidance algorithm.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 	The CPU remains in this state until an explicit policy decision
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 	is made to shut down or suspend the CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 		CPU_GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 		none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 		explicit policy decision
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) CPU_GOING_DOWN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) 	While in this state, the CPU exits coherency, including any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 	operations required to achieve this (such as cleaning data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 	caches).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 		CPU_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 		local CPU teardown complete
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 		(spontaneous)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) Cluster state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) A cluster is a group of connected CPUs with some common resources.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) Because a cluster contains multiple CPUs, it can be doing multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) things at the same time.  This has some implications.  In particular, a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) CPU can start up while another CPU is tearing the cluster down.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) In this discussion, the "outbound side" is the view of the cluster state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) as seen by a CPU tearing the cluster down.  The "inbound side" is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) view of the cluster state as seen by a CPU setting the CPU up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) In order to enable safe coordination in such situations, it is important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) that a CPU which is setting up the cluster can advertise its state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) independently of the CPU which is tearing down the cluster.  For this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) reason, the cluster state is split into two parts:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 	"cluster" state: The global state of the cluster; or the state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 	on the outbound side:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 		- CLUSTER_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 		- CLUSTER_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) 		- CLUSTER_GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) 	"inbound" state: The state of the cluster on the inbound side.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 		- INBOUND_NOT_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 		- INBOUND_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) 	The different pairings of these states results in six possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 	states for the cluster as a whole::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 	                            CLUSTER_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 	          +==========> INBOUND_NOT_COMING_UP -------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 	          #                                               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) 	                                                          |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) 	     CLUSTER_UP     <----+                                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) 	  INBOUND_COMING_UP      |                                v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) 	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) 	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) 	    CLUSTER_DOWN         |                                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 	  INBOUND_COMING_UP <----+                                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 	                                                          |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 	          ^                                               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 	          +===========     CLUSTER_DOWN      <------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 	                       INBOUND_NOT_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 	Transitions -----> can only be made by the outbound CPU, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 	only involve changes to the "cluster" state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) 	Transitions ===##> can only be made by the inbound CPU, and only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 	involve changes to the "inbound" state, except where there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 	further transition possible on the outbound side (i.e., the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) 	outbound CPU has put the cluster into the CLUSTER_DOWN state).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 	The race avoidance algorithm does not provide a way to determine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 	which exact CPUs within the cluster play these roles.  This must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 	be decided in advance by some other means.  Refer to the section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 	"Last man and first man selection" for more explanation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 	cluster can actually be powered down.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 	The parallelism of the inbound and outbound CPUs is observed by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 	the existence of two different paths from CLUSTER_GOING_DOWN/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) 	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 	COMING_UP in the basic model).  The second path avoids cluster
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 	teardown completely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) 	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) 	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 	is trivial and merely resets the state machine ready for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 	next cycle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) 	Details of the allowable transitions follow.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) 	The next state in each case is notated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 		<cluster state>/<inbound state> (<transitioner>)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 	where the <transitioner> is the side on which the transition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) 	can occur; either the inbound or the outbound side.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 		CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) 		none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 		a) an explicit hardware power-up operation, resulting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) 		   from a policy decision on another CPU;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 		b) a hardware event, such as an interrupt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) CLUSTER_DOWN/INBOUND_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) 	In this state, an inbound CPU sets up the cluster, including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) 	enabling of hardware coherency at the cluster level and any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 	other operations (such as cache invalidation) which are required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) 	in order to achieve this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) 	The purpose of this state is to do sufficient cluster-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) 	setup to enable other CPUs in the cluster to enter coherency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 	safely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) 		CLUSTER_UP/INBOUND_COMING_UP (inbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) 		cluster-level setup and hardware coherency complete
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 		(spontaneous)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) CLUSTER_UP/INBOUND_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) 	Cluster-level setup is complete and hardware coherency is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 	enabled for the cluster.  Other CPUs in the cluster can safely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) 	enter coherency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) 	This is a transient state, leading immediately to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) 	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) 	should consider treat these two states as equivalent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) 		CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) 		none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 		(spontaneous)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) CLUSTER_UP/INBOUND_NOT_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) 	Cluster-level setup is complete and hardware coherency is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 	enabled for the cluster.  Other CPUs in the cluster can safely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) 	enter coherency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) 	The cluster will remain in this state until a policy decision is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 	made to power the cluster down.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) 	Next state:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) 		CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 	Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) 		none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) 	Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 		policy decision to power down the cluster
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) 	An outbound CPU is tearing the cluster down.  The selected CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 	must wait in this state until all CPUs in the cluster are in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) 	CPU_DOWN state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 	When all CPUs are in the CPU_DOWN state, the cluster can be torn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 	down, for example by cleaning data caches and exiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) 	cluster-level coherency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) 	To avoid wasteful unnecessary teardown operations, the outbound
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 	should check the inbound cluster state for asynchronous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) 	transitions to INBOUND_COMING_UP.  Alternatively, individual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) 	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) 	Next states:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) 	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 		Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) 			cluster torn down and ready to power off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) 		Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) 			(spontaneous)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) 	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) 		Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 			none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) 		Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) 			a) an explicit hardware power-up operation,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) 			   resulting from a policy decision on another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) 			   CPU;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) 			b) a hardware event, such as an interrupt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) 	The cluster is (or was) being torn down, but another CPU has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) 	come online in the meantime and is trying to set up the cluster
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) 	again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) 	If the outbound CPU observes this state, it has two choices:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) 		a) back out of teardown, restoring the cluster to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) 		   CLUSTER_UP state;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) 		b) finish tearing the cluster down and put the cluster
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) 		   in the CLUSTER_DOWN state; the inbound CPU will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) 		   set up the cluster again from there.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) 	Choice (a) permits the removal of some latency by avoiding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) 	unnecessary teardown and setup operations in situations where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) 	the cluster is not really going to be powered down.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) 	Next states:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) 	CLUSTER_UP/INBOUND_COMING_UP (outbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) 		Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) 				cluster-level setup and hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) 				coherency complete
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) 		Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) 				(spontaneous)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) 	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 		Conditions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) 			cluster torn down and ready to power off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) 		Trigger events:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) 			(spontaneous)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) Last man and First man selection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) The CPU which performs cluster tear-down operations on the outbound side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) is commonly referred to as the "last man".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) The CPU which performs cluster setup on the inbound side is commonly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) referred to as the "first man".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) The race avoidance algorithm documented above does not provide a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) mechanism to choose which CPUs should play these roles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) Last man:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) When shutting down the cluster, all the CPUs involved are initially
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) executing Linux and hence coherent.  Therefore, ordinary spinlocks can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) be used to select a last man safely, before the CPUs become
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) non-coherent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) First man:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) Because CPUs may power up asynchronously in response to external wake-up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) events, a dynamic mechanism is needed to make sure that only one CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) attempts to play the first man role and do the cluster-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) initialisation: any other CPUs must wait for this to complete before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) proceeding.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) Cluster-level initialisation may involve actions such as configuring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) coherency controls in the bus fabric.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) The current implementation in mcpm_head.S uses a separate mutual exclusion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) mechanism to do this arbitration.  This mechanism is documented in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) detail in vlocks.txt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) Features and Limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) Implementation:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) 	The current ARM-based implementation is split between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) 	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) 	arch/arm/common/mcpm_entry.c (everything else):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) 	__mcpm_cpu_going_down() signals the transition of a CPU to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 	CPU_GOING_DOWN state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) 	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) 	state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) 	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) 	low-level power-up code in mcpm_head.S.  This could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) 	involve CPU-specific setup code, but in the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) 	implementation it does not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) 	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) 	handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) 	and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) 	the case of an aborted cluster power-down).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) 	These functions are more complex than the __mcpm_cpu_*()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) 	functions due to the extra inter-CPU coordination which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) 	is needed for safe transitions at the cluster level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) 	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) 	the low-level power-up code in mcpm_head.S.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) 	typically involves platform-specific setup code,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) 	provided by the platform-specific power_up_setup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) 	function registered via mcpm_sync_init.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) Deep topologies:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) 	As currently described and implemented, the algorithm does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) 	support CPU topologies involving more than two levels (i.e.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) 	clusters of clusters are not supported).  The algorithm could be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) 	extended by replicating the cluster-level states for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) 	additional topological levels, and modifying the transition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) 	rules for the intermediate (non-outermost) cluster levels.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) Colophon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) Originally created and documented by Dave Martin for Linaro Limited, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) collaboration with Nicolas Pitre and Achin Gupta.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) Copyright (C) 2012-2013  Linaro Limited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) Distributed under the terms of Version 2 of the GNU General Public
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) License, as defined in linux/COPYING.