^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ====================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Interaction of Suspend code (S3) with the CPU hotplug infrastructure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ====================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) (C) 2011 - 2014 Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) I. Differences between CPU hotplug and Suspend-to-RAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) ======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) How does the regular CPU hotplug code differ from how the Suspend-to-RAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) infrastructure uses it internally? And where do they share common code?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) Well, a picture is worth a thousand words... So ASCII art follows :-)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) [This depicts the current design in the kernel, and focusses only on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) interactions involving the freezer and CPU hotplug and also tries to explain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) the locking involved. It outlines the notifications involved as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) But please note that here, only the call paths are illustrated, with the aim
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) of describing where they take different paths and where they share code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) What happens when regular CPU hotplug and Suspend-to-RAM race with each other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) is not depicted here.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) On a high level, the suspend-resume cycle goes like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) |Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) |tasks | | cpus | | | | cpus | |tasks|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) More details follow::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) Suspend call path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) Write 'mem' to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) /sys/power/state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) sysfs file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Acquire system_transition_mutex lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) Send PM_SUSPEND_PREPARE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) notifications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) Freeze tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) freeze_secondary_cpus()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) /* start */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) Acquire cpu_add_remove_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) Iterate over CURRENTLY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) online CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) | ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) v | L
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) ======> _cpu_down() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) | [This takes cpuhotplug.lock |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) Common | before taking down the CPU |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) code | and releases it when done] | O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) | While it is at it, notifications |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) | are sent when notable events occur, |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) ======> by running all registered callbacks. |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) | | O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) v |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Note down these cpus in | P
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) frozen_cpus mask ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) Disable regular cpu hotplug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) by increasing cpu_hotplug_disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) Release cpu_add_remove_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) /* freeze_secondary_cpus() complete */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) Do suspend
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) Resuming back is likewise, with the counterparts being (in the order of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) execution during resume):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) * thaw_secondary_cpus() which involves::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) | Acquire cpu_add_remove_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) | Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) | Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) | Release cpu_add_remove_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) * thaw tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) * send PM_POST_SUSPEND notifications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) * Release system_transition_mutex lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) It is to be noted here that the system_transition_mutex lock is acquired at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) very beginning, when we are just starting out to suspend, and then released only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) after the entire cycle is complete (i.e., suspend + resume).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) Regular CPU hotplug call path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) Write 0 (or 1) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) /sys/devices/system/cpu/cpu*/online
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) sysfs file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) cpu_down()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Acquire cpu_add_remove_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) If cpu_hotplug_disabled > 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) return gracefully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) ======> _cpu_down()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) | [This takes cpuhotplug.lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Common | before taking down the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) code | and releases it when done]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) | While it is at it, notifications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) | are sent when notable events occur,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) ======> by running all registered callbacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Release cpu_add_remove_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) [That's it!, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) regular CPU hotplug]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) So, as can be seen from the two diagrams (the parts marked as "Common code"),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) regular CPU hotplug and the suspend code path converge at the _cpu_down() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) _cpu_up() functions. They differ in the arguments passed to these functions,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) in that during regular CPU hotplug, 0 is passed for the 'tasks_frozen'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) argument. But during suspend, since the tasks are already frozen by the time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) the non-boot CPUs are offlined or onlined, the _cpu_*() functions are called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) with the 'tasks_frozen' argument set to 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) [See below for some known issues regarding this.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) Important files and functions/entry points:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) -------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) - kernel/power/process.c : freeze_processes(), thaw_processes()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) - kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) - kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) [disable|enable]_nonboot_cpus()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) II. What are the issues involved in CPU hotplug?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) There are some interesting situations involving CPU hotplug and microcode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) update on the CPUs, as discussed below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) [Please bear in mind that the kernel requests the microcode images from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) userspace, using the request_firmware() function defined in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) drivers/base/firmware_loader/main.c]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) a. When all the CPUs are identical:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) This is the most common situation and it is quite straightforward: we want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) to apply the same microcode revision to each of the CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) To give an example of x86, the collect_cpu_info() function defined in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) arch/x86/kernel/microcode_core.c helps in discovering the type of the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) and thereby in applying the correct microcode revision to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) But note that the kernel does not maintain a common microcode image for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) all CPUs, in order to handle case 'b' described below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) b. When some of the CPUs are different than the rest:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) In this case since we probably need to apply different microcode revisions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) to different CPUs, the kernel maintains a copy of the correct microcode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) image for each CPU (after appropriate CPU type/model discovery using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) functions such as collect_cpu_info()).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) c. When a CPU is physically hot-unplugged and a new (and possibly different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) type of) CPU is hot-plugged into the system:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) In the current design of the kernel, whenever a CPU is taken offline during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) a regular CPU hotplug operation, upon receiving the CPU_DEAD notification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) (which is sent by the CPU hotplug code), the microcode update driver's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) callback for that event reacts by freeing the kernel's copy of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) microcode image for that CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) Hence, when a new CPU is brought online, since the kernel finds that it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) doesn't have the microcode image, it does the CPU type/model discovery
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) afresh and then requests the userspace for the appropriate microcode image
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) for that CPU, which is subsequently applied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) For example, in x86, the mc_cpu_callback() function (which is the microcode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) update driver's callback registered for CPU hotplug events) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) microcode_update_cpu() which would call microcode_init_cpu() in this case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) instead of microcode_resume_cpu() when it finds that the kernel doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) have a valid microcode image. This ensures that the CPU type/model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) discovery is performed and the right microcode is applied to the CPU after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) getting it from userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) d. Handling microcode update during suspend/hibernate:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) Strictly speaking, during a CPU hotplug operation which does not involve
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) physically removing or inserting CPUs, the CPUs are not actually powered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) off during a CPU offline. They are just put to the lowest C-states possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) Hence, in such a case, it is not really necessary to re-apply microcode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) when the CPUs are brought back online, since they wouldn't have lost the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) image during the CPU offline operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) This is the usual scenario encountered during a resume after a suspend.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) However, in the case of hibernation, since all the CPUs are completely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) powered off, during restore it becomes necessary to apply the microcode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) images to all the CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) [Note that we don't expect someone to physically pull out nodes and insert
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) nodes with a different type of CPUs in-between a suspend-resume or a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) hibernate/restore cycle.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) In the current design of the kernel however, during a CPU offline operation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) as part of the suspend/hibernate cycle (cpuhp_tasks_frozen is set),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) the existing copy of microcode image in the kernel is not freed up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) And during the CPU online operations (during resume/restore), since the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) kernel finds that it already has copies of the microcode images for all the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) CPUs, it just applies them to the CPUs, avoiding any re-discovery of CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) type/model and the need for validating whether the microcode revisions are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) right for the CPUs or not (due to the above assumption that physical CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) hotplug will not be done in-between suspend/resume or hibernate/restore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) cycles).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) III. Known problems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) Are there any known problems when regular CPU hotplug and suspend race
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) with each other?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) Yes, they are listed below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 1. When invoking regular CPU hotplug, the 'tasks_frozen' argument passed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) the _cpu_down() and _cpu_up() functions is *always* 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) This might not reflect the true current state of the system, since the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) tasks could have been frozen by an out-of-band event such as a suspend
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) operation in progress. Hence, the cpuhp_tasks_frozen variable will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) reflect the frozen state and the CPU hotplug callbacks which evaluate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) that variable might execute the wrong code path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 2. If a regular CPU hotplug stress test happens to race with the freezer due
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) to a suspend operation in progress at the same time, then we could hit the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) situation described below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) * A regular cpu online operation continues its journey from userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) into the kernel, since the freezing has not yet begun.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) * Then freezer gets to work and freezes userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) * If cpu online has not yet completed the microcode update stuff by now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) it will now start waiting on the frozen userspace in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) TASK_UNINTERRUPTIBLE state, in order to get the microcode image.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) * Now the freezer continues and tries to freeze the remaining tasks. But
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) due to this wait mentioned above, the freezer won't be able to freeze
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) the cpu online hotplug task and hence freezing of tasks fails.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) As a result of this task freezing failure, the suspend operation gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) aborted.