^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ==========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Reducing OS jitter due to per-cpu kthreads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) This document lists per-CPU kthreads in the Linux kernel and presents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) options to control their OS jitter. Note that non-per-CPU kthreads are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) not listed here. To reduce OS jitter from non-per-CPU kthreads, bind
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) them to a "housekeeping" CPU dedicated to such work.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) - Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) - Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) - man taskset: Using the taskset command to bind tasks to sets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) of CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) - man sched_setaffinity: Using the sched_setaffinity() system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) call to bind tasks to sets of CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) - /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) writing "0" to offline and "1" to online.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) - In order to locate kernel-generated OS jitter on CPU N:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) cd /sys/kernel/debug/tracing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) echo 1 > max_graph_depth # Increase the "1" for more detail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) echo function_graph > current_tracer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) # run workload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) cat per_cpu/cpuN/trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) kthreads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) ehca_comp/%u
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Periodically process Infiniband-related work.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) To reduce its OS jitter, do any of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) 1. Don't use eHCA Infiniband hardware, instead choosing hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) that does not require per-CPU kthreads. This will prevent these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) kthreads from being created in the first place. (This will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) work for most people, as this hardware, though important, is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) relatively old and is produced in relatively low unit volumes.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) 2. Do all eHCA-Infiniband-related work on other CPUs, including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) interrupts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) 3. Rework the eHCA driver so that its per-CPU kthreads are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) provisioned only on selected CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) irq/%d-%s
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) Handle threaded interrupts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) To reduce its OS jitter, do the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) 1. Use irq affinity to force the irq threads to execute on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) some other CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) kcmtpd_ctr_%d
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) Handle Bluetooth work.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) To reduce its OS jitter, do one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) 1. Don't use Bluetooth, in which case these kthreads won't be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) created in the first place.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) 2. Use irq affinity to force Bluetooth-related interrupts to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) occur on some other CPU and furthermore initiate all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) Bluetooth activity on some other CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) ksoftirqd/%u
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) Execute softirq handlers when threaded or when under heavy load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) To reduce its OS jitter, each softirq vector must be handled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) separately as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) TIMER_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) Do all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) 1. To the extent possible, keep the CPU out of the kernel when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) is non-idle, for example, by avoiding system calls and by forcing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) both kernel threads and interrupts to execute elsewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) 2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) the CPU offline, then bring it back online. This forces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) recurring timers to migrate elsewhere. If you are concerned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) with multiple CPUs, force them all offline before bringing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) first one back online. Once you have onlined the CPUs in question,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) do not offline any other CPUs, because doing so could force the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) timer back onto one of the CPUs in question.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) Do all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 1. Force networking interrupts onto other CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 2. Initiate any network I/O on other CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 3. Once your application has started, prevent CPU-hotplug operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) from being initiated from tasks that might run on the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) be de-jittered. (It is OK to force this CPU offline and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) bring it back online before you start your application.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) BLOCK_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Do all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 1. Force block-device interrupts onto some other CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 2. Initiate any block I/O on other CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 3. Once your application has started, prevent CPU-hotplug operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) from being initiated from tasks that might run on the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) be de-jittered. (It is OK to force this CPU offline and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) bring it back online before you start your application.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) IRQ_POLL_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Do all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 1. Force block-device interrupts onto some other CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 2. Initiate any block I/O and block-I/O polling on other CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 3. Once your application has started, prevent CPU-hotplug operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) from being initiated from tasks that might run on the CPU to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) be de-jittered. (It is OK to force this CPU offline and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) bring it back online before you start your application.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) TASKLET_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) Do one or more of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 1. Avoid use of drivers that use tasklets. (Such drivers will contain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) calls to things like tasklet_schedule().)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 2. Convert all drivers that you must use from tasklets to workqueues.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 3. Force interrupts for drivers using tasklets onto other CPUs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) and also do I/O involving these drivers on other CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) SCHED_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Do all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 1. Avoid sending scheduler IPIs to the CPU to be de-jittered,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) for example, ensure that at most one runnable kthread is present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) on that CPU. If a thread that expects to run on the de-jittered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) CPU awakens, the scheduler will send an IPI that can result in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) a subsequent SCHED_SOFTIRQ.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) is marked as an adaptive-ticks CPU using the "nohz_full="
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) boot parameter. This reduces the number of scheduler-clock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) interrupts that the de-jittered CPU receives, minimizing its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) chances of being selected to do the load balancing work that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) runs in SCHED_SOFTIRQ context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 3. To the extent possible, keep the CPU out of the kernel when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) is non-idle, for example, by avoiding system calls and by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) forcing both kernel threads and interrupts to execute elsewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) This further reduces the number of scheduler-clock interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) received by the de-jittered CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) HRTIMER_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) Do all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 1. To the extent possible, keep the CPU out of the kernel when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) is non-idle. For example, avoid system calls and force both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) kernel threads and interrupts to execute elsewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) CPU offline, then bring it back online. This forces recurring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) timers to migrate elsewhere. If you are concerned with multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) CPUs, force them all offline before bringing the first one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) back online. Once you have onlined the CPUs in question, do not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) offline any other CPUs, because doing so could force the timer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) back onto one of the CPUs in question.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) RCU_SOFTIRQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) Do at least one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 1. Offload callbacks and keep the CPU in either dyntick-idle or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) adaptive-ticks state by doing all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) de-jittered is marked as an adaptive-ticks CPU using the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) "nohz_full=" boot parameter. Bind the rcuo kthreads to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) housekeeping CPUs, which can tolerate OS jitter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) b. To the extent possible, keep the CPU out of the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) when it is non-idle, for example, by avoiding system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) calls and by forcing both kernel threads and interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) to execute elsewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 2. Enable RCU to do its processing remotely via dyntick-idle by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) doing all of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) b. Ensure that the CPU goes idle frequently, allowing other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) CPUs to detect that it has passed through an RCU quiescent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) state. If the kernel is built with CONFIG_NO_HZ_FULL=y,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) userspace execution also allows other CPUs to detect that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) the CPU in question has passed through a quiescent state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) c. To the extent possible, keep the CPU out of the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) when it is non-idle, for example, by avoiding system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) calls and by forcing both kernel threads and interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) to execute elsewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) kworker/%u:%d%s (cpu, id, priority)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) Execute workqueue requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) To reduce its OS jitter, do any of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 1. Run your workload at a real-time priority, which will allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) preempting the kworker daemons.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 2. A given workqueue can be made visible in the sysfs filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) Such a workqueue can be confined to a given subset of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) files. The set of WQ_SYSFS workqueues can be displayed using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) "ls /sys/devices/virtual/workqueue". That said, the workqueues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) maintainer would like to caution people against indiscriminately
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) sprinkling WQ_SYSFS across all the workqueues. The reason for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) caution is that it is easy to add WQ_SYSFS, but because sysfs is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) part of the formal user/kernel API, it can be nearly impossible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) to remove it, even if its addition was a mistake.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) 3. Do any of the following needed to avoid jitter that your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) application cannot tolerate:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) a. Build your kernel with CONFIG_SLUB=y rather than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) use of each CPU's workqueues to run its cache_reap()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) b. Avoid using oprofile, thus avoiding OS jitter from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) wq_sync_buffer().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) c. Limit your CPU frequency so that a CPU-frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) governor is not required, possibly enlisting the aid of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) special heatsinks or other cooling technologies. If done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) correctly, and if you CPU architecture permits, you should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) be able to build your kernel with CONFIG_CPU_FREQ=n to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) avoid the CPU-frequency governor periodically running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) on each CPU, including cs_dbs_timer() and od_dbs_timer().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) WARNING: Please check your CPU specifications to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) make sure that this is safe on your particular system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) d. As of v3.18, Christoph Lameter's on-demand vmstat workers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) commit prevents OS jitter due to vmstat_update() on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) CONFIG_SMP=y systems. Before v3.18, is not possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) to entirely get rid of the OS jitter, but you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) decrease its frequency by writing a large value to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) /proc/sys/vm/stat_interval. The default value is HZ,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) for an interval of one second. Of course, larger values
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) will make your virtual-memory statistics update more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) slowly. Of course, you can also run your workload at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) a real-time priority, thus preempting vmstat_update(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) but if your workload is CPU-bound, this is a bad idea.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) However, there is an RFC patch from Christoph Lameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) (based on an earlier one from Gilad Ben-Yossef) that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) reduces or even eliminates vmstat overhead for some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) workloads at https://lkml.org/lkml/2013/9/4/379.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) e. If running on high-end powerpc servers, build with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) daemon from running on each CPU every second or so.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) (This will require editing Kconfig files and will defeat
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) this platform's RAS functionality.) This avoids jitter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) due to the rtas_event_scan() function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) WARNING: Please check your CPU specifications to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) make sure that this is safe on your particular system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) f. If running on Cell Processor, build your kernel with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) spu_gov_work().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) WARNING: Please check your CPU specifications to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) make sure that this is safe on your particular system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) g. If running on PowerMAC, build your kernel with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) avoiding OS jitter from rackmeter_do_timer().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) rcuc/%u
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) To reduce its OS jitter, do at least one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 1. Build the kernel with CONFIG_PREEMPT=n. This prevents these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) kthreads from being created in the first place, and also obviates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) the need for RCU priority boosting. This approach is feasible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) for workloads that do not require high degrees of responsiveness.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) 2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) kthreads from being created in the first place. This approach
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) is feasible only if your workload never requires RCU priority
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) boosting, for example, if you ensure frequent idle time on all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) CPUs that might execute within the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) 3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) boot parameter offloading RCU callbacks from all CPUs susceptible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) to OS jitter. This approach prevents the rcuc/%u kthreads from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) having any work to do, so that they are never awakened.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) 4. Ensure that the CPU never enters the kernel, and, in particular,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) avoid initiating any CPU hotplug operations on this CPU. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) another way of preventing any callbacks from being queued on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) CPU, again preventing the rcuc/%u kthreads from having any work
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) to do.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) rcuop/%d and rcuos/%d
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) Offload RCU callbacks from the corresponding CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) To reduce its OS jitter, do at least one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) 1. Use affinity, cgroups, or other mechanism to force these kthreads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) to execute on some other CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) kthreads from being created in the first place. However, please
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) note that this will not eliminate OS jitter, but will instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) shift it to RCU_SOFTIRQ.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) Name:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) watchdog/%u
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) Purpose:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) Detect software lockups on each CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) To reduce its OS jitter, do at least one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) kthreads from being created in the first place.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 2. Boot with "nosoftlockup=0", which will also prevent these kthreads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) from being created. Other related watchdog and softlockup boot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) parameters may be found in Documentation/admin-guide/kernel-parameters.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) and Documentation/watchdog/watchdog-parameters.rst.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) 3. Echo a zero to /proc/sys/kernel/watchdog to disable the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) watchdog timer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) 4. Echo a large number of /proc/sys/kernel/watchdog_thresh in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) order to reduce the frequency of OS jitter due to the watchdog
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) timer down to a level that is acceptable for your workload.