^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) .. |intel_pstate| replace:: :doc:`intel_pstate <intel_pstate>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) CPU Performance Scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) :Copyright: |copy| 2017 Intel Corporation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) The Concept of CPU Performance Scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) The majority of modern processors are capable of operating in a number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) different clock frequency and voltage configurations, often referred to as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) Operating Performance Points or P-states (in ACPI terminology). As a rule,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) the higher the clock frequency and the higher the voltage, the more instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) can be retired by the CPU over a unit of time, but also the higher the clock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) frequency and the higher the voltage, the more energy is consumed over a unit of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) time (or the more power is drawn) by the CPU in the given P-state. Therefore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) there is a natural tradeoff between the CPU capacity (the number of instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) that can be executed over a unit of time) and the power drawn by the CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) In some situations it is desirable or even necessary to run the program as fast
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) as possible and then there is no reason to use any P-states different from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) highest one (i.e. the highest-performance frequency/voltage configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) available). In some other cases, however, it may not be necessary to execute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) instructions so quickly and maintaining the highest available CPU capacity for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) relatively long time without utilizing it entirely may be regarded as wasteful.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) It also may not be physically possible to maintain maximum CPU capacity for too
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) long for thermal or power supply capacity reasons or similar. To cover those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) cases, there are hardware interfaces allowing CPUs to be switched between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) different frequency/voltage configurations or (in the ACPI terminology) to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) put into different P-states.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Typically, they are used along with algorithms to estimate the required CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) capacity, so as to decide which P-states to put the CPUs into. Of course, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) the utilization of the system generally changes over time, that has to be done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) repeatedly on a regular basis. The activity by which this happens is referred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) to as CPU performance scaling or CPU frequency scaling (because it involves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) adjusting the CPU clock frequency).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) CPU Performance Scaling in Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) (CPU Frequency scaling) subsystem that consists of three layers of code: the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) core, scaling governors and scaling drivers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) The ``CPUFreq`` core provides the common code infrastructure and user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) interfaces for all platforms that support CPU performance scaling. It defines
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) the basic framework in which the other components operate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Scaling governors implement algorithms to estimate the required CPU capacity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) As a rule, each governor implements one, possibly parametrized, scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) algorithm.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) Scaling drivers talk to the hardware. They provide scaling governors with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) information on the available P-states (or P-state ranges in some cases) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) access platform-specific hardware interfaces to change CPU P-states as requested
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) by scaling governors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) In principle, all available scaling governors can be used with every scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) driver. That design is based on the observation that the information used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) performance scaling algorithms for P-state selection can be represented in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) platform-independent form in the majority of cases, so it should be possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) to use the same performance scaling algorithm implemented in exactly the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) way regardless of which scaling driver is used. Consequently, the same set of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) scaling governors should be suitable for every supported platform.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) However, that observation may not hold for performance scaling algorithms
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) based on information provided by the hardware itself, for example through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) feedback registers, as that information is typically specific to the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) interface it comes from and may not be easily represented in an abstract,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) to bypass the governor layer and implement their own performance scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) algorithms. That is done by the |intel_pstate| scaling driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) ``CPUFreq`` Policy Objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) In some cases the hardware interface for P-state control is shared by multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) CPUs. That is, for example, the same register (or set of registers) is used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) control the P-state of multiple CPUs at the same time and writing to it affects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) all of those CPUs simultaneously.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) Sets of CPUs sharing hardware P-state control interfaces are represented by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) ``CPUFreq`` as struct cpufreq_policy objects. For consistency,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) struct cpufreq_policy is also used when there is only one CPU in the given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) The ``CPUFreq`` core maintains a pointer to a struct cpufreq_policy object for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) every CPU in the system, including CPUs that are currently offline. If multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) CPUs share the same hardware P-state control interface, all of the pointers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) corresponding to them point to the same struct cpufreq_policy object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ``CPUFreq`` uses struct cpufreq_policy as its basic data type and the design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) of its user space interface is based on the policy concept.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) CPU Initialization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) First of all, a scaling driver has to be registered for ``CPUFreq`` to work.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) It is only possible to register one scaling driver at a time, so the scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) driver is expected to be able to handle all CPUs in the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) The scaling driver may be registered before or after CPU registration. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) take a note of all of the already registered CPUs during the registration of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) scaling driver. In turn, if any CPUs are registered after the registration of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) the scaling driver, the ``CPUFreq`` core will be invoked to take note of them
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) at their registration time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) has not seen so far as soon as it is ready to handle that CPU. [Note that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) logical CPU may be a physical single-core processor, or a single core in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) multicore processor, or a hardware thread in a physical processor or processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) core. In what follows "CPU" always means "logical CPU" unless explicitly stated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) otherwise and the word "processor" is used to refer to the physical part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) possibly including multiple logical CPUs.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) for the given CPU and if so, it skips the policy object creation. Otherwise,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) a new policy object is created and initialized, which involves the creation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) a new policy directory in ``sysfs``, and the policy pointer corresponding to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) the given CPU is set to the new policy object's address in memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Next, the scaling driver's ``->init()`` callback is invoked with the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) pointer of the new CPU passed to it as the argument. That callback is expected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) to initialize the performance scaling hardware interface for the given CPU (or,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) more precisely, for the set of CPUs sharing the hardware interface it belongs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) to, represented by its policy object) and, if the policy object it has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) called for is new, to set parameters of the policy, like the minimum and maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) frequencies supported by the hardware, the table of available frequencies (if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) the set of supported P-states is not a continuous range), and the mask of CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) that belong to the same policy (including both online and offline CPUs). That
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) mask is then used by the core to populate the policy pointers for all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) CPUs in it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) The next major initialization step for a new policy object is to attach a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) scaling governor to it (to begin with, that is the default scaling governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) determined by the kernel command line or configuration, but it may be changed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) later via ``sysfs``). First, a pointer to the new policy object is passed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) the governor's ``->init()`` callback which is expected to initialize all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) data structures necessary to handle the given policy and, possibly, to add
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) a governor ``sysfs`` interface to it. Next, the governor is started by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) invoking its ``->start()`` callback.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) That callback is expected to register per-CPU utilization update callbacks for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) all of the online CPUs belonging to the given policy with the CPU scheduler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) The utilization update callbacks will be invoked by the CPU scheduler on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) important events, like task enqueue and dequeue, on every iteration of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) scheduler tick or generally whenever the CPU utilization may change (from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) scheduler's perspective). They are expected to carry out computations needed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) to determine the P-state to use for the given policy going forward and to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) invoke the scaling driver to make changes to the hardware in accordance with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) the P-state selection. The scaling driver may be invoked directly from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) scheduler context or asynchronously, via a kernel thread or workqueue, depending
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) on the configuration and capabilities of the scaling driver and the governor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) Similar steps are taken for policy objects that are not new, but were "inactive"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) previously, meaning that all of the CPUs belonging to them were offline. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) only practical difference in that case is that the ``CPUFreq`` core will attempt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) to use the scaling governor previously used with the policy that became
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) "inactive" (and is re-initialized now) instead of the default governor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) In turn, if a previously offline CPU is being brought back online, but some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) other CPUs sharing the policy object with it are online already, there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) need to re-initialize the policy object at all. In that case, it only is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) necessary to restart the scaling governor so that it can take the new online CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) into account. That is achieved by invoking the governor's ``->stop`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) ``->start()`` callbacks, in this order, for the entire policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) As mentioned before, the |intel_pstate| scaling driver bypasses the scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Consequently, if |intel_pstate| is used, scaling governors are not attached to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) to register per-CPU utilization update callbacks for each policy. These
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) callbacks are invoked by the CPU scheduler in the same way as for scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) governors, but in the |intel_pstate| case they both determine the P-state to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) use and change the hardware configuration accordingly in one go from scheduler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) The policy objects created during CPU initialization and other data structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) associated with them are torn down when the scaling driver is unregistered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) (which happens when the kernel module containing it is unloaded, for example) or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) when the last CPU belonging to the given policy in unregistered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) Policy Interface in ``sysfs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) During the initialization of the kernel, the ``CPUFreq`` core creates a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) ``sysfs`` directory (kobject) called ``cpufreq`` under
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) :file:`/sys/devices/system/cpu/`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) That directory contains a ``policyX`` subdirectory (where ``X`` represents an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) integer number) for every policy object maintained by the ``CPUFreq`` core.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) that may be different from the one represented by ``X``) for all of the CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) associated with (or belonging to) the given policy. The ``policyX`` directories
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) attributes (files) to control ``CPUFreq`` behavior for the corresponding policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) objects (that is, for all of the CPUs associated with them).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) Some of those attributes are generic. They are created by the ``CPUFreq`` core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) and their behavior generally does not depend on what scaling driver is in use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) and what scaling governor is attached to the given policy. Some scaling drivers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) also add driver-specific attributes to the policy directories in ``sysfs`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) control policy-specific aspects of driver behavior.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) are the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) ``affected_cpus``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) List of online CPUs belonging to this policy (i.e. sharing the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) performance scaling interface represented by the ``policyX`` policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) object).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) ``bios_limit``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) If the platform firmware (BIOS) tells the OS to apply an upper limit to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) CPU frequencies, that limit will be reported through this attribute (if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) present).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) The existence of the limit may be a result of some (often unintentional)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) BIOS settings, restrictions coming from a service processor or another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) BIOS/HW-based mechanisms.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) This does not cover ACPI thermal limitations which can be discovered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) through a generic thermal driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) This attribute is not present if the scaling driver in use does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) support it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) ``cpuinfo_cur_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) Current frequency of the CPUs belonging to this policy as obtained from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) the hardware (in KHz).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) This is expected to be the frequency the hardware actually runs at.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) If that frequency cannot be determined, this attribute should not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) be present.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) ``cpuinfo_max_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) Maximum possible operating frequency the CPUs belonging to this policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) can run at (in kHz).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) ``cpuinfo_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) Minimum possible operating frequency the CPUs belonging to this policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) can run at (in kHz).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) ``cpuinfo_transition_latency``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) The time it takes to switch the CPUs belonging to this policy from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) P-state to another, in nanoseconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) If unknown or if known to be so high that the scaling driver does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) will be returned by reads from this attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) ``related_cpus``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) List of all (online and offline) CPUs belonging to this policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) ``scaling_available_governors``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) List of ``CPUFreq`` scaling governors present in the kernel that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) be attached to this policy or (if the |intel_pstate| scaling driver is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) in use) list of scaling algorithms provided by the driver that can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) applied to this policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) [Note that some governors are modular and it may be necessary to load a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) kernel module for the governor held by it to become available and be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) listed by this attribute.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) ``scaling_cur_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) Current frequency of all of the CPUs belonging to this policy (in kHz).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) In the majority of cases, this is the frequency of the last P-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) requested by the scaling driver from the hardware using the scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) interface provided by it, which may or may not reflect the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) the CPU is actually running at (due to hardware design and other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) limitations).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) Some architectures (e.g. ``x86``) may attempt to provide information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) more precisely reflecting the current CPU frequency through this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) attribute, but that still may not be the exact current CPU frequency as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) seen by the hardware at the moment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) ``scaling_driver``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) The scaling driver currently in use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ``scaling_governor``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) The scaling governor currently attached to this policy or (if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) |intel_pstate| scaling driver is in use) the scaling algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) provided by the driver that is currently applied to this policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) This attribute is read-write and writing to it will cause a new scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) governor to be attached to this policy or a new scaling algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) provided by the scaling driver to be applied to it (in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) |intel_pstate| case), as indicated by the string written to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) attribute (which must be one of the names listed by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) ``scaling_available_governors`` attribute described above).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) ``scaling_max_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) Maximum frequency the CPUs belonging to this policy are allowed to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) running at (in kHz).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) This attribute is read-write and writing a string representing an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) integer to it will cause a new limit to be set (it must not be lower
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) than the value of the ``scaling_min_freq`` attribute).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) Minimum frequency the CPUs belonging to this policy are allowed to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) running at (in kHz).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) This attribute is read-write and writing a string representing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) non-negative integer to it will cause a new limit to be set (it must not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) be higher than the value of the ``scaling_max_freq`` attribute).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) ``scaling_setspeed``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) This attribute is functional only if the `userspace`_ scaling governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) is attached to the given policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) It returns the last frequency requested by the governor (in kHz) or can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) be written to in order to set a new frequency for the policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) Generic Scaling Governors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) ``CPUFreq`` provides generic scaling governors that can be used with all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) scaling drivers. As stated before, each of them implements a single, possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) parametrized, performance scaling algorithm.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) Scaling governors are attached to policy objects and different policy objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) can be handled by different scaling governors at the same time (although that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) may lead to suboptimal results in some cases).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) The scaling governor for a given policy object can be changed at any time with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) the help of the ``scaling_governor`` policy attribute in ``sysfs``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) algorithms implemented by them. Those attributes, referred to as governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) tunables, can be either global (system-wide) or per-policy, depending on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) scaling driver in use. If the driver requires governor tunables to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) per-policy, they are located in a subdirectory of each policy directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) Otherwise, they are located in a subdirectory under
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) :file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) subdirectory containing the governor tunables is the name of the governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) providing them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) ``performance``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) When attached to a policy object, this governor causes the highest frequency,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) within the ``scaling_max_freq`` policy limit, to be requested for that policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) The request is made once at that time the governor for the policy is set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) ``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) policy limits change after that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) ``powersave``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) When attached to a policy object, this governor causes the lowest frequency,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) within the ``scaling_min_freq`` policy limit, to be requested for that policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) The request is made once at that time the governor for the policy is set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) ``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) policy limits change after that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) ``userspace``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) This governor does not do anything by itself. Instead, it allows user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) to set the CPU frequency for the policy it is attached to by writing to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) ``scaling_setspeed`` attribute of that policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) ``schedutil``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) This governor uses CPU utilization data available from the CPU scheduler. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) generally is regarded as a part of the CPU scheduler, so it can access the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) scheduler's internal data structures directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) It runs entirely in scheduler context, although in some cases it may need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) invoke the scaling driver asynchronously when it decides that the CPU frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) should be changed for a given policy (that depends on whether or not the driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) is capable of changing the CPU frequency from scheduler context).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) The actions of this governor for a particular CPU depend on the scheduling class
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) invoking its utilization update callback for that CPU. If it is invoked by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) RT or deadline scheduling classes, the governor will increase the frequency to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) the allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) if it is invoked by the CFS scheduling class, the governor will use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) Per-Entity Load Tracking (PELT) metric for the root control group of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) given CPU as the CPU utilization estimate (see the *Per-entity load tracking*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) LWN.net article [1]_ for a description of the PELT mechanism). Then, the new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) CPU frequency to apply is computed in accordance with the formula
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) f = 1.25 * ``f_0`` * ``util`` / ``max``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) where ``util`` is the PELT number, ``max`` is the theoretical maximum of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) ``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) policy (if the PELT number is frequency-invariant), or the current CPU frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) (otherwise).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) This governor also employs a mechanism allowing it to temporarily bump up the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) CPU frequency for tasks that have been waiting on I/O most recently, called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) "IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) is passed by the scheduler to the governor callback which causes the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) to go up to the allowed maximum immediately and then draw back to the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) returned by the above formula over time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) This governor exposes only one tunable:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) ``rate_limit_us``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) Minimum time (in microseconds) that has to pass between two consecutive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) runs of governor computations (default: 1000 times the scaling driver's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) transition latency).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) The purpose of this tunable is to reduce the scheduler context overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) of the governor which might be excessive without it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) This governor generally is regarded as a replacement for the older `ondemand`_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) and `conservative`_ governors (described below), as it is simpler and more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) tightly integrated with the CPU scheduler, its overhead in terms of CPU context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) switches and similar is less significant, and it uses the scheduler's own CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) utilization metric, so in principle its decisions should not contradict the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) decisions made by the other parts of the scheduler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) ``ondemand``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) This governor uses CPU load as a CPU frequency selection metric.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) In order to estimate the current CPU load, it measures the time elapsed between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) consecutive invocations of its worker routine and computes the fraction of that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) time in which the given CPU was not idle. The ratio of the non-idle (active)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) time to the total CPU time is taken as an estimate of the load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) If this governor is attached to a policy shared by multiple CPUs, the load is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) estimated for all of them and the greatest result is taken as the load estimate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) for the entire policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) The worker routine of this governor has to run in process context, so it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) invoked asynchronously (via a workqueue) and CPU P-states are updated from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) there if necessary. As a result, the scheduler context overhead from this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) governor is minimum, but it causes additional CPU context switches to happen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) relatively often and the CPU P-state updates triggered by it can be relatively
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) irregular. Also, it affects its own CPU load metric by running code that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) reduces the CPU idle time (even though the CPU idle time is only reduced very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) slightly by it).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) It generally selects CPU frequencies proportional to the estimated load, so that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) 1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) corresponds to the load of 0, unless when the load exceeds a (configurable)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) speedup threshold, in which case it will go straight for the highest frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) it is allowed to use (the ``scaling_max_freq`` policy limit).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) This governor exposes the following tunables:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) ``sampling_rate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) This is how often the governor's worker routine should run, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) microseconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) Typically, it is set to values of the order of 10000 (10 ms). Its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) default value is equal to the value of ``cpuinfo_transition_latency``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) for each policy this governor is attached to (but since the unit here
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) is greater by 1000, this means that the time represented by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) ``sampling_rate`` is 1000 times greater than the transition latency by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) default).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) If this tunable is per-policy, the following shell command sets the time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) represented by it to be 750 times as high as the transition latency::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) ``up_threshold``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) If the estimated CPU load is above this value (in percent), the governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) will set the frequency to the maximum value allowed for the policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) Otherwise, the selected frequency will be proportional to the estimated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) CPU load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) ``ignore_nice_load``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) If set to 1 (default 0), it will cause the CPU load estimation code to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) treat the CPU time spent on executing tasks with "nice" levels greater
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) than 0 as CPU idle time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) This may be useful if there are tasks in the system that should not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) taken into account when deciding what frequency to run the CPUs at.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) Then, to make that happen it is sufficient to increase the "nice" level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) of those tasks above 0 and set this attribute to 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) ``sampling_down_factor``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) Temporary multiplier, between 1 (default) and 100 inclusive, to apply to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) This causes the next execution of the governor's worker routine (after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) setting the frequency to the allowed maximum) to be delayed, so the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) frequency stays at the maximum level for a longer time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) Frequency fluctuations in some bursty workloads may be avoided this way
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) at the cost of additional energy spent on maintaining the maximum CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) capacity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) ``powersave_bias``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) Reduction factor to apply to the original frequency target of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) governor (including the maximum value used when the ``up_threshold``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) value is exceeded by the estimated CPU load) or sensitivity threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) for the AMD frequency sensitivity powersave bias driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) (:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) inclusive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) If the AMD frequency sensitivity powersave bias driver is not loaded,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) the effective frequency to apply is given by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) f * (1 - ``powersave_bias`` / 1000)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) where f is the governor's original frequency target. The default value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) of this attribute is 0 in that case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) If the AMD frequency sensitivity powersave bias driver is loaded, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) value of this attribute is 400 by default and it is used in a different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) way.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) On Family 16h (and later) AMD processors there is a mechanism to get a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) measured workload sensitivity, between 0 and 100% inclusive, from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) hardware. That value can be used to estimate how the performance of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) workload running on a CPU will change in response to frequency changes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) The performance of a workload with the sensitivity of 0 (memory-bound or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) IO-bound) is not expected to increase at all as a result of increasing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) the CPU frequency, whereas workloads with the sensitivity of 100%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) (CPU-bound) are expected to perform much better if the CPU frequency is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) increased.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) If the workload sensitivity is less than the threshold represented by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) the ``powersave_bias`` value, the sensitivity powersave bias driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) will cause the governor to select a frequency lower than its original
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) target, so as to avoid over-provisioning workloads that will not benefit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) from running at higher CPU frequencies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) ``conservative``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) This governor uses CPU load as a CPU frequency selection metric.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) It estimates the CPU load in the same way as the `ondemand`_ governor described
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) above, but the CPU frequency selection algorithm implemented by it is different.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) Namely, it avoids changing the frequency significantly over short time intervals
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) which may not be suitable for systems with limited power supply capacity (e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) battery-powered). To achieve that, it changes the frequency in relatively
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) small steps, one step at a time, up or down - depending on whether or not a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) (configurable) threshold has been exceeded by the estimated CPU load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) This governor exposes the following tunables:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) ``freq_step``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) Frequency step in percent of the maximum frequency the governor is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) allowed to set (the ``scaling_max_freq`` policy limit), between 0 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) 100 (5 by default).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) This is how much the frequency is allowed to change in one go. Setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) it to 0 will cause the default frequency step (5 percent) to be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) and setting it to 100 effectively causes the governor to periodically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) switch the frequency between the ``scaling_min_freq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) ``scaling_max_freq`` policy limits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) ``down_threshold``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) Threshold value (in percent, 20 by default) used to determine the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) frequency change direction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) If the estimated CPU load is greater than this value, the frequency will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) go up (by ``freq_step``). If the load is less than this value (and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) ``sampling_down_factor`` mechanism is not in effect), the frequency will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) go down. Otherwise, the frequency will not be changed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) ``sampling_down_factor``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) Frequency decrease deferral factor, between 1 (default) and 10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) inclusive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) It effectively causes the frequency to go down ``sampling_down_factor``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) times slower than it ramps up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) ``interactive``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) The CPUfreq governor `interactive` is designed for latency-sensitive,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) interactive workloads. This governor sets the CPU speed depending on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) usage, similar to `ondemand` and `conservative` governors, but with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) different set of configurable behaviors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) The tunable values for this governor are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) ``above_hispeed_delay``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) When speed is at or above hispeed_freq, wait for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) this long before raising speed in response to continued high load.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) The format is a single delay value, optionally followed by pairs of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) CPU speeds and the delay to use at or above those speeds. Colons can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) be used between the speeds and associated delays for readability. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) 80000 1300000:200000 1500000:40000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) uses delay 80000 uS until CPU speed 1.3 GHz, at which speed delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) 200000 uS is used until speed 1.5 GHz, at which speed (and above)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) delay 40000 uS is used. If speeds are specified these must appear in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) ascending order. Default is 20000 uS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) ``boost``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) If non-zero, immediately boost speed of all CPUs to at least
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) hispeed_freq until zero is written to this attribute. If zero, allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) CPU speeds to drop below hispeed_freq according to load as usual.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) Default is zero.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) ``boostpulse``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) On each write, immediately boost speed of all CPUs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) hispeed_freq for at least the period of time specified by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) boostpulse_duration, after which speeds are allowed to drop below
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) hispeed_freq according to load as usual. Its a write-only file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) ``boostpulse_duration``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) Length of time to hold CPU speed at hispeed_freq
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) on a write to boostpulse, before allowing speed to drop according to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) load as usual. Default is 80000 uS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) ``go_hispeed_load``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) The CPU load at which to ramp to hispeed_freq.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) Default is 99%.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) ``hispeed_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) An intermediate "high speed" at which to initially ramp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) when CPU load hits the value specified in go_hispeed_load. If load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) stays high for the amount of time specified in above_hispeed_delay,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) then speed may be bumped higher. Default is the maximum speed allowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) by the policy at governor initialization time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) ``io_is_busy``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) If set, the governor accounts IO time as CPU busy time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) ``min_sample_time``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) The minimum amount of time to spend at the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) Frequency Boost Support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) Background
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) Some processors support a mechanism to raise the operating frequency of some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) cores in a multicore package temporarily (and above the sustainable frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) threshold for the whole package) under certain conditions, for example if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) whole chip is not fully utilized and below its intended thermal or power budget.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) Different names are used by different vendors to refer to this functionality.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) For Intel processors it is referred to as "Turbo Boost", AMD calls it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) "Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) As a rule, it also is implemented differently by different vendors. The simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) term "frequency boost" is used here for brevity to refer to all of those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) implementations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) The frequency boost mechanism may be either hardware-based or software-based.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) made by the hardware (although in general it requires the hardware to be put
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) into a special state in which it can control the CPU frequency within certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) limits). If it is software-based (e.g. on ARM), the scaling driver decides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675) whether or not to trigger boosting and when to do that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) The ``boost`` File in ``sysfs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) the "boost" setting for the whole system. It is not present if the underlying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) scaling driver does not support the frequency boost mechanism (or supports it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) but provides a driver-specific interface for controlling it, like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) |intel_pstate|).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) If the value in this file is 1, the frequency boost mechanism is enabled. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) means that either the hardware can be put into states in which it is able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) trigger boosting (in the hardware-based case), or the software is allowed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) trigger boosting (in the software-based case). It does not mean that boosting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) is actually in use at the moment on any CPUs in the system. It only means a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) permission to use the frequency boost mechanism (which still may never be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) for other reasons).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) If the value in this file is 0, the frequency boost mechanism is disabled and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) cannot be used at all.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) The only values that can be written to this file are 0 and 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) Rationale for Boost Control Knob
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) The frequency boost mechanism is generally intended to help to achieve optimum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) CPU performance on time scales below software resolution (e.g. below the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) scheduler tick interval) and it is demonstrably suitable for many workloads, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) it may lead to problems in certain situations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) For this reason, many systems make it possible to disable the frequency boost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) mechanism in the platform firmware (BIOS) setup, but that requires the system to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) be restarted for the setting to be adjusted as desired, which may not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) practical at least in some cases. For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) 1. Boosting means overclocking the processor, although under controlled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) conditions. Generally, the processor's energy consumption increases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) as a result of increasing its frequency and voltage, even temporarily.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) That may not be desirable on systems that switch to power sources of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) limited capacity, such as batteries, so the ability to disable the boost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) mechanism while the system is running may help there (but that depends on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) the workload too).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) 2. In some situations deterministic behavior is more important than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) performance or energy consumption (or both) and the ability to disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) boosting while the system is running may be useful then.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) 3. To examine the impact of the frequency boost mechanism itself, it is useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) to be able to run tests with and without boosting, preferably without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) restarting the system in the meantime.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) 4. Reproducible results are important when running benchmarks. Since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) the boosting functionality depends on the load of the whole package,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) single-thread performance may vary because of it which may lead to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) unreproducible results sometimes. That can be avoided by disabling the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) frequency boost mechanism before running benchmarks sensitive to that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) issue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735) Legacy AMD ``cpb`` Knob
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) the global ``boost`` one. It is used for disabling/enabling the "Core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) Performance Boost" feature of some AMD processors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) If present, that knob is located in every ``CPUFreq`` policy directory in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) ``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) ``cpb``, which indicates a more fine grained control interface. The actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) implementation, however, works on the system-wide basis and setting that knob
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) for one policy causes the same value of it to be set for all of the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) policies at the same time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) That knob is still supported on AMD processors that support its underlying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) hardware feature, but it may be configured out of the kernel (via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) :c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) ``boost`` knob is present regardless. Thus it is always possible use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) ``boost`` knob instead of the ``cpb`` one which is highly recommended, as that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) is more consistent with what all of the other systems do (and the ``cpb`` knob
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) may not be supported any more in the future).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) The ``cpb`` knob is never present for any processors without the underlying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) hardware feature (e.g. all Intel ones), even if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) :c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) .. [1] Jonathan Corbet, *Per-entity load tracking*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) https://lwn.net/Articles/531853/