^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ``intel_pstate`` CPU Performance Scaling Driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) :Copyright: |copy| 2017 Intel Corporation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) General Information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ``intel_pstate`` is a part of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) :doc:`CPU performance scaling subsystem <cpufreq>` in the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) (``CPUFreq``). It is a scaling driver for the Sandy Bridge and later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) generations of Intel processors. Note, however, that some of those processors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) may not be supported. [To understand ``intel_pstate`` it is necessary to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) how ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) you have not done that yet.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) For the processors supported by ``intel_pstate``, the P-state concept is broader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) than just an operating frequency or an operating performance point (see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) LinuxCon Europe 2015 presentation by Kristen Accardi [1]_ for more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) information about that). For this reason, the representation of P-states used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) by ``intel_pstate`` internally follows the hardware specification (for details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) refer to Intel Software Developer’s Manual [2]_). However, the ``CPUFreq`` core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) uses frequencies for identifying operating performance points of CPUs and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) frequencies are involved in the user space interface exposed by it, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) ``intel_pstate`` maps its internal representation of P-states to frequencies too
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) (fortunately, that mapping is unambiguous). At the same time, it would not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) practical for ``intel_pstate`` to supply the ``CPUFreq`` core with a table of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) available frequencies due to the possible size of it, so the driver does not do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) that. Some functionality of the core is limited by that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) Since the hardware P-state selection interface used by ``intel_pstate`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) available at the logical CPU level, the driver always works with individual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) CPUs. Consequently, if ``intel_pstate`` is in use, every ``CPUFreq`` policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) object corresponds to one logical CPU and ``CPUFreq`` policies are effectively
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) equivalent to CPUs. In particular, this means that they become "inactive" every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) time the corresponding CPU is taken offline and need to be re-initialized when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) it goes back online.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) ``intel_pstate`` is not modular, so it cannot be unloaded, which means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) only way to pass early-configuration-time parameters to it is via the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) command line. However, its configuration can be adjusted via ``sysfs`` to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) great extent. In some configurations it even is possible to unregister it via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) ``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) registered (see `below <status_attr_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) Operation Modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) ``intel_pstate`` can operate in two different modes, active or passive. In the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) active mode, it uses its own internal performance scaling governor algorithm or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) allows the hardware to do preformance scaling by itself, while in the passive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) mode it responds to requests made by a generic ``CPUFreq`` governor implementing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) a certain performance scaling algorithm. Which of them will be in effect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) depends on what kernel command line options are used and on the capabilities of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) the processor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) Active Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) This is the default operation mode of ``intel_pstate`` for processors with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) hardware-managed P-states (HWP) support. If it works in this mode, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) contains the string "intel_pstate".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) provides its own scaling algorithms for P-state selection. Those algorithms
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) can be applied to ``CPUFreq`` policies in the same way as generic scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) governors (that is, through the ``scaling_governor`` policy attribute in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) ``sysfs``). [Note that different P-state selection algorithms may be chosen for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) different policies, but that is not recommended.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) They are not generic scaling governors, but their names are the same as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) names of some of those governors. Moreover, confusingly enough, they generally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) do not work in the same way as the generic governors they share the names with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) For example, the ``powersave`` P-state selection algorithm provided by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) ``intel_pstate`` is not a counterpart of the generic ``powersave`` governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) (roughly, it corresponds to the ``schedutil`` and ``ondemand`` governors).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) There are two P-state selection algorithms provided by ``intel_pstate`` in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) active mode: ``powersave`` and ``performance``. The way they both operate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) depends on whether or not the hardware-managed P-states (HWP) feature has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) enabled in the processor and possibly on the processor model.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) Which of the P-state selection algorithms is used by default depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) Namely, if that option is set, the ``performance`` algorithm will be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) default, and the other one will be used by default if it is not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Active Mode With HWP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) ~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) If the processor supports the HWP feature, it will be enabled during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) processor initialization and cannot be disabled after that. It is possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) to avoid enabling it by passing the ``intel_pstate=no_hwp`` argument to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) kernel in the command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) If the HWP feature has been enabled, ``intel_pstate`` relies on the processor to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) select P-states by itself, but still it can give hints to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) internal P-state selection logic. What those hints are depends on which P-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) selection algorithm has been applied to the given policy (or to the CPU it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) corresponds to).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Even though the P-state selection is carried out by the processor automatically,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) ``intel_pstate`` registers utilization update callbacks with the CPU scheduler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) in this mode. However, they are not used for running a P-state selection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) algorithm, but for periodic updates of the current CPU frequency information to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) be made available from the ``scaling_cur_freq`` policy attribute in ``sysfs``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) HWP + ``performance``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) .....................
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) In this configuration ``intel_pstate`` will write 0 to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Energy-Performance Preference (EPP) knob (if supported) or its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) internal P-state selection logic is expected to focus entirely on performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) This will override the EPP/EPB setting coming from the ``sysfs`` interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) (see `Energy vs Performance Hints`_ below). Moreover, any attempts to change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) configuration will be rejected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) Also, in this configuration the range of P-states available to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) internal P-state selection logic is always restricted to the upper boundary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) (that is, the maximum P-state that the driver is allowed to use).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) HWP + ``powersave``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ...................
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) In this configuration ``intel_pstate`` will set the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) Energy-Performance Preference (EPP) knob (if supported) or its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Energy-Performance Bias (EPB) knob (otherwise) to whatever value it was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) previously set to via ``sysfs`` (or whatever default value it was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) set to by the platform firmware). This usually causes the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) internal P-state selection logic to be less performance-focused.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) Active Mode Without HWP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) ~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) This operation mode is optional for processors that do not support the HWP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) the command line. The active mode is used in those cases if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) ``intel_pstate=active`` argument is passed to the kernel in the command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) In this mode ``intel_pstate`` may refuse to work with processors that are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) recognized by it. [Note that ``intel_pstate`` will never refuse to work with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) any processor with the HWP feature enabled.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) In this mode ``intel_pstate`` registers utilization update callbacks with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) CPU scheduler in order to run a P-state selection algorithm, either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) ``powersave`` or ``performance``, depending on the ``scaling_governor`` policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) setting in ``sysfs``. The current CPU frequency information to be made
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) available from the ``scaling_cur_freq`` policy attribute in ``sysfs`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) periodically updated by those utilization update callbacks too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) ``performance``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) ...............
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) Without HWP, this P-state selection algorithm is always the same regardless of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) the processor model and platform configuration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) It selects the maximum P-state it is allowed to use, subject to limits set via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) ``sysfs``, every time the driver configuration for the given CPU is updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) (e.g. via ``sysfs``).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) This is the default P-state selection algorithm if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) ``powersave``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) .............
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) Without HWP, this P-state selection algorithm is similar to the algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) implemented by the generic ``schedutil`` scaling governor except that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) utilization metric used by it is based on numbers coming from feedback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) registers of the CPU. It generally selects P-states proportional to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) current CPU utilization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) This algorithm is run by the driver's utilization update callback for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) given CPU when it is invoked by the CPU scheduler, but not more often than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) every 10 ms. Like in the ``performance`` case, the hardware configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) is not touched if the new P-state turns out to be the same as the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) This is the default P-state selection algorithm if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) is not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) Passive Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) This is the default operation mode of ``intel_pstate`` for processors without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) hardware-managed P-states (HWP) support. It is always used if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) ``intel_pstate=passive`` argument is passed to the kernel in the command line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) regardless of whether or not the given processor supports HWP. [Note that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) ``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) if it is not combined with ``intel_pstate=active``.] Like in the active mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) without HWP support, in this mode ``intel_pstate`` may refuse to work with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) processors that are not recognized by it if HWP is prevented from being enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) through the kernel command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) If the driver works in this mode, the ``scaling_driver`` policy attribute in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) ``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Then, the driver behaves like a regular ``CPUFreq`` scaling driver. That is,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) it is invoked by generic scaling governors when necessary to talk to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) hardware in order to change the P-state of a CPU (in particular, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) ``schedutil`` governor can invoke it directly from scheduler context).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) While in this mode, ``intel_pstate`` can be used with all of the (generic)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) scaling governors listed by the ``scaling_available_governors`` policy attribute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) in ``sysfs`` (and the P-state selection algorithms described above are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) used). Then, it is responsible for the configuration of policy objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) corresponding to CPUs and provides the ``CPUFreq`` core (and the scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) governors attached to the policy objects) with accurate information on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) maximum and minimum operating frequencies supported by the hardware (including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) the so-called "turbo" frequency ranges). In other words, in the passive mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) the entire range of available P-states is exposed by ``intel_pstate`` to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) ``CPUFreq`` core. However, in this mode the driver does not register
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) utilization update callbacks with the CPU scheduler and the ``scaling_cur_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) information comes from the ``CPUFreq`` core (and is the last frequency selected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) by the current scaling governor for the given policy).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) .. _turbo:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) Turbo P-states Support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) In the majority of cases, the entire range of P-states available to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) ``intel_pstate`` can be divided into two sub-ranges that correspond to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) different types of processor behavior, above and below a boundary that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) will be referred to as the "turbo threshold" in what follows.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) The P-states above the turbo threshold are referred to as "turbo P-states" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) the whole sub-range of P-states they belong to is referred to as the "turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) range". These names are related to the Turbo Boost technology allowing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) multicore processor to opportunistically increase the P-state of one or more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) cores if there is enough power to do that and if that is not going to cause the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) thermal envelope of the processor package to be exceeded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Specifically, if software sets the P-state of a CPU core within the turbo range
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) (that is, above the turbo threshold), the processor is permitted to take over
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) performance scaling control for that core and put it into turbo P-states of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) choice going forward. However, that permission is interpreted differently by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) different processor generations. Namely, the Sandy Bridge generation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) processors will never use any P-states above the last one set by software for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) the given core, even if it is within the turbo range, whereas all of the later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) processor generations will take it as a license to use any P-states from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) turbo range, even above the one set by software. In other words, on those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) processors setting any P-state from the turbo range will enable the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) to put the given core into all turbo P-states up to and including the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) supported one as it sees fit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) One important property of turbo P-states is that they are not sustainable. More
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) precisely, there is no guarantee that any CPUs will be able to stay in any of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) those states indefinitely, because the power distribution within the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) package may change over time or the thermal envelope it was designed for might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) be exceeded if a turbo P-state was used for too long.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) In turn, the P-states below the turbo threshold generally are sustainable. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) fact, if one of them is set by software, the processor is not expected to change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) it to a lower one unless in a thermal stress or a power limit violation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) situation (a higher P-state may still be used if it is set for another CPU in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) the same package at the same time, for example).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) Some processors allow multiple cores to be in turbo P-states at the same time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) but the maximum P-state that can be set for them generally depends on the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) of cores running concurrently. The maximum turbo P-state that can be set for 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) cores at the same time usually is lower than the analogous maximum P-state for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 2 cores, which in turn usually is lower than the maximum turbo P-state that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) be set for 1 core. The one-core maximum turbo P-state is thus the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) supported one overall.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) The maximum supported turbo P-state, the turbo threshold (the maximum supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) non-turbo P-state) and the minimum supported P-state are specific to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) processor model and can be determined by reading the processor's model-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) registers (MSRs). Moreover, some processors support the Configurable TDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) (Thermal Design Power) feature and, when that feature is enabled, the turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) threshold effectively becomes a configurable value that can be set by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) platform firmware.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) the entire range of available P-states, including the whole turbo range, to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) ``CPUFreq`` core and (in the passive mode) to generic scaling governors. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) generally causes turbo P-states to be set more often when ``intel_pstate`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) for more information).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) (even if the Configurable TDP feature is enabled in the processor), its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) work as expected in all cases (that is, if set to disable turbo P-states, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) always should prevent ``intel_pstate`` from using them).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) Processor Support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) To handle a given processor ``intel_pstate`` requires a number of different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) pieces of information on it to be known, including:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) * The minimum supported P-state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) * The maximum supported `non-turbo P-state <turbo_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) * Whether or not turbo P-states are supported at all.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) are supported).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) * The scaling formula to translate the driver's internal representation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) of P-states into frequencies and the other way around.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) Generally, ways to obtain that information are specific to the processor model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) or family. Although it often is possible to obtain all of it from the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) itself (using model-specific registers), there are cases in which hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) manuals need to be consulted to get to it too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) For this reason, there is a list of supported processors in ``intel_pstate`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) the driver initialization will fail if the detected processor is not in that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) list, unless it supports the HWP feature. [The interface to obtain all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) information listed above is the same for all of the processors supporting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) HWP feature, which is why ``intel_pstate`` works with all of them.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) User Space Interface in ``sysfs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) =================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) Global Attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) ``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) control its functionality at the system level. They are located in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) ``/sys/devices/system/cpu/intel_pstate/`` directory and affect all CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) Some of them are not present if the ``intel_pstate=per_cpu_perf_limits``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) argument is passed to the kernel in the command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) ``max_perf_pct``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) Maximum P-state the driver is allowed to set in percent of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) maximum supported performance level (the highest supported `turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) P-state <turbo_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) This attribute will not be exposed if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) ``min_perf_pct``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) Minimum P-state the driver is allowed to set in percent of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) maximum supported performance level (the highest supported `turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) P-state <turbo_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) This attribute will not be exposed if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) ``num_pstates``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) Number of P-states supported by the processor (between 0 and 255
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) inclusive) including both turbo and non-turbo P-states (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) `Turbo P-states Support`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) The value of this attribute is not affected by the ``no_turbo``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) setting described `below <no_turbo_attr_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) This attribute is read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) ``turbo_pct``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) Ratio of the `turbo range <turbo_>`_ size to the size of the entire
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) range of supported P-states, in percent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) This attribute is read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) .. _no_turbo_attr:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) ``no_turbo``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) If set (equal to 1), the driver is not allowed to set any turbo P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) (see `Turbo P-states Support`_). If unset (equalt to 0, which is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) default), turbo P-states can be set by the driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) [Note that ``intel_pstate`` does not support the general ``boost``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) attribute (supported by some other scaling drivers) which is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) by this one.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) This attrubute does not affect the maximum supported frequency value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) supplied to the ``CPUFreq`` core and exposed via the policy interface,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) but it affects the maximum possible value of per-policy P-state limits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) (see `Interpretation of Policy Attributes`_ below for details).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) ``hwp_dynamic_boost``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) This attribute is only present if ``intel_pstate`` works in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) `active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) the processor. If set (equal to 1), it causes the minimum P-state limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) to be increased dynamically for a short time whenever a task previously
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) waiting on I/O is selected to run on a given logical CPU (the purpose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) of this mechanism is to improve performance).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) This setting has no effect on logical CPUs whose minimum P-state limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) is directly set to the highest non-turbo P-state or above it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) .. _status_attr:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) ``status``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) Operation mode of the driver: "active", "passive" or "off".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) "active"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) The driver is functional and in the `active mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) <Active Mode_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) "passive"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) The driver is functional and in the `passive mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) <Passive Mode_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) "off"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) The driver is not functional (it is not registered as a scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) driver with the ``CPUFreq`` core).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) This attribute can be written to in order to change the driver's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) operation mode or to unregister it. The string written to it must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) one of the possible values of it and, if successful, the write will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) cause the driver to switch over to the operation mode represented by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) that string - or to be unregistered in the "off" case. [Actually,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) switching over from the active mode to the passive mode or the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) way around causes the driver to be unregistered and registered again
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) with a different set of callbacks, so all of its settings (the global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) as well as the per-policy ones) are then reset to their default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) values, possibly depending on the target operation mode.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) ``energy_efficiency``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) This attribute is only present on platforms with CPUs matching the Kaby
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) Lake or Coffee Lake desktop CPU model. By default, energy-efficiency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) optimizations are disabled on these CPU models if HWP is enabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) Enabling energy-efficiency optimizations may limit maximum operating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) frequency with or without the HWP feature. With HWP enabled, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) optimizations are done only in the turbo frequency range. Without it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) they are done in the entire available frequency range. Setting this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) attribute to "1" enables the energy-efficiency optimizations and setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) to "0" disables them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) Interpretation of Policy Attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) The interpretation of some ``CPUFreq`` policy attributes described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) :doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) and it generally depends on the driver's `operation mode <Operation Modes_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) ``scaling_cur_freq`` attributes are produced by applying a processor-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) multiplier to the internal P-state representation used by ``intel_pstate``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) attributes are capped by the frequency corresponding to the maximum P-state that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) the driver is allowed to set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) ``scaling_min_freq`` to go down to that value if they were above it before.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) restored after unsetting ``no_turbo``, unless these attributes have been written
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) to after ``no_turbo`` was set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) If ``no_turbo`` is not set, the maximum possible value of ``scaling_max_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) which also is the value of ``cpuinfo_max_freq`` in either case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) Next, the following policy attributes have special meaning if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) ``intel_pstate`` works in the `active mode <Active Mode_>`_:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) ``scaling_available_governors``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) List of P-state selection algorithms provided by ``intel_pstate``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) ``scaling_governor``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) P-state selection algorithm provided by ``intel_pstate`` currently in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) use with the given policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) ``scaling_cur_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) Frequency of the average P-state of the CPU represented by the given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) policy for the time interval between the last two invocations of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) driver's utilization update callback by the CPU scheduler for that CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) One more policy attribute is present if the HWP feature is enabled in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) processor:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) ``base_frequency``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) Shows the base frequency of the CPU. Any frequency above this will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) in the turbo frequency range.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) same as for other scaling drivers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) depends on the operation mode of the driver. Namely, it is either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) "intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) `passive mode <Passive Mode_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) Coordination of P-State Limits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) ``intel_pstate`` allows P-state limits to be set in two ways: with the help of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) <Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) ``CPUFreq`` policy attributes. The coordination between those limits is based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) on the following rules, regardless of the current operation mode of the driver:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) 1. All CPUs are affected by the global limits (that is, none of them can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) requested to run faster than the global maximum and none of them can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) requested to run slower than the global minimum).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) 2. Each individual CPU is affected by its own per-policy limits (that is, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) cannot be requested to run faster than its own per-policy maximum and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) cannot be requested to run slower than its own per-policy minimum). The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) effective performance depends on whether the platform supports per core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) P-states, hyper-threading is enabled and on current performance requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) from other CPUs. When platform doesn't support per core P-states, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) effective performance can be more than the policy limits set on a CPU, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) other CPUs are requesting higher performance at that moment. Even with per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) core P-states support, when hyper-threading is enabled, if the sibling CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) is requesting higher performance, the other siblings will get higher
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) performance than their policy limits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) 3. The global and per-policy limits can be set independently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) resulting effective values are written into hardware registers whenever the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) limits change in order to request its internal P-state selection logic to always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) set P-states within these limits. Otherwise, the limits are taken into account
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) every time before setting a new P-state for a CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) at all and the only way to set the limits is by using the policy attributes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) Energy vs Performance Hints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) If the hardware-managed P-states (HWP) is enabled in the processor, additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) attributes, intended to allow user space to help ``intel_pstate`` to adjust the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) processor's internal P-state selection logic by focusing it on performance or on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) energy-efficiency, or somewhere between the two extremes, are present in every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) ``CPUFreq`` policy directory in ``sysfs``. They are :
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) ``energy_performance_preference``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) Current value of the energy vs performance hint for the given policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) (or the CPU represented by it).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) The hint can be changed by writing to this attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) ``energy_performance_available_preferences``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) List of strings that can be written to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) ``energy_performance_preference`` attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) They represent different energy vs performance hints and should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) self-explanatory, except that ``default`` represents whatever hint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) value was set by the platform firmware.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) Strings written to the ``energy_performance_preference`` attribute are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) internally translated to integer values written to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) Energy-Performance Preference (EPP) knob (if supported) or its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) Energy-Performance Bias (EPB) knob. It is also possible to write a positive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) integer value between 0 to 255, if the EPP feature is present. If the EPP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) feature is not present, writing integer value to this attribute is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) supported. In this case, user can use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) "/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) [Note that tasks may by migrated from one CPU to another by the scheduler's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) load-balancing algorithm and if different energy vs performance hints are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) set for those CPUs, that may lead to undesirable outcomes. To avoid such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) issues it is better to set the same energy vs performance hint for all CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) or to pin every task potentially sensitive to them to a specific CPU.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) .. _acpi-cpufreq:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) ``intel_pstate`` vs ``acpi-cpufreq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) On the majority of systems supported by ``intel_pstate``, the ACPI tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) provided by the platform firmware contain ``_PSS`` objects returning information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) that can be used for CPU performance scaling (refer to the ACPI specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) [3]_ for details on the ``_PSS`` objects and the format of the information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) returned by them).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) The information returned by the ACPI ``_PSS`` objects is used by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) ``acpi-cpufreq`` scaling driver. On systems supported by ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) the ``acpi-cpufreq`` driver uses the same hardware CPU performance scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) interface, but the set of P-states it can use is limited by the ``_PSS``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) output.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) On those systems each ``_PSS`` object returns a list of P-states supported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) the corresponding CPU which basically is a subset of the P-states range that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) be used by ``intel_pstate`` on the same system, with one exception: the whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) `turbo range <turbo_>`_ is represented by one item in it (the topmost one). By
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) than the frequency of the highest non-turbo P-state listed by it, but the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) corresponding P-state representation (following the hardware specification)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) returned for it matches the maximum supported turbo P-state (or is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) special value 255 meaning essentially "go as high as you can get").
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) The list of P-states returned by ``_PSS`` is reflected by the table of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) available frequencies supplied by ``acpi-cpufreq`` to the ``CPUFreq`` core and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) scaling governors and the minimum and maximum supported frequencies reported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) it come from that list as well. In particular, given the special representation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) of the turbo range described above, this means that the maximum supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) frequency reported by ``acpi-cpufreq`` is higher by 1 MHz than the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) of the highest supported non-turbo P-state listed by ``_PSS`` which, of course,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) affects decisions made by the scaling governors, except for ``powersave`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) ``performance``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) For example, if a given governor attempts to select a frequency proportional to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) estimated CPU load and maps the load of 100% to the maximum supported frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) (possibly multiplied by a constant), then it will tend to choose P-states below
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) the turbo threshold if ``acpi-cpufreq`` is used as the scaling driver, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) in that case the turbo range corresponds to a small fraction of the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) band it can use (1 MHz vs 1 GHz or more). In consequence, it will only go to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) the turbo range for the highest loads and the other loads above 50% that might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) benefit from running at turbo frequencies will be given non-turbo P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) One more issue related to that may appear on systems supporting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) `Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) turbo threshold. Namely, if that is not coordinated with the lists of P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) returned by ``_PSS`` properly, there may be more than one item corresponding to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) a turbo P-state in those lists and there may be a problem with avoiding the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) turbo range (if desirable or necessary). Usually, to avoid using turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) by ``_PSS``, but that is not sufficient when there are other turbo P-states in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) the list returned by it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) `passive mode <Passive Mode_>`_, except that the number of P-states it can set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) is limited to the ones listed by the ACPI ``_PSS`` objects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) Kernel Command Line Options for ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) ================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) Several kernel command line options can be used to pass early-configuration-time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) parameters to ``intel_pstate`` in order to enforce specific behavior of it. All
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) of them have to be prepended with the ``intel_pstate=`` prefix.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) ``disable``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) Do not register ``intel_pstate`` as the scaling driver even if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) processor is supported by it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) ``active``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) ``passive``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) start with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) ``force``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) Register ``intel_pstate`` as the scaling driver instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) ``acpi-cpufreq`` even if the latter is preferred on the given system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) This may prevent some platform features (such as thermal controls and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) power capping) that rely on the availability of ACPI P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) information from functioning as expected, so it should be used with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) caution.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) This option does not work with processors that are not supported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) ``intel_pstate`` and on platforms where the ``pcc-cpufreq`` scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) driver is used instead of ``acpi-cpufreq``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) ``no_hwp``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) Do not enable the hardware-managed P-states (HWP) feature even if it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) supported by the processor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) ``hwp_only``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) Register ``intel_pstate`` as the scaling driver only if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) hardware-managed P-states (HWP) feature is supported by the processor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) ``support_acpi_ppc``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) Take ACPI ``_PPC`` performance limits into account.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) If the preferred power management profile in the FADT (Fixed ACPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) Description Table) is set to "Enterprise Server" or "Performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) Server", the ACPI ``_PPC`` limits are taken into account by default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) and this option has no effect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) ``per_cpu_perf_limits``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) Use per-logical-CPU P-State limits (see `Coordination of P-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) Limits`_ for details).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) Diagnostics and Tuning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) Trace Events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) There are two static trace events that can be used for ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) diagnostics. One of them is the ``cpu_frequency`` trace event generally used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) to ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) it works in the `active mode <Active Mode_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) The following sequence of shell commands can be used to enable them and see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) their output (if the kernel is generally configured to support event tracing)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) # cd /sys/kernel/debug/tracing/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) # echo 1 > events/power/pstate_sample/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) # echo 1 > events/power/cpu_frequency/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) # cat trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) ``cpu_frequency`` trace event will be triggered either by the ``schedutil``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) scaling governor (for the policies it is attached to), or by the ``CPUFreq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) core (for the policies with other scaling governors).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) ``ftrace``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) The ``ftrace`` interface can be used for low-level diagnostics of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) ``intel_pstate``. For example, to check how often the function to set a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) P-state is called, the ``ftrace`` filter can be set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) :c:func:`intel_pstate_set_pstate`::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) # cd /sys/kernel/debug/tracing/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) # cat available_filter_functions | grep -i pstate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) intel_pstate_set_pstate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) intel_pstate_cpu_init
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) # echo intel_pstate_set_pstate > set_ftrace_filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) # echo function > current_tracer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735) # cat trace | head -15
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) # tracer: function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) # entries-in-buffer/entries-written: 80/80 #P:4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) # _-----=> irqs-off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) # / _----=> need-resched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) # | / _---=> hardirq/softirq
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) # || / _--=> preempt-depth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) # ||| / delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) # TASK-PID CPU# |||| TIMESTAMP FUNCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) # | | | |||| | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) .. [1] Kristen Accardi, *Balancing Power and Performance in the Linux Kernel*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) https://events.static.linuxfound.org/sites/events/files/slides/LinuxConEurope_2015.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) .. [2] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) .. [3] *Advanced Configuration and Power Interface Specification*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf