Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ``intel_pstate`` CPU Performance Scaling Driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) :Copyright: |copy| 2017 Intel Corporation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) General Information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) ``intel_pstate`` is a part of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) :doc:`CPU performance scaling subsystem <cpufreq>` in the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) (``CPUFreq``).  It is a scaling driver for the Sandy Bridge and later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) generations of Intel processors.  Note, however, that some of those processors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) may not be supported.  [To understand ``intel_pstate`` it is necessary to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) how ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) you have not done that yet.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) For the processors supported by ``intel_pstate``, the P-state concept is broader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) than just an operating frequency or an operating performance point (see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) LinuxCon Europe 2015 presentation by Kristen Accardi [1]_ for more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) information about that).  For this reason, the representation of P-states used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) by ``intel_pstate`` internally follows the hardware specification (for details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) refer to Intel Software Developer’s Manual [2]_).  However, the ``CPUFreq`` core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) uses frequencies for identifying operating performance points of CPUs and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) frequencies are involved in the user space interface exposed by it, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) ``intel_pstate`` maps its internal representation of P-states to frequencies too
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) (fortunately, that mapping is unambiguous).  At the same time, it would not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) practical for ``intel_pstate`` to supply the ``CPUFreq`` core with a table of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) available frequencies due to the possible size of it, so the driver does not do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) that.  Some functionality of the core is limited by that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) Since the hardware P-state selection interface used by ``intel_pstate`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) available at the logical CPU level, the driver always works with individual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) CPUs.  Consequently, if ``intel_pstate`` is in use, every ``CPUFreq`` policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) object corresponds to one logical CPU and ``CPUFreq`` policies are effectively
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) equivalent to CPUs.  In particular, this means that they become "inactive" every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) time the corresponding CPU is taken offline and need to be re-initialized when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) it goes back online.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) ``intel_pstate`` is not modular, so it cannot be unloaded, which means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) only way to pass early-configuration-time parameters to it is via the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) command line.  However, its configuration can be adjusted via ``sysfs`` to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) great extent.  In some configurations it even is possible to unregister it via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) ``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) registered (see `below <status_attr_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) Operation Modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) ``intel_pstate`` can operate in two different modes, active or passive.  In the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) active mode, it uses its own internal performance scaling governor algorithm or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) allows the hardware to do preformance scaling by itself, while in the passive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) mode it responds to requests made by a generic ``CPUFreq`` governor implementing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) a certain performance scaling algorithm.  Which of them will be in effect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) depends on what kernel command line options are used and on the capabilities of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) the processor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) Active Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) This is the default operation mode of ``intel_pstate`` for processors with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) hardware-managed P-states (HWP) support.  If it works in this mode, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) contains the string "intel_pstate".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) provides its own scaling algorithms for P-state selection.  Those algorithms
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) can be applied to ``CPUFreq`` policies in the same way as generic scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) governors (that is, through the ``scaling_governor`` policy attribute in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) ``sysfs``).  [Note that different P-state selection algorithms may be chosen for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) different policies, but that is not recommended.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) They are not generic scaling governors, but their names are the same as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) names of some of those governors.  Moreover, confusingly enough, they generally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) do not work in the same way as the generic governors they share the names with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) For example, the ``powersave`` P-state selection algorithm provided by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) ``intel_pstate`` is not a counterpart of the generic ``powersave`` governor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) (roughly, it corresponds to the ``schedutil`` and ``ondemand`` governors).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) There are two P-state selection algorithms provided by ``intel_pstate`` in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) active mode: ``powersave`` and ``performance``.  The way they both operate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) depends on whether or not the hardware-managed P-states (HWP) feature has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) enabled in the processor and possibly on the processor model.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) Which of the P-state selection algorithms is used by default depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) Namely, if that option is set, the ``performance`` algorithm will be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) default, and the other one will be used by default if it is not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) Active Mode With HWP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) ~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) If the processor supports the HWP feature, it will be enabled during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) processor initialization and cannot be disabled after that.  It is possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) to avoid enabling it by passing the ``intel_pstate=no_hwp`` argument to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) kernel in the command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) If the HWP feature has been enabled, ``intel_pstate`` relies on the processor to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) select P-states by itself, but still it can give hints to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) internal P-state selection logic.  What those hints are depends on which P-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) selection algorithm has been applied to the given policy (or to the CPU it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) corresponds to).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Even though the P-state selection is carried out by the processor automatically,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) ``intel_pstate`` registers utilization update callbacks with the CPU scheduler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) in this mode.  However, they are not used for running a P-state selection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) algorithm, but for periodic updates of the current CPU frequency information to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) be made available from the ``scaling_cur_freq`` policy attribute in ``sysfs``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) HWP + ``performance``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) .....................
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) In this configuration ``intel_pstate`` will write 0 to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Energy-Performance Preference (EPP) knob (if supported) or its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) internal P-state selection logic is expected to focus entirely on performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) This will override the EPP/EPB setting coming from the ``sysfs`` interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) (see `Energy vs Performance Hints`_ below).  Moreover, any attempts to change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) configuration will be rejected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) Also, in this configuration the range of P-states available to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) internal P-state selection logic is always restricted to the upper boundary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) (that is, the maximum P-state that the driver is allowed to use).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) HWP + ``powersave``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ...................
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) In this configuration ``intel_pstate`` will set the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) Energy-Performance Preference (EPP) knob (if supported) or its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Energy-Performance Bias (EPB) knob (otherwise) to whatever value it was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) previously set to via ``sysfs`` (or whatever default value it was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) set to by the platform firmware).  This usually causes the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) internal P-state selection logic to be less performance-focused.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) Active Mode Without HWP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) ~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) This operation mode is optional for processors that do not support the HWP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) the command line.  The active mode is used in those cases if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) ``intel_pstate=active`` argument is passed to the kernel in the command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) In this mode ``intel_pstate`` may refuse to work with processors that are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) recognized by it.  [Note that ``intel_pstate`` will never refuse to work with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) any processor with the HWP feature enabled.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) In this mode ``intel_pstate`` registers utilization update callbacks with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) CPU scheduler in order to run a P-state selection algorithm, either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) ``powersave`` or ``performance``, depending on the ``scaling_governor`` policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) setting in ``sysfs``.  The current CPU frequency information to be made
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) available from the ``scaling_cur_freq`` policy attribute in ``sysfs`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) periodically updated by those utilization update callbacks too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) ``performance``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) ...............
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) Without HWP, this P-state selection algorithm is always the same regardless of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) the processor model and platform configuration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) It selects the maximum P-state it is allowed to use, subject to limits set via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) ``sysfs``, every time the driver configuration for the given CPU is updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) (e.g. via ``sysfs``).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) This is the default P-state selection algorithm if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) ``powersave``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) .............
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) Without HWP, this P-state selection algorithm is similar to the algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) implemented by the generic ``schedutil`` scaling governor except that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) utilization metric used by it is based on numbers coming from feedback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) registers of the CPU.  It generally selects P-states proportional to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) current CPU utilization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) This algorithm is run by the driver's utilization update callback for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) given CPU when it is invoked by the CPU scheduler, but not more often than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) every 10 ms.  Like in the ``performance`` case, the hardware configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) is not touched if the new P-state turns out to be the same as the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) This is the default P-state selection algorithm if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) is not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) Passive Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) This is the default operation mode of ``intel_pstate`` for processors without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) hardware-managed P-states (HWP) support.  It is always used if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) ``intel_pstate=passive`` argument is passed to the kernel in the command line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) regardless of whether or not the given processor supports HWP.  [Note that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) ``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) if it is not combined with ``intel_pstate=active``.]  Like in the active mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) without HWP support, in this mode ``intel_pstate`` may refuse to work with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) processors that are not recognized by it if HWP is prevented from being enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) through the kernel command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) If the driver works in this mode, the ``scaling_driver`` policy attribute in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) ``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Then, the driver behaves like a regular ``CPUFreq`` scaling driver.  That is,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) it is invoked by generic scaling governors when necessary to talk to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) hardware in order to change the P-state of a CPU (in particular, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) ``schedutil`` governor can invoke it directly from scheduler context).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) While in this mode, ``intel_pstate`` can be used with all of the (generic)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) scaling governors listed by the ``scaling_available_governors`` policy attribute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) in ``sysfs`` (and the P-state selection algorithms described above are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) used).  Then, it is responsible for the configuration of policy objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) corresponding to CPUs and provides the ``CPUFreq`` core (and the scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) governors attached to the policy objects) with accurate information on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) maximum and minimum operating frequencies supported by the hardware (including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) the so-called "turbo" frequency ranges).  In other words, in the passive mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) the entire range of available P-states is exposed by ``intel_pstate`` to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) ``CPUFreq`` core.  However, in this mode the driver does not register
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) utilization update callbacks with the CPU scheduler and the ``scaling_cur_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) information comes from the ``CPUFreq`` core (and is the last frequency selected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) by the current scaling governor for the given policy).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) .. _turbo:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) Turbo P-states Support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) In the majority of cases, the entire range of P-states available to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) ``intel_pstate`` can be divided into two sub-ranges that correspond to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) different types of processor behavior, above and below a boundary that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) will be referred to as the "turbo threshold" in what follows.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) The P-states above the turbo threshold are referred to as "turbo P-states" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) the whole sub-range of P-states they belong to is referred to as the "turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) range".  These names are related to the Turbo Boost technology allowing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) multicore processor to opportunistically increase the P-state of one or more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) cores if there is enough power to do that and if that is not going to cause the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) thermal envelope of the processor package to be exceeded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Specifically, if software sets the P-state of a CPU core within the turbo range
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) (that is, above the turbo threshold), the processor is permitted to take over
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) performance scaling control for that core and put it into turbo P-states of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) choice going forward.  However, that permission is interpreted differently by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) different processor generations.  Namely, the Sandy Bridge generation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) processors will never use any P-states above the last one set by software for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) the given core, even if it is within the turbo range, whereas all of the later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) processor generations will take it as a license to use any P-states from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) turbo range, even above the one set by software.  In other words, on those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) processors setting any P-state from the turbo range will enable the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) to put the given core into all turbo P-states up to and including the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) supported one as it sees fit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) One important property of turbo P-states is that they are not sustainable.  More
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) precisely, there is no guarantee that any CPUs will be able to stay in any of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) those states indefinitely, because the power distribution within the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) package may change over time  or the thermal envelope it was designed for might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) be exceeded if a turbo P-state was used for too long.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) In turn, the P-states below the turbo threshold generally are sustainable.  In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) fact, if one of them is set by software, the processor is not expected to change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) it to a lower one unless in a thermal stress or a power limit violation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) situation (a higher P-state may still be used if it is set for another CPU in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) the same package at the same time, for example).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) Some processors allow multiple cores to be in turbo P-states at the same time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) but the maximum P-state that can be set for them generally depends on the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) of cores running concurrently.  The maximum turbo P-state that can be set for 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) cores at the same time usually is lower than the analogous maximum P-state for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 2 cores, which in turn usually is lower than the maximum turbo P-state that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) be set for 1 core.  The one-core maximum turbo P-state is thus the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) supported one overall.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) The maximum supported turbo P-state, the turbo threshold (the maximum supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) non-turbo P-state) and the minimum supported P-state are specific to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) processor model and can be determined by reading the processor's model-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) registers (MSRs).  Moreover, some processors support the Configurable TDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) (Thermal Design Power) feature and, when that feature is enabled, the turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) threshold effectively becomes a configurable value that can be set by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) platform firmware.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) the entire range of available P-states, including the whole turbo range, to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) ``CPUFreq`` core and (in the passive mode) to generic scaling governors.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) generally causes turbo P-states to be set more often when ``intel_pstate`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) for more information).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) (even if the Configurable TDP feature is enabled in the processor), its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) work as expected in all cases (that is, if set to disable turbo P-states, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) always should prevent ``intel_pstate`` from using them).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) Processor Support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) To handle a given processor ``intel_pstate`` requires a number of different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) pieces of information on it to be known, including:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)  * The minimum supported P-state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310)  * The maximum supported `non-turbo P-state <turbo_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)  * Whether or not turbo P-states are supported at all.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314)  * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)    are supported).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)  * The scaling formula to translate the driver's internal representation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)    of P-states into frequencies and the other way around.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) Generally, ways to obtain that information are specific to the processor model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) or family.  Although it often is possible to obtain all of it from the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) itself (using model-specific registers), there are cases in which hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) manuals need to be consulted to get to it too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) For this reason, there is a list of supported processors in ``intel_pstate`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) the driver initialization will fail if the detected processor is not in that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) list, unless it supports the HWP feature.  [The interface to obtain all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) information listed above is the same for all of the processors supporting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) HWP feature, which is why ``intel_pstate`` works with all of them.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) User Space Interface in ``sysfs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) =================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) Global Attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) ``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) control its functionality at the system level.  They are located in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) ``/sys/devices/system/cpu/intel_pstate/`` directory and affect all CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) Some of them are not present if the ``intel_pstate=per_cpu_perf_limits``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) argument is passed to the kernel in the command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) ``max_perf_pct``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 	Maximum P-state the driver is allowed to set in percent of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) 	maximum supported performance level (the highest supported `turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) 	P-state <turbo_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) 	This attribute will not be exposed if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) 	``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) 	command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) ``min_perf_pct``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 	Minimum P-state the driver is allowed to set in percent of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) 	maximum supported performance level (the highest supported `turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) 	P-state <turbo_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 	This attribute will not be exposed if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) 	``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) 	command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) ``num_pstates``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) 	Number of P-states supported by the processor (between 0 and 255
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) 	inclusive) including both turbo and non-turbo P-states (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 	`Turbo P-states Support`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) 	The value of this attribute is not affected by the ``no_turbo``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 	setting described `below <no_turbo_attr_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) 	This attribute is read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) ``turbo_pct``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) 	Ratio of the `turbo range <turbo_>`_ size to the size of the entire
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 	range of supported P-states, in percent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) 	This attribute is read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) .. _no_turbo_attr:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) ``no_turbo``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) 	If set (equal to 1), the driver is not allowed to set any turbo P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) 	(see `Turbo P-states Support`_).  If unset (equalt to 0, which is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) 	default), turbo P-states can be set by the driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) 	[Note that ``intel_pstate`` does not support the general ``boost``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) 	attribute (supported by some other scaling drivers) which is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) 	by this one.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) 	This attrubute does not affect the maximum supported frequency value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) 	supplied to the ``CPUFreq`` core and exposed via the policy interface,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) 	but it affects the maximum possible value of per-policy P-state	limits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) 	(see `Interpretation of Policy Attributes`_ below for details).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) ``hwp_dynamic_boost``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 	This attribute is only present if ``intel_pstate`` works in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) 	`active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) 	the processor.  If set (equal to 1), it causes the minimum P-state limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) 	to be increased dynamically for a short time whenever a task previously
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) 	waiting on I/O is selected to run on a given logical CPU (the purpose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) 	of this mechanism is to improve performance).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) 	This setting has no effect on logical CPUs whose minimum P-state limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 	is directly set to the highest non-turbo P-state or above it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) .. _status_attr:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) ``status``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) 	Operation mode of the driver: "active", "passive" or "off".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) 	"active"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) 		The driver is functional and in the `active mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 		<Active Mode_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) 	"passive"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) 		The driver is functional and in the `passive mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) 		<Passive Mode_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) 	"off"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 		The driver is not functional (it is not registered as a scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) 		driver with the ``CPUFreq`` core).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) 	This attribute can be written to in order to change the driver's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) 	operation mode or to unregister it.  The string written to it must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) 	one of the possible values of it and, if successful, the write will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) 	cause the driver to switch over to the operation mode represented by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) 	that string - or to be unregistered in the "off" case.  [Actually,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) 	switching over from the active mode to the passive mode or the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) 	way around causes the driver to be unregistered and registered again
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) 	with a different set of callbacks, so all of its settings (the global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) 	as well as the per-policy ones) are then reset to their default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) 	values, possibly depending on the target operation mode.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) ``energy_efficiency``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) 	This attribute is only present on platforms with CPUs matching the Kaby
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) 	Lake or Coffee Lake desktop CPU model. By default, energy-efficiency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 	optimizations are disabled on these CPU models if HWP is enabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) 	Enabling energy-efficiency optimizations may limit maximum operating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) 	frequency with or without the HWP feature.  With HWP enabled, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) 	optimizations are done only in the turbo frequency range.  Without it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) 	they are done in the entire available frequency range.  Setting this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) 	attribute to "1" enables the energy-efficiency optimizations and setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) 	to "0" disables them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) Interpretation of Policy Attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) The interpretation of some ``CPUFreq`` policy attributes described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) :doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) and it generally depends on the driver's `operation mode <Operation Modes_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) ``scaling_cur_freq`` attributes are produced by applying a processor-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) multiplier to the internal P-state representation used by ``intel_pstate``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) attributes are capped by the frequency corresponding to the maximum P-state that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) the driver is allowed to set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) ``scaling_min_freq`` to go down to that value if they were above it before.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) restored after unsetting ``no_turbo``, unless these attributes have been written
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) to after ``no_turbo`` was set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) If ``no_turbo`` is not set, the maximum possible value of ``scaling_max_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) which also is the value of ``cpuinfo_max_freq`` in either case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) Next, the following policy attributes have special meaning if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) ``intel_pstate`` works in the `active mode <Active Mode_>`_:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) ``scaling_available_governors``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) 	List of P-state selection algorithms provided by ``intel_pstate``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) ``scaling_governor``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) 	P-state selection algorithm provided by ``intel_pstate`` currently in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) 	use with the given policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) ``scaling_cur_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) 	Frequency of the average P-state of the CPU represented by the given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) 	policy for the time interval between the last two invocations of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) 	driver's utilization update callback by the CPU scheduler for that CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) One more policy attribute is present if the HWP feature is enabled in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) processor:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) ``base_frequency``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 	Shows the base frequency of the CPU. Any frequency above this will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) 	in the turbo frequency range.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) same as for other scaling drivers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) depends on the operation mode of the driver.  Namely, it is either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) "intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) `passive mode <Passive Mode_>`_).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) Coordination of P-State Limits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) ``intel_pstate`` allows P-state limits to be set in two ways: with the help of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) <Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) ``CPUFreq`` policy attributes.  The coordination between those limits is based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) on the following rules, regardless of the current operation mode of the driver:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510)  1. All CPUs are affected by the global limits (that is, none of them can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511)     requested to run faster than the global maximum and none of them can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512)     requested to run slower than the global minimum).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514)  2. Each individual CPU is affected by its own per-policy limits (that is, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515)     cannot be requested to run faster than its own per-policy maximum and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516)     cannot be requested to run slower than its own per-policy minimum). The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517)     effective performance depends on whether the platform supports per core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518)     P-states, hyper-threading is enabled and on current performance requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519)     from other CPUs. When platform doesn't support per core P-states, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520)     effective performance can be more than the policy limits set on a CPU, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521)     other CPUs are requesting higher performance at that moment. Even with per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522)     core P-states support, when hyper-threading is enabled, if the sibling CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)     is requesting higher performance, the other siblings will get higher
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524)     performance than their policy limits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526)  3. The global and per-policy limits can be set independently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) resulting effective values are written into hardware registers whenever the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) limits change in order to request its internal P-state selection logic to always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) set P-states within these limits.  Otherwise, the limits are taken into account
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) every time before setting a new P-state for a CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) at all and the only way to set the limits is by using the policy attributes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) Energy vs Performance Hints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) If the hardware-managed P-states (HWP) is enabled in the processor, additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) attributes, intended to allow user space to help ``intel_pstate`` to adjust the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) processor's internal P-state selection logic by focusing it on performance or on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) energy-efficiency, or somewhere between the two extremes, are present in every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) ``CPUFreq`` policy directory in ``sysfs``.  They are :
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) ``energy_performance_preference``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) 	Current value of the energy vs performance hint for the given policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) 	(or the CPU represented by it).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) 	The hint can be changed by writing to this attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) ``energy_performance_available_preferences``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) 	List of strings that can be written to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) 	``energy_performance_preference`` attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) 	They represent different energy vs performance hints and should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) 	self-explanatory, except that ``default`` represents whatever hint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) 	value was set by the platform firmware.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) Strings written to the ``energy_performance_preference`` attribute are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) internally translated to integer values written to the processor's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) Energy-Performance Preference (EPP) knob (if supported) or its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) Energy-Performance Bias (EPB) knob. It is also possible to write a positive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) integer value between 0 to 255, if the EPP feature is present. If the EPP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) feature is not present, writing integer value to this attribute is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) supported. In this case, user can use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) "/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) [Note that tasks may by migrated from one CPU to another by the scheduler's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) load-balancing algorithm and if different energy vs performance hints are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) set for those CPUs, that may lead to undesirable outcomes.  To avoid such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) issues it is better to set the same energy vs performance hint for all CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) or to pin every task potentially sensitive to them to a specific CPU.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) .. _acpi-cpufreq:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) ``intel_pstate`` vs ``acpi-cpufreq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) On the majority of systems supported by ``intel_pstate``, the ACPI tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) provided by the platform firmware contain ``_PSS`` objects returning information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) that can be used for CPU performance scaling (refer to the ACPI specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) [3]_ for details on the ``_PSS`` objects and the format of the information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) returned by them).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) The information returned by the ACPI ``_PSS`` objects is used by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) ``acpi-cpufreq`` scaling driver.  On systems supported by ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) the ``acpi-cpufreq`` driver uses the same hardware CPU performance scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) interface, but the set of P-states it can use is limited by the ``_PSS``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) output.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) On those systems each ``_PSS`` object returns a list of P-states supported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) the corresponding CPU which basically is a subset of the P-states range that can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) be used by ``intel_pstate`` on the same system, with one exception: the whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) `turbo range <turbo_>`_ is represented by one item in it (the topmost one).  By
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) than the frequency of the highest non-turbo P-state listed by it, but the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) corresponding P-state representation (following the hardware specification)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) returned for it matches the maximum supported turbo P-state (or is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) special value 255 meaning essentially "go as high as you can get").
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) The list of P-states returned by ``_PSS`` is reflected by the table of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) available frequencies supplied by ``acpi-cpufreq`` to the ``CPUFreq`` core and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) scaling governors and the minimum and maximum supported frequencies reported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) it come from that list as well.  In particular, given the special representation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) of the turbo range described above, this means that the maximum supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) frequency reported by ``acpi-cpufreq`` is higher by 1 MHz than the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) of the highest supported non-turbo P-state listed by ``_PSS`` which, of course,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) affects decisions made by the scaling governors, except for ``powersave`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) ``performance``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) For example, if a given governor attempts to select a frequency proportional to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) estimated CPU load and maps the load of 100% to the maximum supported frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) (possibly multiplied by a constant), then it will tend to choose P-states below
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) the turbo threshold if ``acpi-cpufreq`` is used as the scaling driver, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) in that case the turbo range corresponds to a small fraction of the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) band it can use (1 MHz vs 1 GHz or more).  In consequence, it will only go to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) the turbo range for the highest loads and the other loads above 50% that might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) benefit from running at turbo frequencies will be given non-turbo P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) One more issue related to that may appear on systems supporting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) `Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) turbo threshold.  Namely, if that is not coordinated with the lists of P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) returned by ``_PSS`` properly, there may be more than one item corresponding to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) a turbo P-state in those lists and there may be a problem with avoiding the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) turbo range (if desirable or necessary).  Usually, to avoid using turbo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) by ``_PSS``, but that is not sufficient when there are other turbo P-states in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) the list returned by it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) `passive mode <Passive Mode_>`_, except that the number of P-states it can set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) is limited to the ones listed by the ACPI ``_PSS`` objects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) Kernel Command Line Options for ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) ================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) Several kernel command line options can be used to pass early-configuration-time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) parameters to ``intel_pstate`` in order to enforce specific behavior of it.  All
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) of them have to be prepended with the ``intel_pstate=`` prefix.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) ``disable``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) 	Do not register ``intel_pstate`` as the scaling driver even if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) 	processor is supported by it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) ``active``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) 	Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) 	with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) ``passive``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) 	Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) 	start with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) ``force``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) 	Register ``intel_pstate`` as the scaling driver instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) 	``acpi-cpufreq`` even if the latter is preferred on the given system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) 	This may prevent some platform features (such as thermal controls and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) 	power capping) that rely on the availability of ACPI P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) 	information from functioning as expected, so it should be used with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) 	caution.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) 	This option does not work with processors that are not supported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) 	``intel_pstate`` and on platforms where the ``pcc-cpufreq`` scaling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) 	driver is used instead of ``acpi-cpufreq``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) ``no_hwp``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) 	Do not enable the hardware-managed P-states (HWP) feature even if it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) 	supported by the processor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) ``hwp_only``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) 	Register ``intel_pstate`` as the scaling driver only if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) 	hardware-managed P-states (HWP) feature is supported by the processor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) ``support_acpi_ppc``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) 	Take ACPI ``_PPC`` performance limits into account.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) 	If the preferred power management profile in the FADT (Fixed ACPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) 	Description Table) is set to "Enterprise Server" or "Performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) 	Server", the ACPI ``_PPC`` limits are taken into account by default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) 	and this option has no effect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) ``per_cpu_perf_limits``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) 	Use per-logical-CPU P-State limits (see `Coordination of P-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) 	Limits`_ for details).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) Diagnostics and Tuning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) Trace Events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) There are two static trace events that can be used for ``intel_pstate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) diagnostics.  One of them is the ``cpu_frequency`` trace event generally used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) to ``intel_pstate``.  Both of them are triggered by ``intel_pstate`` only if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) it works in the `active mode <Active Mode_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) The following sequence of shell commands can be used to enable them and see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) their output (if the kernel is generally configured to support event tracing)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708)  # cd /sys/kernel/debug/tracing/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709)  # echo 1 > events/power/pstate_sample/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710)  # echo 1 > events/power/cpu_frequency/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711)  # cat trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712)  gnome-terminal--4510  [001] ..s.  1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713)  cat-5235  [002] ..s.  1177.681723: cpu_frequency: state=2900000 cpu_id=2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) ``cpu_frequency`` trace event will be triggered either by the ``schedutil``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) scaling governor (for the policies it is attached to), or by the ``CPUFreq``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) core (for the policies with other scaling governors).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) ``ftrace``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) The ``ftrace`` interface can be used for low-level diagnostics of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) ``intel_pstate``.  For example, to check how often the function to set a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) P-state is called, the ``ftrace`` filter can be set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) :c:func:`intel_pstate_set_pstate`::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728)  # cd /sys/kernel/debug/tracing/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729)  # cat available_filter_functions | grep -i pstate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730)  intel_pstate_set_pstate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731)  intel_pstate_cpu_init
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732)  ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733)  # echo intel_pstate_set_pstate > set_ftrace_filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734)  # echo function > current_tracer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)  # cat trace | head -15
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736)  # tracer: function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737)  #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738)  # entries-in-buffer/entries-written: 80/80   #P:4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739)  #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740)  #                              _-----=> irqs-off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741)  #                             / _----=> need-resched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742)  #                            | / _---=> hardirq/softirq
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743)  #                            || / _--=> preempt-depth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744)  #                            ||| /     delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745)  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746)  #              | |       |   ||||       |         |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747)              Xorg-3129  [000] ..s.  2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748)   gnome-terminal--4510  [002] ..s.  2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749)       gnome-shell-3409  [001] ..s.  2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750)            <idle>-0     [000] ..s.  2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) .. [1] Kristen Accardi, *Balancing Power and Performance in the Linux Kernel*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757)        https://events.static.linuxfound.org/sites/events/files/slides/LinuxConEurope_2015.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) .. [2] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760)        https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) .. [3] *Advanced Configuration and Power Interface Specification*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763)        https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf