Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) ==============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ``intel_idle`` CPU Idle Time Management Driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ==============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) :Copyright: |copy| 2020 Intel Corporation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) General Information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) ``intel_idle`` is a part of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) :doc:`CPU idle time management subsystem <cpuidle>` in the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) (``CPUIdle``).  It is the default CPU idle time management driver for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) Nehalem and later generations of Intel processors, but the level of support for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) a particular processor model in it depends on whether or not it recognizes that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) processor model and may also depend on information coming from the platform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) firmware.  [To understand ``intel_idle`` it is necessary to know how ``CPUIdle``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) works in general, so this is the time to get familiar with :doc:`cpuidle` if you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) have not done that yet.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) ``intel_idle`` uses the ``MWAIT`` instruction to inform the processor that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) logical CPU executing it is idle and so it may be possible to put some of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) processor's functional blocks into low-power states.  That instruction takes two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) arguments (passed in the ``EAX`` and ``ECX`` registers of the target CPU), the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) first of which, referred to as a *hint*, can be used by the processor to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) determine what can be done (for details refer to Intel Software Developer’s
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) Manual [1]_).  Accordingly, ``intel_idle`` refuses to work with processors in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) which the support for the ``MWAIT`` instruction has been disabled (for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) via the platform firmware configuration menu) or which do not support that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) instruction at all.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) ``intel_idle`` is not modular, so it cannot be unloaded, which means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) only way to pass early-configuration-time parameters to it is via the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) command line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) .. _intel-idle-enumeration-of-states:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) Enumeration of Idle States
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) Each ``MWAIT`` hint value is interpreted by the processor as a license to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) reconfigure itself in a certain way in order to save energy.  The processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) configurations (with reduced power draw) resulting from that are referred to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) as C-states (in the ACPI terminology) or idle states.  The list of meaningful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) ``MWAIT`` hint values and idle states (i.e. low-power configurations of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) processor) corresponding to them depends on the processor model and it may also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) depend on the configuration of the platform.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) In order to create a list of available idle states required by the ``CPUIdle``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) subsystem (see :ref:`idle-states-representation` in :doc:`cpuidle`),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) ``intel_idle`` can use two sources of information: static tables of idle states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) for different processor models included in the driver itself and the ACPI tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) of the system.  The former are always used if the processor model at hand is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) recognized by ``intel_idle`` and the latter are used if that is required for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) the given processor model (which is the case for all server processor models
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) recognized by ``intel_idle``) or if the processor model is not recognized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) [There is a module parameter that can be used to make the driver use the ACPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) tables with any processor model recognized by it; see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) `below <intel-idle-parameters_>`_.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) If the ACPI tables are going to be used for building the list of available idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) states, ``intel_idle`` first looks for a ``_CST`` object under one of the ACPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) objects corresponding to the CPUs in the system (refer to the ACPI specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) [2]_ for the description of ``_CST`` and its output package).  Because the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) ``CPUIdle`` subsystem expects that the list of idle states supplied by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) driver will be suitable for all of the CPUs handled by it and ``intel_idle`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) registered as the ``CPUIdle`` driver for all of the CPUs in the system, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) driver looks for the first ``_CST`` object returning at least one valid idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) state description and such that all of the idle states included in its return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) package are of the FFH (Functional Fixed Hardware) type, which means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) ``MWAIT`` instruction is expected to be used to tell the processor that it can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) enter one of them.  The return package of that ``_CST`` is then assumed to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) applicable to all of the other CPUs in the system and the idle state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) descriptions extracted from it are stored in a preliminary list of idle states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) coming from the ACPI tables.  [This step is skipped if ``intel_idle`` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) configured to ignore the ACPI tables; see `below <intel-idle-parameters_>`_.]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) Next, the first (index 0) entry in the list of available idle states is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) initialized to represent a "polling idle state" (a pseudo-idle state in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) the target CPU continuously fetches and executes instructions), and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) subsequent (real) idle state entries are populated as follows.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) If the processor model at hand is recognized by ``intel_idle``, there is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) (static) table of idle state descriptions for it in the driver.  In that case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) the "internal" table is the primary source of information on idle states and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) information from it is copied to the final list of available idle states.  If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) using the ACPI tables for the enumeration of idle states is not required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) (depending on the processor model), all of the listed idle state are enabled by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) default (so all of them will be taken into consideration by ``CPUIdle``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) governors during CPU idle state selection).  Otherwise, some of the listed idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) states may not be enabled by default if there are no matching entries in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) preliminary list of idle states coming from the ACPI tables.  In that case user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) space still can enable them later (on a per-CPU basis) with the help of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) the ``disable`` idle state attribute in ``sysfs`` (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) :ref:`idle-states-representation` in :doc:`cpuidle`).  This basically means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) the idle states "known" to the driver may not be enabled by default if they have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) not been exposed by the platform firmware (through the ACPI tables).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) If the given processor model is not recognized by ``intel_idle``, but it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) supports ``MWAIT``, the preliminary list of idle states coming from the ACPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) tables is used for building the final list that will be supplied to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ``CPUIdle`` core during driver registration.  For each idle state in that list,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) the description, ``MWAIT`` hint and exit latency are copied to the corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) entry in the final list of idle states.  The name of the idle state represented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) by it (to be returned by the ``name`` idle state attribute in ``sysfs``) is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) "CX_ACPI", where X is the index of that idle state in the final list (note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) the minimum value of X is 1, because 0 is reserved for the "polling" state), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) its target residency is based on the exit latency value.  Specifically, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) C1-type idle states the exit latency value is also used as the target residency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) (for compatibility with the majority of the "internal" tables of idle states for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) various processor models recognized by ``intel_idle``) and for the other idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) state types (C2 and C3) the target residency value is 3 times the exit latency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) (again, that is because it reflects the target residency to exit latency ratio
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) in the majority of cases for the processor models recognized by ``intel_idle``).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) All of the idle states in the final list are enabled by default in this case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) .. _intel-idle-initialization:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) Initialization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) The initialization of ``intel_idle`` starts with checking if the kernel command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) line options forbid the use of the ``MWAIT`` instruction.  If that is the case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) an error code is returned right away.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) The next step is to check whether or not the processor model is known to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) driver, which determines the idle states enumeration method (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) `above <intel-idle-enumeration-of-states_>`_), and whether or not the processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) supports ``MWAIT`` (the initialization fails if that is not the case).  Then,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) the ``MWAIT`` support in the processor is enumerated through ``CPUID`` and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) driver initialization fails if the level of support is not as expected (for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) example, if the total number of ``MWAIT`` substates returned is 0).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) Next, if the driver is not configured to ignore the ACPI tables (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) `below <intel-idle-parameters_>`_), the idle states information provided by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) platform firmware is extracted from them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) Then, ``CPUIdle`` device objects are allocated for all CPUs and the list of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) available idle states is created as explained
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) `above <intel-idle-enumeration-of-states_>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) Finally, ``intel_idle`` is registered with the help of cpuidle_register_driver()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) as the ``CPUIdle`` driver for all CPUs in the system and a CPU online callback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) for configuring individual CPUs is registered via cpuhp_setup_state(), which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) (among other things) causes the callback routine to be invoked for all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) CPUs present in the system at that time (each CPU executes its own instance of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) the callback routine).  That routine registers a ``CPUIdle`` device for the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) running it (which enables the ``CPUIdle`` subsystem to operate that CPU) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) optionally performs some CPU-specific initialization actions that may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) required for the given processor model.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) .. _intel-idle-parameters:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Kernel Command Line Options and Module Parameters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) =================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) The *x86* architecture support code recognizes three kernel command line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) options related to CPU idle time management: ``idle=poll``, ``idle=halt``,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) and ``idle=nomwait``.  If any of them is present in the kernel command line, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) ``MWAIT`` instruction is not allowed to be used, so the initialization of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) ``intel_idle`` will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) Apart from that there are four module parameters recognized by ``intel_idle``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) itself that can be set via the kernel command line (they cannot be updated via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) sysfs, so that is the only way to change their values).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) The ``max_cstate`` parameter value is the maximum idle state index in the list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) of idle states supplied to the ``CPUIdle`` core during the registration of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) driver.  It is also the maximum number of regular (non-polling) idle states that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) can be used by ``intel_idle``, so the enumeration of idle states is terminated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) after finding that number of usable idle states (the other idle states that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) potentially might have been used if ``max_cstate`` had been greater are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) taken into consideration at all).  Setting ``max_cstate`` can prevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) ``intel_idle`` from exposing idle states that are regarded as "too deep" for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) some reason to the ``CPUIdle`` core, but it does so by making them effectively
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) invisible until the system is shut down and started again which may not always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) be desirable.  In practice, it is only really necessary to do that if the idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) states in question cannot be enabled during system startup, because in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) working state of the system the CPU power management quality of service (PM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) even if they have been enumerated (see :ref:`cpu-pm-qos` in :doc:`cpuidle`).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) if the kernel has been configured with ACPI support) can be set to make the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) driver ignore the system's ACPI tables entirely or use them for all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) recognized processor models, respectively (they both are unset by default and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ``use_acpi`` has no effect if ``no_acpi`` is set).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) The value of the ``states_off`` module parameter (0 by default) represents a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) list of idle states to be disabled by default in the form of a bitmask.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) Namely, the positions of the bits that are set in the ``states_off`` value are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) the indices of idle states to be disabled by default (as reflected by the names
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) of the corresponding idle state directories in ``sysfs``, :file:`state0`,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the index of the given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) idle state; see :ref:`idle-states-representation` in :doc:`cpuidle`).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) For example, if ``states_off`` is equal to 3, the driver will disable idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) states 0 and 1 by default, and if it is equal to 8, idle state 3 will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) disabled by default and so on (bit positions beyond the maximum idle state index
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) are ignored).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) The idle states disabled this way can be enabled (on a per-CPU basis) from user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) space via ``sysfs``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) .. _intel-idle-core-and-package-idle-states:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) Core and Package Levels of Idle States
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) Typically, in a processor supporting the ``MWAIT`` instruction there are (at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) least) two levels of idle states (or C-states).  One level, referred to as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) "core C-states", covers individual cores in the processor, whereas the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) level, referred to as "package C-states", covers the entire processor package
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) and it may also involve other components of the system (GPUs, memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) controllers, I/O hubs etc.).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) Some of the ``MWAIT`` hint values allow the processor to use core C-states only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) (most importantly, that is the case for the ``MWAIT`` hint value corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) to the ``C1`` idle state), but the majority of them give it a license to put
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) the target core (i.e. the core containing the logical CPU executing ``MWAIT``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) with the given hint value) into a specific core C-state and then (if possible)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) to enter a specific package C-state at the deeper level.  For example, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) ``MWAIT`` hint value representing the ``C3`` idle state allows the processor to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) put the target core into the low-power state referred to as "core ``C3``" (or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) ``CC3``), which happens if all of the logical CPUs (SMT siblings) in that core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) have executed ``MWAIT`` with the ``C3`` hint value (or with a hint value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) representing a deeper idle state), and in addition to that (in the majority of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) cases) it gives the processor a license to put the entire package (possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) including some non-CPU components such as a GPU or a memory controller) into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) low-power state referred to as "package ``C3``" (or ``PC3``), which happens if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) all of the cores have gone into the ``CC3`` state and (possibly) some additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) conditions are satisfied (for instance, if the GPU is covered by ``PC3``, it may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) be required to be in a certain GPU-specific low-power state for ``PC3`` to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) reachable).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) As a rule, there is no simple way to make the processor use core C-states only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) if the conditions for entering the corresponding package C-states are met, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) the logical CPU executing ``MWAIT`` with a hint value that is not core-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) only (like for ``C1``) must always assume that this may cause the processor to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) enter a package C-state.  [That is why the exit latency and target residency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) values corresponding to the majority of ``MWAIT`` hint values in the "internal"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) tables of idle states in ``intel_idle`` reflect the properties of package
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) C-states.]  If using package C-states is not desirable at all, either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) :ref:`PM QoS <cpu-pm-qos>` or the ``max_cstate`` module parameter of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) ``intel_idle`` described `above <intel-idle-parameters_>`_ must be used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) restrict the range of permissible idle states to the ones with core-level only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) ``MWAIT`` hint values (like ``C1``).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) .. [1] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)        https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2b-manual.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) .. [2] *Advanced Configuration and Power Interface (ACPI) Specification*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268)        https://uefi.org/specifications