Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) .. _imc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) IMC (In-Memory Collection Counters)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Anju T Sudhakar, 10 May 2019
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) .. contents::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11)     :depth: 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) Basic overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) IMC (In-Memory collection counters) is a hardware monitoring facility that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) collects large numbers of hardware performance events at Nest level (these are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) on-chip but off-core), Core level and Thread level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) (On-Chip Controller) complex. The microcode collects the counter data and moves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) the nest IMC counter data to memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) The Core and Thread IMC PMU counters are handled in the core. Core level PMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) counters give us the IMC counters' data per core and thread level PMU counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) give us the IMC counters' data per CPU thread.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) OPAL obtains the IMC PMU and supported events information from the IMC Catalog
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) and passes on to the kernel via the device tree. The event's information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) contains:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) - Event name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) - Event Offset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) - Event description
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) and possibly also:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) - Event scale
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) - Event unit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) Some PMUs may have a common scale and unit values for all their supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) events. For those cases, the scale and unit properties for those events must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) inherited from the PMU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) The event offset in the memory is where the counter data gets accumulated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) IMC catalog is available at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 	https://github.com/open-power/ima-catalog
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) The kernel discovers the IMC counters information in the device tree at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) `imc-counters` device node which has a compatible field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) `ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) and their event's information and register the PMU and its attributes in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) IMC example usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)   # perf list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)   [...]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)   nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/            [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)   nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/            [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)   [...]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)   core_imc/CPM_0THRD_NON_IDLE_PCYC/                  [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)   core_imc/CPM_1THRD_NON_IDLE_INST/                  [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)   [...]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)   thread_imc/CPM_0THRD_NON_IDLE_PCYC/                [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)   thread_imc/CPM_1THRD_NON_IDLE_INST/                [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)   # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) To see non-idle instructions for core 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) To see non-idle instructions for a "make":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)   # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) IMC Trace-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) POWER9 supports two modes for IMC which are the Accumulation mode and Trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) mode. In Accumulation mode, event counts are accumulated in system Memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) Hypervisor then reads the posted counts periodically or when requested. In IMC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) Trace mode, the 64 bit trace SCOM value is initialized with the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) to be monitored and the sampling duration. On each overflow in the CPMCxSEL,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) hardware snapshots the program counter along with event counts and writes into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) memory pointed by LDBAR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) LDBAR is a 64 bit special purpose per thread register, it has bits to indicate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) whether hardware is configured for accumulation or trace mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) LDBAR Register Layout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)   | 0     | Enable/Disable       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)   | 1     | 0: Accumulation Mode |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)   |       +----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)   |       | 1: Trace Mode        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)   | 2:3   | Reserved             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)   | 4-6   | PB scope             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)   | 7     | Reserved             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)   | 8:50  | Counter Address      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)   | 51:63 | Reserved             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)   +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) TRACE_IMC_SCOM bit representation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)   | 0:1   | SAMPSEL    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)   | 2:33  | CPMC_LOAD  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)   | 34:40 | CPMC1SEL   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)   | 41:47 | CPMC2SEL   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)   | 48:50 | BUFFERSIZE |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)   | 51:63 | RESERVED   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)   +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) event to count. BUFFERSIZE indicates the memory range. On each overflow,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) hardware snapshots the program counter along with event counts and updates the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) memory and reloads the CMPC_LOAD value for the next sampling duration. IMC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) hardware does not support exceptions, so it quietly wraps around if memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) buffer reaches the end.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) *Currently the event monitored for trace-mode is fixed as cycle.*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) Trace IMC example usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)   # perf list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)   [....]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)   trace_imc/trace_cycles/                            [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) To record an application/process with trace-imc event:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)   # perf record -e trace_imc/trace_cycles/ yes > /dev/null
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)   [ perf record: Woken up 1 times to write data ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)   [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) The `perf.data` generated, can be read using perf report.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Benefits of using IMC trace-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) trace mode snapshots the program counter and updates to the memory. And this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) also provide a way for the operating system to do instruction sampling in real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) time without PMI processing overhead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) Performance data using `perf top` with and without trace-imc event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) PMI interrupts count when `perf top` command is executed without trace-imc event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)   # grep PMI /proc/interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)   PMI:          0          0          0          0   Performance monitoring interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)   # ./perf top
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)   ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)   # grep PMI /proc/interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)   PMI:      39735       8710      17338      17801   Performance monitoring interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)   # ./perf top -e trace_imc/trace_cycles/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)   ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)   # grep PMI /proc/interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)   PMI:      39735       8710      17338      17801   Performance monitoring interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) That is, the PMI interrupt counts do not increment when using the `trace_imc` event.