^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) .. _imc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) IMC (In-Memory Collection Counters)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Anju T Sudhakar, 10 May 2019
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) .. contents::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) :depth: 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) Basic overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) IMC (In-Memory collection counters) is a hardware monitoring facility that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) collects large numbers of hardware performance events at Nest level (these are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) on-chip but off-core), Core level and Thread level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) (On-Chip Controller) complex. The microcode collects the counter data and moves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) the nest IMC counter data to memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) The Core and Thread IMC PMU counters are handled in the core. Core level PMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) counters give us the IMC counters' data per core and thread level PMU counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) give us the IMC counters' data per CPU thread.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) OPAL obtains the IMC PMU and supported events information from the IMC Catalog
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) and passes on to the kernel via the device tree. The event's information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) contains:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) - Event name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) - Event Offset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) - Event description
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) and possibly also:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) - Event scale
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) - Event unit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) Some PMUs may have a common scale and unit values for all their supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) events. For those cases, the scale and unit properties for those events must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) inherited from the PMU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) The event offset in the memory is where the counter data gets accumulated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) IMC catalog is available at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) https://github.com/open-power/ima-catalog
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) The kernel discovers the IMC counters information in the device tree at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) `imc-counters` device node which has a compatible field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) `ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) and their event's information and register the PMU and its attributes in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) IMC example usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) # perf list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) [...]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) [...]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) [...]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) To see non-idle instructions for core 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) To see non-idle instructions for a "make":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) IMC Trace-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) POWER9 supports two modes for IMC which are the Accumulation mode and Trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) mode. In Accumulation mode, event counts are accumulated in system Memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Hypervisor then reads the posted counts periodically or when requested. In IMC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) Trace mode, the 64 bit trace SCOM value is initialized with the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) to be monitored and the sampling duration. On each overflow in the CPMCxSEL,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) hardware snapshots the program counter along with event counts and writes into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) memory pointed by LDBAR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) LDBAR is a 64 bit special purpose per thread register, it has bits to indicate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) whether hardware is configured for accumulation or trace mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) LDBAR Register Layout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) | 0 | Enable/Disable |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) | 1 | 0: Accumulation Mode |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) | +----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) | | 1: Trace Mode |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) | 2:3 | Reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) | 4-6 | PB scope |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) | 7 | Reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) | 8:50 | Counter Address |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) | 51:63 | Reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) +-------+----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) TRACE_IMC_SCOM bit representation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) | 0:1 | SAMPSEL |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) | 2:33 | CPMC_LOAD |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) | 34:40 | CPMC1SEL |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) | 41:47 | CPMC2SEL |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) | 48:50 | BUFFERSIZE |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) | 51:63 | RESERVED |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) +-------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) event to count. BUFFERSIZE indicates the memory range. On each overflow,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) hardware snapshots the program counter along with event counts and updates the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) memory and reloads the CMPC_LOAD value for the next sampling duration. IMC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) hardware does not support exceptions, so it quietly wraps around if memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) buffer reaches the end.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) *Currently the event monitored for trace-mode is fixed as cycle.*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) Trace IMC example usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) # perf list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) [....]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) trace_imc/trace_cycles/ [Kernel PMU event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) To record an application/process with trace-imc event:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) # perf record -e trace_imc/trace_cycles/ yes > /dev/null
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) [ perf record: Woken up 1 times to write data ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) The `perf.data` generated, can be read using perf report.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Benefits of using IMC trace-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) trace mode snapshots the program counter and updates to the memory. And this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) also provide a way for the operating system to do instruction sampling in real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) time without PMI processing overhead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) Performance data using `perf top` with and without trace-imc event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) PMI interrupts count when `perf top` command is executed without trace-imc event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) .. code-block:: sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) # grep PMI /proc/interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) PMI: 0 0 0 0 Performance monitoring interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) # ./perf top
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) # grep PMI /proc/interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) PMI: 39735 8710 17338 17801 Performance monitoring interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) # ./perf top -e trace_imc/trace_cycles/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) # grep PMI /proc/interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) PMI: 39735 8710 17338 17801 Performance monitoring interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) That is, the PMI interrupt counts do not increment when using the `trace_imc` event.