Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Performance Counters for Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Performance counters are special hardware registers available on most modern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) CPUs. These registers count the number of certain types of hw events: such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) as instructions executed, cachemisses suffered, or branches mis-predicted -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) without slowing down the kernel or applications. These registers can also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) trigger interrupts when a threshold number of events have passed - and can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) thus be used to profile the code that runs on that CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) The Linux Performance Counter subsystem provides an abstraction of these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) hardware capabilities. It provides per task and per CPU counters, counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) groups, and it provides event capabilities on top of those.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) provides "virtual" 64-bit counters, regardless of the width of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) underlying hardware counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Performance counters are accessed via special file descriptors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) There's one file descriptor per virtual counter used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) The special file descriptor is opened via the sys_perf_event_open()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) system call:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24)    int sys_perf_event_open(struct perf_event_attr *hw_event_uptr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 			     pid_t pid, int cpu, int group_fd,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 			     unsigned long flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) The syscall returns the new fd. The fd can be used via the normal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) VFS system calls: read() can be used to read the counter, fcntl()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) can be used to set the blocking mode, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) Multiple counters can be kept open at a time, and the counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) can be poll()ed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) When creating a new counter fd, 'perf_event_attr' is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) struct perf_event_attr {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)         /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39)          * The MSB of the config word signifies if the rest contains cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)          * specific (raw) counter configuration data, if unset, the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41)          * 7 bits are an event type and the rest of the bits are the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42)          * identifier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43)          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44)         __u64                   config;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)         __u64                   irq_period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)         __u32                   record_type;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)         __u32                   read_format;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)         __u64                   disabled       :  1, /* off by default        */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)                                 inherit        :  1, /* children inherit it   */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52)                                 pinned         :  1, /* must always be on PMU */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)                                 exclusive      :  1, /* only group on PMU     */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)                                 exclude_user   :  1, /* don't count user      */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)                                 exclude_kernel :  1, /* ditto kernel          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)                                 exclude_hv     :  1, /* ditto hypervisor      */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)                                 exclude_idle   :  1, /* don't count when idle */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)                                 mmap           :  1, /* include mmap data     */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)                                 munmap         :  1, /* include munmap data   */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)                                 comm           :  1, /* include comm data     */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)                                 __reserved_1   : 52;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)         __u32                   extra_config_len;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)         __u32                   wakeup_events;  /* wakeup every n events */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)         __u64                   __reserved_2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)         __u64                   __reserved_3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) The 'config' field specifies what the counter should count.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) is divided into 3 bit-fields:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) raw_type: 1 bit   (most significant bit)	0x8000_0000_0000_0000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) type:	  7 bits  (next most significant)	0x7f00_0000_0000_0000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) event_id: 56 bits (least significant)		0x00ff_ffff_ffff_ffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) If 'raw_type' is 1, then the counter will count a hardware event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) specified by the remaining 63 bits of event_config.  The encoding is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) machine-specific.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) If 'raw_type' is 0, then the 'type' field says what kind of counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) this is, with the following encoding:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) enum perf_type_id {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 	PERF_TYPE_HARDWARE		= 0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 	PERF_TYPE_SOFTWARE		= 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 	PERF_TYPE_TRACEPOINT		= 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) A counter of PERF_TYPE_HARDWARE will count the hardware event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) specified by 'event_id':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)  * Generalized performance counter event types, used by the hw_event.event_id
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)  * parameter of the sys_perf_event_open() syscall:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) enum perf_hw_id {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 	 * Common hardware events, generalized by the kernel:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	PERF_COUNT_HW_CPU_CYCLES		= 0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	PERF_COUNT_HW_INSTRUCTIONS		= 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 	PERF_COUNT_HW_CACHE_REFERENCES		= 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 	PERF_COUNT_HW_CACHE_MISSES		= 3,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 	PERF_COUNT_HW_BRANCH_INSTRUCTIONS	= 4,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 	PERF_COUNT_HW_BRANCH_MISSES		= 5,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 	PERF_COUNT_HW_BUS_CYCLES		= 6,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) These are standardized types of events that work relatively uniformly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) on all CPUs that implement Performance Counters support under Linux,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) although there may be variations (e.g., different CPUs might count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) cache references and misses at different levels of the cache hierarchy).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) If a CPU is not able to count the selected event, then the system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) will return -EINVAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) More hw_event_types are supported as well, but they are CPU-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) and accessed as raw events.  For example, to count "External bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) cycles while bus lock signal asserted" events on Intel Core CPUs, pass
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) in a 0x4064 event_id value and set hw_event.raw_type to 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) A counter of type PERF_TYPE_SOFTWARE will count one of the available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) software events, selected by 'event_id':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)  * Special "software" counters provided by the kernel, even if the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)  * does not support performance counters. These counters measure various
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)  * physical and sw events of the kernel (and allow the profiling of them as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)  * well):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) enum perf_sw_ids {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 	PERF_COUNT_SW_CPU_CLOCK		= 0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 	PERF_COUNT_SW_TASK_CLOCK	= 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 	PERF_COUNT_SW_PAGE_FAULTS	= 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 	PERF_COUNT_SW_CONTEXT_SWITCHES	= 3,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 	PERF_COUNT_SW_CPU_MIGRATIONS	= 4,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 	PERF_COUNT_SW_PAGE_FAULTS_MIN	= 5,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 	PERF_COUNT_SW_PAGE_FAULTS_MAJ	= 6,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 	PERF_COUNT_SW_ALIGNMENT_FAULTS	= 7,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 	PERF_COUNT_SW_EMULATION_FAULTS	= 8,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) Counters of the type PERF_TYPE_TRACEPOINT are available when the ftrace event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) tracer is available, and event_id values can be obtained from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) /debug/tracing/events/*/*/id
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) Counters come in two flavours: counting counters and sampling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) counters.  A "counting" counter is one that is used for counting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) number of events that occur, and is characterised by having
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) irq_period = 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) A read() on a counter returns the current value of the counter and possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) additional values as specified by 'read_format', each value is a u64 (8 bytes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) in size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)  * Bits that can be set in hw_event.read_format to request that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)  * reads on the counter should return the indicated quantities,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)  * in increasing order of bit value, after the counter value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) enum perf_event_read_format {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)         PERF_FORMAT_TOTAL_TIME_ENABLED  =  1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)         PERF_FORMAT_TOTAL_TIME_RUNNING  =  2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) Using these additional values one can establish the overcommit ratio for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) particular counter allowing one to take the round-robin scheduling effect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) into account.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) A "sampling" counter is one that is set up to generate an interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) every N events, where N is given by 'irq_period'.  A sampling counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) has irq_period > 0. The record_type controls what data is recorded on each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) interrupt:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)  * Bits that can be set in hw_event.record_type to request information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)  * in the overflow packets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) enum perf_event_record_format {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)         PERF_RECORD_IP          = 1U << 0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)         PERF_RECORD_TID         = 1U << 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)         PERF_RECORD_TIME        = 1U << 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)         PERF_RECORD_ADDR        = 1U << 3,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)         PERF_RECORD_GROUP       = 1U << 4,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)         PERF_RECORD_CALLCHAIN   = 1U << 5,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) Such (and other) events will be recorded in a ring-buffer, which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) available to user-space using mmap() (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) The 'disabled' bit specifies whether the counter starts out disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) or enabled.  If it is initially disabled, it can be enabled by ioctl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) or prctl (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) The 'inherit' bit, if set, specifies that this counter should count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) events on descendant tasks as well as the task specified.  This only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) applies to new descendents, not to any existing descendents at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) time the counter is created (nor to any new descendents of existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) descendents).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) The 'pinned' bit, if set, specifies that the counter should always be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) on the CPU if at all possible.  It only applies to hardware counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) and only to group leaders.  If a pinned counter cannot be put onto the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) CPU (e.g. because there are not enough hardware counters or because of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) a conflict with some other event), then the counter goes into an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 'error' state, where reads return end-of-file (i.e. read() returns 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) until the counter is subsequently enabled or disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) The 'exclusive' bit, if set, specifies that when this counter's group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) is on the CPU, it should be the only group using the CPU's counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) In future, this will allow sophisticated monitoring programs to supply
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) extra configuration information via 'extra_config_len' to exploit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) advanced features of the CPU's Performance Monitor Unit (PMU) that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) not otherwise accessible and that might disrupt other hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) way to request that counting of events be restricted to times when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) CPU is in user, kernel and/or hypervisor mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) to request counting of events restricted to guest and host contexts when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) using Linux as the hypervisor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) operations, these can be used to relate userspace IP addresses to actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) code, even after the mapping (or even the whole process) is gone,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) these events are recorded in the ring-buffer (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) The 'comm' bit allows tracking of process comm data on process creation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) This too is recorded in the ring-buffer (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) The 'pid' parameter to the sys_perf_event_open() system call allows the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) counter to be specific to a task:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)  pid == 0: if the pid parameter is zero, the counter is attached to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)  current task.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)  pid > 0: the counter is attached to a specific task (if the current task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)  has sufficient privilege to do so)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)  pid < 0: all tasks are counted (per cpu counters)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) The 'cpu' parameter allows a counter to be made specific to a CPU:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)  cpu >= 0: the counter is restricted to a specific CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)  cpu == -1: the counter counts on all CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) (Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) events of that task and 'follows' that task to whatever CPU the task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) gets schedule to. Per task counters can be created by any user, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) their own tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) all events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) privilege.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) The 'flags' parameter is currently unused and must be zero.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) The 'group_fd' parameter allows counter "groups" to be set up.  A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) counter group has one counter which is the group "leader".  The leader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) is created first, with group_fd = -1 in the sys_perf_event_open call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) that creates it.  The rest of the group members are created
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) subsequently, with group_fd giving the fd of the group leader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) (A single counter on its own is created with group_fd = -1 and is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) considered to be a group with only 1 member.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) A counter group is scheduled onto the CPU as a unit, that is, it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) only be put onto the CPU if all of the counters in the group can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) put onto the CPU.  This means that the values of the member counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) can be meaningfully compared, added, divided (to get ratios), etc.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) with each other, since they have counted events for the same set of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) executed instructions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) Like stated, asynchronous events, like counter overflow or PROT_EXEC mmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) tracking are logged into a ring-buffer. This ring-buffer is created and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) accessed through mmap().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) The mmap size should be 1+2^n pages, where the first page is a meta-data page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) (struct perf_event_mmap_page) that contains various bits of information such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) as where the ring-buffer head is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)  * Structure of the page that can be mapped via mmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) struct perf_event_mmap_page {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)         __u32   version;                /* version number of this structure */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)         __u32   compat_version;         /* lowest version this is compat with */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)         /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)          * Bits needed to read the hw counters in user-space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)          *   u32 seq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)          *   s64 count;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)          *   do {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)          *     seq = pc->lock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)          *     barrier()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)          *     if (pc->index) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)          *       count = pmc_read(pc->index - 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)          *       count += pc->offset;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310)          *     } else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311)          *       goto regular_read;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)          *     barrier();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314)          *   } while (pc->lock != seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)          * NOTE: for obvious reason this only works on self-monitoring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)          *       processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)         __u32   lock;                   /* seqlock for synchronization */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)         __u32   index;                  /* hardware counter identifier */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)         __s64   offset;                 /* add to hardware counter value */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)         /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)          * Control data for the mmap() data buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)          * User-space reading this value should issue an rmb(), on SMP capable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)          * platforms, after reading this value -- see perf_event_wakeup().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)         __u32   data_head;              /* head in the data section */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) NOTE: the hw-counter userspace bits are arch specific and are currently only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)       implemented on powerpc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) The following 2^n pages are the ring-buffer which contains events of the form:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) #define PERF_RECORD_MISC_KERNEL          (1 << 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) #define PERF_RECORD_MISC_USER            (1 << 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) #define PERF_RECORD_MISC_OVERFLOW        (1 << 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) struct perf_event_header {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342)         __u32   type;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343)         __u16   misc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344)         __u16   size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) enum perf_event_type {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349)         /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350)          * The MMAP events record the PROT_EXEC mappings so that we can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351)          * correlate userspace IPs to code. They have the following structure:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)          * struct {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)          *      struct perf_event_header        header;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)          *      u32                             pid, tid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)          *      u64                             addr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358)          *      u64                             len;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)          *      u64                             pgoff;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)          *      char                            filename[];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361)          * };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363)         PERF_RECORD_MMAP                 = 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364)         PERF_RECORD_MUNMAP               = 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366)         /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)          * struct {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368)          *      struct perf_event_header        header;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370)          *      u32                             pid, tid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)          *      char                            comm[];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372)          * };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374)         PERF_RECORD_COMM                 = 3,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)         /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377)          * When header.misc & PERF_RECORD_MISC_OVERFLOW the event_type field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)          * will be PERF_RECORD_*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)          * struct {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)          *      struct perf_event_header        header;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)          *      { u64                   ip;       } && PERF_RECORD_IP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)          *      { u32                   pid, tid; } && PERF_RECORD_TID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385)          *      { u64                   time;     } && PERF_RECORD_TIME
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)          *      { u64                   addr;     } && PERF_RECORD_ADDR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388)          *      { u64                   nr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)          *        { u64 event, val; }   cnt[nr];  } && PERF_RECORD_GROUP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390)          *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391)          *      { u16                   nr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)          *                              hv,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)          *                              kernel,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)          *                              user;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)          *        u64                   ips[nr];  } && PERF_RECORD_CALLCHAIN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396)          * };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)          */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) NOTE: PERF_RECORD_CALLCHAIN is arch specific and currently only implemented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401)       on x86.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) Notification of new events is possible through poll()/select()/epoll() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) fcntl() managing signals.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) Normally a notification is generated for every page filled, however one can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) additionally set perf_event_attr.wakeup_events to generate one every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) so many counter overflow events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) Future work will include a splice() interface to the ring-buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) Counters can be enabled and disabled in two ways: via ioctl and via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) prctl.  When a counter is disabled, it doesn't count or generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) events but does continue to exist and maintain its count value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) An individual counter can be enabled with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 	ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) or disabled with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) 	ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) For a counter group, pass PERF_IOC_FLAG_GROUP as the third argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) Enabling or disabling the leader of a group enables or disables the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) whole group; that is, while the group leader is disabled, none of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) counters in the group will count.  Enabling or disabling a member of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) group other than the leader only affects that counter - disabling an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) non-leader stops that counter from counting but doesn't affect any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) other counter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) Additionally, non-inherited overflow counters can use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) 	ioctl(fd, PERF_EVENT_IOC_REFRESH, nr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) to enable a counter for 'nr' events, after which it gets disabled again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) A process can enable or disable all the counter groups that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) attached to it, using prctl:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) 	prctl(PR_TASK_PERF_EVENTS_ENABLE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) 	prctl(PR_TASK_PERF_EVENTS_DISABLE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) This applies to all counters on the current process, whether created
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) by this process or by another, and doesn't affect any counters that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) this process has created on other processes.  It only enables or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) disables the group leaders, not any other members in the groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) Arch requirements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) If your architecture does not have hardware performance metrics, you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) still use the generic software counters based on hrtimers for sampling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) So to start with, in order to add HAVE_PERF_EVENTS to your Kconfig, you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) will need at least this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) 	- asm/perf_event.h - a basic stub will suffice at first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) 	- support for atomic64 types (and associated helper functions)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) If your architecture does have hardware capabilities, you can override the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) weak stub hw_perf_event_init() to register hardware counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) Architectures that have d-cache aliassing issues, such as Sparc and ARM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) should select PERF_USE_VMALLOC in order to avoid these for perf mmap().