Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Notes on Analysing Behaviour Using Events and Tracepoints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) :Author: Mel Gorman (PCL information heavily based on email from Ingo Molnar)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 1. Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Tracepoints (see Documentation/trace/tracepoints.rst) can be used without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) creating custom kernel modules to register probe functions using the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) tracing infrastructure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) Simplistically, tracepoints represent important events that can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) taken in conjunction with other tracepoints to build a "Big Picture" of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) what is going on within the system. There are a large number of methods for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) gathering and interpreting these events. Lacking any current Best Practises,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) this document describes some of the methods that can be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) This document assumes that debugfs is mounted on /sys/kernel/debug and that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) the appropriate tracing options have been configured into the kernel. It is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) assumed that the PCL tool tools/perf has been installed and is in your path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 2. Listing Available Events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 2.1 Standard Utilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) All possible events are visible from /sys/kernel/debug/tracing/events. Simply
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) calling::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)   $ find /sys/kernel/debug/tracing/events -type d
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) will give a fair indication of the number of events available.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 2.2 PCL (Performance Counters for Linux)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) ----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) Discovery and enumeration of all counters and events, including tracepoints,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) are available with the perf tool. Getting a list of available events is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) simple case of::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43)   $ perf list 2>&1 | grep Tracepoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44)   ext4:ext4_free_inode                     [Tracepoint event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45)   ext4:ext4_request_inode                  [Tracepoint event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)   ext4:ext4_allocate_inode                 [Tracepoint event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)   ext4:ext4_write_begin                    [Tracepoint event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)   ext4:ext4_ordered_write_end              [Tracepoint event]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49)   [ .... remaining output snipped .... ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 3. Enabling Events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 3.1 System-Wide Event Enabling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) See Documentation/trace/events.rst for a proper description on how events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) can be enabled system-wide. A short example of enabling all events related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) to page allocation would look something like::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)   $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 3.2 System-Wide Event Enabling with SystemTap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) ---------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) In SystemTap, tracepoints are accessible using the kernel.trace() function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) call. The following is an example that reports every 5 seconds what processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) were allocating the pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)   global page_allocs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)   probe kernel.trace("mm_page_alloc") {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)   	page_allocs[execname()]++
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)   function print_count() {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)   	printf ("%-25s %-s\n", "#Pages Allocated", "Process Name")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)   	foreach (proc in page_allocs-)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)   		printf("%-25d %s\n", page_allocs[proc], proc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)   	printf ("\n")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   	delete page_allocs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)   probe timer.s(5) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)           print_count()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 3.3 System-Wide Event Enabling with PCL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) By specifying the -a switch and analysing sleep, the system-wide events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) for a duration of time can be examined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)  $ perf stat -a \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 	-e kmem:mm_page_alloc -e kmem:mm_page_free \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 	-e kmem:mm_page_free_batched \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 	sleep 10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)  Performance counter stats for 'sleep 10':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)            9630  kmem:mm_page_alloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)            2143  kmem:mm_page_free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)            7424  kmem:mm_page_free_batched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)    10.002577764  seconds time elapsed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) Similarly, one could execute a shell and exit it as desired to get a report
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) at that point.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 3.4 Local Event Enabling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) Documentation/trace/ftrace.rst describes how to enable events on a per-thread
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) basis using set_ftrace_pid.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 3.5 Local Event Enablement with PCL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Events can be activated and tracked for the duration of a process on a local
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) basis using PCL such as follows.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)   $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 		 -e kmem:mm_page_free_batched ./hackbench 10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)   Time: 0.909
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)     Performance counter stats for './hackbench 10':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)           17803  kmem:mm_page_alloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)           12398  kmem:mm_page_free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)            4827  kmem:mm_page_free_batched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)     0.973913387  seconds time elapsed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 4. Event Filtering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) Documentation/trace/ftrace.rst covers in-depth how to filter events in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) ftrace.  Obviously using grep and awk of trace_pipe is an option as well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) as any script reading trace_pipe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 5. Analysing Event Variances with PCL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Any workload can exhibit variances between runs and it can be important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) to know what the standard deviation is. By and large, this is left to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) performance analyst to do it by hand. In the event that the discrete event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) occurrences are useful to the performance analyst, then perf can be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)   $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 			-e kmem:mm_page_free_batched ./hackbench 10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)   Time: 0.890
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)   Time: 0.895
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)   Time: 0.915
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)   Time: 1.001
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)   Time: 0.899
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)    Performance counter stats for './hackbench 10' (5 runs):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)           16630  kmem:mm_page_alloc         ( +-   3.542% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)           11486  kmem:mm_page_free	    ( +-   4.771% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)            4730  kmem:mm_page_free_batched  ( +-   2.325% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)     0.982653002  seconds time elapsed   ( +-   1.448% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) In the event that some higher-level event is required that depends on some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) aggregation of discrete events, then a script would need to be developed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) Using --repeat, it is also possible to view how events are fluctuating over
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) time on a system-wide basis using -a and sleep.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)   $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 		-e kmem:mm_page_free_batched \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 		-a --repeat 10 \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 		sleep 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)   Performance counter stats for 'sleep 1' (10 runs):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)            1066  kmem:mm_page_alloc         ( +-  26.148% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)             182  kmem:mm_page_free          ( +-   5.464% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)             890  kmem:mm_page_free_batched  ( +-  30.079% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)     1.002251757  seconds time elapsed   ( +-   0.005% )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 6. Higher-Level Analysis with Helper Scripts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) ============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) When events are enabled the events that are triggering can be read from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) /sys/kernel/debug/tracing/trace_pipe in human-readable format although binary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) options exist as well. By post-processing the output, further information can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) be gathered on-line as appropriate. Examples of post-processing might include
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)   - Reading information from /proc for the PID that triggered the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)   - Deriving a higher-level event from a series of lower-level events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)   - Calculating latencies between two events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) script that can read trace_pipe from STDIN or a copy of a trace. When used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) on-line, it can be interrupted once to generate a report without exiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) and twice to exit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) Simplistically, the script just reads STDIN and counts up events but it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) also can do more such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)   - Derive high-level events from many low-level events. If a number of pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)     are freed to the main allocator from the per-CPU lists, it recognises
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)     that as one per-CPU drain even though there is no specific tracepoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)     for that event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)   - It can aggregate based on PID or individual process number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)   - In the event memory is getting externally fragmented, it reports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)     on whether the fragmentation event was severe or moderate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)   - When receiving an event about a PID, it can record who the parent was so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)     that if large numbers of events are coming from very short-lived
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)     processes, the parent process responsible for creating all the helpers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)     can be identified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 7. Lower-Level Analysis with PCL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) There may also be a requirement to identify what functions within a program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) were generating events within the kernel. To begin this sort of analysis, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) data must be recorded. At the time of writing, this required root:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228)   $ perf record -c 1 \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 	-e kmem:mm_page_alloc -e kmem:mm_page_free \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 	-e kmem:mm_page_free_batched \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 	./hackbench 10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)   Time: 0.894
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)   [ perf record: Captured and wrote 0.733 MB perf.data (~32010 samples) ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) Note the use of '-c 1' to set the event period to sample. The default sample
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) period is quite high to minimise overhead but the information collected can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) very coarse as a result.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) This record outputted a file called perf.data which can be analysed using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) perf report.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)   $ perf report
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)   # Samples: 30922
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)   # Overhead    Command                     Shared Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)   # ........  .........  ................................
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)       87.27%  hackbench  [vdso]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)        6.85%  hackbench  /lib/i686/cmov/libc-2.9.so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)        2.62%  hackbench  /lib/ld-2.9.so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)        1.52%       perf  [vdso]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)        1.22%  hackbench  ./hackbench
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)        0.48%  hackbench  [kernel]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)        0.02%       perf  /lib/i686/cmov/libc-2.9.so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)        0.01%       perf  /usr/bin/perf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)        0.01%       perf  /lib/ld-2.9.so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)        0.00%  hackbench  /lib/i686/cmov/libpthread-2.9.so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)   # (For more details, try: perf report --sort comm,dso,symbol)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) According to this, the vast majority of events triggered on events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) within the VDSO. With simple binaries, this will often be the case so let's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) take a slightly different example. In the course of writing this, it was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) noticed that X was generating an insane amount of page allocations so let's look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) at it:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)   $ perf record -c 1 -f \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 		-e kmem:mm_page_alloc -e kmem:mm_page_free \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 		-e kmem:mm_page_free_batched \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 		-p `pidof X`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) This was interrupted after a few seconds and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)   $ perf report
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)   # Samples: 27666
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)   # Overhead  Command                            Shared Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)   # ........  .......  .......................................
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)       51.95%     Xorg  [vdso]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)       47.95%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)        0.09%     Xorg  /lib/i686/cmov/libc-2.9.so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)        0.01%     Xorg  [kernel]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)   # (For more details, try: perf report --sort comm,dso,symbol)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) So, almost half of the events are occurring in a library. To get an idea which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) symbol:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)   $ perf report --sort comm,dso,symbol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)   # Samples: 27666
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)   # Overhead  Command                            Shared Object  Symbol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)   # ........  .......  .......................................  ......
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)       51.95%     Xorg  [vdso]                                   [.] 0x000000ffffe424
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)       47.93%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] pixmanFillsse2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)        0.09%     Xorg  /lib/i686/cmov/libc-2.9.so               [.] _int_malloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)        0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] pixman_region32_copy_f
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)        0.01%     Xorg  [kernel]                                 [k] read_hpet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)        0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] get_fast_path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)        0.00%     Xorg  [kernel]                                 [k] ftrace_trace_userstack
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) To see where within the function pixmanFillsse2 things are going wrong:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)   $ perf annotate pixmanFillsse2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314)   [ ... ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)     0.00 :         34eeb:       0f 18 08                prefetcht0 (%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)          :      }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)          :
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)          :      extern __inline void __attribute__((__gnu_inline__, __always_inline__, _
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)          :      _mm_store_si128 (__m128i *__P, __m128i __B) :      {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)          :        *__P = __B;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)    12.40 :         34eee:       66 0f 7f 80 40 ff ff    movdqa %xmm0,-0xc0(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)     0.00 :         34ef5:       ff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)    12.40 :         34ef6:       66 0f 7f 80 50 ff ff    movdqa %xmm0,-0xb0(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)     0.00 :         34efd:       ff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)    12.39 :         34efe:       66 0f 7f 80 60 ff ff    movdqa %xmm0,-0xa0(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)     0.00 :         34f05:       ff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)    12.67 :         34f06:       66 0f 7f 80 70 ff ff    movdqa %xmm0,-0x90(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)     0.00 :         34f0d:       ff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)    12.58 :         34f0e:       66 0f 7f 40 80          movdqa %xmm0,-0x80(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)    12.31 :         34f13:       66 0f 7f 40 90          movdqa %xmm0,-0x70(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)    12.40 :         34f18:       66 0f 7f 40 a0          movdqa %xmm0,-0x60(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)    12.31 :         34f1d:       66 0f 7f 40 b0          movdqa %xmm0,-0x50(%eax)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) At a glance, it looks like the time is being spent copying pixmaps to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) the card.  Further investigation would be needed to determine why pixmaps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) are being copied around so much but a starting point would be to take an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) ancient build of libpixmap out of the library path where it was totally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) forgotten about from months ago!