^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Event Tracing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) :Author: Theodore Ts'o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) :Updated: Li Zefan and Tom Zanussi
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) 1. Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) Tracepoints (see Documentation/trace/tracepoints.rst) can be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) without creating custom kernel modules to register probe functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) using the event tracing infrastructure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Not all tracepoints can be traced using the event tracing system;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) the kernel developer must provide code snippets which define how the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) tracing information is saved into the tracing buffer, and how the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) tracing information should be printed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) 2. Using Event Tracing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) 2.1 Via the 'set_event' interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) The events which are available for tracing can be found in the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) /sys/kernel/debug/tracing/available_events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) To enable a particular event, such as 'sched_wakeup', simply echo it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) to /sys/kernel/debug/tracing/set_event. For example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) .. Note:: '>>' is necessary, otherwise it will firstly disable all the events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) To disable an event, echo the event name to the set_event file prefixed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) with an exclamation point::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) To disable all events, echo an empty line to the set_event file::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) # echo > /sys/kernel/debug/tracing/set_event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) To enable all events, echo ``*:*`` or ``*:`` to the set_event file::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) # echo *:* > /sys/kernel/debug/tracing/set_event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) The events are organized into subsystems, such as ext4, irq, sched,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) etc., and a full event name looks like this: <subsystem>:<event>. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) subsystem name is optional, but it is displayed in the available_events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) file. All of the events in a subsystem can be specified via the syntax
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) ``<subsystem>:*``; for example, to enable all irq events, you can use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) command::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) 2.2 Via the 'enable' toggle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) of directories.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) To enable event 'sched_wakeup'::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) To disable it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) To enable all events in sched subsystem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) To enable all events::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) # echo 1 > /sys/kernel/debug/tracing/events/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) When reading one of these enable files, there are four results:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) - 0 - all events this file affects are disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) - 1 - all events this file affects are enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) - X - there is a mixture of events enabled and disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) - ? - this file does not affect any event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) 2.3 Boot option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) In order to facilitate early boot debugging, use boot option::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) trace_event=[event-list]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) event-list is a comma separated list of events. See section 2.1 for event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) format.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) 3. Defining an event-enabled tracepoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) =======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) See The example provided in samples/trace_events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 4. Event formats
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Each trace event has a 'format' file associated with it that contains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) a description of each field in a logged event. This information can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) be used to parse the binary trace stream, and is also the place to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) find the field names that can be used in event filters (see section 5).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) It also displays the format string that will be used to print the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) event in text mode, along with the event name and ID used for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) profiling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) Every event has a set of ``common`` fields associated with it; these are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) the fields prefixed with ``common_``. The other fields vary between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) events and correspond to the fields defined in the TRACE_EVENT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) definition for that event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Each field in the format has the form::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) field:field-type field-name; offset:N; size:N;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) where offset is the offset of the field in the trace record and size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) is the size of the data item, in bytes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) For example, here's the information displayed for the 'sched_wakeup'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) event::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) name: sched_wakeup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) ID: 60
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) format:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) field:unsigned short common_type; offset:0; size:2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) field:unsigned char common_flags; offset:2; size:1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) field:unsigned char common_preempt_count; offset:3; size:1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) field:int common_pid; offset:4; size:4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) field:int common_tgid; offset:8; size:4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) field:char comm[TASK_COMM_LEN]; offset:12; size:16;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) field:pid_t pid; offset:28; size:4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) field:int prio; offset:32; size:4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) field:int success; offset:36; size:4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) field:int cpu; offset:40; size:4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) REC->prio, REC->success, REC->cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) This event contains 10 fields, the first 5 common and the remaining 5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) event-specific. All the fields for this event are numeric, except for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 'comm' which is a string, a distinction important for event filtering.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 5. Event filtering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Trace events can be filtered in the kernel by associating boolean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 'filter expressions' with them. As soon as an event is logged into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) the trace buffer, its fields are checked against the filter expression
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) associated with that event type. An event with field values that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 'match' the filter will appear in the trace output, and an event whose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) values don't match will be discarded. An event with no filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) associated with it matches everything, and is the default when no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) filter has been set for an event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 5.1 Expression syntax
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) A filter expression consists of one or more 'predicates' that can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) combined using the logical operators '&&' and '||'. A predicate is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) simply a clause that compares the value of a field contained within a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) logged event with a constant value and returns either 0 or 1 depending
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) on whether the field value matched (1) or didn't match (0)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) field-name relational-operator value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Parentheses can be used to provide arbitrary logical groupings and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) double-quotes can be used to prevent the shell from interpreting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) operators as shell metacharacters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) The field-names available for use in filters can be found in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 'format' files for trace events (see section 4).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) The relational-operators depend on the type of the field being tested:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) The operators available for numeric fields are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) ==, !=, <, <=, >, >=, &
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) And for string fields they are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) ==, !=, ~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) The glob (~) accepts a wild card character (\*,?) and character classes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ([). For example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) prev_comm ~ "*sh"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) prev_comm ~ "sh*"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) prev_comm ~ "*sh*"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) prev_comm ~ "ba*sh"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) If the field is a pointer that points into user space (for example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) "filename" from sys_enter_openat), then you have to append ".ustring" to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) field name::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) filename.ustring ~ "password"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) As the kernel will have to know how to retrieve the memory that the pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) is at from user space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 5.2 Setting filters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) A filter for an individual event is set by writing a filter expression
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) to the 'filter' file for the given event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) For example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) # echo "common_preempt_count > 4" > filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) A slightly more involved example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) # cd /sys/kernel/debug/tracing/events/signal/signal_generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) If there is an error in the expression, you'll get an 'Invalid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) argument' error when setting it, and the erroneous string along with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) an error message can be seen by looking at the filter e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) # cd /sys/kernel/debug/tracing/events/signal/signal_generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) -bash: echo: write error: Invalid argument
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) # cat filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) parse_error: Field not found
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) Currently the caret ('^') for an error always appears at the beginning of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) the filter string; the error message should still be useful though
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) even without more accurate position info.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 5.2.1 Filter limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) If a filter is placed on a string pointer ``(char *)`` that does not point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) to a string on the ring buffer, but instead points to kernel or user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) memory, then, for safety reasons, at most 1024 bytes of the content is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) copied onto a temporary buffer to do the compare. If the copy of the memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) faults (the pointer points to memory that should not be accessed), then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) string compare will be treated as not matching.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 5.3 Clearing filters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) To clear the filter for an event, write a '0' to the event's filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) To clear the filters for all events in a subsystem, write a '0' to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) subsystem's filter file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 5.3 Subsystem filters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) For convenience, filters for every event in a subsystem can be set or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) cleared as a group by writing a filter expression into the filter file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) at the root of the subsystem. Note however, that if a filter for any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) event within the subsystem lacks a field specified in the subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) filter, or if the filter can't be applied for any other reason, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) filter for that event will retain its previous setting. This can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) result in an unintended mixture of filters which could lead to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) confusing (to the user who might think different filters are in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) effect) trace output. Only filters that reference just the common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) fields can be guaranteed to propagate successfully to all events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) Here are a few subsystem filter examples that also illustrate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) above points:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) Clear the filters on all events in the sched subsystem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) # cd /sys/kernel/debug/tracing/events/sched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) # echo 0 > filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) # cat sched_switch/filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) # cat sched_wakeup/filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) Set a filter using only common fields for all events in the sched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) subsystem (all events end up with the same filter)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) # cd /sys/kernel/debug/tracing/events/sched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) # echo common_pid == 0 > filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) # cat sched_switch/filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) common_pid == 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) # cat sched_wakeup/filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) common_pid == 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) Attempt to set a filter using a non-common field for all events in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) sched subsystem (all events but those that have a prev_pid field retain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) their old filters)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) # cd /sys/kernel/debug/tracing/events/sched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) # echo prev_pid == 0 > filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) # cat sched_switch/filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) prev_pid == 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) # cat sched_wakeup/filter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) common_pid == 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 5.4 PID filtering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) The set_event_pid file in the same directory as the top events directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) exists, will filter all events from tracing any task that does not have the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) PID listed in the set_event_pid file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) # cd /sys/kernel/debug/tracing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) # echo $$ > set_event_pid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) # echo 1 > events/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) Will only trace events for the current task.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) To add more PIDs without losing the PIDs already included, use '>>'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) # echo 123 244 1 >> set_event_pid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) 6. Event triggers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) Trace events can be made to conditionally invoke trigger 'commands'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) which can take various forms and are described in detail below;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) examples would be enabling or disabling other trace events or invoking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) a stack trace whenever the trace event is hit. Whenever a trace event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) with attached triggers is invoked, the set of trigger commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) associated with that event is invoked. Any given trigger can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) additionally have an event filter of the same form as described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) section 5 (Event filtering) associated with it - the command will only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) be invoked if the event being invoked passes the associated filter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) If no filter is associated with the trigger, it always passes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) Triggers are added to and removed from a particular event by writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) trigger expressions to the 'trigger' file for the given event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) A given event can have any number of triggers associated with it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) subject to any restrictions that individual commands may have in that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) regard.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) Event triggers are implemented on top of "soft" mode, which means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) whenever a trace event has one or more triggers associated with it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) the event is activated even if it isn't actually enabled, but is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) disabled in a "soft" mode. That is, the tracepoint will be called,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) but just will not be traced, unless of course it's actually enabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) This scheme allows triggers to be invoked even for events that aren't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) enabled, and also allows the current event filter implementation to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) used for conditionally invoking triggers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) The syntax for event triggers is roughly based on the syntax for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) section of Documentation/trace/ftrace.rst), but there are major
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) differences and the implementation isn't currently tied to it in any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) way, so beware about making generalizations between the two.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) .. Note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) Writing into trace_marker (See Documentation/trace/ftrace.rst)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) can also enable triggers that are written into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) /sys/kernel/tracing/events/ftrace/print/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 6.1 Expression syntax
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) Triggers are added by echoing the command to the 'trigger' file::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) # echo 'command[:count] [if filter]' > trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) Triggers are removed by echoing the same command but starting with '!'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) to the 'trigger' file::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) # echo '!command[:count] [if filter]' > trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) The [if filter] part isn't used in matching commands when removing, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) leaving that off in a '!' command will accomplish the same thing as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) having it in.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) The filter syntax is the same as that described in the 'Event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) filtering' section above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) For ease of use, writing to the trigger file using '>' currently just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) adds or removes a single trigger and there's no explicit '>>' support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) ('>' actually behaves like '>>') or truncation support to remove all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) triggers (you have to use '!' for each one added.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) 6.2 Supported trigger commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) The following commands are supported:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) - enable_event/disable_event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) These commands can enable or disable another trace event whenever
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) the triggering event is hit. When these commands are registered,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) the other trace event is activated, but disabled in a "soft" mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) That is, the tracepoint will be called, but just will not be traced.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) The event tracepoint stays in this mode as long as there's a trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) in effect that can trigger it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) For example, the following trigger causes kmalloc events to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) traced when a read system call is entered, and the :1 at the end
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) specifies that this enablement happens only once::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) # echo 'enable_event:kmem:kmalloc:1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) The following trigger causes kmalloc events to stop being traced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) when a read system call exits. This disablement happens on every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) read system call exit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) # echo 'disable_event:kmem:kmalloc' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) The format is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) enable_event:<system>:<event>[:count]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) disable_event:<system>:<event>[:count]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) To remove the above commands::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) # echo '!enable_event:kmem:kmalloc:1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) # echo '!disable_event:kmem:kmalloc' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) Note that there can be any number of enable/disable_event triggers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) per triggering event, but there can only be one trigger per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) triggered event. e.g. sys_enter_read can have triggers enabling both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) could be combined into a single filter on kmem:kmalloc though).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) - stacktrace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) This command dumps a stacktrace in the trace buffer whenever the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) triggering event occurs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) For example, the following trigger dumps a stacktrace every time the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) kmalloc tracepoint is hit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) # echo 'stacktrace' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) The following trigger dumps a stacktrace the first 5 times a kmalloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) request happens with a size >= 64K::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) # echo 'stacktrace:5 if bytes_req >= 65536' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) The format is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) stacktrace[:count]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) To remove the above commands::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) # echo '!stacktrace' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) # echo '!stacktrace:5 if bytes_req >= 65536' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) The latter can also be removed more simply by the following (without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) the filter)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) # echo '!stacktrace:5' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) Note that there can be only one stacktrace trigger per triggering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) - snapshot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) This command causes a snapshot to be triggered whenever the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) triggering event occurs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) The following command creates a snapshot every time a block request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) queue is unplugged with a depth > 1. If you were tracing a set of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) events or functions at the time, the snapshot trace buffer would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) capture those events when the trigger event occurred::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) # echo 'snapshot if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) To only snapshot once::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) # echo 'snapshot:1 if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) To remove the above commands::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) # echo '!snapshot if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) # echo '!snapshot:1 if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) Note that there can be only one snapshot trigger per triggering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) - traceon/traceoff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) These commands turn tracing on and off when the specified events are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) hit. The parameter determines how many times the tracing system is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) turned on and off. If unspecified, there is no limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) The following command turns tracing off the first time a block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) request queue is unplugged with a depth > 1. If you were tracing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) set of events or functions at the time, you could then examine the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) trace buffer to see the sequence of events that led up to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) trigger event::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) # echo 'traceoff:1 if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) To always disable tracing when nr_rq > 1::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) # echo 'traceoff if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) To remove the above commands::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) # echo '!traceoff:1 if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) # echo '!traceoff if nr_rq > 1' > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) /sys/kernel/debug/tracing/events/block/block_unplug/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) Note that there can be only one traceon or traceoff trigger per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) triggering event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) - hist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) This command aggregates event hits into a hash table keyed on one or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) more trace event format fields (or stacktrace) and a set of running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) totals derived from one or more trace event format fields and/or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) event counts (hitcount).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) See Documentation/trace/histogram.rst for details and examples.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) 7. In-kernel trace event API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) In most cases, the command-line interface to trace events is more than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) sufficient. Sometimes, however, applications might find the need for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) more complex relationships than can be expressed through a simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) series of linked command-line expressions, or putting together sets of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) commands may be simply too cumbersome. An example might be an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) application that needs to 'listen' to the trace stream in order to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) maintain an in-kernel state machine detecting, for instance, when an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) illegal kernel state occurs in the scheduler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) The trace event subsystem provides an in-kernel API allowing modules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) or other kernel code to generate user-defined 'synthetic' events at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) will, which can be used to either augment the existing trace stream
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) and/or signal that a particular important state has occurred.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) A similar in-kernel API is also available for creating kprobe and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) kretprobe events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) Both the synthetic event and k/ret/probe event APIs are built on top
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) of a lower-level "dynevent_cmd" event command API, which is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) available for more specialized applications, or as the basis of other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) higher-level trace event APIs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) The API provided for these purposes is describe below and allows the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) - dynamically creating synthetic event definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) - dynamically creating kprobe and kretprobe event definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) - tracing synthetic events from in-kernel code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) - the low-level "dynevent_cmd" API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) 7.1 Dyamically creating synthetic event definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) ---------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) There are a couple ways to create a new synthetic event from a kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) module or other kernel code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) The first creates the event in one step, using synth_event_create().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) In this method, the name of the event to create and an array defining
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) the fields is supplied to synth_event_create(). If successful, a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) synthetic event with that name and fields will exist following that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) call. For example, to create a new "schedtest" synthetic event::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) ret = synth_event_create("schedtest", sched_fields,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) ARRAY_SIZE(sched_fields), THIS_MODULE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) The sched_fields param in this example points to an array of struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) synth_field_desc, each of which describes an event field by type and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) name::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) static struct synth_field_desc sched_fields[] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) { .type = "pid_t", .name = "next_pid_field" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) { .type = "char[16]", .name = "next_comm_field" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) { .type = "u64", .name = "ts_ns" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) { .type = "u64", .name = "ts_ms" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) { .type = "unsigned int", .name = "cpu" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) { .type = "char[64]", .name = "my_string_field" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) { .type = "int", .name = "my_int_field" },
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) See synth_field_size() for available types.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) If field_name contains [n], the field is considered to be a static array.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) If field_names contains[] (no subscript), the field is considered to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) be a dynamic array, which will only take as much space in the event as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) is required to hold the array.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) Because space for an event is reserved before assigning field values
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) to the event, using dynamic arrays implies that the piecewise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) in-kernel API described below can't be used with dynamic arrays. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) other non-piecewise in-kernel APIs can, however, be used with dynamic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) arrays.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) If the event is created from within a module, a pointer to the module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) must be passed to synth_event_create(). This will ensure that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) trace buffer won't contain unreadable events when the module is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) removed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) At this point, the event object is ready to be used for generating new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) In the second method, the event is created in several steps. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) allows events to be created dynamically and without the need to create
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) and populate an array of fields beforehand.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) To use this method, an empty or partially empty synthetic event should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) first be created using synth_event_gen_cmd_start() or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) synth_event_gen_cmd_array_start(). For synth_event_gen_cmd_start(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) the name of the event along with one or more pairs of args each pair
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) representing a 'type field_name;' field specification should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) supplied. For synth_event_gen_cmd_array_start(), the name of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) event along with an array of struct synth_field_desc should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) supplied. Before calling synth_event_gen_cmd_start() or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) synth_event_gen_cmd_array_start(), the user should create and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) initialize a dynevent_cmd object using synth_event_cmd_init().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) For example, to create a new "schedtest" synthetic event with two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) fields::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) struct dynevent_cmd cmd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) char *buf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) /* Create a buffer to hold the generated command */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) /* Before generating the command, initialize the cmd object */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) ret = synth_event_gen_cmd_start(&cmd, "schedtest", THIS_MODULE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) "pid_t", "next_pid_field",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) "u64", "ts_ns");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) Alternatively, using an array of struct synth_field_desc fields
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) containing the same information::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) fields, n_fields);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) Once the synthetic event object has been created, it can then be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) populated with more fields. Fields are added one by one using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) synth_event_add_field(), supplying the dynevent_cmd object, a field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) type, and a field name. For example, to add a new int field named
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) "intfield", the following call should be made::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) ret = synth_event_add_field(&cmd, "int", "intfield");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) See synth_field_size() for available types. If field_name contains [n]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) the field is considered to be an array.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) A group of fields can also be added all at once using an array of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) synth_field_desc with add_synth_fields(). For example, this would add
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) just the first four sched_fields::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) ret = synth_event_add_fields(&cmd, sched_fields, 4);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) If you already have a string of the form 'type field_name',
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) synth_event_add_field_str() can be used to add it as-is; it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) also automatically append a ';' to the string.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) Once all the fields have been added, the event should be finalized and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) registered by calling the synth_event_gen_cmd_end() function::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) ret = synth_event_gen_cmd_end(&cmd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) At this point, the event object is ready to be used for tracing new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) 7.2 Tracing synthetic events from in-kernel code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) ------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) To trace a synthetic event, there are several options. The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) option is to trace the event in one call, using synth_event_trace()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) with a variable number of values, or synth_event_trace_array() with an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) array of values to be set. A second option can be used to avoid the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) need for a pre-formed array of values or list of arguments, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) synth_event_trace_start() and synth_event_trace_end() along with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) synth_event_add_next_val() or synth_event_add_val() to add the values
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) piecewise.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) 7.2.1 Tracing a synthetic event all at once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) -------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) To trace a synthetic event all at once, the synth_event_trace() or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) synth_event_trace_array() functions can be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) The synth_event_trace() function is passed the trace_event_file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) representing the synthetic event (which can be retrieved using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) trace_get_event_file() using the synthetic event name, "synthetic" as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) the system name, and the trace instance name (NULL if using the global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) trace array)), along with an variable number of u64 args, one for each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) synthetic event field, and the number of values being passed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) So, to trace an event corresponding to the synthetic event definition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) above, code like the following could be used::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) ret = synth_event_trace(create_synth_test, 7, /* number of values */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) 444, /* next_pid_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) (u64)"clackers", /* next_comm_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) 1000000, /* ts_ns */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) 1000, /* ts_ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) smp_processor_id(),/* cpu */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) (u64)"Thneed", /* my_string_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) 999); /* my_int_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) All vals should be cast to u64, and string vals are just pointers to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) strings, cast to u64. Strings will be copied into space reserved in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) the event for the string, using these pointers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) Alternatively, the synth_event_trace_array() function can be used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) accomplish the same thing. It is passed the trace_event_file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) representing the synthetic event (which can be retrieved using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) trace_get_event_file() using the synthetic event name, "synthetic" as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) the system name, and the trace instance name (NULL if using the global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) trace array)), along with an array of u64, one for each synthetic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) event field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) To trace an event corresponding to the synthetic event definition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) above, code like the following could be used::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) u64 vals[7];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) vals[0] = 777; /* next_pid_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) vals[1] = (u64)"tiddlywinks"; /* next_comm_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) vals[2] = 1000000; /* ts_ns */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) vals[3] = 1000; /* ts_ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) vals[4] = smp_processor_id(); /* cpu */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) vals[5] = (u64)"thneed"; /* my_string_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) vals[6] = 398; /* my_int_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) The 'vals' array is just an array of u64, the number of which must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) match the number of field in the synthetic event, and which must be in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) the same order as the synthetic event fields.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) All vals should be cast to u64, and string vals are just pointers to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) strings, cast to u64. Strings will be copied into space reserved in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) the event for the string, using these pointers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) In order to trace a synthetic event, a pointer to the trace event file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) is needed. The trace_get_event_file() function can be used to get
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) it - it will find the file in the given trace instance (in this case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) NULL since the top trace array is being used) while at the same time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) preventing the instance containing it from going away::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) schedtest_event_file = trace_get_event_file(NULL, "synthetic",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) "schedtest");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) Before tracing the event, it should be enabled in some way, otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779) the synthetic event won't actually show up in the trace buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) To enable a synthetic event from the kernel, trace_array_set_clr_event()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) can be used (which is not specific to synthetic events, so does need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) the "synthetic" system name to be specified explicitly).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) To enable the event, pass 'true' to it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) trace_array_set_clr_event(schedtest_event_file->tr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) "synthetic", "schedtest", true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) To disable it pass false::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) trace_array_set_clr_event(schedtest_event_file->tr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) "synthetic", "schedtest", false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795) Finally, synth_event_trace_array() can be used to actually trace the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) event, which should be visible in the trace buffer afterwards::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) ret = synth_event_trace_array(schedtest_event_file, vals,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799) ARRAY_SIZE(vals));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) To remove the synthetic event, the event should be disabled, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) trace instance should be 'put' back using trace_put_event_file()::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804) trace_array_set_clr_event(schedtest_event_file->tr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805) "synthetic", "schedtest", false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806) trace_put_event_file(schedtest_event_file);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808) If those have been successful, synth_event_delete() can be called to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809) remove the event::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811) ret = synth_event_delete("schedtest");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813) 7.2.2 Tracing a synthetic event piecewise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816) To trace a synthetic using the piecewise method described above, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817) synth_event_trace_start() function is used to 'open' the synthetic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818) event trace::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820) struct synth_trace_state trace_state;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822) ret = synth_event_trace_start(schedtest_event_file, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824) It's passed the trace_event_file representing the synthetic event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825) using the same methods as described above, along with a pointer to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826) struct synth_trace_state object, which will be zeroed before use and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827) used to maintain state between this and following calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829) Once the event has been opened, which means space for it has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) reserved in the trace buffer, the individual fields can be set. There
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831) are two ways to do that, either one after another for each field in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832) the event, which requires no lookups, or by name, which does. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833) tradeoff is flexibility in doing the assignments vs the cost of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834) lookup per field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836) To assign the values one after the other without lookups,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837) synth_event_add_next_val() should be used. Each call is passed the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838) same synth_trace_state object used in the synth_event_trace_start(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839) along with the value to set the next field in the event. After each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840) field is set, the 'cursor' points to the next field, which will be set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841) by the subsequent call, continuing until all the fields have been set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842) in order. The same sequence of calls as in the above examples using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) this method would be (without error-handling code)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845) /* next_pid_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846) ret = synth_event_add_next_val(777, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848) /* next_comm_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849) ret = synth_event_add_next_val((u64)"slinky", &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851) /* ts_ns */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852) ret = synth_event_add_next_val(1000000, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854) /* ts_ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) ret = synth_event_add_next_val(1000, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857) /* cpu */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858) ret = synth_event_add_next_val(smp_processor_id(), &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860) /* my_string_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861) ret = synth_event_add_next_val((u64)"thneed_2.01", &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863) /* my_int_field */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864) ret = synth_event_add_next_val(395, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866) To assign the values in any order, synth_event_add_val() should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) used. Each call is passed the same synth_trace_state object used in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868) the synth_event_trace_start(), along with the field name of the field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869) to set and the value to set it to. The same sequence of calls as in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870) the above examples using this method would be (without error-handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871) code)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873) ret = synth_event_add_val("next_pid_field", 777, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874) ret = synth_event_add_val("next_comm_field", (u64)"silly putty",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875) &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876) ret = synth_event_add_val("ts_ns", 1000000, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877) ret = synth_event_add_val("ts_ms", 1000, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878) ret = synth_event_add_val("cpu", smp_processor_id(), &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879) ret = synth_event_add_val("my_string_field", (u64)"thneed_9",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) ret = synth_event_add_val("my_int_field", 3999, &trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883) Note that synth_event_add_next_val() and synth_event_add_val() are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884) incompatible if used within the same trace of an event - either one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885) can be used but not both at the same time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887) Finally, the event won't be actually traced until it's 'closed',
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888) which is done using synth_event_trace_end(), which takes only the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889) struct synth_trace_state object used in the previous calls::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891) ret = synth_event_trace_end(&trace_state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893) Note that synth_event_trace_end() must be called at the end regardless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894) of whether any of the add calls failed (say due to a bad field name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) being passed in).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897) 7.3 Dyamically creating kprobe and kretprobe event definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898) --------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900) To create a kprobe or kretprobe trace event from kernel code, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901) kprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902) functions can be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904) To create a kprobe event, an empty or partially empty kprobe event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905) should first be created using kprobe_event_gen_cmd_start(). The name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906) of the event and the probe location should be specfied along with one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907) or args each representing a probe field should be supplied to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908) function. Before calling kprobe_event_gen_cmd_start(), the user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909) should create and initialize a dynevent_cmd object using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910) kprobe_event_cmd_init().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912) For example, to create a new "schedtest" kprobe event with two fields::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914) struct dynevent_cmd cmd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915) char *buf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917) /* Create a buffer to hold the generated command */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918) buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920) /* Before generating the command, initialize the cmd object */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921) kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924) * Define the gen_kprobe_test event with the first 2 kprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) * fields.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927) ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", "do_sys_open",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928) "dfd=%ax", "filename=%dx");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930) Once the kprobe event object has been created, it can then be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931) populated with more fields. Fields can be added using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932) kprobe_event_add_fields(), supplying the dynevent_cmd object along
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933) with a variable arg list of probe fields. For example, to add a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934) couple additional fields, the following call could be made::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936) ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938) Once all the fields have been added, the event should be finalized and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939) registered by calling the kprobe_event_gen_cmd_end() or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) kretprobe_event_gen_cmd_end() functions, depending on whether a kprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941) or kretprobe command was started::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943) ret = kprobe_event_gen_cmd_end(&cmd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945) or::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947) ret = kretprobe_event_gen_cmd_end(&cmd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949) At this point, the event object is ready to be used for tracing new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950) events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952) Similarly, a kretprobe event can be created using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953) kretprobe_event_gen_cmd_start() with a probe name and location and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954) additional params such as $retval::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957) "do_sys_open", "$retval");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959) Similar to the synthetic event case, code like the following can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960) used to enable the newly created kprobe event::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962) gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964) ret = trace_array_set_clr_event(gen_kprobe_test->tr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965) "kprobes", "gen_kprobe_test", true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967) Finally, also similar to synthetic events, the following code can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968) used to give the kprobe event file back and delete the event::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970) trace_put_event_file(gen_kprobe_test);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972) ret = kprobe_event_delete("gen_kprobe_test");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974) 7.4 The "dynevent_cmd" low-level API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975) ------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977) Both the in-kernel synthetic event and kprobe interfaces are built on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978) top of a lower-level "dynevent_cmd" interface. This interface is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979) meant to provide the basis for higher-level interfaces such as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980) synthetic and kprobe interfaces, which can be used as examples.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982) The basic idea is simple and amounts to providing a general-purpose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983) layer that can be used to generate trace event commands. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 984) generated command strings can then be passed to the command-parsing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 985) and event creation code that already exists in the trace event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 986) subystem for creating the corresponding trace events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 987)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 988) In a nutshell, the way it works is that the higher-level interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 989) code creates a struct dynevent_cmd object, then uses a couple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 990) functions, dynevent_arg_add() and dynevent_arg_pair_add() to build up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 991) a command string, which finally causes the command to be executed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 992) using the dynevent_create() function. The details of the interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 993) are described below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 994)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 995) The first step in building a new command string is to create and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 996) initialize an instance of a dynevent_cmd. Here, for instance, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 997) create a dynevent_cmd on the stack and initialize it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 998)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 999) struct dynevent_cmd cmd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) char *buf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_FOO,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) foo_event_run_command);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) The dynevent_cmd initialization needs to be given a user-specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) buffer and the length of the buffer (MAX_DYNEVENT_CMD_LEN can be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) for this purpose - at 2k it's generally too big to be comfortably put
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) on the stack, so is dynamically allocated), a dynevent type id, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) is meant to be used to check that further API calls are for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) correct command type, and a pointer to an event-specific run_command()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) callback that will be called to actually execute the event-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) command function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) Once that's done, the command string can by built up by successive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) calls to argument-adding functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) To add a single argument, define and initialize a struct dynevent_arg
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) or struct dynevent_arg_pair object. Here's an example of the simplest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) possible arg addition, which is simply to append the given string as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) a whitespace-separated argument to the command::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) struct dynevent_arg arg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) dynevent_arg_init(&arg, NULL, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) arg.str = name;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) ret = dynevent_arg_add(cmd, &arg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) The arg object is first initialized using dynevent_arg_init() and in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) this case the parameters are NULL or 0, which means there's no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) optional sanity-checking function or separator appended to the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) the arg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) Here's another more complicated example using an 'arg pair', which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) used to create an argument that consists of a couple components added
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) together as a unit, for example, a 'type field_name;' arg or a simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) expression arg e.g. 'flags=%cx'::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) struct dynevent_arg_pair arg_pair;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) dynevent_arg_pair_init(&arg_pair, dynevent_foo_check_arg_fn, 0, ';');
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) arg_pair.lhs = type;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) arg_pair.rhs = name;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) ret = dynevent_arg_pair_add(cmd, &arg_pair);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) Again, the arg_pair is first initialized, in this case with a callback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) function used to check the sanity of the args (for example, that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) neither part of the pair is NULL), along with a character to be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) to add an operator between the pair (here none) and a separator to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) appended onto the end of the arg pair (here ';').
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) There's also a dynevent_str_add() function that can be used to simply
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) add a string as-is, with no spaces, delimeters, or arg check.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) Any number of dynevent_*_add() calls can be made to build up the string
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) (until its length surpasses cmd->maxlen). When all the arguments have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) been added and the command string is complete, the only thing left to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) do is run the command, which happens by simply calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) dynevent_create()::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) ret = dynevent_create(&cmd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) At that point, if the return value is 0, the dynamic event has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) created and is ready to use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) See the dynevent_cmd function definitions themselves for the details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) of the API.