Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Kernel Probes (Kprobes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) :Author: Jim Keniston <jkenisto@us.ibm.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) :Author: Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) :Author: Masami Hiramatsu <mhiramat@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) .. CONTENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11)   1. Concepts: Kprobes, and Return Probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)   2. Architectures Supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)   3. Configuring Kprobes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)   4. API Reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)   5. Kprobes Features and Limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)   6. Probe Overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17)   7. TODO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18)   8. Kprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19)   9. Kretprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20)   10. Deprecated Features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)   Appendix A: The kprobes debugfs interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)   Appendix B: The kprobes sysctl interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23)   Appendix C: References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) Concepts: Kprobes and Return Probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Kprobes enables you to dynamically break into any kernel routine and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) collect debugging and performance information non-disruptively. You
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) can trap at almost any kernel code address [1]_, specifying a handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) routine to be invoked when the breakpoint is hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) .. [1] some parts of the kernel code can not be trapped, see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)        :ref:`kprobes_blacklist`)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) There are currently two types of probes: kprobes, and kretprobes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) (also called return probes).  A kprobe can be inserted on virtually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) any instruction in the kernel.  A return probe fires when a specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) function returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) In the typical case, Kprobes-based instrumentation is packaged as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) a kernel module.  The module's init function installs ("registers")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) one or more probes, and the exit function unregisters them.  A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) registration function such as register_kprobe() specifies where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) the probe is to be inserted and what handler is to be called when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) the probe is hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) There are also ``register_/unregister_*probes()`` functions for batch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) registration/unregistration of a group of ``*probes``. These functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) can speed up unregistration process when you have to unregister
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) a lot of probes at once.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) The next four subsections explain how the different types of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) probes work and how jump optimization works.  They explain certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) things that you'll need to know in order to make the best use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) Kprobes -- e.g., the difference between a pre_handler and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) a post_handler, and how to use the maxactive and nmissed fields of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) a kretprobe.  But if you're in a hurry to start using Kprobes, you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) can skip ahead to :ref:`kprobes_archs_supported`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) How Does a Kprobe Work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) When a kprobe is registered, Kprobes makes a copy of the probed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) instruction and replaces the first byte(s) of the probed instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) with a breakpoint instruction (e.g., int3 on i386 and x86_64).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) registers are saved, and control passes to Kprobes via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) notifier_call_chain mechanism.  Kprobes executes the "pre_handler"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) associated with the kprobe, passing the handler the addresses of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) kprobe struct and the saved registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) Next, Kprobes single-steps its copy of the probed instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) (It would be simpler to single-step the actual instruction in place,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) but then Kprobes would have to temporarily remove the breakpoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) instruction.  This would open a small time window when another CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) could sail right past the probepoint.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) After the instruction is single-stepped, Kprobes executes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) "post_handler," if any, that is associated with the kprobe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) Execution then continues with the instruction following the probepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) Changing Execution Path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) Since kprobes can probe into a running kernel code, it can change the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) register set, including instruction pointer. This operation requires
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) maximum care, such as keeping the stack frame, recovering the execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) path etc. Since it operates on a running kernel and needs deep knowledge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) of computer architecture and concurrent computing, you can easily shoot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) your foot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) If you change the instruction pointer (and set up other related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) registers) in pre_handler, you must return !0 so that kprobes stops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) single stepping and just returns to the given address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) This also means post_handler should not be called anymore.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) Note that this operation may be harder on some architectures which use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) TOC (Table of Contents) for function call, since you have to setup a new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) TOC for your function in your module, and recover the old one after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) returning from it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Return Probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) How Does a Return Probe Work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) When you call register_kretprobe(), Kprobes establishes a kprobe at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) the entry to the function.  When the probed function is called and this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) probe is hit, Kprobes saves a copy of the return address, and replaces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) the return address with the address of a "trampoline."  The trampoline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) is an arbitrary piece of code -- typically just a nop instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) At boot time, Kprobes registers a kprobe at the trampoline.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) When the probed function executes its return instruction, control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) passes to the trampoline and that probe is hit.  Kprobes' trampoline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) handler calls the user-specified return handler associated with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) kretprobe, then sets the saved instruction pointer to the saved return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) address, and that's where execution resumes upon return from the trap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) While the probed function is executing, its return address is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) stored in an object of type kretprobe_instance.  Before calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) register_kretprobe(), the user sets the maxactive field of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) kretprobe struct to specify how many instances of the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) function can be probed simultaneously.  register_kretprobe()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) pre-allocates the indicated number of kretprobe_instance objects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) For example, if the function is non-recursive and is called with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) spinlock held, maxactive = 1 should be enough.  If the function is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) non-recursive and can never relinquish the CPU (e.g., via a semaphore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) or preemption), NR_CPUS should be enough.  If maxactive <= 0, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) set to a default value.  If CONFIG_PREEMPT is enabled, the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) is max(10, 2*NR_CPUS).  Otherwise, the default is NR_CPUS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) It's not a disaster if you set maxactive too low; you'll just miss
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) some probes.  In the kretprobe struct, the nmissed field is set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) zero when the return probe is registered, and is incremented every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) time the probed function is entered but there is no kretprobe_instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) object available for establishing the return probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) Kretprobe entry-handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) ^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) Kretprobes also provides an optional user-specified handler which runs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) on function entry. This handler is specified by setting the entry_handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) field of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) function entry is hit, the user-defined entry_handler, if any, is invoked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) If the entry_handler returns 0 (success) then a corresponding return handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) is guaranteed to be called upon function return. If the entry_handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) returns a non-zero error then Kprobes leaves the return address as is, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) the kretprobe has no further effect for that particular function instance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) Multiple entry and return handler invocations are matched using the unique
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) kretprobe_instance object associated with them. Additionally, a user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) may also specify per return-instance private data to be part of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) kretprobe_instance object. This is especially useful when sharing private
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) data between corresponding user entry and return handlers. The size of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) private data object can be specified at kretprobe registration time by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) setting the data_size field of the kretprobe struct. This data can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) accessed through the data field of each kretprobe_instance object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) In case probed function is entered but there is no kretprobe_instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) object available, then in addition to incrementing the nmissed count,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) the user entry_handler invocation is also skipped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) .. _kprobes_jump_optimization:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) How Does Jump Optimization Work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) the "debug.kprobes_optimization" kernel parameter is set to 1 (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) instruction instead of a breakpoint instruction at each probepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) Init a Kprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) ^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) When a probe is registered, before attempting this optimization,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) address. So, even if it's not possible to optimize this particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) probepoint, there'll be a probe there.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) Safety Check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) ^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) Before optimizing a probe, Kprobes performs the following safety checks:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) - Kprobes verifies that the region that will be replaced by the jump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)   instruction (the "optimized region") lies entirely within one function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)   (A jump instruction is multiple bytes, and so may overlay multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)   instructions.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) - Kprobes analyzes the entire function and verifies that there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)   jump into the optimized region.  Specifically:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)   - the function contains no indirect jump;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)   - the function contains no instruction that causes an exception (since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)     the fixup code triggered by the exception could jump back into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)     optimized region -- Kprobes checks the exception tables to verify this);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)   - there is no near jump to the optimized region (other than to the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)     byte).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) - For each instruction in the optimized region, Kprobes verifies that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)   the instruction can be executed out of line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Preparing Detour Buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) ^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) Next, Kprobes prepares a "detour" buffer, which contains the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) instruction sequence:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) - code to push the CPU's registers (emulating a breakpoint trap)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) - a call to the trampoline code which calls user's probe handlers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) - code to restore registers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) - the instructions from the optimized region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) - a jump back to the original execution path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) Pre-optimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) ^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) After preparing the detour buffer, Kprobes verifies that none of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) following situations exist:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) - The probe has a post_handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) - Other instructions in the optimized region are probed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) - The probe is disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) In any of the above cases, Kprobes won't start optimizing the probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Since these are temporary situations, Kprobes tries to start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) optimizing it again if the situation is changed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) If the kprobe can be optimized, Kprobes enqueues the kprobe to an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) optimizing list, and kicks the kprobe-optimizer workqueue to optimize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) it.  If the to-be-optimized probepoint is hit before being optimized,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) Kprobes returns control to the original instruction path by setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) the CPU's instruction pointer to the copied code in the detour buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) -- thus at least avoiding the single-step.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) Optimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) ^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) The Kprobe-optimizer doesn't insert the jump instruction immediately;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) rather, it calls synchronize_rcu() for safety first, because it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) possible for a CPU to be interrupted in the middle of executing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) optimized region [3]_.  As you know, synchronize_rcu() can ensure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) that all interruptions that were active when synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) was called are done, but only if CONFIG_PREEMPT=n.  So, this version
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) of kprobe optimization supports only kernels with CONFIG_PREEMPT=n [4]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) After that, the Kprobe-optimizer calls stop_machine() to replace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) the optimized region with a jump instruction to the detour buffer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) using text_poke_smp().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) Unoptimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) ^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) When an optimized kprobe is unregistered, disabled, or blocked by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) another kprobe, it will be unoptimized.  If this happens before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) the optimization is complete, the kprobe is just dequeued from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) optimized list.  If the optimization has been done, the jump is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) replaced with the original code (except for an int3 breakpoint in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) the first byte) by using text_poke_smp().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) .. [3] Please imagine that the 2nd instruction is interrupted and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)    the optimizer replaces the 2nd instruction with the jump *address*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)    while the interrupt handler is running. When the interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)    returns to original address, there is no valid instruction,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)    and it causes an unexpected result.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) .. [4] This optimization-safety checking may be replaced with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)    stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)    kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) NOTE for geeks:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) The jump optimization changes the kprobe's pre_handler behavior.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) Without optimization, the pre_handler can change the kernel's execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) path by changing regs->ip and returning 1.  However, when the probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) is optimized, that modification is ignored.  Thus, if you want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) tweak the kernel's execution path, you need to suppress optimization,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) using one of the following techniques:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) - Specify an empty function for the kprobe's post_handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) - Execute 'sysctl -w debug.kprobes_optimization=n'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) .. _kprobes_blacklist:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) Blacklist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) Kprobes can probe most of the kernel except itself. This means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) that there are some functions where kprobes cannot probe. Probing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) (trapping) such functions can cause a recursive trap (e.g. double
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) fault) or the nested probe handler may never be called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) Kprobes manages such functions as a blacklist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) If you want to add a function into the blacklist, you just need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) to (1) include linux/kprobes.h and (2) use NOKPROBE_SYMBOL() macro
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) to specify a blacklisted function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) Kprobes checks the given probe address against the blacklist and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) rejects registering it, if the given address is in the blacklist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) .. _kprobes_archs_supported:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) Architectures Supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) Kprobes and return probes are implemented on the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) architectures:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) - i386 (Supports jump optimization)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) - x86_64 (AMD-64, EM64T) (Supports jump optimization)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) - ppc64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) - ia64 (Does not support probes on instruction slot1.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) - sparc64 (Return probes not yet implemented.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) - arm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) - ppc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) - mips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) - s390
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) - parisc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) Configuring Kprobes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) When configuring the kernel using make menuconfig/xconfig/oldconfig,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) ensure that CONFIG_KPROBES is set to "y". Under "General setup", look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) for "Kprobes".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) So that you can load and unload Kprobes-based instrumentation modules,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) make sure "Loadable module support" (CONFIG_MODULES) and "Module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) Also make sure that CONFIG_KALLSYMS and perhaps even CONFIG_KALLSYMS_ALL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) are set to "y", since kallsyms_lookup_name() is used by the in-kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) kprobe address resolution code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) If you need to insert a probe in the middle of a function, you may find
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) so you can use "objdump -d -l vmlinux" to see the source-to-object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) code mapping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) API Reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) The Kprobes API includes a "register" function and an "unregister"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) function for each type of probe. The API also includes "register_*probes"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) and "unregister_*probes" functions for (un)registering arrays of probes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) Here are terse, mini-man-page specifications for these functions and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) the associated probe handlers that you'll write. See the files in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) samples/kprobes/ sub-directory for examples.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) register_kprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 	int register_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) Sets a breakpoint at the address kp->addr.  When the breakpoint is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) hit, Kprobes calls kp->pre_handler.  After the probed instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) is single-stepped, Kprobe calls kp->post_handler.  If a fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) occurs during execution of kp->pre_handler or kp->post_handler,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) or during single-stepping of the probed instruction, Kprobes calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) kp->fault_handler.  Any or all handlers can be NULL. If kp->flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) so, its handlers aren't hit until calling enable_kprobe(kp).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)    1. With the introduction of the "symbol_name" field to struct kprobe,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377)       the probepoint address resolution will now be taken care of by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)       The following will now work::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 	kp.symbol_name = "symbol_name";
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)       (64-bit powerpc intricacies such as function descriptors are handled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)       transparently)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385)    2. Use the "offset" field of struct kprobe if the offset into the symbol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)       to install a probepoint is known. This field is used to calculate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)       probepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)    3. Specify either the kprobe "symbol_name" OR the "addr". If both are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390)       specified, kprobe registration will fail with -EINVAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)    4. With CISC architectures (such as i386 and x86_64), the kprobes code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)       does not validate if the kprobe.addr is at an instruction boundary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)       Use "offset" with caution.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) register_kprobe() returns 0 on success, or a negative errno otherwise.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) User's pre-handler (kp->pre_handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) 	#include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) 	int pre_handler(struct kprobe *p, struct pt_regs *regs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) Called with p pointing to the kprobe associated with the breakpoint,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) and regs pointing to the struct containing the registers saved when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) the breakpoint was hit.  Return 0 here unless you're a Kprobes geek.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) User's post-handler (kp->post_handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) 	#include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 	void post_handler(struct kprobe *p, struct pt_regs *regs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) 			  unsigned long flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) p and regs are as described for the pre_handler.  flags always seems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) to be zero.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) User's fault-handler (kp->fault_handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) 	#include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) 	int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) p and regs are as described for the pre_handler.  trapnr is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) architecture-specific trap number associated with the fault (e.g.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) on i386, 13 for a general protection fault or 14 for a page fault).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) Returns 1 if it successfully handled the exception.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) register_kretprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) 	int register_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) Establishes a return probe for the function whose address is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) rp->kp.addr.  When that function returns, Kprobes calls rp->handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) You must set rp->maxactive appropriately before you call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) register_kretprobe(); see "How Does a Return Probe Work?" for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) register_kretprobe() returns 0 on success, or a negative errno
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) otherwise.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) User's return-probe handler (rp->handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) 	#include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) 	int kretprobe_handler(struct kretprobe_instance *ri,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) 			      struct pt_regs *regs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) regs is as described for kprobe.pre_handler.  ri points to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) kretprobe_instance object, of which the following fields may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) of interest:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) - ret_addr: the return address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) - rp: points to the corresponding kretprobe object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) - task: points to the corresponding task struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) - data: points to per return-instance private data; see "Kretprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) 	entry-handler" for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) The regs_return_value(regs) macro provides a simple abstraction to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) extract the return value from the appropriate register as defined by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) the architecture's ABI.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) The handler's return value is currently ignored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) unregister_*probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) 	void unregister_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) 	void unregister_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) Removes the specified probe.  The unregister function can be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) at any time after the probe has been registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482)    If the functions find an incorrect probe (ex. an unregistered probe),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483)    they clear the addr field of the probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) register_*probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) 	int register_kprobes(struct kprobe **kps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) 	int register_kretprobes(struct kretprobe **rps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) Registers each of the num probes in the specified array.  If any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) error occurs during registration, all probes in the array, up to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) the bad probe, are safely unregistered before the register_*probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) function returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) - kps/rps: an array of pointers to ``*probe`` data structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) - num: the number of the array entries.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504)    You have to allocate(or define) an array of pointers and set all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505)    of the array entries before using these functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) unregister_*probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) 	void unregister_kprobes(struct kprobe **kps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) 	void unregister_kretprobes(struct kretprobe **rps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) Removes each of the num probes in the specified array at once.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520)    If the functions find some incorrect probes (ex. unregistered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521)    probes) in the specified array, they clear the addr field of those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522)    incorrect probes. However, other probes in the array are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)    unregistered correctly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) disable_*probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) 	int disable_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) 	int disable_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) Temporarily disables the specified ``*probe``. You can enable it again by using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) enable_*probe(). You must specify the probe which has been registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) enable_*probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) 	#include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) 	int enable_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) 	int enable_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) Enables ``*probe`` which has been disabled by disable_*probe(). You must specify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) the probe which has been registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) Kprobes Features and Limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) Kprobes allows multiple probes at the same address. Also,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) a probepoint for which there is a post_handler cannot be optimized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) So if you install a kprobe with a post_handler, at an optimized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) probepoint, the probepoint will be unoptimized automatically.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) In general, you can install a probe anywhere in the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) In particular, you can probe interrupt handlers.  Known exceptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) are discussed in this section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) The register_*probe functions will return -EINVAL if you attempt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) to install a probe in the code that implements Kprobes (mostly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) kernel/kprobes.c and ``arch/*/kernel/kprobes.c``, but also functions such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) as do_page_fault and notifier_call_chain).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) If you install a probe in an inline-able function, Kprobes makes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) no attempt to chase down all inline instances of the function and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) install probes there.  gcc may inline a function without being asked,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) so keep this in mind if you're not seeing the probe hits you expect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) A probe handler can modify the environment of the probed function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) -- e.g., by modifying kernel data structures, or by modifying the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) contents of the pt_regs struct (which are restored to the registers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) upon return from the breakpoint).  So Kprobes can be used, for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) to install a bug fix or to inject faults for testing.  Kprobes, of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) course, has no way to distinguish the deliberately injected faults
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) from the accidental ones.  Don't drink and probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) Kprobes makes no attempt to prevent probe handlers from stepping on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) each other -- e.g., probing printk() and then calling printk() from a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) probe handler.  If a probe handler hits a probe, that second probe's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) handlers won't be run in that instance, and the kprobe.nmissed member
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) of the second probe will be incremented.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) the same handler) may run concurrently on different CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) Kprobes does not use mutexes or allocate memory except during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) registration and unregistration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) Probe handlers are run with preemption disabled or interrupt disabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) which depends on the architecture and optimization state.  (e.g.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) kretprobe handlers and optimized kprobe handlers run without interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) disabled on x86/x86-64).  In any case, your handler should not yield
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) the CPU (e.g., by attempting to acquire a semaphore, or waiting I/O).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) Since a return probe is implemented by replacing the return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) address with the trampoline's address, stack backtraces and calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) to __builtin_return_address() will typically yield the trampoline's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) address instead of the real return address for kretprobed functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) (As far as we can tell, __builtin_return_address() is used only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) for instrumentation and error reporting.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) If the number of times a function is called does not match the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) of times it returns, registering a return probe on that function may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) produce undesirable results. In such a case, a line:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) kretprobe BUG!: Processing kretprobe d000000000041aa8 @ c00000000004f48c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) gets printed. With this information, one will be able to correlate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) exact instance of the kretprobe that caused the problem. We have the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) do_exit() case covered. do_execve() and do_fork() are not an issue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) We're unaware of other specific cases where this could be a problem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) If, upon entry to or exit from a function, the CPU is running on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) a stack other than that of the current task, registering a return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) probe on that function may produce undesirable results.  For this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) reason, Kprobes doesn't support return probes (or kprobes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) on the x86_64 version of __switch_to(); the registration functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) return -EINVAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) On x86/x86-64, since the Jump Optimization of Kprobes modifies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) instructions widely, there are some limitations to optimization. To
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) explain it, we introduce some terminology. Imagine a 3-instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) sequence consisting of a two 2-byte instructions and one 3-byte
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) 		IA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) 		|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) 	[-2][-1][0][1][2][3][4][5][6][7]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) 		[ins1][ins2][  ins3 ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) 		[<-     DCR       ->]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) 		[<- JTPR ->]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) 	ins1: 1st Instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) 	ins2: 2nd Instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) 	ins3: 3rd Instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) 	IA:  Insertion Address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) 	JTPR: Jump Target Prohibition Region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) 	DCR: Detoured Code Region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) The instructions in DCR are copied to the out-of-line buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) of the kprobe, because the bytes in DCR are replaced by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) a 5-byte jump instruction. So there are several limitations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) a) The instructions in DCR must be relocatable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) b) The instructions in DCR must not include a call instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) c) JTPR must not be targeted by any jump or call instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) d) DCR must not straddle the border between functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) Anyway, these limitations are checked by the in-kernel instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) decoder, so you don't need to worry about that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) Probe Overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) microseconds to process.  Specifically, a benchmark that hits the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) probepoint repeatedly, firing a simple handler each time, reports 1-2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) million hits per second, depending on the architecture.  A return-probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) hit typically takes 50-75% longer than a kprobe hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) When you have a return probe set on a function, adding a kprobe at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) the entry to that function adds essentially no overhead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) Here are sample overhead figures (in usec) for different architectures::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667)   k = kprobe; r = return probe; kr = kprobe + return probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668)   on same function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670)   i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671)   k = 0.57 usec; r = 0.92; kr = 0.99
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673)   x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674)   k = 0.49 usec; r = 0.80; kr = 0.82
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676)   ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677)   k = 0.77 usec; r = 1.26; kr = 1.45
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) Optimized Probe Overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) process. Here are sample overhead figures (in usec) for x86 architectures::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685)   k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686)   r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688)   i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689)   k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691)   x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692)   k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) TODO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)    programming interface for probe-based instrumentation.  Try it out.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) b. Kernel return probes for sparc64.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) c. Support for other architectures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) d. User-space probes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) e. Watchpoint probes (which fire on data references).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) Kprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) See samples/kprobes/kprobe_example.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) Kretprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) See samples/kprobes/kretprobe_example.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) Deprecated Features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) Jprobes is now a deprecated feature. People who are depending on it should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) migrate to other tracing features or use older kernels. Please consider to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) migrate your tool to one of the following options:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) - Use trace-event to trace target function with arguments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723)   trace-event is a low-overhead (and almost no visible overhead if it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724)   is off) statically defined event interface. You can define new events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725)   and trace it via ftrace or any other tracing tools.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727)   See the following urls:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729)     - https://lwn.net/Articles/379903/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730)     - https://lwn.net/Articles/381064/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731)     - https://lwn.net/Articles/383362/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) - Use ftrace dynamic events (kprobe event) with perf-probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)   If you build your kernel with debug info (CONFIG_DEBUG_INFO=y), you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736)   find which register/stack is assigned to which local variable or arguments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737)   by using perf-probe and set up new event to trace it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739)   See following documents:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741)   - Documentation/trace/kprobetrace.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742)   - Documentation/trace/events.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743)   - tools/perf/Documentation/perf-probe.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) The kprobes debugfs interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) With recent kernels (> 2.6.20) the list of registered kprobes is visible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) /sys/kernel/debug/kprobes/list: Lists all registered probes on the system::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) 	c015d71a  k  vfs_read+0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) 	c03dedc5  r  tcp_v4_rcv+0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) The first column provides the kernel address where the probe is inserted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) The second column identifies the type of probe (k - kprobe and r - kretprobe)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) while the third column specifies the symbol+offset of the probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) If the probed function belongs to a module, the module name is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) specified. Following columns show probe status. If the probe is on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) a virtual address that is no longer valid (module init sections, module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764) virtual addresses that correspond to modules that've been unloaded),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) such probes are marked with [GONE]. If the probe is temporarily disabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) such probes are marked with [DISABLED]. If the probe is optimized, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) marked with [OPTIMIZED]. If the probe is ftrace-based, it is marked with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) [FTRACE].
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) /sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) Provides a knob to globally and forcibly turn registered kprobes ON or OFF.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) By default, all kprobes are enabled. By echoing "0" to this file, all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) registered probes will be disarmed, till such time a "1" is echoed to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) file. Note that this knob just disarms and arms all kprobes and doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) change each probe's disabling state. This means that disabled kprobes (marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) [DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) The kprobes sysctl interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) /proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786) a knob to globally and forcibly turn jump optimization (see section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) :ref:`kprobes_jump_optimization`) ON or OFF. By default, jump optimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) is allowed (ON). If you echo "0" to this file or set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789) "debug.kprobes_optimization" to 0 via sysctl, all optimized probes will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) unoptimized, and any new probes registered after that will not be optimized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) Note that this knob *changes* the optimized state. This means that optimized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) probes (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) removed). If the knob is turned on, they will be optimized again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799) For additional information on Kprobes, refer to the following URLs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) - https://www.ibm.com/developerworks/library/l-kprobes/index.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) - https://www.kernel.org/doc/ols/2006/ols2006v2-pages-109-124.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803)