^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Kernel Probes (Kprobes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) :Author: Jim Keniston <jkenisto@us.ibm.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) :Author: Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) :Author: Masami Hiramatsu <mhiramat@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) .. CONTENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) 1. Concepts: Kprobes, and Return Probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) 2. Architectures Supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) 3. Configuring Kprobes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) 4. API Reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) 5. Kprobes Features and Limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) 6. Probe Overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) 7. TODO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) 8. Kprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) 9. Kretprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) 10. Deprecated Features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) Appendix A: The kprobes debugfs interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) Appendix B: The kprobes sysctl interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Appendix C: References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) Concepts: Kprobes and Return Probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) Kprobes enables you to dynamically break into any kernel routine and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) collect debugging and performance information non-disruptively. You
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) can trap at almost any kernel code address [1]_, specifying a handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) routine to be invoked when the breakpoint is hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) .. [1] some parts of the kernel code can not be trapped, see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) :ref:`kprobes_blacklist`)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) There are currently two types of probes: kprobes, and kretprobes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) (also called return probes). A kprobe can be inserted on virtually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) any instruction in the kernel. A return probe fires when a specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) function returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) In the typical case, Kprobes-based instrumentation is packaged as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) a kernel module. The module's init function installs ("registers")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) one or more probes, and the exit function unregisters them. A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) registration function such as register_kprobe() specifies where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) the probe is to be inserted and what handler is to be called when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) the probe is hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) There are also ``register_/unregister_*probes()`` functions for batch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) registration/unregistration of a group of ``*probes``. These functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) can speed up unregistration process when you have to unregister
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) a lot of probes at once.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) The next four subsections explain how the different types of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) probes work and how jump optimization works. They explain certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) things that you'll need to know in order to make the best use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) Kprobes -- e.g., the difference between a pre_handler and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) a post_handler, and how to use the maxactive and nmissed fields of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) a kretprobe. But if you're in a hurry to start using Kprobes, you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) can skip ahead to :ref:`kprobes_archs_supported`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) How Does a Kprobe Work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) When a kprobe is registered, Kprobes makes a copy of the probed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) instruction and replaces the first byte(s) of the probed instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) with a breakpoint instruction (e.g., int3 on i386 and x86_64).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) registers are saved, and control passes to Kprobes via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) notifier_call_chain mechanism. Kprobes executes the "pre_handler"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) associated with the kprobe, passing the handler the addresses of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) kprobe struct and the saved registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) Next, Kprobes single-steps its copy of the probed instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) (It would be simpler to single-step the actual instruction in place,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) but then Kprobes would have to temporarily remove the breakpoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) instruction. This would open a small time window when another CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) could sail right past the probepoint.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) After the instruction is single-stepped, Kprobes executes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) "post_handler," if any, that is associated with the kprobe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) Execution then continues with the instruction following the probepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Changing Execution Path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) Since kprobes can probe into a running kernel code, it can change the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) register set, including instruction pointer. This operation requires
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) maximum care, such as keeping the stack frame, recovering the execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) path etc. Since it operates on a running kernel and needs deep knowledge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) of computer architecture and concurrent computing, you can easily shoot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) your foot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) If you change the instruction pointer (and set up other related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) registers) in pre_handler, you must return !0 so that kprobes stops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) single stepping and just returns to the given address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) This also means post_handler should not be called anymore.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) Note that this operation may be harder on some architectures which use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) TOC (Table of Contents) for function call, since you have to setup a new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) TOC for your function in your module, and recover the old one after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) returning from it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Return Probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) How Does a Return Probe Work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) When you call register_kretprobe(), Kprobes establishes a kprobe at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) the entry to the function. When the probed function is called and this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) probe is hit, Kprobes saves a copy of the return address, and replaces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) the return address with the address of a "trampoline." The trampoline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) is an arbitrary piece of code -- typically just a nop instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) At boot time, Kprobes registers a kprobe at the trampoline.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) When the probed function executes its return instruction, control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) passes to the trampoline and that probe is hit. Kprobes' trampoline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) handler calls the user-specified return handler associated with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) kretprobe, then sets the saved instruction pointer to the saved return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) address, and that's where execution resumes upon return from the trap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) While the probed function is executing, its return address is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) stored in an object of type kretprobe_instance. Before calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) register_kretprobe(), the user sets the maxactive field of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) kretprobe struct to specify how many instances of the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) function can be probed simultaneously. register_kretprobe()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) pre-allocates the indicated number of kretprobe_instance objects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) For example, if the function is non-recursive and is called with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) spinlock held, maxactive = 1 should be enough. If the function is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) non-recursive and can never relinquish the CPU (e.g., via a semaphore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) or preemption), NR_CPUS should be enough. If maxactive <= 0, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) set to a default value. If CONFIG_PREEMPT is enabled, the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) It's not a disaster if you set maxactive too low; you'll just miss
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) some probes. In the kretprobe struct, the nmissed field is set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) zero when the return probe is registered, and is incremented every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) time the probed function is entered but there is no kretprobe_instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) object available for establishing the return probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) Kretprobe entry-handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) ^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) Kretprobes also provides an optional user-specified handler which runs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) on function entry. This handler is specified by setting the entry_handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) field of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) function entry is hit, the user-defined entry_handler, if any, is invoked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) If the entry_handler returns 0 (success) then a corresponding return handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) is guaranteed to be called upon function return. If the entry_handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) returns a non-zero error then Kprobes leaves the return address as is, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) the kretprobe has no further effect for that particular function instance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) Multiple entry and return handler invocations are matched using the unique
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) kretprobe_instance object associated with them. Additionally, a user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) may also specify per return-instance private data to be part of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) kretprobe_instance object. This is especially useful when sharing private
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) data between corresponding user entry and return handlers. The size of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) private data object can be specified at kretprobe registration time by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) setting the data_size field of the kretprobe struct. This data can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) accessed through the data field of each kretprobe_instance object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) In case probed function is entered but there is no kretprobe_instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) object available, then in addition to incrementing the nmissed count,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) the user entry_handler invocation is also skipped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) .. _kprobes_jump_optimization:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) How Does Jump Optimization Work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) the "debug.kprobes_optimization" kernel parameter is set to 1 (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) instruction instead of a breakpoint instruction at each probepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) Init a Kprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) ^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) When a probe is registered, before attempting this optimization,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) address. So, even if it's not possible to optimize this particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) probepoint, there'll be a probe there.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) Safety Check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) ^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) Before optimizing a probe, Kprobes performs the following safety checks:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) - Kprobes verifies that the region that will be replaced by the jump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) instruction (the "optimized region") lies entirely within one function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) (A jump instruction is multiple bytes, and so may overlay multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) instructions.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) - Kprobes analyzes the entire function and verifies that there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) jump into the optimized region. Specifically:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) - the function contains no indirect jump;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) - the function contains no instruction that causes an exception (since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) the fixup code triggered by the exception could jump back into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) optimized region -- Kprobes checks the exception tables to verify this);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) - there is no near jump to the optimized region (other than to the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) byte).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) - For each instruction in the optimized region, Kprobes verifies that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) the instruction can be executed out of line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Preparing Detour Buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) ^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) Next, Kprobes prepares a "detour" buffer, which contains the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) instruction sequence:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) - code to push the CPU's registers (emulating a breakpoint trap)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) - a call to the trampoline code which calls user's probe handlers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) - code to restore registers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) - the instructions from the optimized region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) - a jump back to the original execution path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) Pre-optimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) ^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) After preparing the detour buffer, Kprobes verifies that none of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) following situations exist:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) - The probe has a post_handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) - Other instructions in the optimized region are probed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) - The probe is disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) In any of the above cases, Kprobes won't start optimizing the probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Since these are temporary situations, Kprobes tries to start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) optimizing it again if the situation is changed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) If the kprobe can be optimized, Kprobes enqueues the kprobe to an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) optimizing list, and kicks the kprobe-optimizer workqueue to optimize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) it. If the to-be-optimized probepoint is hit before being optimized,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) Kprobes returns control to the original instruction path by setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) the CPU's instruction pointer to the copied code in the detour buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) -- thus at least avoiding the single-step.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) Optimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) ^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) The Kprobe-optimizer doesn't insert the jump instruction immediately;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) rather, it calls synchronize_rcu() for safety first, because it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) possible for a CPU to be interrupted in the middle of executing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) optimized region [3]_. As you know, synchronize_rcu() can ensure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) that all interruptions that were active when synchronize_rcu()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) was called are done, but only if CONFIG_PREEMPT=n. So, this version
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) of kprobe optimization supports only kernels with CONFIG_PREEMPT=n [4]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) After that, the Kprobe-optimizer calls stop_machine() to replace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) the optimized region with a jump instruction to the detour buffer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) using text_poke_smp().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) Unoptimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) ^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) When an optimized kprobe is unregistered, disabled, or blocked by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) another kprobe, it will be unoptimized. If this happens before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) the optimization is complete, the kprobe is just dequeued from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) optimized list. If the optimization has been done, the jump is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) replaced with the original code (except for an int3 breakpoint in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) the first byte) by using text_poke_smp().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) .. [3] Please imagine that the 2nd instruction is interrupted and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) the optimizer replaces the 2nd instruction with the jump *address*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) while the interrupt handler is running. When the interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) returns to original address, there is no valid instruction,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) and it causes an unexpected result.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) .. [4] This optimization-safety checking may be replaced with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) NOTE for geeks:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) The jump optimization changes the kprobe's pre_handler behavior.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) Without optimization, the pre_handler can change the kernel's execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) path by changing regs->ip and returning 1. However, when the probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) is optimized, that modification is ignored. Thus, if you want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) tweak the kernel's execution path, you need to suppress optimization,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) using one of the following techniques:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) - Specify an empty function for the kprobe's post_handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) - Execute 'sysctl -w debug.kprobes_optimization=n'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) .. _kprobes_blacklist:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) Blacklist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) Kprobes can probe most of the kernel except itself. This means
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) that there are some functions where kprobes cannot probe. Probing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) (trapping) such functions can cause a recursive trap (e.g. double
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) fault) or the nested probe handler may never be called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) Kprobes manages such functions as a blacklist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) If you want to add a function into the blacklist, you just need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) to (1) include linux/kprobes.h and (2) use NOKPROBE_SYMBOL() macro
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) to specify a blacklisted function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) Kprobes checks the given probe address against the blacklist and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) rejects registering it, if the given address is in the blacklist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) .. _kprobes_archs_supported:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) Architectures Supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) Kprobes and return probes are implemented on the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) architectures:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) - i386 (Supports jump optimization)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) - x86_64 (AMD-64, EM64T) (Supports jump optimization)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) - ppc64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) - ia64 (Does not support probes on instruction slot1.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) - sparc64 (Return probes not yet implemented.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) - arm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) - ppc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) - mips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) - s390
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) - parisc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) Configuring Kprobes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) When configuring the kernel using make menuconfig/xconfig/oldconfig,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) ensure that CONFIG_KPROBES is set to "y". Under "General setup", look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) for "Kprobes".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) So that you can load and unload Kprobes-based instrumentation modules,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) make sure "Loadable module support" (CONFIG_MODULES) and "Module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) Also make sure that CONFIG_KALLSYMS and perhaps even CONFIG_KALLSYMS_ALL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) are set to "y", since kallsyms_lookup_name() is used by the in-kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) kprobe address resolution code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) If you need to insert a probe in the middle of a function, you may find
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) so you can use "objdump -d -l vmlinux" to see the source-to-object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) code mapping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) API Reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) The Kprobes API includes a "register" function and an "unregister"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) function for each type of probe. The API also includes "register_*probes"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) and "unregister_*probes" functions for (un)registering arrays of probes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) Here are terse, mini-man-page specifications for these functions and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) the associated probe handlers that you'll write. See the files in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) samples/kprobes/ sub-directory for examples.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) register_kprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) int register_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) Sets a breakpoint at the address kp->addr. When the breakpoint is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) hit, Kprobes calls kp->pre_handler. After the probed instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) is single-stepped, Kprobe calls kp->post_handler. If a fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) occurs during execution of kp->pre_handler or kp->post_handler,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) or during single-stepping of the probed instruction, Kprobes calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) kp->fault_handler. Any or all handlers can be NULL. If kp->flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) so, its handlers aren't hit until calling enable_kprobe(kp).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 1. With the introduction of the "symbol_name" field to struct kprobe,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) the probepoint address resolution will now be taken care of by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) The following will now work::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) kp.symbol_name = "symbol_name";
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) (64-bit powerpc intricacies such as function descriptors are handled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) transparently)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) 2. Use the "offset" field of struct kprobe if the offset into the symbol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) to install a probepoint is known. This field is used to calculate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) probepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) 3. Specify either the kprobe "symbol_name" OR the "addr". If both are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) specified, kprobe registration will fail with -EINVAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) 4. With CISC architectures (such as i386 and x86_64), the kprobes code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) does not validate if the kprobe.addr is at an instruction boundary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) Use "offset" with caution.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) register_kprobe() returns 0 on success, or a negative errno otherwise.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) User's pre-handler (kp->pre_handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) #include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) int pre_handler(struct kprobe *p, struct pt_regs *regs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) Called with p pointing to the kprobe associated with the breakpoint,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) and regs pointing to the struct containing the registers saved when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) User's post-handler (kp->post_handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) #include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) void post_handler(struct kprobe *p, struct pt_regs *regs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) unsigned long flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) p and regs are as described for the pre_handler. flags always seems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) to be zero.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) User's fault-handler (kp->fault_handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) #include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) p and regs are as described for the pre_handler. trapnr is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) architecture-specific trap number associated with the fault (e.g.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) on i386, 13 for a general protection fault or 14 for a page fault).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) Returns 1 if it successfully handled the exception.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) register_kretprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) int register_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) Establishes a return probe for the function whose address is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) rp->kp.addr. When that function returns, Kprobes calls rp->handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) You must set rp->maxactive appropriately before you call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) register_kretprobe(); see "How Does a Return Probe Work?" for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) register_kretprobe() returns 0 on success, or a negative errno
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) otherwise.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) User's return-probe handler (rp->handler)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) #include <linux/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) int kretprobe_handler(struct kretprobe_instance *ri,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) struct pt_regs *regs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) regs is as described for kprobe.pre_handler. ri points to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) kretprobe_instance object, of which the following fields may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) of interest:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) - ret_addr: the return address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) - rp: points to the corresponding kretprobe object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) - task: points to the corresponding task struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) - data: points to per return-instance private data; see "Kretprobe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) entry-handler" for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) The regs_return_value(regs) macro provides a simple abstraction to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) extract the return value from the appropriate register as defined by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) the architecture's ABI.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) The handler's return value is currently ignored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) unregister_*probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) void unregister_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) void unregister_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) Removes the specified probe. The unregister function can be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) at any time after the probe has been registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) If the functions find an incorrect probe (ex. an unregistered probe),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) they clear the addr field of the probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) register_*probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) int register_kprobes(struct kprobe **kps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) int register_kretprobes(struct kretprobe **rps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) Registers each of the num probes in the specified array. If any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) error occurs during registration, all probes in the array, up to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) the bad probe, are safely unregistered before the register_*probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) function returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) - kps/rps: an array of pointers to ``*probe`` data structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) - num: the number of the array entries.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) You have to allocate(or define) an array of pointers and set all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) of the array entries before using these functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) unregister_*probes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) void unregister_kprobes(struct kprobe **kps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) void unregister_kretprobes(struct kretprobe **rps, int num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) Removes each of the num probes in the specified array at once.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) If the functions find some incorrect probes (ex. unregistered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) probes) in the specified array, they clear the addr field of those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) incorrect probes. However, other probes in the array are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) unregistered correctly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) disable_*probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) int disable_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) int disable_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) Temporarily disables the specified ``*probe``. You can enable it again by using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) enable_*probe(). You must specify the probe which has been registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) enable_*probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) #include <linux/kprobes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) int enable_kprobe(struct kprobe *kp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) int enable_kretprobe(struct kretprobe *rp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) Enables ``*probe`` which has been disabled by disable_*probe(). You must specify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) the probe which has been registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) Kprobes Features and Limitations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) Kprobes allows multiple probes at the same address. Also,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) a probepoint for which there is a post_handler cannot be optimized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) So if you install a kprobe with a post_handler, at an optimized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) probepoint, the probepoint will be unoptimized automatically.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) In general, you can install a probe anywhere in the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) In particular, you can probe interrupt handlers. Known exceptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) are discussed in this section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) The register_*probe functions will return -EINVAL if you attempt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) to install a probe in the code that implements Kprobes (mostly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) kernel/kprobes.c and ``arch/*/kernel/kprobes.c``, but also functions such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) as do_page_fault and notifier_call_chain).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) If you install a probe in an inline-able function, Kprobes makes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) no attempt to chase down all inline instances of the function and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) install probes there. gcc may inline a function without being asked,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) so keep this in mind if you're not seeing the probe hits you expect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) A probe handler can modify the environment of the probed function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) -- e.g., by modifying kernel data structures, or by modifying the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) contents of the pt_regs struct (which are restored to the registers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) upon return from the breakpoint). So Kprobes can be used, for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) to install a bug fix or to inject faults for testing. Kprobes, of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) course, has no way to distinguish the deliberately injected faults
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) from the accidental ones. Don't drink and probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) Kprobes makes no attempt to prevent probe handlers from stepping on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) each other -- e.g., probing printk() and then calling printk() from a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) probe handler. If a probe handler hits a probe, that second probe's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) handlers won't be run in that instance, and the kprobe.nmissed member
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) of the second probe will be incremented.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) the same handler) may run concurrently on different CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) Kprobes does not use mutexes or allocate memory except during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) registration and unregistration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) Probe handlers are run with preemption disabled or interrupt disabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) which depends on the architecture and optimization state. (e.g.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) kretprobe handlers and optimized kprobe handlers run without interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) disabled on x86/x86-64). In any case, your handler should not yield
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) the CPU (e.g., by attempting to acquire a semaphore, or waiting I/O).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) Since a return probe is implemented by replacing the return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) address with the trampoline's address, stack backtraces and calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) to __builtin_return_address() will typically yield the trampoline's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) address instead of the real return address for kretprobed functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) (As far as we can tell, __builtin_return_address() is used only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) for instrumentation and error reporting.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) If the number of times a function is called does not match the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) of times it returns, registering a return probe on that function may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) produce undesirable results. In such a case, a line:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) kretprobe BUG!: Processing kretprobe d000000000041aa8 @ c00000000004f48c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) gets printed. With this information, one will be able to correlate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) exact instance of the kretprobe that caused the problem. We have the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) do_exit() case covered. do_execve() and do_fork() are not an issue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) We're unaware of other specific cases where this could be a problem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) If, upon entry to or exit from a function, the CPU is running on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) a stack other than that of the current task, registering a return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) probe on that function may produce undesirable results. For this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) reason, Kprobes doesn't support return probes (or kprobes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) on the x86_64 version of __switch_to(); the registration functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) return -EINVAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) On x86/x86-64, since the Jump Optimization of Kprobes modifies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) instructions widely, there are some limitations to optimization. To
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) explain it, we introduce some terminology. Imagine a 3-instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) sequence consisting of a two 2-byte instructions and one 3-byte
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) IA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) [-2][-1][0][1][2][3][4][5][6][7]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) [ins1][ins2][ ins3 ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) [<- DCR ->]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) [<- JTPR ->]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) ins1: 1st Instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) ins2: 2nd Instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) ins3: 3rd Instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) IA: Insertion Address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) JTPR: Jump Target Prohibition Region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) DCR: Detoured Code Region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) The instructions in DCR are copied to the out-of-line buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) of the kprobe, because the bytes in DCR are replaced by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) a 5-byte jump instruction. So there are several limitations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) a) The instructions in DCR must be relocatable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) b) The instructions in DCR must not include a call instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) c) JTPR must not be targeted by any jump or call instruction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) d) DCR must not straddle the border between functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) Anyway, these limitations are checked by the in-kernel instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) decoder, so you don't need to worry about that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) Probe Overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) microseconds to process. Specifically, a benchmark that hits the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) probepoint repeatedly, firing a simple handler each time, reports 1-2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) million hits per second, depending on the architecture. A return-probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) hit typically takes 50-75% longer than a kprobe hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) When you have a return probe set on a function, adding a kprobe at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) the entry to that function adds essentially no overhead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) Here are sample overhead figures (in usec) for different architectures::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) k = kprobe; r = return probe; kr = kprobe + return probe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) on same function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) k = 0.57 usec; r = 0.92; kr = 0.99
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) k = 0.49 usec; r = 0.80; kr = 0.82
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) k = 0.77 usec; r = 1.26; kr = 1.45
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) Optimized Probe Overhead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) process. Here are sample overhead figures (in usec) for x86 architectures::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) TODO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698) programming interface for probe-based instrumentation. Try it out.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) b. Kernel return probes for sparc64.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) c. Support for other architectures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) d. User-space probes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) e. Watchpoint probes (which fire on data references).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) Kprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) See samples/kprobes/kprobe_example.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) Kretprobes Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) See samples/kprobes/kretprobe_example.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) Deprecated Features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) Jprobes is now a deprecated feature. People who are depending on it should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) migrate to other tracing features or use older kernels. Please consider to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) migrate your tool to one of the following options:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) - Use trace-event to trace target function with arguments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) trace-event is a low-overhead (and almost no visible overhead if it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) is off) statically defined event interface. You can define new events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) and trace it via ftrace or any other tracing tools.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) See the following urls:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) - https://lwn.net/Articles/379903/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) - https://lwn.net/Articles/381064/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) - https://lwn.net/Articles/383362/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) - Use ftrace dynamic events (kprobe event) with perf-probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735) If you build your kernel with debug info (CONFIG_DEBUG_INFO=y), you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) find which register/stack is assigned to which local variable or arguments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) by using perf-probe and set up new event to trace it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) See following documents:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) - Documentation/trace/kprobetrace.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) - Documentation/trace/events.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) - tools/perf/Documentation/perf-probe.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) The kprobes debugfs interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) With recent kernels (> 2.6.20) the list of registered kprobes is visible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) /sys/kernel/debug/kprobes/list: Lists all registered probes on the system::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) c015d71a k vfs_read+0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) c03dedc5 r tcp_v4_rcv+0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) The first column provides the kernel address where the probe is inserted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) The second column identifies the type of probe (k - kprobe and r - kretprobe)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) while the third column specifies the symbol+offset of the probe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) If the probed function belongs to a module, the module name is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) specified. Following columns show probe status. If the probe is on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) a virtual address that is no longer valid (module init sections, module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764) virtual addresses that correspond to modules that've been unloaded),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) such probes are marked with [GONE]. If the probe is temporarily disabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) such probes are marked with [DISABLED]. If the probe is optimized, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) marked with [OPTIMIZED]. If the probe is ftrace-based, it is marked with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) [FTRACE].
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) /sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) Provides a knob to globally and forcibly turn registered kprobes ON or OFF.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) By default, all kprobes are enabled. By echoing "0" to this file, all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) registered probes will be disarmed, till such time a "1" is echoed to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) file. Note that this knob just disarms and arms all kprobes and doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) change each probe's disabling state. This means that disabled kprobes (marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) [DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) The kprobes sysctl interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) /proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786) a knob to globally and forcibly turn jump optimization (see section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) :ref:`kprobes_jump_optimization`) ON or OFF. By default, jump optimization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) is allowed (ON). If you echo "0" to this file or set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789) "debug.kprobes_optimization" to 0 via sysctl, all optimized probes will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) unoptimized, and any new probes registered after that will not be optimized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) Note that this knob *changes* the optimized state. This means that optimized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) probes (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) removed). If the knob is turned on, they will be optimized again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799) For additional information on Kprobes, refer to the following URLs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) - https://www.ibm.com/developerworks/library/l-kprobes/index.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) - https://www.kernel.org/doc/ols/2006/ols2006v2-pages-109-124.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803)