^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) Bug hunting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Kernel bug reports often come with a stack dump like the one below::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ------------[ cut here ]------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) 00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) Call Trace:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) [<c12ba080>] ? dump_stack+0x44/0x64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) [<c103ed6a>] ? __warn+0xfa/0x120
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) [<c109e8a7>] ? module_put+0x57/0x70
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) [<c109e8a7>] ? module_put+0x57/0x70
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) [<c103ee33>] ? warn_slowpath_null+0x23/0x30
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) [<c109e8a7>] ? module_put+0x57/0x70
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) [<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) [<c109f617>] ? symbol_put_addr+0x27/0x50
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) [<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) [<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) [<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) [<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) [<c13d2882>] ? usb_unbind_interface+0x62/0x250
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) [<c136b514>] ? __pm_runtime_idle+0x44/0x70
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) [<c13620d8>] ? __device_release_driver+0x78/0x120
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) [<c1362907>] ? driver_detach+0x87/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) [<c1361c48>] ? bus_remove_driver+0x38/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) [<c13d1c18>] ? usb_deregister+0x58/0xb0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) [<c109fbb0>] ? SyS_delete_module+0x130/0x1f0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) [<c1055654>] ? task_work_run+0x64/0x80
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) [<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) [<c10013f0>] ? do_fast_syscall_32+0x80/0x130
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) [<c1549f43>] ? sysenter_past_esp+0x40/0x6a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) ---[ end trace 6ebc60ef3981792f ]---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Such stack traces provide enough information to identify the line inside the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Kernel's source code where the bug happened. Depending on the severity of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) the issue, it may also contain the word **Oops**, as on this one::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) BUG: unable to handle kernel NULL pointer dereference at (null)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) IP: [<c06969d4>] iret_exc+0x7d0/0xa59
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) *pdpt = 000000002258a001 *pde = 0000000000000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) Oops: 0002 [#1] PREEMPT SMP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) Despite being an **Oops** or some other sort of stack trace, the offended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) line is usually required to identify and handle the bug. Along this chapter,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) we'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) If the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) quality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) Modules linked in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) Modules that are tainted or are being loaded or unloaded are marked with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) "(...)", where the taint flags are described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) file:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) annotated with "+", and "being unloaded" is annotated with "-".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) Where is the Oops message is located?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) -------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) Normally the Oops text is read from the kernel buffers by klogd and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) handed to ``syslogd`` which writes it to a syslog file, typically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) ``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) systemd, it may also be stored by the ``journald`` daemon, and accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) by running ``journalctl`` command.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) read the data from the kernel buffers and save it. Or you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) ``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) since ``kmsg`` is a "never ending file".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) If the machine has crashed so badly that you cannot enter commands or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) the disk is not available then you have three options:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) (1) Hand copy the text from the screen and type it in after the machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) has restarted. Messy but it is the only option if you have not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) planned for a crash. Alternatively, you can take a picture of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) the screen with a digital camera - not nice, but better than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) nothing. If the messages scroll off the top of the console, you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) may find that booting with a higher resolution (e.g., ``vga=791``)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) will allow you to read more of the text. (Caveat: This needs ``vesafb``,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) so won't help for 'early' oopses.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) (2) Boot with a serial console (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) run a null modem to a second machine and capture the output there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) using your favourite communication program. Minicom works well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) (3) Use Kdump (see Documentation/admin-guide/kdump/kdump.rst),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) extract the kernel ring buffer from old memory with using dmesg
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) gdbmacro in Documentation/admin-guide/kdump/gdbmacros.txt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) Finding the bug's location
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Reporting a bug works best if you point the location of the bug at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Kernel source file. There are two methods for doing that. Usually, using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ``gdb`` is easier, but the Kernel should be pre-compiled with debug info.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) gdb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) ^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) The GNU debugger (``gdb``) is the best way to figure out the exact file and line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) number of the OOPS from the ``vmlinux`` file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) This can be set by running::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) EIP value from the OOPS::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) EIP: 0060:[<c021e50e>] Not tainted VLI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) And use GDB to translate that to human-readable form::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) $ gdb vmlinux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) (gdb) l *0xc021e50e
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) offset from the OOPS::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) EIP is at vt_ioctl+0xda8/0x1482
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) $ make vmlinux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) $ gdb vmlinux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) (gdb) l *vt_ioctl+0xda8
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 288 {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 289 struct vc_data *vc = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 290 int ret = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 291
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 292 console_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 293 if (VT_BUSY(vc_num))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 294 ret = -EBUSY;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 295 else if (vc_num)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 296 vc = vc_deallocate(vc_num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 297 console_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) or, if you want to be more verbose::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) (gdb) p vt_ioctl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) $1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) (gdb) l *0xae0+0xda8
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) You could, instead, use the object file::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) $ make drivers/tty/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) $ gdb drivers/tty/vt/vt_ioctl.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) (gdb) l *vt_ioctl+0xda8
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) If you have a call trace, such as::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) Call Trace:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) this shows the problem likely is in the :jbd: module. You can load that module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) in gdb and list the relevant code::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) $ gdb fs/jbd/jbd.ko
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) (gdb) l *log_wait_commit+0xa3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) You can also do the same for any function call at the stack trace,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) like this one::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) [<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) The position where the above call happened can be seen with::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) $ gdb drivers/media/usb/dvb-usb/dvb-usb.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) (gdb) l *dvb_usb_adapter_frontend_exit+0x3a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) objdump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) ^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) To debug a kernel, use objdump and look for the hex offset from the crash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) output to find the valid line of code/assembler. Without debug symbols, you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) will see the assembler code for the routine shown, but if your kernel has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) debug symbols the C code will also be available. (Debug symbols can be enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) in the kernel hacking menu of the menu configuration.) For example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) $ objdump -r -S -l --disassemble net/dccp/ipv4.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) You need to be at the top level of the kernel tree for this to pick up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) your C files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) If you don't have access to the source code you can still debug some crash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) dumps using the following method (example crash dump output as shown by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) Dave Miller)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) EIP is at +0x14/0x4c0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) Put the bytes into a "foo.s" file like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) .text
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) .globl foo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) foo:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) .byte .... /* bytes from Code: part of OOPS dump */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) Compile it with "gcc -c -o foo.o foo.s" then look at the output of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) "objdump --disassemble foo.o".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) Output:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) ip_queue_xmit:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) push %ebp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) push %edi
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) push %esi
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) push %ebx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) sub $0xbc, %esp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) mov 0x8(%ebp), %ebx ! %ebx = skb->sk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) file:`scripts/decodecode` can be used to automate most of this, depending
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) on what CPU architecture is being debugged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) Reporting the bug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) Once you find where the bug happened, by inspecting its location,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) you could either try to fix it yourself or report it upstream.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) In order to report it upstream, you should identify the mailing list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) used for the development of the affected code. This can be done by using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) the ``get_maintainer.pl`` script.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) For example, if you find a bug at the gspca's sonixj.c file, you can get
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) its maintainers with::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) $ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) linux-kernel@vger.kernel.org (open list)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) Please notice that it will point to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) - The last developers that touched the source code (if this is done inside
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) a git tree). On the above example, Tejun and Bhaktipriya (in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) specific case, none really envolved on the development of this file);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) - The driver maintainer (Hans Verkuil);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) - The subsystem maintainer (Mauro Carvalho Chehab);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) - The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) - the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) Usually, the fastest way to have your bug fixed is to report it to mailing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) list used for the development of the code (linux-media ML) copying the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) driver maintainer (Hans).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) If you are totally stumped as to whom to send the report, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) ``get_maintainer.pl`` didn't provide you anything useful, send it to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) linux-kernel@vger.kernel.org.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) Thanks for your help in making Linux as stable as humanly possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) Fixing the bug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) If you know programming, you could help us by not only reporting the bug,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) but also providing us with a solution. After all, open source is about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) sharing what you do and don't you want to be recognised for your genius?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) If you decide to take this way, once you have worked out a fix please submit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) it upstream.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) Please do read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) :ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) to help your code get accepted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ---------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) Notes on Oops tracing with ``klogd``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) ------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) In order to help Linus and the other kernel developers there has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) substantial support incorporated into ``klogd`` for processing protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) faults. In order to have full support for address resolution at least
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) version 1.3-pl3 of the ``sysklogd`` package should be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) When a protection fault occurs the ``klogd`` daemon automatically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) translates important addresses in the kernel log messages to their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) symbolic equivalents. This translated kernel message is then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) forwarded through whatever reporting mechanism ``klogd`` is using. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) protection fault message can be simply cut out of the message files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) and forwarded to the kernel developers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) Two types of address resolution are performed by ``klogd``. The first is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) static translation and the second is dynamic translation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) Static translation uses the System.map file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) In order to do static translation the ``klogd`` daemon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) must be able to find a system map file at daemon initialization time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) See the klogd man page for information on how ``klogd`` searches for map
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) Dynamic address translation is important when kernel loadable modules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) are being used. Since memory for kernel modules is allocated from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) kernel's dynamic memory pools there are no fixed locations for either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) the start of the module or for functions and symbols in the module.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) The kernel supports system calls which allow a program to determine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) which modules are loaded and their location in memory. Using these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) system calls the klogd daemon builds a symbol table which can be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) to debug a protection fault which occurs in a loadable kernel module.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) At the very minimum klogd will provide the name of the module which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) generated the protection fault. There may be additional symbolic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) information available if the developer of the loadable module chose to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) export symbol information from the module.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) Since the kernel module environment can be dynamic there must be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) mechanism for notifying the ``klogd`` daemon when a change in module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) environment occurs. There are command line options available which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) allow klogd to signal the currently executing daemon that symbol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) information should be refreshed. See the ``klogd`` manual page for more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) A patch is included with the sysklogd distribution which modifies the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) ``modules-2.0.0`` package to automatically signal klogd whenever a module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) is loaded or unloaded. Applying this patch provides essentially
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) seamless support for debugging protection faults which occur with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) kernel loadable modules.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) The following is an example of a protection fault in a loadable module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) processed by ``klogd``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) Aug 29 09:51:01 blizard kernel: *pde = 00000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) Aug 29 09:51:01 blizard kernel: Oops: 0002
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) Aug 29 09:51:01 blizard kernel: CPU: 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) ---------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) Dr. G.W. Wettstein Oncology Research Div. Computing Facility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 820 4th St. N.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) Fargo, ND 58122
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) Phone: 701-234-7556