^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) The Kernel Concurrency Sanitizer (KCSAN)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) ========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) relies on compile-time instrumentation, and uses a watchpoint-based sampling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) approach to detect races. KCSAN's primary purpose is to detect `data races`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) KCSAN is supported by both GCC and Clang. With GCC we require version 11 or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) later, and with Clang also require version 11 or later.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) To enable KCSAN configure the kernel with::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) CONFIG_KCSAN = y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) KCSAN provides several other configuration options to customize behaviour (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) the respective help text in ``lib/Kconfig.kcsan`` for more info).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) Error reports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) ~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) A typical data race report looks like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) kernfs_refresh_inode+0x70/0x170
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) kernfs_iop_permission+0x4f/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) inode_permission+0x190/0x200
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) link_path_walk.part.0+0x503/0x8e0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) path_lookupat.isra.0+0x69/0x4d0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) filename_lookup+0x136/0x280
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) user_path_at_empty+0x47/0x60
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) vfs_statx+0x9b/0x130
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) __do_sys_newlstat+0x50/0xb0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) __x64_sys_newlstat+0x37/0x50
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) do_syscall_64+0x85/0x260
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) entry_SYSCALL_64_after_hwframe+0x44/0xa9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) generic_permission+0x5b/0x2a0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) kernfs_iop_permission+0x66/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) inode_permission+0x190/0x200
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) link_path_walk.part.0+0x503/0x8e0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) path_lookupat.isra.0+0x69/0x4d0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) filename_lookup+0x136/0x280
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) user_path_at_empty+0x47/0x60
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) do_faccessat+0x11a/0x390
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) __x64_sys_access+0x3c/0x50
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) do_syscall_64+0x85/0x260
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) entry_SYSCALL_64_after_hwframe+0x44/0xa9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) Reported by Kernel Concurrency Sanitizer on:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) The header of the report provides a short summary of the functions involved in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) the race. It is followed by the access types and stack traces of the 2 threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) involved in the data race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) The other less common type of data race report looks like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) e1000_clean_rx_irq+0x551/0xb10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) e1000_clean+0x533/0xda0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) net_rx_action+0x329/0x900
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) __do_softirq+0xdb/0x2db
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) irq_exit+0x9b/0xa0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) do_IRQ+0x9c/0xf0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) ret_from_intr+0x0/0x18
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) default_idle+0x3f/0x220
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) arch_cpu_idle+0x21/0x30
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) do_idle+0x1df/0x230
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) cpu_startup_entry+0x14/0x20
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) rest_init+0xc5/0xcb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) arch_call_rest_init+0x13/0x2b
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) start_kernel+0x6db/0x700
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) Reported by Kernel Concurrency Sanitizer on:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) This report is generated where it was not possible to determine the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) racing thread, but a race was inferred due to the data value of the watched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) memory location having changed. These can occur either due to missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) instrumentation or e.g. DMA accesses. These reports will only be generated if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) ``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` (selected by default).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Selective analysis
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) ~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) It may be desirable to disable data race detection for specific accesses,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) functions, compilation units, or entire subsystems. For static blacklisting,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) the below options are available:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) any data races due to accesses in ``expr`` should be ignored and resulting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) behaviour when encountering a data race is deemed safe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) * Disabling data race detection for entire functions can be accomplished by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) using the function attribute ``__no_kcsan``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) __no_kcsan
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) void foo(void) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) To dynamically limit for which functions to generate reports, see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) `DebugFS interface`_ blacklist/whitelist feature.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) * To disable data race detection for a particular compilation unit, add to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) ``Makefile``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) KCSAN_SANITIZE_file.o := n
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) * To disable data race detection for all compilation units listed in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) ``Makefile``, add to the respective ``Makefile``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) KCSAN_SANITIZE := n
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) Furthermore, it is possible to tell KCSAN to show or hide entire classes of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) data races, depending on preferences. These can be changed via the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) Kconfig options:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) * ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) is observed via a watchpoint, but the data value of the memory location was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) observed to remain unchanged, do not report the data race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) * ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) up to word size are atomic by default. Assumes that such writes are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) subject to unsafe compiler optimizations resulting in data races. The option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) causes KCSAN to not report data races due to conflicts where the only plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) accesses are aligned writes up to word size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) DebugFS interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) ~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) The file ``/sys/kernel/debug/kcsan`` provides the following interface:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) * Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) * Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) on or off, respectively.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) * Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) ``some_func_name`` to the report filter list, which (by default) blacklists
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) reporting data races where either one of the top stackframes are a function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) in the list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) * Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) changes the report filtering behaviour. For example, the blacklist feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) can be used to silence frequently occurring data races; the whitelist feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) can help with reproduction and testing of fixes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Tuning performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) ~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) Core parameters that affect KCSAN's overall performance and bug detection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) ability are exposed as kernel command-line arguments whose defaults can also be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) changed via the corresponding Kconfig options.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) * ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) operations to skip, before another watchpoint is set up. Setting up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) watchpoints more frequently will result in the likelihood of races to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) observed to increase. This parameter has the most significant impact on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) overall system performance and race detection ability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) * ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) microsecond delay to stall execution after a watchpoint has been set up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) Larger values result in the window in which we may observe a race to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) increase.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) * ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) interrupts, the microsecond delay to stall execution after a watchpoint has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) been set up. Interrupts have tighter latency requirements, and their delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) should generally be smaller than the one chosen for tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) Data Races
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) In an execution, two memory accesses form a *data race* if they *conflict*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) they happen concurrently in different threads, and at least one of them is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) *plain access*; they *conflict* if both access the same memory location, and at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) least one is a write. For a more thorough discussion and definition, see `"Plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) Accesses and Data Races" in the LKMM`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) .. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) Relationship with the Linux-Kernel Memory Consistency Model (LKMM)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) The LKMM defines the propagation and ordering rules of various memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) operations, which gives developers the ability to reason about concurrent code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) Ultimately this allows to determine the possible executions of concurrent code,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) and if that code is free from data races.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) ``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) assumes that memory barriers are placed correctly. In other words, KCSAN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) assumes that as long as a plain access is not observed to race with another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) conflicting access, memory operations are correctly ordered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) This means that KCSAN will not report *potential* data races due to missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) memory ordering. Developers should therefore carefully consider the required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) memory ordering requirements that remain unchecked. If, however, missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) memory ordering (that is observable with a particular compiler and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) architecture) leads to an observable data race (e.g. entering a critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) section erroneously), KCSAN would report the resulting data race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) Race Detection Beyond Data Races
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) For code with complex concurrency design, race-condition bugs may not always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) manifest as data races. Race conditions occur if concurrently executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) operations result in unexpected system behaviour. On the other hand, data races
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) are defined at the C-language level. The following macros can be used to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) properties of concurrent code where bugs would not manifest as data races.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) .. kernel-doc:: include/linux/kcsan-checks.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) :functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) ASSERT_EXCLUSIVE_BITS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Implementation Details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) KCSAN relies on observing that two accesses happen concurrently. Crucially, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) want to (a) increase the chances of observing races (especially for races that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) manifest rarely), and (b) be able to actually observe them. We can accomplish
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) (a) by injecting various delays, and (b) by using address watchpoints (or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) breakpoints).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) If we deliberately stall a memory access, while we have a watchpoint for its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) address set up, and then observe the watchpoint to fire, two accesses to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) same address just raced. Using hardware watchpoints, this is the approach taken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) in `DataCollider
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) <http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) relies on compiler instrumentation and "soft watchpoints".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) In KCSAN, watchpoints are implemented using an efficient encoding that stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) access type, size, and address in a long; the benefits of using "soft
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) watchpoints" are portability and greater flexibility. KCSAN then relies on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) compiler instrumenting plain accesses. For each instrumented plain access:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) 1. Check if a matching watchpoint exists; if yes, and at least one access is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) write, then we encountered a racing access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 2. Periodically, if no matching watchpoint exists, set up a watchpoint and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) stall for a small randomized delay.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 3. Also check the data value before the delay, and re-check the data value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) after delay; if the values mismatch, we infer a race of unknown origin.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) To detect data races between plain and marked accesses, KCSAN also annotates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) sets up a watchpoint on marked accesses. By never setting up watchpoints for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) marked operations, if all accesses to a variable that is accessed concurrently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) are properly marked, KCSAN will never trigger a watchpoint and therefore never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) report the accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) Key Properties
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) ~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 1. **Memory Overhead:** The overall memory overhead is only a few MiB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) depending on configuration. The current implementation uses a small array of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) longs to encode watchpoint information, which is negligible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) efficient watchpoint encoding that does not require acquiring any shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) locks in the fast-path. For kernel boot on a system with 8 CPUs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) - 5.0x slow-down with the default KCSAN config;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) - 2.8x slow-down from runtime fast-path overhead only (set very large
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) ``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) runtime. As a result, maintenance overheads are minimal as the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) evolves.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 4. **Detects Racy Writes from Devices:** Due to checking data values upon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) setting up watchpoints, racy writes from devices can also be detected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) rules; this may result in missed data races (false negatives).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 6. **Analysis Accuracy:** For observed executions, due to using a sampling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) strategy, the analysis is *unsound* (false negatives possible), but aims to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) be complete (no false positives).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) Alternatives Considered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) An alternative data race detection approach for the kernel can be found in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) `Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) KTSAN is a happens-before data race detector, which explicitly establishes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) happens-before order between memory operations, which can then be used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) determine data races as defined in `Data Races`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) To build a correct happens-before relation, KTSAN must be aware of all ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) rules of the LKMM and synchronization primitives. Unfortunately, any omission
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) leads to large numbers of false positives, which is especially detrimental in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) the context of the kernel which includes numerous custom synchronization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) mechanisms. To track the happens-before relation, KTSAN's implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) requires metadata for each memory location (shadow memory), which for each page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) corresponds to 4 pages of shadow memory, and can translate into overhead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) tens of GiB on a large system.