Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) The Kernel Concurrency Sanitizer (KCSAN)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) ========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) relies on compile-time instrumentation, and uses a watchpoint-based sampling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) approach to detect races. KCSAN's primary purpose is to detect `data races`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) KCSAN is supported by both GCC and Clang. With GCC we require version 11 or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) later, and with Clang also require version 11 or later.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) To enable KCSAN configure the kernel with::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)     CONFIG_KCSAN = y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) KCSAN provides several other configuration options to customize behaviour (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) the respective help text in ``lib/Kconfig.kcsan`` for more info).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) Error reports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) ~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) A typical data race report looks like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)     ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)     BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)     write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)      kernfs_refresh_inode+0x70/0x170
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)      kernfs_iop_permission+0x4f/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)      inode_permission+0x190/0x200
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)      link_path_walk.part.0+0x503/0x8e0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)      path_lookupat.isra.0+0x69/0x4d0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)      filename_lookup+0x136/0x280
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)      user_path_at_empty+0x47/0x60
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37)      vfs_statx+0x9b/0x130
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)      __do_sys_newlstat+0x50/0xb0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39)      __x64_sys_newlstat+0x37/0x50
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)      do_syscall_64+0x85/0x260
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41)      entry_SYSCALL_64_after_hwframe+0x44/0xa9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43)     read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44)      generic_permission+0x5b/0x2a0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45)      kernfs_iop_permission+0x66/0x90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)      inode_permission+0x190/0x200
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)      link_path_walk.part.0+0x503/0x8e0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)      path_lookupat.isra.0+0x69/0x4d0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49)      filename_lookup+0x136/0x280
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)      user_path_at_empty+0x47/0x60
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)      do_faccessat+0x11a/0x390
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52)      __x64_sys_access+0x3c/0x50
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)      do_syscall_64+0x85/0x260
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)      entry_SYSCALL_64_after_hwframe+0x44/0xa9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)     Reported by Kernel Concurrency Sanitizer on:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)     CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)     ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) The header of the report provides a short summary of the functions involved in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) the race. It is followed by the access types and stack traces of the 2 threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) involved in the data race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) The other less common type of data race report looks like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)     ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)     BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)     race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)      e1000_clean_rx_irq+0x551/0xb10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)      e1000_clean+0x533/0xda0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)      net_rx_action+0x329/0x900
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)      __do_softirq+0xdb/0x2db
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)      irq_exit+0x9b/0xa0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)      do_IRQ+0x9c/0xf0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)      ret_from_intr+0x0/0x18
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)      default_idle+0x3f/0x220
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)      arch_cpu_idle+0x21/0x30
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)      do_idle+0x1df/0x230
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)      cpu_startup_entry+0x14/0x20
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)      rest_init+0xc5/0xcb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)      arch_call_rest_init+0x13/0x2b
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)      start_kernel+0x6db/0x700
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)     Reported by Kernel Concurrency Sanitizer on:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)     CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)     ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) This report is generated where it was not possible to determine the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) racing thread, but a race was inferred due to the data value of the watched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) memory location having changed. These can occur either due to missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) instrumentation or e.g. DMA accesses. These reports will only be generated if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) ``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` (selected by default).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) Selective analysis
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) ~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) It may be desirable to disable data race detection for specific accesses,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) functions, compilation units, or entire subsystems.  For static blacklisting,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) the below options are available:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)   any data races due to accesses in ``expr`` should be ignored and resulting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)   behaviour when encountering a data race is deemed safe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) * Disabling data race detection for entire functions can be accomplished by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)   using the function attribute ``__no_kcsan``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)     __no_kcsan
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)     void foo(void) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)         ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)   To dynamically limit for which functions to generate reports, see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)   `DebugFS interface`_ blacklist/whitelist feature.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) * To disable data race detection for a particular compilation unit, add to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)   ``Makefile``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)     KCSAN_SANITIZE_file.o := n
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) * To disable data race detection for all compilation units listed in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)   ``Makefile``, add to the respective ``Makefile``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)     KCSAN_SANITIZE := n
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) Furthermore, it is possible to tell KCSAN to show or hide entire classes of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) data races, depending on preferences. These can be changed via the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) Kconfig options:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) * ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)   is observed via a watchpoint, but the data value of the memory location was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)   observed to remain unchanged, do not report the data race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) * ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)   up to word size are atomic by default. Assumes that such writes are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)   subject to unsafe compiler optimizations resulting in data races. The option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)   causes KCSAN to not report data races due to conflicts where the only plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)   accesses are aligned writes up to word size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) DebugFS interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) ~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) The file ``/sys/kernel/debug/kcsan`` provides the following interface:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) * Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) * Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)   on or off, respectively.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) * Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)   ``some_func_name`` to the report filter list, which (by default) blacklists
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)   reporting data races where either one of the top stackframes are a function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)   in the list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) * Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)   changes the report filtering behaviour. For example, the blacklist feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)   can be used to silence frequently occurring data races; the whitelist feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)   can help with reproduction and testing of fixes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Tuning performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) ~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) Core parameters that affect KCSAN's overall performance and bug detection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) ability are exposed as kernel command-line arguments whose defaults can also be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) changed via the corresponding Kconfig options.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) * ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)   operations to skip, before another watchpoint is set up. Setting up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)   watchpoints more frequently will result in the likelihood of races to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)   observed to increase. This parameter has the most significant impact on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)   overall system performance and race detection ability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) * ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)   microsecond delay to stall execution after a watchpoint has been set up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)   Larger values result in the window in which we may observe a race to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)   increase.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) * ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)   interrupts, the microsecond delay to stall execution after a watchpoint has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)   been set up. Interrupts have tighter latency requirements, and their delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)   should generally be smaller than the one chosen for tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) Data Races
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) In an execution, two memory accesses form a *data race* if they *conflict*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) they happen concurrently in different threads, and at least one of them is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) *plain access*; they *conflict* if both access the same memory location, and at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) least one is a write. For a more thorough discussion and definition, see `"Plain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) Accesses and Data Races" in the LKMM`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) .. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) Relationship with the Linux-Kernel Memory Consistency Model (LKMM)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) The LKMM defines the propagation and ordering rules of various memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) operations, which gives developers the ability to reason about concurrent code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) Ultimately this allows to determine the possible executions of concurrent code,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) and if that code is free from data races.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) ``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) assumes that memory barriers are placed correctly. In other words, KCSAN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) assumes that as long as a plain access is not observed to race with another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) conflicting access, memory operations are correctly ordered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) This means that KCSAN will not report *potential* data races due to missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) memory ordering. Developers should therefore carefully consider the required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) memory ordering requirements that remain unchecked. If, however, missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) memory ordering (that is observable with a particular compiler and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) architecture) leads to an observable data race (e.g. entering a critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) section erroneously), KCSAN would report the resulting data race.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) Race Detection Beyond Data Races
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) For code with complex concurrency design, race-condition bugs may not always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) manifest as data races. Race conditions occur if concurrently executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) operations result in unexpected system behaviour. On the other hand, data races
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) are defined at the C-language level. The following macros can be used to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) properties of concurrent code where bugs would not manifest as data races.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) .. kernel-doc:: include/linux/kcsan-checks.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)     :functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)                 ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)                 ASSERT_EXCLUSIVE_BITS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Implementation Details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) KCSAN relies on observing that two accesses happen concurrently. Crucially, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) want to (a) increase the chances of observing races (especially for races that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) manifest rarely), and (b) be able to actually observe them. We can accomplish
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) (a) by injecting various delays, and (b) by using address watchpoints (or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) breakpoints).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) If we deliberately stall a memory access, while we have a watchpoint for its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) address set up, and then observe the watchpoint to fire, two accesses to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) same address just raced. Using hardware watchpoints, this is the approach taken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) in `DataCollider
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) <http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) relies on compiler instrumentation and "soft watchpoints".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) In KCSAN, watchpoints are implemented using an efficient encoding that stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) access type, size, and address in a long; the benefits of using "soft
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) watchpoints" are portability and greater flexibility. KCSAN then relies on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) compiler instrumenting plain accesses. For each instrumented plain access:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) 1. Check if a matching watchpoint exists; if yes, and at least one access is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)    write, then we encountered a racing access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 2. Periodically, if no matching watchpoint exists, set up a watchpoint and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)    stall for a small randomized delay.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 3. Also check the data value before the delay, and re-check the data value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)    after delay; if the values mismatch, we infer a race of unknown origin.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) To detect data races between plain and marked accesses, KCSAN also annotates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) sets up a watchpoint on marked accesses. By never setting up watchpoints for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) marked operations, if all accesses to a variable that is accessed concurrently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) are properly marked, KCSAN will never trigger a watchpoint and therefore never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) report the accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) Key Properties
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) ~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 1. **Memory Overhead:**  The overall memory overhead is only a few MiB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)    depending on configuration. The current implementation uses a small array of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)    longs to encode watchpoint information, which is negligible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)    efficient watchpoint encoding that does not require acquiring any shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)    locks in the fast-path. For kernel boot on a system with 8 CPUs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)    - 5.0x slow-down with the default KCSAN config;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)    - 2.8x slow-down from runtime fast-path overhead only (set very large
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)      ``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)    runtime. As a result, maintenance overheads are minimal as the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)    evolves.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 4. **Detects Racy Writes from Devices:** Due to checking data values upon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)    setting up watchpoints, racy writes from devices can also be detected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)    rules; this may result in missed data races (false negatives).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 6. **Analysis Accuracy:** For observed executions, due to using a sampling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)    strategy, the analysis is *unsound* (false negatives possible), but aims to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)    be complete (no false positives).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) Alternatives Considered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) An alternative data race detection approach for the kernel can be found in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) `Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) KTSAN is a happens-before data race detector, which explicitly establishes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) happens-before order between memory operations, which can then be used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) determine data races as defined in `Data Races`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) To build a correct happens-before relation, KTSAN must be aware of all ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) rules of the LKMM and synchronization primitives. Unfortunately, any omission
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) leads to large numbers of false positives, which is especially detrimental in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) the context of the kernel which includes numerous custom synchronization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) mechanisms. To track the happens-before relation, KTSAN's implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) requires metadata for each memory location (shadow memory), which for each page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) corresponds to 4 pages of shadow memory, and can translate into overhead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) tens of GiB on a large system.