^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Kernel mode NEON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) TL;DR summary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) * Use only NEON instructions, or VFP instructions that don't rely on support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) * Isolate your NEON code in a separate compilation unit, and compile it with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) NEON code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) * Don't sleep in your NEON code, and be aware that it will be executed with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) preemption disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) It is possible to use NEON instructions (and in some cases, VFP instructions) in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) code that runs in kernel mode. However, for performance reasons, the NEON/VFP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) register file is not preserved and restored at every context switch or taken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) exception like the normal register file is, so some manual intervention is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) required. Furthermore, special care is required for code that may sleep [i.e.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) may call schedule()], as NEON or VFP instructions will be executed in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) non-preemptible section for reasons outlined below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) Lazy preserve and restore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) The NEON/VFP register file is managed using lazy preserve (on UP systems) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) lazy restore (on both SMP and UP systems). This means that the register file is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) kept 'live', and is only preserved and restored when multiple tasks are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) another core). Lazy restore is implemented by disabling the NEON/VFP unit after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) every context switch, resulting in a trap when subsequently a NEON/VFP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) instruction is issued, allowing the kernel to step in and perform the restore if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) it is required to do an 'eager' preserve of the NEON/VFP register file, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) enable the NEON/VFP unit explicitly so no exceptions are generated on first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) subsequent use. This is handled by the function kernel_neon_begin(), which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) should be called before any kernel mode NEON or VFP instructions are issued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) Likewise, the NEON/VFP unit should be disabled again after use to make sure user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) mode will hit the lazy restore trap upon next use. This is handled by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) function kernel_neon_end().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) Interruptions in kernel mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) For reasons of performance and simplicity, it was decided that there shall be no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) implies that interruptions of a kernel mode NEON section can only be allowed if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) they are guaranteed not to touch the NEON/VFP registers. For this reason, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) following rules and restrictions apply in the kernel:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) * NEON/VFP code is not allowed in interrupt context;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) * NEON/VFP code is not allowed to sleep;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) * NEON/VFP code is executed with preemption disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) If latency is a concern, it is possible to put back to back calls to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) kernel_neon_end() and kernel_neon_begin() in places in your code where none of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) the NEON registers are live. (Additional calls to kernel_neon_begin() should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) reasonably cheap if no context switch occurred in the meantime)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) VFP and support code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) Earlier versions of VFP (prior to version 3) rely on software support for things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) software assistance, it signals the kernel by raising an undefined instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) exception. The kernel responds by inspecting the VFP control registers and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) current instruction and arguments, and emulates the instruction in software.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) Such software assistance is currently not implemented for VFP instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) executed in kernel mode. If such a condition is encountered, the kernel will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) fail and generate an OOPS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) Separating NEON code from ordinary code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) The compiler is not aware of the special significance of kernel_neon_begin() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) between calls to these respective functions. Furthermore, GCC may generate NEON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) kernel is currently compiled at -O2, future changes may result in NEON/VFP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) instructions appearing in unexpected places if no special care is taken.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) Therefore, the recommended and only supported way of using NEON/VFP in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) kernel is by adhering to the following rules:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) * isolate the NEON code in a separate compilation unit and compile it with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) into the unit containing the NEON code from a compilation unit which is *not*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) built with the GCC flag '-mfpu=neon' set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) As the kernel is compiled with '-msoft-float', the above will guarantee that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) both NEON and VFP instructions will only ever appear in designated compilation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) units at any optimization level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) NEON assembler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) NEON assembler is supported with no additional caveats as long as the rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) above are followed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) NEON code generated by GCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) parallelism, and generates NEON code from ordinary C source code. This is fully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) supported as long as the rules above are followed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) NEON intrinsics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) NEON intrinsics are also supported. However, as code using NEON intrinsics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) observe the following in addition to the rules above:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) uses its builtin version of <stdint.h> (this is a C99 header which the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) does not supply);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) * Include <arm_neon.h> last, or at least after <linux/types.h>