Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Kernel mode NEON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) TL;DR summary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) * Use only NEON instructions, or VFP instructions that don't rely on support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8)   code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) * Isolate your NEON code in a separate compilation unit, and compile it with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10)   '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)   NEON code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) * Don't sleep in your NEON code, and be aware that it will be executed with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)   preemption disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) It is possible to use NEON instructions (and in some cases, VFP instructions) in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) code that runs in kernel mode. However, for performance reasons, the NEON/VFP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) register file is not preserved and restored at every context switch or taken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) exception like the normal register file is, so some manual intervention is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) required. Furthermore, special care is required for code that may sleep [i.e.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) may call schedule()], as NEON or VFP instructions will be executed in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) non-preemptible section for reasons outlined below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Lazy preserve and restore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) The NEON/VFP register file is managed using lazy preserve (on UP systems) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) lazy restore (on both SMP and UP systems). This means that the register file is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) kept 'live', and is only preserved and restored when multiple tasks are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) another core). Lazy restore is implemented by disabling the NEON/VFP unit after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) every context switch, resulting in a trap when subsequently a NEON/VFP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) instruction is issued, allowing the kernel to step in and perform the restore if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) it is required to do an 'eager' preserve of the NEON/VFP register file, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) enable the NEON/VFP unit explicitly so no exceptions are generated on first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) subsequent use. This is handled by the function kernel_neon_begin(), which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) should be called before any kernel mode NEON or VFP instructions are issued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) Likewise, the NEON/VFP unit should be disabled again after use to make sure user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) mode will hit the lazy restore trap upon next use. This is handled by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) function kernel_neon_end().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) Interruptions in kernel mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) For reasons of performance and simplicity, it was decided that there shall be no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) implies that interruptions of a kernel mode NEON section can only be allowed if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) they are guaranteed not to touch the NEON/VFP registers. For this reason, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) following rules and restrictions apply in the kernel:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) * NEON/VFP code is not allowed in interrupt context;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) * NEON/VFP code is not allowed to sleep;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) * NEON/VFP code is executed with preemption disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) If latency is a concern, it is possible to put back to back calls to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) kernel_neon_end() and kernel_neon_begin() in places in your code where none of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) the NEON registers are live. (Additional calls to kernel_neon_begin() should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) reasonably cheap if no context switch occurred in the meantime)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) VFP and support code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) Earlier versions of VFP (prior to version 3) rely on software support for things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) software assistance, it signals the kernel by raising an undefined instruction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) exception. The kernel responds by inspecting the VFP control registers and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) current instruction and arguments, and emulates the instruction in software.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) Such software assistance is currently not implemented for VFP instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) executed in kernel mode. If such a condition is encountered, the kernel will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) fail and generate an OOPS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) Separating NEON code from ordinary code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) The compiler is not aware of the special significance of kernel_neon_begin() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) between calls to these respective functions. Furthermore, GCC may generate NEON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) kernel is currently compiled at -O2, future changes may result in NEON/VFP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) instructions appearing in unexpected places if no special care is taken.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) Therefore, the recommended and only supported way of using NEON/VFP in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) kernel is by adhering to the following rules:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) * isolate the NEON code in a separate compilation unit and compile it with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)   '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)   into the unit containing the NEON code from a compilation unit which is *not*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)   built with the GCC flag '-mfpu=neon' set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) As the kernel is compiled with '-msoft-float', the above will guarantee that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) both NEON and VFP instructions will only ever appear in designated compilation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) units at any optimization level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) NEON assembler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) NEON assembler is supported with no additional caveats as long as the rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) above are followed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) NEON code generated by GCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) parallelism, and generates NEON code from ordinary C source code. This is fully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) supported as long as the rules above are followed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) NEON intrinsics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) NEON intrinsics are also supported. However, as code using NEON intrinsics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) observe the following in addition to the rules above:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)   uses its builtin version of <stdint.h> (this is a C99 header which the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)   does not supply);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) * Include <arm_neon.h> last, or at least after <linux/types.h>