Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) Microarchitectural Data Sampling (MDS) mitigation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) =================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) .. _mds:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Microarchitectural Data Sampling (MDS) is a family of side channel attacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) on internal buffers in Intel CPUs. The variants are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)  - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)  - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)  - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)  - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) dependent load (store-to-load forwarding) as an optimization. The forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) can also happen to a faulting or assisting load operation for a different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) memory address, which can be exploited under certain conditions. Store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) buffers are partitioned between Hyper-Threads so cross thread forwarding is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) not possible. But if a thread enters or exits a sleep state the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) buffer is repartitioned which can expose data from one thread to the other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) L1 miss situations and to hold data which is returned or sent in response
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) to a memory or I/O operation. Fill buffers can forward data to a load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) operation and also write data to the cache. When the fill buffer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) deallocated it can retain the stale data of the preceding operations which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) can then be forwarded to a faulting or assisting load operation, which can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) be exploited under certain conditions. Fill buffers are shared between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) Hyper-Threads so cross thread leakage is possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) MLPDS leaks Load Port Data. Load ports are used to perform load operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) from memory or I/O. The received data is then forwarded to the register
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) file or a subsequent operation. In some implementations the Load Port can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) contain stale data from a previous operation which can be forwarded to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) faulting or assisting loads under certain conditions, which again can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) exploited eventually. Load ports are shared between Hyper-Threads so cross
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) thread leakage is possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) memory that takes a fault or assist can leave data in a microarchitectural
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) structure that may later be observed using one of the same methods used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) MSBDS, MFBDS or MLPDS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) Exposure assumptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) It is assumed that attack code resides in user space or in a guest with one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) exception. The rationale behind this assumption is that the code construct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) needed for exploiting MDS requires:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)  - to control the load to trigger a fault or assist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)  - to have a disclosure gadget which exposes the speculatively accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)    data for consumption through a side channel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)  - to control the pointer through which the disclosure gadget exposes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)    data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) The existence of such a construct in the kernel cannot be excluded with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 100% certainty, but the complexity involved makes it extremly unlikely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) There is one exception, which is untrusted BPF. The functionality of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) untrusted BPF is limited, but it needs to be thoroughly investigated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) whether it can be used to create such a construct.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) Mitigation strategy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) All variants have the same mitigation strategy at least for the single CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) thread case (SMT off): Force the CPU to clear the affected buffers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) This is achieved by using the otherwise unused and obsolete VERW
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) instruction in combination with a microcode update. The microcode clears
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) the affected CPU buffers when the VERW instruction is executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) For virtualization there are two ways to achieve CPU buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) clearing. Either the modified VERW instruction or via the L1D Flush
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) command. The latter is issued when L1TF mitigation is enabled so the extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) be issued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) If the VERW instruction with the supplied segment selector argument is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) executed on a CPU without the microcode update there is no side effect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) other than a small number of pointlessly wasted CPU cycles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) This does not protect against cross Hyper-Thread attacks except for MSBDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) which is only exploitable cross Hyper-thread when one of the Hyper-Threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) enters a C-state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) The kernel provides a function to invoke the buffer clearing:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)     mds_clear_cpu_buffers()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) (idle) transitions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) As a special quirk to address virtualization scenarios where the host has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) the microcode updated, but the hypervisor does not (yet) expose the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) hope that it might actually clear the buffers. The state is reflected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) According to current knowledge additional mitigations inside the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) itself are not required because the necessary gadgets to expose the leaked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) data cannot be controlled in a way which allows exploitation from malicious
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) user space or VM guests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Kernel internal mitigation modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)  ======= ============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)  off      Mitigation is disabled. Either the CPU is not affected or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)           mds=off is supplied on the kernel command line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)  full     Mitigation is enabled. CPU is affected and MD_CLEAR is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)           advertised in CPUID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)  vmwerv	  Mitigation is enabled. CPU is affected and MD_CLEAR is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 	  advertised in CPUID. That is mainly for virtualization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 	  scenarios where the host has the updated microcode but the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 	  hypervisor does not expose MD_CLEAR in CPUID. It's a best
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 	  effort approach without guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)  ======= ============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) If the CPU is affected and mds=off is not supplied on the kernel command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) line then the kernel selects the appropriate mitigation mode depending on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) the availability of the MD_CLEAR CPUID bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Mitigation points
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 1. Return to user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) ^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)    When transitioning from kernel to user space the CPU buffers are flushed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)    on affected CPUs when the mitigation is not disabled on the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)    command line. The migitation is enabled through the static key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)    mds_user_clear.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)    The mitigation is invoked in prepare_exit_to_usermode() which covers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)    all but one of the kernel to user space transitions.  The exception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)    is when we return from a Non Maskable Interrupt (NMI), which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)    handled directly in do_nmi().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)    (The reason that NMI is special is that prepare_exit_to_usermode() can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)     enable IRQs.  In NMI context, NMIs are blocked, and we don't want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)     enable IRQs with NMIs blocked.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 2. C-State transition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) ^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)    When a CPU goes idle and enters a C-State the CPU buffers need to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)    cleared on affected CPUs when SMT is active. This addresses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)    repartitioning of the store buffer when one of the Hyper-Threads enters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)    a C-State.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)    When SMT is inactive, i.e. either the CPU does not support it or all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)    sibling threads are offline CPU buffer clearing is not required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)    The idle clearing is enabled on CPUs which are only affected by MSBDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)    and not by any other MDS variant. The other MDS variants cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)    protected against cross Hyper-Thread attacks because the Fill Buffer and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)    the Load Ports are shared. So on CPUs affected by other variants, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)    idle clearing would be a window dressing exercise and is therefore not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)    activated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)    The invocation is controlled by the static key mds_idle_clear which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)    switched depending on the chosen mitigation mode and the SMT state of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)    the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)    The buffer clear is only invoked before entering the C-State to prevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)    that stale data from the idling CPU from spilling to the Hyper-Thread
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)    sibling after the store buffer got repartitioned and all entries are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)    available to the non idle sibling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)    When coming out of idle the store buffer is partitioned again so each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)    sibling has half of it available. The back from idle CPU could be then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)    speculatively exposed to contents of the sibling. The buffers are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)    flushed either on exit to user space or on VMENTER so malicious code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)    in user space or the guest cannot speculatively access them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)    The mitigation is hooked into all variants of halt()/mwait(), but does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)    not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)    has been superseded by the intel_idle driver around 2010 and is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)    preferred on all affected CPUs which are expected to gain the MD_CLEAR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)    functionality in microcode. Aside of that the IO-Port mechanism is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)    legacy interface which is only used on older systems which are either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)    not affected or do not receive microcode updates anymore.