^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) Microarchitectural Data Sampling (MDS) mitigation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) =================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) .. _mds:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) Microarchitectural Data Sampling (MDS) is a family of side channel attacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) on internal buffers in Intel CPUs. The variants are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) dependent load (store-to-load forwarding) as an optimization. The forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) can also happen to a faulting or assisting load operation for a different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) memory address, which can be exploited under certain conditions. Store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) buffers are partitioned between Hyper-Threads so cross thread forwarding is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) not possible. But if a thread enters or exits a sleep state the store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) buffer is repartitioned which can expose data from one thread to the other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) L1 miss situations and to hold data which is returned or sent in response
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) to a memory or I/O operation. Fill buffers can forward data to a load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) operation and also write data to the cache. When the fill buffer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) deallocated it can retain the stale data of the preceding operations which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) can then be forwarded to a faulting or assisting load operation, which can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) be exploited under certain conditions. Fill buffers are shared between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) Hyper-Threads so cross thread leakage is possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) MLPDS leaks Load Port Data. Load ports are used to perform load operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) from memory or I/O. The received data is then forwarded to the register
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) file or a subsequent operation. In some implementations the Load Port can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) contain stale data from a previous operation which can be forwarded to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) faulting or assisting loads under certain conditions, which again can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) exploited eventually. Load ports are shared between Hyper-Threads so cross
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) thread leakage is possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) memory that takes a fault or assist can leave data in a microarchitectural
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) structure that may later be observed using one of the same methods used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) MSBDS, MFBDS or MLPDS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) Exposure assumptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) It is assumed that attack code resides in user space or in a guest with one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) exception. The rationale behind this assumption is that the code construct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) needed for exploiting MDS requires:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) - to control the load to trigger a fault or assist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) - to have a disclosure gadget which exposes the speculatively accessed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) data for consumption through a side channel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) - to control the pointer through which the disclosure gadget exposes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) The existence of such a construct in the kernel cannot be excluded with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) 100% certainty, but the complexity involved makes it extremly unlikely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) There is one exception, which is untrusted BPF. The functionality of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) untrusted BPF is limited, but it needs to be thoroughly investigated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) whether it can be used to create such a construct.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) Mitigation strategy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) All variants have the same mitigation strategy at least for the single CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) thread case (SMT off): Force the CPU to clear the affected buffers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) This is achieved by using the otherwise unused and obsolete VERW
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) instruction in combination with a microcode update. The microcode clears
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) the affected CPU buffers when the VERW instruction is executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) For virtualization there are two ways to achieve CPU buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) clearing. Either the modified VERW instruction or via the L1D Flush
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) command. The latter is issued when L1TF mitigation is enabled so the extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) be issued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) If the VERW instruction with the supplied segment selector argument is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) executed on a CPU without the microcode update there is no side effect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) other than a small number of pointlessly wasted CPU cycles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) This does not protect against cross Hyper-Thread attacks except for MSBDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) which is only exploitable cross Hyper-thread when one of the Hyper-Threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) enters a C-state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) The kernel provides a function to invoke the buffer clearing:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) mds_clear_cpu_buffers()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) (idle) transitions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) As a special quirk to address virtualization scenarios where the host has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) the microcode updated, but the hypervisor does not (yet) expose the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) hope that it might actually clear the buffers. The state is reflected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) According to current knowledge additional mitigations inside the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) itself are not required because the necessary gadgets to expose the leaked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) data cannot be controlled in a way which allows exploitation from malicious
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) user space or VM guests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Kernel internal mitigation modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) ======= ============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) off Mitigation is disabled. Either the CPU is not affected or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) mds=off is supplied on the kernel command line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) full Mitigation is enabled. CPU is affected and MD_CLEAR is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) advertised in CPUID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) advertised in CPUID. That is mainly for virtualization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) scenarios where the host has the updated microcode but the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) hypervisor does not expose MD_CLEAR in CPUID. It's a best
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) effort approach without guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) ======= ============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) If the CPU is affected and mds=off is not supplied on the kernel command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) line then the kernel selects the appropriate mitigation mode depending on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) the availability of the MD_CLEAR CPUID bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Mitigation points
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 1. Return to user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) ^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) When transitioning from kernel to user space the CPU buffers are flushed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) on affected CPUs when the mitigation is not disabled on the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) command line. The migitation is enabled through the static key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) mds_user_clear.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) The mitigation is invoked in prepare_exit_to_usermode() which covers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) all but one of the kernel to user space transitions. The exception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) is when we return from a Non Maskable Interrupt (NMI), which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) handled directly in do_nmi().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) (The reason that NMI is special is that prepare_exit_to_usermode() can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) enable IRQs. In NMI context, NMIs are blocked, and we don't want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) enable IRQs with NMIs blocked.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 2. C-State transition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) ^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) When a CPU goes idle and enters a C-State the CPU buffers need to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) cleared on affected CPUs when SMT is active. This addresses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) repartitioning of the store buffer when one of the Hyper-Threads enters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) a C-State.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) When SMT is inactive, i.e. either the CPU does not support it or all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) sibling threads are offline CPU buffer clearing is not required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) The idle clearing is enabled on CPUs which are only affected by MSBDS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) and not by any other MDS variant. The other MDS variants cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) protected against cross Hyper-Thread attacks because the Fill Buffer and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) the Load Ports are shared. So on CPUs affected by other variants, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) idle clearing would be a window dressing exercise and is therefore not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) activated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) The invocation is controlled by the static key mds_idle_clear which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) switched depending on the chosen mitigation mode and the SMT state of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) The buffer clear is only invoked before entering the C-State to prevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) that stale data from the idling CPU from spilling to the Hyper-Thread
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) sibling after the store buffer got repartitioned and all entries are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) available to the non idle sibling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) When coming out of idle the store buffer is partitioned again so each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) sibling has half of it available. The back from idle CPU could be then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) speculatively exposed to contents of the sibling. The buffers are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) flushed either on exit to user space or on VMENTER so malicious code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) in user space or the guest cannot speculatively access them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) The mitigation is hooked into all variants of halt()/mwait(), but does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) has been superseded by the intel_idle driver around 2010 and is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) preferred on all affected CPUs which are expected to gain the MD_CLEAR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) functionality in microcode. Aside of that the IO-Port mechanism is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) legacy interface which is only used on older systems which are either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) not affected or do not receive microcode updates anymore.