^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) iTLB multihit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) iTLB multihit is an erratum where some processors may incur a machine check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) error, possibly resulting in an unrecoverable CPU lockup, when an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) instruction fetch hits multiple entries in the instruction TLB. This can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) occur when the page size is changed along with either the physical address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) or cache type. A malicious guest running on a virtualized system can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) exploit this erratum to perform a denial of service attack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) Affected processors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Variations of this erratum are present on most Intel Core and Xeon processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) models. The erratum is not present on:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) - non-Intel processors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) - Some Atoms (Airmont, Bonnell, Goldmont, GoldmontPlus, Saltwell, Silvermont)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) - Intel processors that have the PSCHANGE_MC_NO bit set in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) IA32_ARCH_CAPABILITIES MSR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) Related CVEs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) The following CVE entry is related to this issue:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) ============== =================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) CVE-2018-12207 Machine Check Error Avoidance on Page Size Change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) ============== =================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Problem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) Privileged software, including OS and virtual machine managers (VMM), are in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) charge of memory management. A key component in memory management is the control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) of the page tables. Modern processors use virtual memory, a technique that creates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) the illusion of a very large memory for processors. This virtual space is split
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) into pages of a given size. Page tables translate virtual addresses to physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) addresses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) To reduce latency when performing a virtual to physical address translation,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) processors include a structure, called TLB, that caches recent translations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) There are separate TLBs for instruction (iTLB) and data (dTLB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) Under this errata, instructions are fetched from a linear address translated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) using a 4 KB translation cached in the iTLB. Privileged software modifies the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) paging structure so that the same linear address using large page size (2 MB, 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) MB, 1 GB) with a different physical address or memory type. After the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) structure modification but before the software invalidates any iTLB entries for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) the linear address, a code fetch that happens on the same linear address may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) cause a machine-check error which can result in a system hang or shutdown.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Attack scenarios
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) Attacks against the iTLB multihit erratum can be mounted from malicious
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) guests in a virtualized system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) iTLB multihit system information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) The Linux kernel provides a sysfs interface to enumerate the current iTLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) multihit status of the system:whether the system is vulnerable and which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) mitigations are active. The relevant sysfs file is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) /sys/devices/system/cpu/vulnerabilities/itlb_multihit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) The possible values in this file are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) .. list-table::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) * - Not affected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) - The processor is not vulnerable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) * - KVM: Mitigation: Split huge pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) - Software changes mitigate this issue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) * - KVM: Mitigation: VMX unsupported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) - KVM is not vulnerable because Virtual Machine Extensions (VMX) is not supported.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) * - KVM: Mitigation: VMX disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) - KVM is not vulnerable because Virtual Machine Extensions (VMX) is disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) * - KVM: Vulnerable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) - The processor is vulnerable, but no mitigation enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) Enumeration of the erratum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) A new bit has been allocated in the IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) msr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) and will be set on CPU's which are mitigated against this issue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) ======================================= =========== ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) IA32_ARCH_CAPABILITIES MSR Not present Possibly vulnerable,check model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '0' Likely vulnerable,check model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '1' Not vulnerable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) ======================================= =========== ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Mitigation mechanism
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) This erratum can be mitigated by restricting the use of large page sizes to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) non-executable pages. This forces all iTLB entries to be 4K, and removes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) the possibility of multiple hits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) In order to mitigate the vulnerability, KVM initially marks all huge pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) as non-executable. If the guest attempts to execute in one of those pages,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) the page is broken down into 4K pages, which are then marked executable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) If EPT is disabled or not available on the host, KVM is in control of TLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) flushes and the problematic situation cannot happen. However, the shadow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) EPT paging mechanism used by nested virtualization is vulnerable, because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) the nested guest can trigger multiple iTLB hits by modifying its own
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) (non-nested) page tables. For simplicity, KVM will make large pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) non-executable in all shadow paging modes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) Mitigation control on the kernel command line and KVM - module parameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) ------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) The KVM hypervisor mitigation mechanism for marking huge pages as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) non-executable can be controlled with a module parameter "nx_huge_pages=".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) The kernel command line allows to control the iTLB multihit mitigations at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) boot time with the option "kvm.nx_huge_pages=".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) The valid arguments for these options are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) ========== ================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) force Mitigation is enabled. In this case, the mitigation implements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) non-executable huge pages in Linux kernel KVM module. All huge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) pages in the EPT are marked as non-executable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) If a guest attempts to execute in one of those pages, the page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) broken down into 4K pages, which are then marked executable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) off Mitigation is disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) auto Enable mitigation only if the platform is affected and the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) was not booted with the "mitigations=off" command line parameter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) This is the default option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) ========== ================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Mitigation selection guide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 1. No virtualization in use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) The system is protected by the kernel unconditionally and no further
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) action is required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 2. Virtualization with trusted guests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) If the guest comes from a trusted source, you may assume that the guest will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) not attempt to maliciously exploit these errata and no further action is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 3. Virtualization with untrusted guests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) If the guest comes from an untrusted source, the guest host kernel will need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) to apply iTLB multihit mitigation via the kernel command line or kvm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) module parameter.