^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) KVM Lock Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) 1. Acquisition Orders
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) The acquisition orders for mutexes are as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) - kvm->lock is taken outside vcpu->mutex
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) them together is quite rare.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) Everything else is a leaf: no other lock is taken inside the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) 2. Exception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) Fast page fault:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) Fast page fault is the fast path which fixes the guest page fault out of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) the mmu-lock on x86. Currently, the page fault can be fast in one of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) following two cases:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) 1. Access Tracking: The SPTE is not present, but it is marked for access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) tracking i.e. the SPTE_SPECIAL_MASK is set. That means we need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) restore the saved R/X bits. This is described in more detail later below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) 2. Write-Protection: The SPTE is present and the fault is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) caused by write-protect. That means we just need to change the W bit of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) the spte.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) SPTE_MMU_WRITEABLE bit on the spte:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) - SPTE_HOST_WRITEABLE means the gfn is writable on host.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) - SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) the gfn is writable on guest mmu and it is not write-protected by shadow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) page write-protection.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) On fast page fault path, we will use cmpxchg to atomically set the spte W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) restore the saved R/X bits if VMX_EPT_TRACK_ACCESS mask is set, or both. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) is safe because whenever changing these bits can be detected by cmpxchg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) But we need carefully check these cases:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) 1) The mapping from gfn to pfn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) The mapping from gfn to pfn may be changed since we can only ensure the pfn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) is not changed during cmpxchg. This is a ABA problem, for example, below case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) will happen:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) | At the beginning:: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) | gpte = gfn1 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) | gfn1 is mapped to pfn1 on host |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) | spte is the shadow page table entry corresponding with gpte and |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) | spte = pfn1 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) | On fast page fault path: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) | CPU 0: | CPU 1: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) | :: | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) | old_spte = *spte; | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) | | pfn1 is swapped out:: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) | | spte = 0; |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) | | pfn1 is re-alloced for gfn2. |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) | | gpte is changed to point to |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) | | gfn2 by the guest:: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) | | spte = pfn1; |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) | :: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) | if (cmpxchg(spte, old_spte, old_spte+W) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) | mark_page_dirty(vcpu->kvm, gfn1) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) | OOPS!!! |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) For direct sp, we can easily avoid it since the spte of direct sp is fixed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) to gfn. For indirect sp, we disabled fast page fault for simplicity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) A solution for indirect sp could be to pin the gfn, for example via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg. After the pinning:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) - We have held the refcount of pfn that means the pfn can not be freed and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) be reused for another gfn.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) - The pfn is writable and therefore it cannot be shared between different gfns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) by KSM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) Then, we can ensure the dirty bitmaps is correctly set for a gfn.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 2) Dirty bit tracking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) In the origin code, the spte can be fast updated (non-atomically) if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) spte is read-only and the Accessed bit has already been set since the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) Accessed bit and Dirty bit can not be lost.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) But it is not true after fast page fault since the spte can be marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) writable between reading spte and updating spte. Like below case:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) | At the beginning:: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) | spte.W = 0 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) | spte.Accessed = 1 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) | CPU 0: | CPU 1: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) | In mmu_spte_clear_track_bits():: | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) | old_spte = *spte; | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) | /* 'if' condition is satisfied. */| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) | if (old_spte.Accessed == 1 && | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) | old_spte.W == 0) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) | spte = 0ull; | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) | | on fast page fault path:: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) | | spte.W = 1 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) | | memory write on the spte:: |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) | | spte.Dirty = 1 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) | :: | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) | else | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) | old_spte = xchg(spte, 0ull) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) | if (old_spte.Accessed == 1) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) | kvm_set_pfn_accessed(spte.pfn);| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) | if (old_spte.Dirty == 1) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) | kvm_set_pfn_dirty(spte.pfn); | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) | OOPS!!! | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) The Dirty bit is lost in this case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) In order to avoid this kind of issue, we always treat the spte as "volatile"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) the spte is always atomically updated in this case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 3) flush tlbs due to spte updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) If the spte is updated from writable to readonly, we should flush all TLBs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) otherwise rmap_write_protect will find a read-only spte, even though the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) writable spte might be cached on a CPU's TLB.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) As mentioned before, the spte can be updated to writable out of mmu-lock on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) fast page fault path, in order to easily audit the path, we see if TLBs need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) be flushed caused by this reason in mmu_spte_update() since this is a common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) function to update spte (present -> present).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Since the spte is "volatile" if it can be updated out of mmu-lock, we always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) atomically update the spte, the race caused by fast page fault can be avoided,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) See the comments in spte_has_volatile_bits() and mmu_spte_update().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) Lockless Access Tracking:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) This is used for Intel CPUs that are using EPT but do not support the EPT A/D
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) bits. In this case, when the KVM MMU notifier is called to track accesses to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE as not-present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) by clearing the RWX bits in the PTE and storing the original R & X bits in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) some unused/ignored bits. In addition, the SPTE_SPECIAL_MASK is also set on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) PTE (using the ignored bit 62). When the VM tries to access the page later on,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) a fault is generated and the fast page fault mechanism described above is used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) to atomically restore the PTE to a Present state. The W bit is not saved when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) the PTE is marked for access tracking and during restoration to the Present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) state, the W bit is set depending on whether or not it was a write access. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) it wasn't, then the W bit will remain clear until a write access happens, at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) which time it will be set using the Dirty tracking mechanism described above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 3. Reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) :Name: kvm_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) :Type: mutex
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) :Arch: any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) :Protects: - vm_list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) :Name: kvm_count_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) :Type: raw_spinlock_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) :Arch: any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) :Protects: - hardware virtualization enable/disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) :Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) migration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) :Name: kvm_arch::tsc_write_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) :Type: raw_spinlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) :Arch: x86
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) :Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) - tsc offset in vmcb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) :Comment: 'raw' because updating the tsc offsets must not be preempted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) :Name: kvm->mmu_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) :Type: spinlock_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) :Arch: any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) :Protects: -shadow page/shadow tlb entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) :Comment: it is a spinlock since it is used in mmu notifier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) :Name: kvm->srcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) :Type: srcu lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) :Arch: any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) :Protects: - kvm->memslots
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) - kvm->buses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) :Comment: The srcu read lock must be held while accessing memslots (e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) when using gfn_to_* functions) and while accessing in-kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) MMIO/PIO address->device structure mapping (kvm->buses).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) if it is needed by multiple functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) :Name: blocked_vcpu_on_cpu_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) :Type: spinlock_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) :Arch: x86
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) :Protects: blocked_vcpu_on_cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) :Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) When VT-d posted-interrupts is supported and the VM has assigned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) wakeup notification event since external interrupts from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) assigned devices happens, we will find the vCPU on the list to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) wakeup.