Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) KVM Lock Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 1. Acquisition Orders
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) The acquisition orders for mutexes are as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) - kvm->lock is taken outside vcpu->mutex
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) - kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17)   them together is quite rare.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) Everything else is a leaf: no other lock is taken inside the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 2. Exception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) Fast page fault:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) Fast page fault is the fast path which fixes the guest page fault out of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) the mmu-lock on x86. Currently, the page fault can be fast in one of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) following two cases:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 1. Access Tracking: The SPTE is not present, but it is marked for access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)    tracking i.e. the SPTE_SPECIAL_MASK is set. That means we need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)    restore the saved R/X bits. This is described in more detail later below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 2. Write-Protection: The SPTE is present and the fault is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)    caused by write-protect. That means we just need to change the W bit of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39)    the spte.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) SPTE_MMU_WRITEABLE bit on the spte:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) - SPTE_HOST_WRITEABLE means the gfn is writable on host.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) - SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)   the gfn is writable on guest mmu and it is not write-protected by shadow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)   page write-protection.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) On fast page fault path, we will use cmpxchg to atomically set the spte W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) restore the saved R/X bits if VMX_EPT_TRACK_ACCESS mask is set, or both. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) is safe because whenever changing these bits can be detected by cmpxchg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) But we need carefully check these cases:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 1) The mapping from gfn to pfn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) The mapping from gfn to pfn may be changed since we can only ensure the pfn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) is not changed during cmpxchg. This is a ABA problem, for example, below case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) will happen:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) | At the beginning::                                                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) |                                                                        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) |	gpte = gfn1                                                      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) |	gfn1 is mapped to pfn1 on host                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) |	spte is the shadow page table entry corresponding with gpte and  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) |	spte = pfn1                                                      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) | On fast page fault path:                                               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) | CPU 0:                             | CPU 1:                            |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) | ::                                 |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) |   old_spte = *spte;                |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) |                                    | pfn1 is swapped out::             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) |                                    |    spte = 0;                      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) |                                    | pfn1 is re-alloced for gfn2.      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) |                                    | gpte is changed to point to       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) |                                    | gfn2 by the guest::               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) |                                    |    spte = pfn1;                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) | ::                                                                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) |                                                                        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) |   if (cmpxchg(spte, old_spte, old_spte+W)                              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) |	mark_page_dirty(vcpu->kvm, gfn1)                                 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) |            OOPS!!!                                                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) For direct sp, we can easily avoid it since the spte of direct sp is fixed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) to gfn.  For indirect sp, we disabled fast page fault for simplicity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) A solution for indirect sp could be to pin the gfn, for example via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg.  After the pinning:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) - We have held the refcount of pfn that means the pfn can not be freed and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)   be reused for another gfn.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) - The pfn is writable and therefore it cannot be shared between different gfns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)   by KSM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) Then, we can ensure the dirty bitmaps is correctly set for a gfn.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 2) Dirty bit tracking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) In the origin code, the spte can be fast updated (non-atomically) if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) spte is read-only and the Accessed bit has already been set since the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) Accessed bit and Dirty bit can not be lost.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) But it is not true after fast page fault since the spte can be marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) writable between reading spte and updating spte. Like below case:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) +------------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) | At the beginning::                                                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) |                                                                        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) |	spte.W = 0                                                       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) |	spte.Accessed = 1                                                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) | CPU 0:                             | CPU 1:                            |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) | In mmu_spte_clear_track_bits()::   |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) |  old_spte = *spte;                 |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) |  /* 'if' condition is satisfied. */|                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) |  if (old_spte.Accessed == 1 &&     |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) |       old_spte.W == 0)             |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) |     spte = 0ull;                   |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) |                                    | on fast page fault path::         |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) |                                    |    spte.W = 1                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) |                                    | memory write on the spte::        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) |                                    |    spte.Dirty = 1                 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) |  ::                                |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) |                                    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) |   else                             |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) |     old_spte = xchg(spte, 0ull)    |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) |   if (old_spte.Accessed == 1)      |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) |     kvm_set_pfn_accessed(spte.pfn);|                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) |   if (old_spte.Dirty == 1)         |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) |     kvm_set_pfn_dirty(spte.pfn);   |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) |     OOPS!!!                        |                                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) +------------------------------------+-----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) The Dirty bit is lost in this case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) In order to avoid this kind of issue, we always treat the spte as "volatile"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) the spte is always atomically updated in this case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 3) flush tlbs due to spte updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) If the spte is updated from writable to readonly, we should flush all TLBs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) otherwise rmap_write_protect will find a read-only spte, even though the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) writable spte might be cached on a CPU's TLB.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) As mentioned before, the spte can be updated to writable out of mmu-lock on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) fast page fault path, in order to easily audit the path, we see if TLBs need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) be flushed caused by this reason in mmu_spte_update() since this is a common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) function to update spte (present -> present).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Since the spte is "volatile" if it can be updated out of mmu-lock, we always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) atomically update the spte, the race caused by fast page fault can be avoided,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) See the comments in spte_has_volatile_bits() and mmu_spte_update().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) Lockless Access Tracking:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) This is used for Intel CPUs that are using EPT but do not support the EPT A/D
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) bits. In this case, when the KVM MMU notifier is called to track accesses to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE as not-present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) by clearing the RWX bits in the PTE and storing the original R & X bits in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) some unused/ignored bits. In addition, the SPTE_SPECIAL_MASK is also set on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) PTE (using the ignored bit 62). When the VM tries to access the page later on,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) a fault is generated and the fast page fault mechanism described above is used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) to atomically restore the PTE to a Present state. The W bit is not saved when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) the PTE is marked for access tracking and during restoration to the Present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) state, the W bit is set depending on whether or not it was a write access. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) it wasn't, then the W bit will remain clear until a write access happens, at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) which time it will be set using the Dirty tracking mechanism described above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 3. Reference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) :Name:		kvm_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) :Type:		mutex
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) :Arch:		any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) :Protects:	- vm_list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) :Name:		kvm_count_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) :Type:		raw_spinlock_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) :Arch:		any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) :Protects:	- hardware virtualization enable/disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) :Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 		migration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) :Name:		kvm_arch::tsc_write_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) :Type:		raw_spinlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) :Arch:		x86
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) :Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 		- tsc offset in vmcb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) :Comment:	'raw' because updating the tsc offsets must not be preempted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) :Name:		kvm->mmu_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) :Type:		spinlock_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) :Arch:		any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) :Protects:	-shadow page/shadow tlb entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) :Comment:	it is a spinlock since it is used in mmu notifier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) :Name:		kvm->srcu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) :Type:		srcu lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) :Arch:		any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) :Protects:	- kvm->memslots
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 		- kvm->buses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) :Comment:	The srcu read lock must be held while accessing memslots (e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 		when using gfn_to_* functions) and while accessing in-kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 		MMIO/PIO address->device structure mapping (kvm->buses).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 		The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 		if it is needed by multiple functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) :Name:		blocked_vcpu_on_cpu_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) :Type:		spinlock_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) :Arch:		x86
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) :Protects:	blocked_vcpu_on_cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) :Comment:	This is a per-CPU lock and it is used for VT-d posted-interrupts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) 		When VT-d posted-interrupts is supported and the VM has assigned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 		devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 		protected by blocked_vcpu_on_cpu_lock, when VT-d hardware issues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 		wakeup notification event since external interrupts from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) 		assigned devices happens, we will find the vCPU on the list to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 		wakeup.