^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) KVM VCPU Requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) KVM supports an internal API enabling threads to request a VCPU thread to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) perform some activity. For example, a thread may request a VCPU to flush
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) its TLB with a VCPU request. The API consists of the following functions::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) /* Check if any requests are pending for VCPU @vcpu. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) bool kvm_request_pending(struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) /* Check if VCPU @vcpu has request @req pending. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) /* Clear request @req for VCPU @vcpu. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) * Check if VCPU @vcpu has request @req pending. When the request is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) * pending it will be cleared and a memory barrier, which pairs with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) * another in kvm_make_request(), will be issued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) * with another in kvm_check_request(), prior to setting the request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) void kvm_make_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) Typically a requester wants the VCPU to perform the activity as soon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) as possible after making the request. This means most requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) (kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) and kvm_make_all_cpus_request() has the kicking of all VCPUs built
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) into it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) VCPU Kicks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) order to perform some KVM maintenance. To do so, an IPI is sent, forcing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) a guest mode exit. However, a VCPU thread may not be in guest mode at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) time of the kick. Therefore, depending on the mode and state of the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) thread, there are two other actions a kick may take. All three actions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) are listed below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) 1) Send an IPI. This forces a guest mode exit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) 2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) mode that wait on waitqueues. Waking them removes the threads from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) the waitqueues, allowing the threads to run again. This behavior
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) 3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) sleeping, then there is nothing to do.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) VCPU Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) guest is running in guest mode or not, as well as some specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) outside guest mode states. The architecture may use ``vcpu->mode`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) even to ensure IPI acknowledgements are waited upon (see "Waiting for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) Acknowledgements"). The following modes are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) OUTSIDE_GUEST_MODE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) The VCPU thread is outside guest mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) IN_GUEST_MODE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) The VCPU thread is in guest mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) EXITING_GUEST_MODE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) The VCPU thread is transitioning from IN_GUEST_MODE to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) OUTSIDE_GUEST_MODE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) READING_SHADOW_PAGE_TABLES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) The VCPU thread is outside guest mode, but it wants the sender of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) thread is done reading the page tables.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) VCPU Request Internals
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) This means general bitops, like those documented in [atomic-ops]_ could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) also be used, e.g. ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) However, VCPU request users should refrain from doing so, as it would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) break the abstraction. The first 8 bits are reserved for architecture
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) independent requests, all additional bits are available for architecture
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) dependent requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Architecture Independent Requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) KVM_REQ_TLB_FLUSH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) KVM's common MMU notifier may need to flush all of a guest's TLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) entries, calling kvm_flush_remote_tlbs() to do so. Architectures that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) choose to use the common kvm_flush_remote_tlbs() implementation will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) need to handle this VCPU request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) KVM_REQ_MMU_RELOAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) When shadow page tables are used and memory slots are removed it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) necessary to inform each VCPU to completely refresh the tables. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) request is used for that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) KVM_REQ_PENDING_TIMER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) This request may be made from a timer handler run on the host on behalf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) of a VCPU. It informs the VCPU thread to inject a timer interrupt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) KVM_REQ_UNHALT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) This request may be made from the KVM common function kvm_vcpu_block(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) which is used to emulate an instruction that causes a CPU to halt until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) one of an architectural specific set of events and/or interrupts is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) received (determined by checking kvm_arch_vcpu_runnable()). When that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) event or interrupt arrives kvm_vcpu_block() makes the request. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) in contrast to when kvm_vcpu_block() returns due to any other reason,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) such as a pending signal, which does not indicate the VCPU's halt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) emulation should stop, and therefore does not make the request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) KVM_REQUEST_MASK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) VCPU requests should be masked by KVM_REQUEST_MASK before using them with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) bitops. This is because only the lower 8 bits are used to represent the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) request's number. The upper bits are used as flags. Currently only two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) flags are defined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) VCPU Request Flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) KVM_REQUEST_NO_WAKEUP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) This flag is applied to requests that only need immediate attention
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) from VCPUs running in guest mode. That is, sleeping VCPUs do not need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) to be awaken for these requests. Sleeping VCPUs will handle the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) requests when they are awaken later for some other reason.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) KVM_REQUEST_WAIT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) When requests with this flag are made with kvm_make_all_cpus_request(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) then the caller will wait for each VCPU to acknowledge its IPI before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) proceeding. This flag only applies to VCPUs that would receive IPIs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) If, for example, the VCPU is sleeping, so no IPI is necessary, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) the requesting thread does not wait. This means that this flag may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) Acknowledgements" for more information about requests with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) KVM_REQUEST_WAIT.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) VCPU Requests with Associated State
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) Requesters that want the receiving VCPU to handle new state need to ensure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) the newly written state is observable to the receiving VCPU thread's CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) by the time it observes the request. This means a write memory barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) must be inserted after writing the new state and before setting the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) request bit. Additionally, on the receiving VCPU thread's side, a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) corresponding read barrier must be inserted after reading the request bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) and before proceeding to read the new state associated with it. See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) [memory-barriers]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) The pair of functions, kvm_check_request() and kvm_make_request(), provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) the memory barriers, allowing this requirement to be handled internally by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) the API.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Ensuring Requests Are Seen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) When making requests to VCPUs, we want to avoid the receiving VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) executing in guest mode for an arbitrary long time without handling the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) request. We can be sure this won't happen as long as we ensure the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) thread checks kvm_request_pending() before entering guest mode and that a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) kick will send an IPI to force an exit from guest mode when necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) Extra care must be taken to cover the period after the VCPU thread's last
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) kvm_request_pending() check and before it has entered guest mode, as kick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) IPIs will only trigger guest mode exits for VCPU threads that are in guest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) mode or at least have already disabled interrupts in order to prepare to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) enter guest mode. This means that an optimized implementation (see "IPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) Reduction") must be certain when it's safe to not send the IPI. One
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) solution, which all architectures except s390 apply, is to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) - set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) the last kvm_request_pending() check;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) - enable interrupts atomically when entering the guest.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) This solution also requires memory barriers to be placed carefully in both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) the requesting thread and the receiving VCPU. With the memory barriers we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) can exclude the possibility of a VCPU thread observing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) !kvm_request_pending() on its last check and then not receiving an IPI for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) the next request made of it, even if the request is made immediately after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) the check. This is done by way of the Dekker memory barrier pattern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) (scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) them into the pattern gives::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) CPU1 CPU2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) ================= =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) local_irq_disable();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) smp_mb(); smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) ==
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) IN_GUEST_MODE) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) ...abort guest entry... ...send IPI...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) } }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) As stated above, the IPI is only useful for VCPU threads in guest mode or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) that have already disabled interrupts. This is why this specific case of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) the Dekker pattern has been extended to disable interrupts before setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) ``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) pedantically implement the memory barrier pattern, guaranteeing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) compiler doesn't interfere with ``vcpu->mode``'s carefully planned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) IPI Reduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) As only one IPI is needed to get a VCPU to check for any/all requests,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) then they may be coalesced. This is easily done by having the first IPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) sending kick also change the VCPU mode to something !IN_GUEST_MODE. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) transitional state, EXITING_GUEST_MODE, is used for this purpose.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) Waiting for Acknowledgements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) be sent, and the acknowledgements to be waited upon, even when the target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) VCPU threads are in modes other than IN_GUEST_MODE. For example, one case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) is set after disabling interrupts. To support these cases, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) checking that the VCPU is IN_GUEST_MODE to checking that it is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) OUTSIDE_GUEST_MODE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) Request-less VCPU Kicks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) As the determination of whether or not to send an IPI depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) two-variable Dekker memory barrier pattern, then it's clear that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) request-less VCPU kicks are almost never correct. Without the assurance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) that a non-IPI generating kick will still result in an action by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) receiving VCPU, as the final kvm_request_pending() check does for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) request-accompanying kicks, then the kick may not do anything useful at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) all. If, for instance, a request-less kick was made to a VCPU that was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) the VCPU thread may continue its entry without actually having done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) whatever it was the kick was meant to initiate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) One exception is x86's posted interrupt mechanism. In this case, however,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) even the request-less VCPU kick is coupled with the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) local_irq_disable() + smp_mb() pattern described above; the ON bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) (Outstanding Notification) in the posted interrupt descriptor takes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) set before reading ``vcpu->mode``; dually, in the VCPU thread,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) IN_GUEST_MODE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) Additional Considerations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) Sleeping VCPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) VCPU threads may need to consider requests before and/or after calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) do or not, and, if they do, which requests need consideration, is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) to check if it should awaken. One reason to do so is to provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) architectures a function where requests may be checked if necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) Clearing Requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) Generally it only makes sense for the receiving VCPU thread to clear a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) request. However, in some circumstances, such as when the requesting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) thread and the receiving VCPU thread are executed serially, such as when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) they are the same thread, or when they are using some form of concurrency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) control to temporarily execute synchronously, then it's possible to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) that the request may be cleared immediately, rather than waiting for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) receiving VCPU thread to handle the request in VCPU RUN. The only current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) examples of this are kvm_vcpu_block() calls made by VCPUs to block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) themselves. A possible side-effect of that call is to make the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) KVM_REQ_UNHALT request, which may then be cleared immediately when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) VCPU returns from the call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) .. [atomic-ops] Documentation/core-api/atomic_ops.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) .. [memory-barriers] Documentation/memory-barriers.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) .. [lwn-mb] https://lwn.net/Articles/573436/