Orange Pi5 kernel

^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) KVM VCPU Requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) KVM supports an internal API enabling threads to request a VCPU thread to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) perform some activity.  For example, a thread may request a VCPU to flush
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) its TLB with a VCPU request.  The API consists of the following functions::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)   /* Check if any requests are pending for VCPU @vcpu. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)   bool kvm_request_pending(struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)   /* Check if VCPU @vcpu has request @req pending. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)   bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18)   /* Clear request @req for VCPU @vcpu. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19)   void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)   /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)    * Check if VCPU @vcpu has request @req pending. When the request is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23)    * pending it will be cleared and a memory barrier, which pairs with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24)    * another in kvm_make_request(), will be issued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25)    */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)   bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)   /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)    * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)    * with another in kvm_check_request(), prior to setting the request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)    */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)   void kvm_make_request(int req, struct kvm_vcpu *vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)   /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)   bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) Typically a requester wants the VCPU to perform the activity as soon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) as possible after making the request.  This means most requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) (kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) and kvm_make_all_cpus_request() has the kicking of all VCPUs built
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) into it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) VCPU Kicks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) order to perform some KVM maintenance.  To do so, an IPI is sent, forcing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) a guest mode exit.  However, a VCPU thread may not be in guest mode at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) time of the kick.  Therefore, depending on the mode and state of the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) thread, there are two other actions a kick may take.  All three actions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) are listed below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 1) Send an IPI.  This forces a guest mode exit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 2) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)    mode that wait on waitqueues.  Waking them removes the threads from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)    the waitqueues, allowing the threads to run again.  This behavior
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)    may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 3) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)    sleeping, then there is nothing to do.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) VCPU Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) guest is running in guest mode or not, as well as some specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) outside guest mode states.  The architecture may use ``vcpu->mode`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) even to ensure IPI acknowledgements are waited upon (see "Waiting for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) Acknowledgements").  The following modes are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) OUTSIDE_GUEST_MODE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)   The VCPU thread is outside guest mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) IN_GUEST_MODE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)   The VCPU thread is in guest mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) EXITING_GUEST_MODE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)   The VCPU thread is transitioning from IN_GUEST_MODE to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   OUTSIDE_GUEST_MODE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) READING_SHADOW_PAGE_TABLES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)   The VCPU thread is outside guest mode, but it wants the sender of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)   certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)   thread is done reading the page tables.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) VCPU Request Internals
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) This means general bitops, like those documented in [atomic-ops]_ could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) also be used, e.g. ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)   clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) However, VCPU request users should refrain from doing so, as it would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) break the abstraction.  The first 8 bits are reserved for architecture
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) independent requests, all additional bits are available for architecture
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) dependent requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Architecture Independent Requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) KVM_REQ_TLB_FLUSH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)   KVM's common MMU notifier may need to flush all of a guest's TLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)   entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)   choose to use the common kvm_flush_remote_tlbs() implementation will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)   need to handle this VCPU request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) KVM_REQ_MMU_RELOAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)   When shadow page tables are used and memory slots are removed it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)   necessary to inform each VCPU to completely refresh the tables.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)   request is used for that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) KVM_REQ_PENDING_TIMER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)   This request may be made from a timer handler run on the host on behalf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)   of a VCPU.  It informs the VCPU thread to inject a timer interrupt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) KVM_REQ_UNHALT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)   This request may be made from the KVM common function kvm_vcpu_block(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)   which is used to emulate an instruction that causes a CPU to halt until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)   one of an architectural specific set of events and/or interrupts is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)   received (determined by checking kvm_arch_vcpu_runnable()).  When that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)   event or interrupt arrives kvm_vcpu_block() makes the request.  This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)   in contrast to when kvm_vcpu_block() returns due to any other reason,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)   such as a pending signal, which does not indicate the VCPU's halt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)   emulation should stop, and therefore does not make the request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) KVM_REQUEST_MASK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) VCPU requests should be masked by KVM_REQUEST_MASK before using them with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) bitops.  This is because only the lower 8 bits are used to represent the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) request's number.  The upper bits are used as flags.  Currently only two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) flags are defined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) VCPU Request Flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) KVM_REQUEST_NO_WAKEUP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)   This flag is applied to requests that only need immediate attention
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)   from VCPUs running in guest mode.  That is, sleeping VCPUs do not need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)   to be awaken for these requests.  Sleeping VCPUs will handle the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)   requests when they are awaken later for some other reason.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) KVM_REQUEST_WAIT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)   When requests with this flag are made with kvm_make_all_cpus_request(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)   then the caller will wait for each VCPU to acknowledge its IPI before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)   proceeding.  This flag only applies to VCPUs that would receive IPIs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)   If, for example, the VCPU is sleeping, so no IPI is necessary, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)   the requesting thread does not wait.  This means that this flag may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)   safely combined with KVM_REQUEST_NO_WAKEUP.  See "Waiting for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)   Acknowledgements" for more information about requests with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)   KVM_REQUEST_WAIT.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) VCPU Requests with Associated State
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) Requesters that want the receiving VCPU to handle new state need to ensure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) the newly written state is observable to the receiving VCPU thread's CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) by the time it observes the request.  This means a write memory barrier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) must be inserted after writing the new state and before setting the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) request bit.  Additionally, on the receiving VCPU thread's side, a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) corresponding read barrier must be inserted after reading the request bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) and before proceeding to read the new state associated with it.  See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) [memory-barriers]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) The pair of functions, kvm_check_request() and kvm_make_request(), provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) the memory barriers, allowing this requirement to be handled internally by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) the API.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Ensuring Requests Are Seen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) When making requests to VCPUs, we want to avoid the receiving VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) executing in guest mode for an arbitrary long time without handling the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) request.  We can be sure this won't happen as long as we ensure the VCPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) thread checks kvm_request_pending() before entering guest mode and that a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) kick will send an IPI to force an exit from guest mode when necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) Extra care must be taken to cover the period after the VCPU thread's last
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) kvm_request_pending() check and before it has entered guest mode, as kick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) IPIs will only trigger guest mode exits for VCPU threads that are in guest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) mode or at least have already disabled interrupts in order to prepare to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) enter guest mode.  This means that an optimized implementation (see "IPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) Reduction") must be certain when it's safe to not send the IPI.  One
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) solution, which all architectures except s390 apply, is to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) - set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)   the last kvm_request_pending() check;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) - enable interrupts atomically when entering the guest.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) This solution also requires memory barriers to be placed carefully in both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) the requesting thread and the receiving VCPU.  With the memory barriers we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) can exclude the possibility of a VCPU thread observing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) !kvm_request_pending() on its last check and then not receiving an IPI for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) the next request made of it, even if the request is made immediately after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) the check.  This is done by way of the Dekker memory barrier pattern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) (scenario 10 of [lwn-mb]_).  As the Dekker pattern requires two variables,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) this solution pairs ``vcpu->mode`` with ``vcpu->requests``.  Substituting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) them into the pattern gives::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)   CPU1                                    CPU2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)   =================                       =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)   local_irq_disable();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)   WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)   smp_mb();                               smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)   if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)                                               IN_GUEST_MODE) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)       ...abort guest entry...                 ...send IPI...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)   }                                       }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) As stated above, the IPI is only useful for VCPU threads in guest mode or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) that have already disabled interrupts.  This is why this specific case of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) the Dekker pattern has been extended to disable interrupts before setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) ``vcpu->mode`` to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) pedantically implement the memory barrier pattern, guaranteeing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) compiler doesn't interfere with ``vcpu->mode``'s carefully planned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) accesses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) IPI Reduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) As only one IPI is needed to get a VCPU to check for any/all requests,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) then they may be coalesced.  This is easily done by having the first IPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) sending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) transitional state, EXITING_GUEST_MODE, is used for this purpose.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) Waiting for Acknowledgements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) be sent, and the acknowledgements to be waited upon, even when the target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) VCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) is set after disabling interrupts.  To support these cases, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) checking that the VCPU is IN_GUEST_MODE to checking that it is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) OUTSIDE_GUEST_MODE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) Request-less VCPU Kicks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) As the determination of whether or not to send an IPI depends on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) two-variable Dekker memory barrier pattern, then it's clear that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) request-less VCPU kicks are almost never correct.  Without the assurance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) that a non-IPI generating kick will still result in an action by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) receiving VCPU, as the final kvm_request_pending() check does for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) request-accompanying kicks, then the kick may not do anything useful at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) all.  If, for instance, a request-less kick was made to a VCPU that was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) the VCPU thread may continue its entry without actually having done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) whatever it was the kick was meant to initiate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) One exception is x86's posted interrupt mechanism.  In this case, however,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) even the request-less VCPU kick is coupled with the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) local_irq_disable() + smp_mb() pattern described above; the ON bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) (Outstanding Notification) in the posted interrupt descriptor takes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) role of ``vcpu->requests``.  When sending a posted interrupt, PIR.ON is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) set before reading ``vcpu->mode``; dually, in the VCPU thread,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) IN_GUEST_MODE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) Additional Considerations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) Sleeping VCPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) VCPU threads may need to consider requests before and/or after calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) do or not, and, if they do, which requests need consideration, is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) to check if it should awaken.  One reason to do so is to provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) architectures a function where requests may be checked if necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) Clearing Requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) Generally it only makes sense for the receiving VCPU thread to clear a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) request.  However, in some circumstances, such as when the requesting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) thread and the receiving VCPU thread are executed serially, such as when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) they are the same thread, or when they are using some form of concurrency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) control to temporarily execute synchronously, then it's possible to know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) that the request may be cleared immediately, rather than waiting for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) receiving VCPU thread to handle the request in VCPU RUN.  The only current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) examples of this are kvm_vcpu_block() calls made by VCPUs to block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) themselves.  A possible side-effect of that call is to make the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) KVM_REQ_UNHALT request, which may then be cleared immediately when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) VCPU returns from the call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) .. [atomic-ops] Documentation/core-api/atomic_ops.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) .. [memory-barriers] Documentation/memory-barriers.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) .. [lwn-mb] https://lwn.net/Articles/573436/