^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) KVM-specific MSRs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) :Author: Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) KVM makes use of some custom MSRs to service some requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) Custom MSRs have a range reserved for them, that goes from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) 0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) but they are deprecated and their use is discouraged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Custom MSR list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) The current supported Custom MSR list is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) MSR_KVM_WALL_CLOCK_NEW:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) 0x4b564d00
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) 4-byte alignment physical address of a memory area which must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) in guest RAM. This memory is expected to hold a copy of the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) struct pvclock_wall_clock {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) u32 version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) u32 sec;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) u32 nsec;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) } __attribute__((__packed__));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) whose data will be filled in by the hypervisor. The hypervisor is only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) guaranteed to update this data at the moment of MSR write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Users that want to reliably query this information more than once have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) to write more than once to this MSR. Fields have the following meanings:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) version:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) guest has to check version before and after grabbing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) time information and check that they are both equal and even.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) An odd version indicates an in-progress update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) sec:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) number of seconds for wallclock at time of boot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) nsec:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) number of nanoseconds for wallclock at time of boot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) In order to get the current wallclock time, the system_time from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) MSR_KVM_SYSTEM_TIME_NEW needs to be added.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) Note that although MSRs are per-CPU entities, the effect of this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) particular MSR is global.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) leaf prior to usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) MSR_KVM_SYSTEM_TIME_NEW:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) 0x4b564d01
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) 4-byte aligned physical address of a memory area which must be in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) guest RAM, plus an enable bit in bit 0. This memory is expected to hold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) a copy of the following structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) struct pvclock_vcpu_time_info {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) u32 version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) u32 pad0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) u64 tsc_timestamp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) u64 system_time;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) u32 tsc_to_system_mul;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) s8 tsc_shift;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) u8 flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) u8 pad[2];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) } __attribute__((__packed__)); /* 32 bytes */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) whose data will be filled in by the hypervisor periodically. Only one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) write, or registration, is needed for each VCPU. The interval between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) updates of this structure is arbitrary and implementation-dependent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) The hypervisor may update this structure at any time it sees fit until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) anything with bit0 == 0 is written to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Fields have the following meanings:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) version:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) guest has to check version before and after grabbing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) time information and check that they are both equal and even.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) An odd version indicates an in-progress update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) tsc_timestamp:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) the tsc value at the current VCPU at the time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) of the update of this structure. Guests can subtract this value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) from current tsc to derive a notion of elapsed time since the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) structure update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) system_time:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) a host notion of monotonic time, including sleep
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) time at the time this structure was last updated. Unit is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) nanoseconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) tsc_to_system_mul:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) multiplier to be used when converting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) tsc-related quantity to nanoseconds
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) tsc_shift:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) shift to be used when converting tsc-related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) quantity to nanoseconds. This shift will ensure that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) multiplication with tsc_to_system_mul does not overflow.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) A positive value denotes a left shift, a negative value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) a right shift.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) The conversion from tsc to nanoseconds involves an additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) right shift by 32 bits. With this information, guests can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) derive per-CPU time by doing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) time = (current_tsc - tsc_timestamp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) if (tsc_shift >= 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) time <<= tsc_shift;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) time >>= -tsc_shift;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) time = (time * tsc_to_system_mul) >> 32
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) time = time + system_time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) flags:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) bits in this field indicate extended capabilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) coordinated between the guest and the hypervisor. Availability
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) of specific flags has to be checked in 0x40000001 cpuid leaf.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Current flags are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) +-----------+--------------+----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) | flag bit | cpuid bit | meaning |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) +-----------+--------------+----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) | | | time measures taken across |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) | 0 | 24 | multiple cpus are guaranteed to |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) | | | be monotonic |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) +-----------+--------------+----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) | | | guest vcpu has been paused by |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) | 1 | N/A | the host |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) | | | See 4.70 in api.txt |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) +-----------+--------------+----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) leaf prior to usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) MSR_KVM_WALL_CLOCK:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 0x11
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) data and functioning:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) same as MSR_KVM_WALL_CLOCK_NEW. Use that instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) This MSR falls outside the reserved KVM range and may be removed in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) future. Its usage is deprecated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) leaf prior to usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) MSR_KVM_SYSTEM_TIME:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 0x12
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) data and functioning:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) same as MSR_KVM_SYSTEM_TIME_NEW. Use that instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) This MSR falls outside the reserved KVM range and may be removed in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) future. Its usage is deprecated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) leaf prior to usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) The suggested algorithm for detecting kvmclock presence is then::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) if (!kvm_para_available()) /* refer to cpuid.txt */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) return NON_PRESENT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) flags = cpuid_eax(0x40000001);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) if (flags & 3) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) return PRESENT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) } else if (flags & 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) return PRESENT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) } else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) return NON_PRESENT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) MSR_KVM_ASYNC_PF_EN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 0x4b564d02
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) Asynchronous page fault (APF) control MSR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) which must be in guest RAM and must be zeroed. This memory is expected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) to hold a copy of the following structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) struct kvm_vcpu_pv_apf_data {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) /* Used for 'page not present' events delivered via #PF */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) __u32 flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) /* Used for 'page ready' events delivered via interrupt notification */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) __u32 token;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) __u8 pad[56];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) __u32 enabled;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) when asynchronous page faults are enabled on the vcpu, 0 when disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) #PF vmexits. Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) present in CPUID. Bit 3 enables interrupt based delivery of 'page ready'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) events. Bit 3 can only be set if KVM_FEATURE_ASYNC_PF_INT is present in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) CPUID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 'Page not present' events are currently always delivered as synthetic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) #PF exception. During delivery of these events APF CR2 register contains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) a token that will be used to notify the guest when missing page becomes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) available. Also, to make it possible to distinguish between real #PF and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) APF, first 4 bytes of 64 byte memory location ('flags') will be written
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) to by the hypervisor at the time of injection. Only first bit of 'flags'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) is currently supported, when set, it indicates that the guest is dealing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) with asynchronous 'page not present' event. If during a page fault APF
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 'flags' is '0' it means that this is regular page fault. Guest is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) supposed to clear 'flags' when it is done handling #PF exception so the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) next event can be delivered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) Note, since APF 'page not present' events use the same exception vector
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) as regular page fault, guest must reset 'flags' to '0' before it does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) something that can generate normal page fault.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) Bytes 5-7 of 64 byte memory location ('token') will be written to by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) hypervisor at the time of APF 'page ready' event injection. The content
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) of these bytes is a token which was previously delivered as 'page not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) present' event. The event indicates the page in now available. Guest is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) supposed to write '0' to 'token' when it is done handling 'page ready'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) event and to write 1' to MSR_KVM_ASYNC_PF_ACK after clearing the location;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) writing to the MSR forces KVM to re-scan its queue and deliver the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) pending notification.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) ready' APF delivery needs to be written to before enabling APF mechanism
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) in MSR_KVM_ASYNC_PF_EN or interrupt #0 can get injected. The MSR is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) available if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) Note, previously, 'page ready' events were delivered via the same #PF
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) exception as 'page not present' events but this is now deprecated. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) bit 3 (interrupt based delivery) is not set APF events are not delivered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) If APF is disabled while there are outstanding APFs, they will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) not be delivered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) Currently 'page ready' APF events will be always delivered on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) same vcpu as 'page not present' event was, but guest should not rely on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) MSR_KVM_STEAL_TIME:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 0x4b564d03
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 64-byte alignment physical address of a memory area which must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) in guest RAM, plus an enable bit in bit 0. This memory is expected to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) hold a copy of the following structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) struct kvm_steal_time {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) __u64 steal;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) __u32 version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) __u32 flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) __u8 preempted;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) __u8 u8_pad[3];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) __u32 pad[11];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) whose data will be filled in by the hypervisor periodically. Only one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) write, or registration, is needed for each VCPU. The interval between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) updates of this structure is arbitrary and implementation-dependent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) The hypervisor may update this structure at any time it sees fit until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) anything with bit0 == 0 is written to it. Guest is required to make sure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) this structure is initialized to zero.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) Fields have the following meanings:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) version:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) a sequence counter. In other words, guest has to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) this field before and after grabbing time information and make
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) sure they are both equal and even. An odd version indicates an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) in-progress update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) flags:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) At this point, always zero. May be used to indicate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) changes in this structure in the future.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) steal:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) the amount of time in which this vCPU did not run, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) nanoseconds. Time during which the vcpu is idle, will not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) reported as steal time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) preempted:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) indicate the vCPU who owns this struct is running or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) not. Non-zero values mean the vCPU has been preempted. Zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) means the vCPU is not preempted. NOTE, it is always zero if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) the hypervisor doesn't support this field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) MSR_KVM_EOI_EN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 0x4b564d04
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) when disabled. Bit 1 is reserved and must be zero. When PV end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) interrupt is enabled (bit 0 set), bits 63-2 hold a 4-byte aligned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) physical address of a 4 byte memory area which must be in guest RAM and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) must be zeroed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) The first, least significant bit of 4 byte memory location will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) written to by the hypervisor, typically at the time of interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) injection. Value of 1 means that guest can skip writing EOI to the apic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) (using MSR or MMIO write); instead, it is sufficient to signal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) EOI by clearing the bit in guest memory - this location will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) later be polled by the hypervisor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) Value of 0 means that the EOI write is required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) It is always safe for the guest to ignore the optimization and perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) the APIC EOI write anyway.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) Hypervisor is guaranteed to only modify this least
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) significant bit while in the current VCPU context, this means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) guest does not need to use either lock prefix or memory ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) primitives to synchronise with the hypervisor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) However, hypervisor can set and clear this memory bit at any time:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) therefore to make sure hypervisor does not interrupt the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) guest and clear the least significant bit in the memory area
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) in the window between guest testing it to detect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) whether it can skip EOI apic write and between guest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) clearing it to signal EOI to the hypervisor,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) guest must both read the least significant bit in the memory area and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) clear it using a single CPU instruction, such as test and clear, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) compare and exchange.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) MSR_KVM_POLL_CONTROL:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 0x4b564d05
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) Control host-side polling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) Bit 0 enables (1) or disables (0) host-side HLT polling logic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) KVM guests can request the host not to poll on HLT, for example if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) they are performing polling themselves.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) MSR_KVM_ASYNC_PF_INT:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 0x4b564d06
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) Second asynchronous page fault (APF) control MSR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) Bits 0-7: APIC vector for delivery of 'page ready' APF events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) Bits 8-63: Reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) Interrupt vector for asynchnonous 'page ready' notifications delivery.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) The vector has to be set up before asynchronous page fault mechanism
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) is enabled in MSR_KVM_ASYNC_PF_EN. The MSR is only available if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) MSR_KVM_ASYNC_PF_ACK:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 0x4b564d07
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) data:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) Asynchronous page fault (APF) acknowledgment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) When the guest is done processing 'page ready' APF event and 'token'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) field in 'struct kvm_vcpu_pv_apf_data' is cleared it is supposed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) and check if there are more notifications pending. The MSR is available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.