Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Timekeeping Virtualization for X86-Based Architectures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) :Author: Zachary Amsden <zamsden@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) :Copyright: (c) 2010, Red Hat.  All rights reserved.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) .. Contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)    1) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)    2) Timing Devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)    3) TSC Hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)    4) Virtualization Problems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) One of the most complicated parts of the X86 platform, and specifically,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) the virtualization of this platform is the plethora of timing devices available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) and the complexity of emulating those devices.  In addition, virtualization of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) time introduces a new set of challenges because it introduces a multiplexed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) division of time beyond the control of the guest CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) First, we will describe the various timekeeping hardware available, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) present some of the problems which arise and solutions available, giving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) specific recommendations for certain classes of KVM guests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) The purpose of this document is to collect data and information relevant to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) timekeeping which may be difficult to find elsewhere, specifically,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) information relevant to KVM and hardware-based virtualization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 2. Timing Devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) First we discuss the basic hardware devices available.  TSC and the related
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) KVM clock are special enough to warrant a full exposition and are described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) the following section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 2.1. i8254 - PIT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) One of the first timer devices available is the programmable interrupt timer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) or PIT.  The PIT has a fixed frequency 1.193182 MHz base clock and three
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) channels which can be programmed to deliver periodic or one-shot interrupts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) These three channels can be configured in different modes and have individual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) counters.  Channel 1 and 2 were not available for general use in the original
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) IBM PC, and historically were connected to control RAM refresh and the PC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) speaker.  Now the PIT is typically integrated as part of an emulated chipset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) and a separate physical PIT is not used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) The PIT uses I/O ports 0x40 - 0x43.  Access to the 16-bit counters is done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) using single or multiple byte access to the I/O ports.  There are 6 modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) available, but not all modes are available to all timers, as only timer 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) has a connected gate input, required for modes 1 and 5.  The gate line is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) controlled by port 61h, bit 0, as illustrated in the following diagram::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)   --------------             ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)   |            |           |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)   |  1.1932 MHz|---------->| CLOCK      OUT | ---------> IRQ 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)   |    Clock   |   |       |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)   --------------   |    +->| GATE  TIMER 0  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)                    |        ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)                    |        ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)                    |       |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)                    |------>| CLOCK      OUT | ---------> 66.3 KHZ DRAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)                    |       |                |            (aka /dev/null)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)                    |    +->| GATE  TIMER 1  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)                    |        ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)                    |        ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)                    |       |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)                    |------>| CLOCK      OUT | ---------> Port 61h, bit 5
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)                            |                |      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)   Port 61h, bit 0 -------->| GATE  TIMER 2  |       \_.----   ____
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)                             ----------------         _|    )--|LPF|---Speaker
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)                                                     / *----   \___/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)   Port 61h, bit 1 ---------------------------------/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) The timer modes are now described.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) Mode 0: Single Timeout.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)  This is a one-shot software timeout that counts down
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)  when the gate is high (always true for timers 0 and 1).  When the count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)  reaches zero, the output goes high.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) Mode 1: Triggered One-shot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)  The output is initially set high.  When the gate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)  line is set high, a countdown is initiated (which does not stop if the gate is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)  lowered), during which the output is set low.  When the count reaches zero,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)  the output goes high.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) Mode 2: Rate Generator.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)  The output is initially set high.  When the countdown
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)  reaches 1, the output goes low for one count and then returns high.  The value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)  is reloaded and the countdown automatically resumes.  If the gate line goes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)  low, the count is halted.  If the output is low when the gate is lowered, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)  output automatically goes high (this only affects timer 2).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) Mode 3: Square Wave.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)  This generates a high / low square wave.  The count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)  determines the length of the pulse, which alternates between high and low
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)  when zero is reached.  The count only proceeds when gate is high and is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)  automatically reloaded on reaching zero.  The count is decremented twice at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)  each clock to generate a full high / low cycle at the full periodic rate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)  If the count is even, the clock remains high for N/2 counts and low for N/2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)  counts; if the clock is odd, the clock is high for (N+1)/2 counts and low
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)  for (N-1)/2 counts.  Only even values are latched by the counter, so odd
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)  values are not observed when reading.  This is the intended mode for timer 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)  which generates sine-like tones by low-pass filtering the square wave output.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) Mode 4: Software Strobe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)  After programming this mode and loading the counter,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)  the output remains high until the counter reaches zero.  Then the output
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)  goes low for 1 clock cycle and returns high.  The counter is not reloaded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)  Counting only occurs when gate is high.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) Mode 5: Hardware Strobe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)  After programming and loading the counter, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)  output remains high.  When the gate is raised, a countdown is initiated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)  (which does not stop if the gate is lowered).  When the counter reaches zero,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)  the output goes low for 1 clock cycle and then returns high.  The counter is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)  not reloaded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) In addition to normal binary counting, the PIT supports BCD counting.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) command port, 0x43 is used to set the counter and mode for each of the three
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) timers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) PIT commands, issued to port 0x43, using the following bit encoding::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)   Bit 7-4: Command (See table below)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)   Bit 3-1: Mode (000 = Mode 0, 101 = Mode 5, 11X = undefined)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)   Bit 0  : Binary (0) / BCD (1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) Command table::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)   0000 - Latch Timer 0 count for port 0x40
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 	sample and hold the count to be read in port 0x40;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 	additional commands ignored until counter is read;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 	mode bits ignored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)   0001 - Set Timer 0 LSB mode for port 0x40
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 	set timer to read LSB only and force MSB to zero;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 	mode bits set timer mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)   0010 - Set Timer 0 MSB mode for port 0x40
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 	set timer to read MSB only and force LSB to zero;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 	mode bits set timer mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)   0011 - Set Timer 0 16-bit mode for port 0x40
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 	set timer to read / write LSB first, then MSB;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 	mode bits set timer mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)   0100 - Latch Timer 1 count for port 0x41 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)   0101 - Set Timer 1 LSB mode for port 0x41 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)   0110 - Set Timer 1 MSB mode for port 0x41 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)   0111 - Set Timer 1 16-bit mode for port 0x41 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)   1000 - Latch Timer 2 count for port 0x42 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)   1001 - Set Timer 2 LSB mode for port 0x42 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)   1010 - Set Timer 2 MSB mode for port 0x42 - as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)   1011 - Set Timer 2 16-bit mode for port 0x42 as described above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)   1101 - General counter latch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 	Latch combination of counters into corresponding ports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 	Bit 3 = Counter 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 	Bit 2 = Counter 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 	Bit 1 = Counter 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 	Bit 0 = Unused
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)   1110 - Latch timer status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 	Latch combination of counter mode into corresponding ports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 	Bit 3 = Counter 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 	Bit 2 = Counter 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 	Bit 1 = Counter 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 	The output of ports 0x40-0x42 following this command will be:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 	Bit 7 = Output pin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 	Bit 6 = Count loaded (0 if timer has expired)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 	Bit 5-4 = Read / Write mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 	    01 = MSB only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 	    10 = LSB only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 	    11 = LSB / MSB (16-bit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 	Bit 3-1 = Mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 	Bit 0 = Binary (0) / BCD mode (1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 2.2. RTC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) The second device which was available in the original PC was the MC146818 real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) time clock.  The original device is now obsolete, and usually emulated by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) system chipset, sometimes by an HPET and some frankenstein IRQ routing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) The RTC is accessed through CMOS variables, which uses an index register to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) control which bytes are read.  Since there is only one index register, read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) of the CMOS and read of the RTC require lock protection (in addition, it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) dangerous to allow userspace utilities such as hwclock to have direct RTC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) access, as they could corrupt kernel reads and writes of CMOS memory).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) The RTC generates an interrupt which is usually routed to IRQ 8.  The interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) can function as a periodic timer, an additional once a day alarm, and can issue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) interrupts after an update of the CMOS registers by the MC146818 is complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) The type of interrupt is signalled in the RTC status registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) The RTC will update the current time fields by battery power even while the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) system is off.  The current time fields should not be read while an update is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) in progress, as indicated in the status register.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) The clock uses a 32.768kHz crystal, so bits 6-4 of register A should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) programmed to a 32kHz divider if the RTC is to count seconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) This is the RAM map originally used for the RTC/CMOS::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)   Location    Size    Description
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)   ------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)   00h         byte    Current second (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)   01h         byte    Seconds alarm (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)   02h         byte    Current minute (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)   03h         byte    Minutes alarm (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)   04h         byte    Current hour (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)   05h         byte    Hours alarm (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)   06h         byte    Current day of week (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)   07h         byte    Current day of month (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)   08h         byte    Current month (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228)   09h         byte    Current year (BCD)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)   0Ah         byte    Register A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)                        bit 7   = Update in progress
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)                        bit 6-4 = Divider for clock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)                                   000 = 4.194 MHz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)                                   001 = 1.049 MHz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)                                   010 = 32 kHz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)                                   10X = test modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)                                   110 = reset / disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)                                   111 = reset / disable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)                        bit 3-0 = Rate selection for periodic interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)                                   000 = periodic timer disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)                                   001 = 3.90625 uS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)                                   010 = 7.8125 uS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)                                   011 = .122070 mS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)                                   100 = .244141 mS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)                                      ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)                                  1101 = 125 mS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)                                  1110 = 250 mS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)                                  1111 = 500 mS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248)   0Bh         byte    Register B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)                        bit 7   = Run (0) / Halt (1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)                        bit 6   = Periodic interrupt enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)                        bit 5   = Alarm interrupt enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)                        bit 4   = Update-ended interrupt enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)                        bit 3   = Square wave interrupt enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)                        bit 2   = BCD calendar (0) / Binary (1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)                        bit 1   = 12-hour mode (0) / 24-hour mode (1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)                        bit 0   = 0 (DST off) / 1 (DST enabled)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)   OCh         byte    Register C (read only)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)                        bit 7   = interrupt request flag (IRQF)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)                        bit 6   = periodic interrupt flag (PF)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)                        bit 5   = alarm interrupt flag (AF)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)                        bit 4   = update interrupt flag (UF)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)                        bit 3-0 = reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)   ODh         byte    Register D (read only)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)                        bit 7   = RTC has power
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)                        bit 6-0 = reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)   32h         byte    Current century BCD (*)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)   (*) location vendor specific and now determined from ACPI global tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) 2.3. APIC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) On Pentium and later processors, an on-board timer is available to each CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) as part of the Advanced Programmable Interrupt Controller.  The APIC is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) accessed through memory-mapped registers and provides interrupt service to each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) CPU, used for IPIs and local timer interrupts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) Although in theory the APIC is a safe and stable source for local interrupts,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) in practice, many bugs and glitches have occurred due to the special nature of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) the APIC CPU-local memory-mapped hardware.  Beware that CPU errata may affect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) the use of the APIC and that workarounds may be required.  In addition, some of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) these workarounds pose unique constraints for virtualization - requiring either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) extra overhead incurred from extra reads of memory-mapped I/O or additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) functionality that may be more computationally expensive to implement.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) Since the APIC is documented quite well in the Intel and AMD manuals, we will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) avoid repetition of the detail here.  It should be pointed out that the APIC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) timer is programmed through the LVT (local vector timer) register, is capable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) of one-shot or periodic operation, and is based on the bus clock divided down
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) by the programmable divider register.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 2.4. HPET
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) HPET is quite complex, and was originally intended to replace the PIT / RTC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) support of the X86 PC.  It remains to be seen whether that will be the case, as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) the de facto standard of PC hardware is to emulate these older devices.  Some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) systems designated as legacy free may support only the HPET as a hardware timer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) The HPET spec is rather loose and vague, requiring at least 3 hardware timers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) but allowing implementation freedom to support many more.  It also imposes no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) fixed rate on the timer frequency, but does impose some extremal values on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) frequency, error and slew.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) In general, the HPET is recommended as a high precision (compared to PIT /RTC)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) time source which is independent of local variation (as there is only one HPET
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) in any given system).  The HPET is also memory-mapped, and its presence is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) indicated through ACPI tables by the BIOS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) Detailed specification of the HPET is beyond the current scope of this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) document, as it is also very well documented elsewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 2.5. Offboard Timers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) Several cards, both proprietary (watchdog boards) and commonplace (e1000) have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) timing chips built into the cards which may have registers which are accessible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) to kernel or user drivers.  To the author's knowledge, using these to generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) a clocksource for a Linux or other kernel has not yet been attempted and is in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) general frowned upon as not playing by the agreed rules of the game.  Such a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) timer device would require additional support to be virtualized properly and is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) not considered important at this time as no known operating system does this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 3. TSC Hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) The TSC or time stamp counter is relatively simple in theory; it counts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) instruction cycles issued by the processor, which can be used as a measure of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) time.  In practice, due to a number of problems, it is the most complicated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) timekeeping device to use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) The TSC is represented internally as a 64-bit MSR which can be read with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) RDMSR, RDTSC, or RDTSCP (when available) instructions.  In the past, hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) limitations made it possible to write the TSC, but generally on old hardware it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) was only possible to write the low 32-bits of the 64-bit counter, and the upper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) 32-bits of the counter were cleared.  Now, however, on Intel processors family
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 0Fh, for models 3, 4 and 6, and family 06h, models e and f, this restriction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) has been lifted and all 64-bits are writable.  On AMD systems, the ability to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) write the TSC MSR is not an architectural guarantee.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) The TSC is accessible from CPL-0 and conditionally, for CPL > 0 software by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) means of the CR4.TSD bit, which when enabled, disables CPL > 0 TSC access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) Some vendors have implemented an additional instruction, RDTSCP, which returns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) atomically not just the TSC, but an indicator which corresponds to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) processor number.  This can be used to index into an array of TSC variables to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) determine offset information in SMP systems where TSCs are not synchronized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) The presence of this instruction must be determined by consulting CPUID feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) bits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) Both VMX and SVM provide extension fields in the virtualization hardware which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) allows the guest visible TSC to be offset by a constant.  Newer implementations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) promise to allow the TSC to additionally be scaled, but this hardware is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) yet widely available.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) 3.1. TSC synchronization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) The TSC is a CPU-local clock in most implementations.  This means, on SMP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) platforms, the TSCs of different CPUs may start at different times depending
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) on when the CPUs are powered on.  Generally, CPUs on the same die will share
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) the same clock, however, this is not always the case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) The BIOS may attempt to resynchronize the TSCs during the poweron process and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) the operating system or other system software may attempt to do this as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) Several hardware limitations make the problem worse - if it is not possible to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) write the full 64-bits of the TSC, it may be impossible to match the TSC in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) newly arriving CPUs to that of the rest of the system, resulting in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) unsynchronized TSCs.  This may be done by BIOS or system software, but in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) practice, getting a perfectly synchronized TSC will not be possible unless all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) values are read from the same clock, which generally only is possible on single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) socket systems or those with special hardware support.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) 3.2. TSC and CPU hotplug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) As touched on already, CPUs which arrive later than the boot time of the system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) may not have a TSC value that is synchronized with the rest of the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) Either system software, BIOS, or SMM code may actually try to establish the TSC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) to a value matching the rest of the system, but a perfect match is usually not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) a guarantee.  This can have the effect of bringing a system from a state where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) TSC is synchronized back to a state where TSC synchronization flaws, however
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) small, may be exposed to the OS and any virtualization environment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) 3.3. TSC and multi-socket / NUMA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) Multi-socket systems, especially large multi-socket systems are likely to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) individual clocksources rather than a single, universally distributed clock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) Since these clocks are driven by different crystals, they will not have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) perfectly matched frequency, and temperature and electrical variations will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) cause the CPU clocks, and thus the TSCs to drift over time.  Depending on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) exact clock and bus design, the drift may or may not be fixed in absolute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) error, and may accumulate over time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) In addition, very large systems may deliberately slew the clocks of individual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) cores.  This technique, known as spread-spectrum clocking, reduces EMI at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) clock frequency and harmonics of it, which may be required to pass FCC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) standards for telecommunications and computer equipment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) It is recommended not to trust the TSCs to remain synchronized on NUMA or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) multiple socket systems for these reasons.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) 3.4. TSC and C-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) C-states, or idling states of the processor, especially C1E and deeper sleep
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) states may be problematic for TSC as well.  The TSC may stop advancing in such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) a state, resulting in a TSC which is behind that of other CPUs when execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) is resumed.  Such CPUs must be detected and flagged by the operating system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) based on CPU and chipset identifications.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) The TSC in such a case may be corrected by catching it up to a known external
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) clocksource.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) 3.5. TSC frequency change / P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) ------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) To make things slightly more interesting, some CPUs may change frequency.  They
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) may or may not run the TSC at the same rate, and because the frequency change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) may be staggered or slewed, at some points in time, the TSC rate may not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) known other than falling within a range of values.  In this case, the TSC will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) not be a stable time source, and must be calibrated against a known, stable,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) external clock to be a usable source of time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) Whether the TSC runs at a constant rate or scales with the P-state is model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) dependent and must be determined by inspecting CPUID, chipset or vendor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) specific MSR fields.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) In addition, some vendors have known bugs where the P-state is actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) compensated for properly during normal operation, but when the processor is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) inactive, the P-state may be raised temporarily to service cache misses from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) other processors.  In such cases, the TSC on halted CPUs could advance faster
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) than that of non-halted processors.  AMD Turion processors are known to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) this problem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) 3.6. TSC and STPCLK / T-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) External signals given to the processor may also have the effect of stopping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) the TSC.  This is typically done for thermal emergency power control to prevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) an overheating condition, and typically, there is no way to detect that this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) condition has happened.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) 3.7. TSC virtualization - VMX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) VMX provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) instructions, which is enough for full virtualization of TSC in any manner.  In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) addition, VMX allows passing through the host TSC plus an additional TSC_OFFSET
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) field specified in the VMCS.  Special instructions must be used to read and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) write the VMCS field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) 3.8. TSC virtualization - SVM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) SVM provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) instructions, which is enough for full virtualization of TSC in any manner.  In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) addition, SVM allows passing through the host TSC plus an additional offset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) field specified in the SVM control block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) 3.9. TSC feature bits in Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) In summary, there is no way to guarantee the TSC remains in perfect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) synchronization unless it is explicitly guaranteed by the architecture.  Even
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) if so, the TSCs in multi-sockets or NUMA systems may still run independently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) despite being locally consistent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) The following feature bits are used by Linux to signal various TSC attributes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) but they can only be taken to be meaningful for UP or single node systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) =========================	=======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) X86_FEATURE_TSC			The TSC is available in hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) X86_FEATURE_RDTSCP		The RDTSCP instruction is available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) X86_FEATURE_CONSTANT_TSC	The TSC rate is unchanged with P-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) X86_FEATURE_NONSTOP_TSC		The TSC does not stop in C-states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) X86_FEATURE_TSC_RELIABLE	TSC sync checks are skipped (VMware)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) =========================	=======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) 4. Virtualization Problems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) Timekeeping is especially problematic for virtualization because a number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) challenges arise.  The most obvious problem is that time is now shared between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) the host and, potentially, a number of virtual machines.  Thus the virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) operating system does not run with 100% usage of the CPU, despite the fact that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) it may very well make that assumption.  It may expect it to remain true to very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) exacting bounds when interrupt sources are disabled, but in reality only its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) virtual interrupt sources are disabled, and the machine may still be preempted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) at any time.  This causes problems as the passage of real time, the injection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) of machine interrupts and the associated clock sources are no longer completely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) synchronized with real time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) This same problem can occur on native hardware to a degree, as SMM mode may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) steal cycles from the naturally on X86 systems when SMM mode is used by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) BIOS, but not in such an extreme fashion.  However, the fact that SMM mode may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) cause similar problems to virtualization makes it a good justification for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) solving many of these problems on bare metal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) 4.1. Interrupt clocking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) One of the most immediate problems that occurs with legacy operating systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) is that the system timekeeping routines are often designed to keep track of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) time by counting periodic interrupts.  These interrupts may come from the PIT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) or the RTC, but the problem is the same: the host virtualization engine may not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) be able to deliver the proper number of interrupts per second, and so guest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) time may fall behind.  This is especially problematic if a high interrupt rate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) is selected, such as 1000 HZ, which is unfortunately the default for many Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) guests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) There are three approaches to solving this problem; first, it may be possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) to simply ignore it.  Guests which have a separate time source for tracking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) 'wall clock' or 'real time' may not need any adjustment of their interrupts to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) maintain proper time.  If this is not sufficient, it may be necessary to inject
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) additional interrupts into the guest in order to increase the effective
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) interrupt rate.  This approach leads to complications in extreme conditions,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) where host load or guest lag is too much to compensate for, and thus another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) solution to the problem has risen: the guest may need to become aware of lost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) ticks and compensate for them internally.  Although promising in theory, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) implementation of this policy in Linux has been extremely error prone, and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) number of buggy variants of lost tick compensation are distributed across
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) commonly used Linux systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) Windows uses periodic RTC clocking as a means of keeping time internally, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) thus requires interrupt slewing to keep proper time.  It does use a low enough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) rate (ed: is it 18.2 Hz?) however that it has not yet been a problem in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) practice.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) 4.2. TSC sampling and serialization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) As the highest precision time source available, the cycle counter of the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) has aroused much interest from developers.  As explained above, this timer has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) many problems unique to its nature as a local, potentially unstable and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) potentially unsynchronized source.  One issue which is not unique to the TSC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) but is highlighted because of its very precise nature is sampling delay.  By
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) definition, the counter, once read is already old.  However, it is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) possible for the counter to be read ahead of the actual use of the result.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) This is a consequence of the superscalar execution of the instruction stream,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) which may execute instructions out of order.  Such execution is called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) non-serialized.  Forcing serialized execution is necessary for precise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) measurement with the TSC, and requires a serializing instruction, such as CPUID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) or an MSR read.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) Since CPUID may actually be virtualized by a trap and emulate mechanism, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) serialization can pose a performance issue for hardware virtualization.  An
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) accurate time stamp counter reading may therefore not always be available, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) it may be necessary for an implementation to guard against "backwards" reads of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) the TSC as seen from other CPUs, even in an otherwise perfectly synchronized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) 4.3. Timespec aliasing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) Additionally, this lack of serialization from the TSC poses another challenge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) when using results of the TSC when measured against another time source.  As
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) the TSC is much higher precision, many possible values of the TSC may be read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) while another clock is still expressing the same value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) That is, you may read (T,T+10) while external clock C maintains the same value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) Due to non-serialized reads, you may actually end up with a range which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) fluctuates - from (T-1.. T+10).  Thus, any time calculated from a TSC, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) calibrated against an external value may have a range of valid values.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) Re-calibrating this computation may actually cause time, as computed after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) calibration, to go backwards, compared with time computed before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) calibration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) This problem is particularly pronounced with an internal time source in Linux,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) the kernel time, which is expressed in the theoretically high resolution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) timespec - but which advances in much larger granularity intervals, sometimes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) at the rate of jiffies, and possibly in catchup modes, at a much larger step.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) This aliasing requires care in the computation and recalibration of kvmclock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) and any other values derived from TSC computation (such as TSC virtualization
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) itself).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) 4.4. Migration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) Migration of a virtual machine raises problems for timekeeping in two ways.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) First, the migration itself may take time, during which interrupts cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) delivered, and after which, the guest time may need to be caught up.  NTP may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) be able to help to some degree here, as the clock correction required is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) typically small enough to fall in the NTP-correctable window.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) An additional concern is that timers based off the TSC (or HPET, if the raw bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) clock is exposed) may now be running at different rates, requiring compensation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) in some way in the hypervisor by virtualizing these timers.  In addition,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) migrating to a faster machine may preclude the use of a passthrough TSC, as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) faster clock cannot be made visible to a guest without the potential of time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) advancing faster than usual.  A slower clock is less of a problem, as it can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) always be caught up to the original rate.  KVM clock avoids these problems by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) simply storing multipliers and offsets against the TSC for the guest to convert
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) back into nanosecond resolution values.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) 4.5. Scheduling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) Since scheduling may be based on precise timing and firing of interrupts, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) scheduling algorithms of an operating system may be adversely affected by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) virtualization.  In theory, the effect is random and should be universally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) distributed, but in contrived as well as real scenarios (guest device access,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) causes of virtualization exits, possible context switch), this may not always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) be the case.  The effect of this has not been well studied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) In an attempt to work around this, several implementations have provided a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) paravirtualized scheduler clock, which reveals the true amount of CPU time for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) which a virtual machine has been running.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) 4.6. Watchdogs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) Watchdog timers, such as the lock detector in Linux may fire accidentally when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) running under hardware virtualization due to timer interrupts being delayed or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) misinterpretation of the passage of real time.  Usually, these warnings are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) spurious and can be ignored, but in some circumstances it may be necessary to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) disable such detection.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) 4.7. Delays and precision timing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) Precise timing and delays may not be possible in a virtualized system.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) can happen if the system is controlling physical hardware, or issues delays to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) compensate for slower I/O to and from devices.  The first issue is not solvable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) in general for a virtualized system; hardware control software can't be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) adequately virtualized without a full real-time operating system, which would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) require an RT aware virtualization platform.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) The second issue may cause performance problems, but this is unlikely to be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) significant issue.  In many cases these delays may be eliminated through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) configuration or paravirtualization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) 4.8. Covert channels and leaks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) In addition to the above problems, time information will inevitably leak to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) guest about the host in anything but a perfect implementation of virtualized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) time.  This may allow the guest to infer the presence of a hypervisor (as in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) red-pill type detection), and it may allow information to leak between guests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) by using CPU utilization itself as a signalling channel.  Preventing such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) problems would require completely isolated virtual time which may not track
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) real time any longer.  This may be useful in certain security or QA contexts,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) but in general isn't recommended for real-world deployment scenarios.