Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ===========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Clock sources, Clock events, sched_clock() and delay timers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) This document tries to briefly explain some basic kernel timekeeping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) abstractions. It partly pertains to the drivers usually found in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) drivers/clocksource in the kernel tree, but the code may be spread out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) across the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) If you grep through the kernel source you will find a number of architecture-
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) specific implementations of clock sources, clockevents and several likewise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) architecture-specific overrides of the sched_clock() function and some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) delay timers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) To provide timekeeping for your platform, the clock source provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) the basic timeline, whereas clock events shoot interrupts on certain points
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) on this timeline, providing facilities such as high-resolution timers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) sched_clock() is used for scheduling and timestamping, and delay timers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) provide an accurate delay source using hardware counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) Clock sources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) The purpose of the clock source is to provide a timeline for the system that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) tells you where you are in time. For example issuing the command 'date' on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) a Linux system will eventually read the clock source to determine exactly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) what time it is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) Typically the clock source is a monotonic, atomic counter which will provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) n bits which count from 0 to (2^n)-1 and then wraps around to 0 and start over.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) It will ideally NEVER stop ticking as long as the system is running. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) may stop during system suspend.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) The clock source shall have as high resolution as possible, and the frequency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) shall be as stable and correct as possible as compared to a real-world wall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) clock. It should not move unpredictably back and forth in time or miss a few
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) cycles here and there.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) It must be immune to the kind of effects that occur in hardware where e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) the counter register is read in two phases on the bus lowest 16 bits first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) and the higher 16 bits in a second bus cycle with the counter bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) potentially being updated in between leading to the risk of very strange
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) values from the counter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) When the wall-clock accuracy of the clock source isn't satisfactory, there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) are various quirks and layers in the timekeeping code for e.g. synchronizing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) the user-visible time to RTC clocks in the system or against networked time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) servers using NTP, but all they do basically is update an offset against
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) the clock source, which provides the fundamental timeline for the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) These measures does not affect the clock source per se, they only adapt the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) system to the shortcomings of it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) The clock source struct shall provide means to translate the provided counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) into a nanosecond value as an unsigned long long (unsigned 64 bit) number.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) Since this operation may be invoked very often, doing this in a strict
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) mathematical sense is not desirable: instead the number is taken as close as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) possible to a nanosecond value using only the arithmetic operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) multiply and shift, so in clocksource_cyc2ns() you find:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)   ns ~= (clocksource * mult) >> shift
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) You will find a number of helper functions in the clock source code intended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) to aid in providing these mult and shift values, such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) clocksource_khz2mult(), clocksource_hz2mult() that help determine the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) mult factor from a fixed shift, and clocksource_register_hz() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) clocksource_register_khz() which will help out assigning both shift and mult
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) factors using the frequency of the clock source as the only input.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) For real simple clock sources accessed from a single I/O memory location
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) there is nowadays even clocksource_mmio_init() which will take a memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) location, bit width, a parameter telling whether the counter in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) register counts up or down, and the timer clock rate, and then conjure all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) necessary parameters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) Since a 32-bit counter at say 100 MHz will wrap around to zero after some 43
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) seconds, the code handling the clock source will have to compensate for this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) That is the reason why the clock source struct also contains a 'mask'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) member telling how many bits of the source are valid. This way the timekeeping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) code knows when the counter will wrap around and can insert the necessary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) compensation code on both sides of the wrap point so that the system timeline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) remains monotonic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) Clock events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) Clock events are the conceptual reverse of clock sources: they take a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) desired time specification value and calculate the values to poke into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) hardware timer registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) Clock events are orthogonal to clock sources. The same hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) and register range may be used for the clock event, but it is essentially
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) a different thing. The hardware driving clock events has to be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) fire interrupts, so as to trigger events on the system timeline. On an SMP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) system, it is ideal (and customary) to have one such event driving timer per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) CPU core, so that each core can trigger events independently of any other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) core.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) You will notice that the clock event device code is based on the same basic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) idea about translating counters to nanoseconds using mult and shift
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) arithmetic, and you find the same family of helper functions again for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) assigning these values. The clock event driver does not need a 'mask'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) attribute however: the system will not try to plan events beyond the time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) horizon of the clock event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) sched_clock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) In addition to the clock sources and clock events there is a special weak
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) function in the kernel called sched_clock(). This function shall return the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) number of nanoseconds since the system was started. An architecture may or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) may not provide an implementation of sched_clock() on its own. If a local
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) implementation is not provided, the system jiffy counter will be used as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) sched_clock().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) As the name suggests, sched_clock() is used for scheduling the system,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) determining the absolute timeslice for a certain process in the CFS scheduler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) for example. It is also used for printk timestamps when you have selected to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) include time information in printk for things like bootcharts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) Compared to clock sources, sched_clock() has to be very fast: it is called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) much more often, especially by the scheduler. If you have to do trade-offs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) between accuracy compared to the clock source, you may sacrifice accuracy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) for speed in sched_clock(). It however requires some of the same basic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) characteristics as the clock source, i.e. it should be monotonic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) The sched_clock() function may wrap only on unsigned long long boundaries,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) i.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) after circa 585 years. (For most practical systems this means "never".)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) If an architecture does not provide its own implementation of this function,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) it will fall back to using jiffies, making its maximum resolution 1/HZ of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) jiffy frequency for the architecture. This will affect scheduling accuracy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) and will likely show up in system benchmarks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) The clock driving sched_clock() may stop or reset to zero during system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) suspend/sleep. This does not matter to the function it serves of scheduling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) events on the system. However it may result in interesting timestamps in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) printk().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) The sched_clock() function should be callable in any context, IRQ- and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) NMI-safe and return a sane value in any context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) Some architectures may have a limited set of time sources and lack a nice
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) counter to derive a 64-bit nanosecond value, so for example on the ARM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) architecture, special helper functions have been created to provide a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) sched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) same counter that is also used as clock source is used for this purpose.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) On SMP systems, it is crucial for performance that sched_clock() can be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) independently on each CPU without any synchronization performance hits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) Some hardware (such as the x86 TSC) will cause the sched_clock() function to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) drift between the CPUs on the system. The kernel can work around this by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) enabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) that makes sched_clock() different from the ordinary clock source.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) Delay timers (some architectures only)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) On systems with variable CPU frequency, the various kernel delay() functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) will sometimes behave strangely. Basically these delays usually use a hard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) loop to delay a certain number of jiffy fractions using a "lpj" (loops per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) jiffy) value, calibrated on boot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) Let's hope that your system is running on maximum frequency when this value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) is calibrated: as an effect when the frequency is geared down to half the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) full frequency, any delay() will be twice as long. Usually this does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) hurt, as you're commonly requesting that amount of delay *or more*. But
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) basically the semantics are quite unpredictable on such systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Enter timer-based delays. Using these, a timer read may be used instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) a hard-coded loop for providing the desired delay.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) This is done by declaring a struct delay_timer and assigning the appropriate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) function pointers and rate settings for this delay timer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) This is available on some architectures like OpenRISC or ARM.