Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Real-Time group scheduling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) .. CONTENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7)    0. WARNING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8)    1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9)      1.1 The problem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10)      1.2 The solution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11)    2. The interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)      2.1 System-wide settings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)      2.2 Default behaviour
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)      2.3 Basis for grouping tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)    3. Future plans
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 0. WARNING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)  Fiddling with these settings can result in an unstable system, the knobs are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)  root only and assumes root knows what he is doing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) Most notable:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)  * very small values in sched_rt_period_us can result in an unstable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)    system when the period is smaller than either the available hrtimer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)    resolution, or the time it takes to handle the budget refresh itself.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)  * very small values in sched_rt_runtime_us can result in an unstable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)    system when the runtime is so small the system has difficulty making
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)    forward progress (NOTE: the migration thread and kstopmachine both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)    are real-time processes).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 1.1 The problem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) Realtime scheduling is all about determinism, a group has to be able to rely on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) the amount of bandwidth (eg. CPU time) being constant. In order to schedule
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) multiple groups of realtime tasks, each group must be assigned a fixed portion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) of the CPU time available.  Without a minimum guarantee a realtime group can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) obviously fall short. A fuzzy upper limit is of no use since it cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) relied upon. Which leaves us with just the single fixed portion.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 1.2 The solution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) CPU time is divided by means of specifying how much time can be spent running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) in a given period. We allocate this "run time" for each realtime group which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) the other realtime groups will not be permitted to use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) Any time not allocated to a realtime group will be used to run normal priority
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) SCHED_OTHER.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Let's consider an example: a frame fixed realtime renderer must deliver 25
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) frames a second, which yields a period of 0.04s per frame. Now say it will also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) have to play some music and respond to input, leaving it with around 80% CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) time dedicated for the graphics. We can then give this group a run time of 0.8
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) * 0.04s = 0.032s.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) This way the graphics group will have a 0.04s period with a 0.032s run time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 0.00015s. So this group can be scheduled with a period of 0.005s and a run time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) of 0.00015s.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) The remaining CPU time will be used for user input and other tasks. Because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) realtime tasks have explicitly allocated the CPU time they need to perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) their tasks, buffer underruns in the graphics or audio can be eliminated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) NOTE: the above example is not fully implemented yet. We still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) lack an EDF scheduler to make non-uniform periods usable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 2. The Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 2.1 System wide settings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) The system wide settings are configured under the /proc virtual file system:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) /proc/sys/kernel/sched_rt_period_us:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)   The scheduling period that is equivalent to 100% CPU bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) /proc/sys/kernel/sched_rt_runtime_us:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)   A global limit on how much time realtime scheduling may use.  Even without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)   CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)   processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)   available to all realtime groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)   * Time is specified in us because the interface is s32. This gives an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)     operating range from 1us to about 35 minutes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)   * sched_rt_period_us takes values from 1 to INT_MAX.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)   * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)   * A run time of -1 specifies runtime == period, ie. no limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 2.2 Default behaviour
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) The default values for sched_rt_period_us (1000000 or 1s) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) sched_rt_runtime_us (950000 or 0.95s).  This gives 0.05s to be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) realtime tasks will not lock up the machine but leave a little time to recover
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) it.  By setting runtime to -1 you'd get the old behaviour back.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) By default all bandwidth is assigned to the root group and new groups get the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) want to assign bandwidth to another group, reduce the root group's bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) and assign some or all of the difference to another group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Realtime group scheduling means you have to assign a portion of total CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) bandwidth to the group before it will accept realtime tasks. Therefore you will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) not be able to run realtime tasks as any user other than root until you have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) done that, even if the user has the rights to run processes with realtime
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) priority!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 2.3 Basis for grouping tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) CPU bandwidth to task groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) to control the CPU time reserved for each control group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) For more information on working with control groups, you should read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) Documentation/admin-guide/cgroup-v1/cgroups.rst as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) Group settings are checked against the following limits in order to keep the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) configuration schedulable:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)    \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) For now, this can be simplified to just the following (but see Future plans):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)    \Sum_{i} runtime_{i} <= global_runtime
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 3. Future plans
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) There is work in progress to make the scheduling period for each group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) ("<cgroup>/cpu.rt_period_us") configurable as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) The constraint on the period is that a subgroup must have a smaller or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) equal period to its parent. But realistically its not very useful _yet_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) as its prone to starvation without deadline scheduling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) Consider two sibling groups A and B; both have 50% bandwidth, but A's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) period is twice the length of B's.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) * group A: period=100000us, runtime=50000us
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 	- this runs for 0.05s once every 0.1s
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) * group B: period= 50000us, runtime=25000us
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 	- this runs for 0.025s twice every 0.1s (or once every 0.05 sec).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) This means that currently a while (1) loop in A will run for the full period of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) B and can starve B's tasks (assuming they are of lower priority) for a whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) full deadline scheduling to the linux kernel. Deadline scheduling the above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) groups and treating end of the period as a deadline will ensure that they both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) get their allocated time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) the biggest challenge as the current linux PI infrastructure is geared towards
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) the limited static priority levels 0-99. With deadline scheduling you need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) do deadline inheritance (since priority is inversely proportional to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) deadline delta (deadline - now)).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) This means the whole PI machinery will have to be reworked - and that is one of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) the most complex pieces of code we have.