Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  1) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  2) Scheduler Domains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  5) Each CPU has a "base" scheduling domain (struct sched_domain). The domain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  6) hierarchy is built from these base domains via the ->parent pointer. ->parent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  7) MUST be NULL terminated, and domain structures should be per-CPU as they are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  8) locklessly updated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Each scheduling domain spans a number of CPUs (stored in the ->span field).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) A domain's span MUST be a superset of it child's span (this restriction could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) be relaxed if the need arises), and a base domain for CPU i MUST span at least
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) i. The top domain for each CPU will generally span all CPUs in the system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) although strictly it doesn't have to, but this could lead to a case where some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) CPUs will never be given tasks to run unless the CPUs allowed mask is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) explicitly set. A sched domain's span means "balance process load among these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) CPUs".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) Each scheduling domain must have one or more CPU groups (struct sched_group)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) which are organised as a circular one way linked list from the ->groups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) pointer. The union of cpumasks of these groups MUST be the same as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) domain's span. The group pointed to by the ->groups pointer MUST contain the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) to which the domain belongs. Groups may be shared among CPUs as they contain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) read only data after they have been set up. The intersection of cpumasks from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) any two of these groups may be non empty. If this is the case the SD_OVERLAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) flag is set on the corresponding scheduling domain and its groups may not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) shared between CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) Balancing within a sched domain occurs between groups. That is, each group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) is treated as one entity. The load of a group is defined as the sum of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) load of each of its member CPUs, and only when the load of a group becomes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) out of balance are tasks moved between groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) In kernel/sched/core.c, trigger_load_balance() is run periodically on each CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) through scheduler_tick(). It raises a softirq after the next regularly scheduled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) rebalancing event for the current runqueue has arrived. The actual load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) in softirq context (SCHED_SOFTIRQ).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) The latter function takes two arguments: the current CPU and whether it was idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) at the time the scheduler_tick() happened and iterates over all sched domains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) our CPU is on, starting from its base domain and going up the ->parent chain.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) While doing that, it checks to see if the current domain has exhausted its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) rebalance interval. If so, it runs load_balance() on that domain. It then checks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) the parent sched_domain (if it exists), and the parent of the parent and so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) forth.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) Initially, load_balance() finds the busiest group in the current sched domain.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) that group. If it manages to find such a runqueue, it locks both our initial
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) CPU's runqueue and the newly found busiest one and starts moving tasks from it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) to our runqueue. The exact number of tasks amounts to an imbalance previously
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) computed while iterating over this sched domain's groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) Implementing sched domains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) The "base" domain will "span" the first level of the hierarchy. In the case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) of SMT, you'll span all siblings of the physical CPU, with each group being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) a single virtual CPU.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) In SMP, the parent of the base domain will span all physical CPUs in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) node. Each group being a single physical CPU. Then with NUMA, the parent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) of the SMP domain will span the entire machine, with each group having the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) cpumask of a node. Or, you could do multi-level NUMA or Opteron, for example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) might have just one domain covering its one NUMA level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) The implementor should read comments in include/linux/sched.h:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) the specifics and what to tune.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) Architectures may retain the regular override the default SD_*_INIT flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) while using the generic domain builder in kernel/sched/core.c if they wish to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) retain the traditional SMT->SMP->NUMA topology (or some subset of that). This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) can be done by #define'ing ARCH_HASH_SCHED_TUNE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Alternatively, the architecture may completely override the generic domain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) arch_init_sched_domains function. This function will attach domains to all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) CPUs using cpu_attach_domain.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) The sched-domains debugging infrastructure can be enabled by enabling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) which should catch most possible errors (described above). It also prints out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) the domain structure in a visual format.