Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Cgroup Freezer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) The cgroup freezer is useful to batch job management system which start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) and stop sets of tasks in order to schedule the resources of a machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) according to the desires of a system administrator. This sort of program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) is often used on HPC clusters to schedule access to the cluster as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) whole. The cgroup freezer uses cgroups to describe the set of tasks to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) be started/stopped by the batch job management system. It also provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) a means to start and stop the tasks composing the job.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) The cgroup freezer will also be useful for checkpointing running groups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) of tasks. The freezer allows the checkpoint code to obtain a consistent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) image of the tasks by attempting to force the tasks in a cgroup into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) quiescent state. Once the tasks are quiescent another task can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) walk /proc or invoke a kernel interface to gather information about the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) quiesced tasks. Checkpointed tasks can be restarted later should a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) recoverable error occur. This also allows the checkpointed tasks to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) migrated between nodes in a cluster by copying the gathered information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) to another node and restarting the tasks there.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) and resuming tasks in userspace. Both of these signals are observable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) blocked, or ignored it can be seen by waiting or ptracing parent tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) SIGCONT is especially unsuitable since it can be caught by the task. Any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) programs designed to watch for SIGSTOP and SIGCONT could be broken by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) demonstrate this problem using nested bash shells::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 	$ echo $$
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 	16644
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 	$ bash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 	$ echo $$
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 	16690
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 	From a second, unrelated bash shell:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 	$ kill -SIGSTOP 16690
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 	$ kill -SIGCONT 16690
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 	<at this point 16690 exits and causes 16644 to exit too>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) This happens because bash can observe both signals and choose how it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) responds to them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) Another example of a program which catches and responds to these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) signals is gdb. In fact any program designed to use ptrace is likely to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) have a problem with this method of stopping and resuming tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) In contrast, the cgroup freezer uses the kernel freezer code to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) prevent the freeze/unfreeze cycle from becoming visible to the tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) being frozen. This allows the bash example above and gdb to run as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) expected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) The cgroup freezer is hierarchical. Freezing a cgroup freezes all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) tasks belonging to the cgroup and all its descendant cgroups. Each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) cgroup has its own state (self-state) and the state inherited from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) parent (parent-state). Iff both states are THAWED, the cgroup is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) THAWED.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) The following cgroupfs files are created by cgroup freezer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) * freezer.state: Read-write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)   When read, returns the effective state of the cgroup - "THAWED",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)   "FREEZING" or "FROZEN". This is the combined self and parent-states.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)   If any is freezing, the cgroup is freezing (FREEZING or FROZEN).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)   FREEZING cgroup transitions into FROZEN state when all tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)   belonging to the cgroup and its descendants become frozen. Note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)   a cgroup reverts to FREEZING from FROZEN after a new task is added
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)   to the cgroup or one of its descendant cgroups until the new task is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)   frozen.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)   When written, sets the self-state of the cgroup. Two values are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)   allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)   if not already freezing, enters FREEZING state along with all its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)   descendant cgroups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)   If THAWED is written, the self-state of the cgroup is changed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)   THAWED.  Note that the effective state may not change to THAWED if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   the parent-state is still freezing. If a cgroup's effective state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)   becomes THAWED, all its descendants which are freezing because of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)   the cgroup also leave the freezing state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) * freezer.self_freezing: Read only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)   Shows the self-state. 0 if the self-state is THAWED; otherwise, 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)   This value is 1 iff the last write to freezer.state was "FROZEN".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) * freezer.parent_freezing: Read only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)   Shows the parent-state.  0 if none of the cgroup's ancestors is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)   frozen; otherwise, 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) The root cgroup is non-freezable and the above interface files don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) exist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) * Examples of usage::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)    # mkdir /sys/fs/cgroup/freezer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)    # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)    # mkdir /sys/fs/cgroup/freezer/0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)    # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) to get status of the freezer subsystem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)    # cat /sys/fs/cgroup/freezer/0/freezer.state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)    THAWED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) to freeze all tasks in the container::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)    # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)    # cat /sys/fs/cgroup/freezer/0/freezer.state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)    FREEZING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)    # cat /sys/fs/cgroup/freezer/0/freezer.state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)    FROZEN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) to unfreeze all tasks in the container::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)    # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)    # cat /sys/fs/cgroup/freezer/0/freezer.state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)    THAWED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) This is the basic mechanism which should do the right thing for user space task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) in a simple scenario.