^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Process Number Controller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Abstract
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) The process number controller is used to allow a cgroup hierarchy to stop any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) new tasks from being fork()'d or clone()'d after a certain limit is reached.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) Since it is trivial to hit the task limit without hitting any kmemcg limits in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) place, PIDs are a fundamental resource. As such, PID exhaustion must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) preventable in the scope of a cgroup hierarchy by allowing resource limiting of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) the number of tasks in a cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) In order to use the `pids` controller, set the maximum number of tasks in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) pids.max (this is not available in the root cgroup for obvious reasons). The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) number of processes currently in the cgroup is given by pids.current.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Organisational operations are not blocked by cgroup policies, so it is possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) to have pids.current > pids.max. This can be done by either setting the limit to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) be smaller than pids.current, or attaching enough processes to the cgroup such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) that pids.current > pids.max. However, it is not possible to violate a cgroup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) policy through fork() or clone(). fork() and clone() will return -EAGAIN if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) creation of a new process would cause a cgroup policy to be violated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) To set a cgroup to have no limit, set pids.max to "max". This is the default for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) all new cgroups (N.B. that PID limits are hierarchical, so the most stringent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) limit in the hierarchy is followed).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) pids.current tracks all child cgroup hierarchies, so parent/pids.current is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) superset of parent/child/pids.current.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) The pids.events file contains event counters:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) - max: Number of times fork failed because limit was hit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) First, we mount the pids controller::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) # mkdir -p /sys/fs/cgroup/pids
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) # mount -t cgroup -o pids none /sys/fs/cgroup/pids
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) Then we create a hierarchy, set limits and attach processes to it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) # mkdir -p /sys/fs/cgroup/pids/parent/child
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) # echo 2 > /sys/fs/cgroup/pids/parent/pids.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) # echo $$ > /sys/fs/cgroup/pids/parent/cgroup.procs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) # cat /sys/fs/cgroup/pids/parent/pids.current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) It should be noted that attempts to overcome the set limit (2 in this case) will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) fail::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) # cat /sys/fs/cgroup/pids/parent/pids.current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) # ( /bin/echo "Here's some processes for you." | cat )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) sh: fork: Resource temporary unavailable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Even if we migrate to a child cgroup (which doesn't have a set limit), we will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) not be able to overcome the most stringent limit in the hierarchy (in this case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) parent's)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) # echo $$ > /sys/fs/cgroup/pids/parent/child/cgroup.procs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) # cat /sys/fs/cgroup/pids/parent/pids.current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) # cat /sys/fs/cgroup/pids/parent/child/pids.current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) # cat /sys/fs/cgroup/pids/parent/child/pids.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) # ( /bin/echo "Here's some processes for you." | cat )
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) sh: fork: Resource temporary unavailable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) We can set a limit that is smaller than pids.current, which will stop any new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) processes from being forked at all (note that the shell itself counts towards
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) pids.current)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) # echo 1 > /sys/fs/cgroup/pids/parent/pids.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) # /bin/echo "We can't even spawn a single process now."
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) sh: fork: Resource temporary unavailable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) # echo 0 > /sys/fs/cgroup/pids/parent/pids.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) # /bin/echo "We can't even spawn a single process now."
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) sh: fork: Resource temporary unavailable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) #