^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Device Whitelist Controller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) 1. Description
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Implement a cgroup to track and enforce open and mknod restrictions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) on device files. A device cgroup associates a device access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) whitelist with each cgroup. A whitelist entry has 4 fields.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) 'type' is a (all), c (char), or b (block). 'all' means it applies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) to all types and all major and minor numbers. Major and minor are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) either an integer or * for all. Access is a composition of r
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) (read), w (write), and m (mknod).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) The root device cgroup starts with rwm to 'all'. A child device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) cgroup gets a copy of the parent. Administrators can then remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) devices from the whitelist or add new entries. A child cgroup can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) never receive a device access which is denied by its parent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) 2. User Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) An entry is added using devices.allow, and removed using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) devices.deny. For instance::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) allows cgroup 1 to read and mknod the device usually known as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) /dev/null. Doing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) echo a > /sys/fs/cgroup/1/devices.deny
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) will remove the default 'a *:* rwm' entry. Doing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) echo a > /sys/fs/cgroup/1/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) will add the 'a *:* rwm' entry to the whitelist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) 3. Security
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) Any task can move itself between cgroups. This clearly won't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) suffice, but we can decide the best way to adequately restrict
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) movement as people get some experience with this. We may just want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) to require CAP_SYS_ADMIN, which at least is a separate bit from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) CAP_MKNOD. We may want to just refuse moving to a cgroup which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) isn't a descendant of the current one. Or we may want to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) CAP_MAC_ADMIN, since we really are trying to lock down root.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) CAP_SYS_ADMIN is needed to modify the whitelist or move another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) task to a new cgroup. (Again we'll probably want to change that).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) A cgroup may not be granted more permissions than the cgroup's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) parent has.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) 4. Hierarchy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) device cgroups maintain hierarchy by making sure a cgroup never has more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) access permissions than its parent. Every time an entry is written to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) a cgroup's devices.deny file, all its children will have that entry removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) from their whitelist and all the locally set whitelist entries will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) re-evaluated. In case one of the locally set whitelist entries would provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) more access than the cgroup's parent, it'll be removed from the whitelist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) / \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) group behavior exceptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) A allow "b 8:* rwm", "c 116:1 rw"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) If a device is denied in group A::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) # echo "c 116:* r" > A/devices.deny
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) it'll propagate down and after revalidating B's entries, the whitelist entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) "c 116:2 rwm" will be removed::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) group whitelist entries denied devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) A all "b 8:* rwm", "c 116:* rw"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) B "c 1:3 rwm", "b 3:* rwm" all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) In case parent's exceptions change and local exceptions are not allowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) anymore, they'll be deleted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) Notice that new whitelist entries will not be propagated::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) / \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) group whitelist entries denied devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) A "c 1:3 rwm", "c 1:5 r" all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) B "c 1:3 rwm", "c 1:5 r" all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) when adding ``c *:3 rwm``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) # echo "c *:3 rwm" >A/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) the result::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) group whitelist entries denied devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) A "c *:3 rwm", "c 1:5 r" all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) B "c 1:3 rwm", "c 1:5 r" all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) but now it'll be possible to add new entries to B::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) # echo "c 2:3 rwm" >B/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) # echo "c 50:3 r" >B/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) or even::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) # echo "c *:3 rwm" >B/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) Allowing or denying all by writing 'a' to devices.allow or devices.deny will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) not be possible once the device cgroups has children.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 4.1 Hierarchy (internal implementation)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) list of exceptions. The internal state is controlled using the same user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) interface to preserve compatibility with the previous whitelist-only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) implementation. Removal or addition of exceptions that will reduce the access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) to devices will be propagated down the hierarchy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) For every propagated exception, the effective rules will be re-evaluated based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) on current parent's access rules.