Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Device Whitelist Controller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) 1. Description
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Implement a cgroup to track and enforce open and mknod restrictions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) on device files.  A device cgroup associates a device access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) whitelist with each cgroup.  A whitelist entry has 4 fields.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 'type' is a (all), c (char), or b (block).  'all' means it applies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) to all types and all major and minor numbers.  Major and minor are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) either an integer or * for all.  Access is a composition of r
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) (read), w (write), and m (mknod).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) The root device cgroup starts with rwm to 'all'.  A child device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) cgroup gets a copy of the parent.  Administrators can then remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) devices from the whitelist or add new entries.  A child cgroup can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) never receive a device access which is denied by its parent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 2. User Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) An entry is added using devices.allow, and removed using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) devices.deny.  For instance::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 	echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) allows cgroup 1 to read and mknod the device usually known as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) /dev/null.  Doing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 	echo a > /sys/fs/cgroup/1/devices.deny
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) will remove the default 'a *:* rwm' entry. Doing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 	echo a > /sys/fs/cgroup/1/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) will add the 'a *:* rwm' entry to the whitelist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 3. Security
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) Any task can move itself between cgroups.  This clearly won't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) suffice, but we can decide the best way to adequately restrict
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) movement as people get some experience with this.  We may just want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) to require CAP_SYS_ADMIN, which at least is a separate bit from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) CAP_MKNOD.  We may want to just refuse moving to a cgroup which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) isn't a descendant of the current one.  Or we may want to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) CAP_MAC_ADMIN, since we really are trying to lock down root.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) CAP_SYS_ADMIN is needed to modify the whitelist or move another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) task to a new cgroup.  (Again we'll probably want to change that).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) A cgroup may not be granted more permissions than the cgroup's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) parent has.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 4. Hierarchy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) device cgroups maintain hierarchy by making sure a cgroup never has more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) access permissions than its parent.  Every time an entry is written to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) a cgroup's devices.deny file, all its children will have that entry removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) from their whitelist and all the locally set whitelist entries will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) re-evaluated.  In case one of the locally set whitelist entries would provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) more access than the cgroup's parent, it'll be removed from the whitelist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) Example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)       A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)      / \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)         B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)     group        behavior	exceptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)     A            allow		"b 8:* rwm", "c 116:1 rw"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)     B            deny		"c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) If a device is denied in group A::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 	# echo "c 116:* r" > A/devices.deny
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) it'll propagate down and after revalidating B's entries, the whitelist entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) "c 116:2 rwm" will be removed::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)     group        whitelist entries                        denied devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)     A            all                                      "b 8:* rwm", "c 116:* rw"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)     B            "c 1:3 rwm", "b 3:* rwm"                 all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) In case parent's exceptions change and local exceptions are not allowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) anymore, they'll be deleted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) Notice that new whitelist entries will not be propagated::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)       A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)      / \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)         B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)     group        whitelist entries                        denied devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)     A            "c 1:3 rwm", "c 1:5 r"                   all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)     B            "c 1:3 rwm", "c 1:5 r"                   all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) when adding ``c *:3 rwm``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	# echo "c *:3 rwm" >A/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) the result::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)     group        whitelist entries                        denied devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)     A            "c *:3 rwm", "c 1:5 r"                   all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)     B            "c 1:3 rwm", "c 1:5 r"                   all the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) but now it'll be possible to add new entries to B::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 	# echo "c 2:3 rwm" >B/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 	# echo "c 50:3 r" >B/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) or even::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	# echo "c *:3 rwm" >B/devices.allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) Allowing or denying all by writing 'a' to devices.allow or devices.deny will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) not be possible once the device cgroups has children.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 4.1 Hierarchy (internal implementation)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) list of exceptions.  The internal state is controlled using the same user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) interface to preserve compatibility with the previous whitelist-only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) implementation.  Removal or addition of exceptions that will reduce the access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) to devices will be propagated down the hierarchy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) For every propagated exception, the effective rules will be re-evaluated based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) on current parent's access rules.