^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) BPF_PROG_TYPE_CGROUP_SYSCTL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) provides cgroup-bpf hook for sysctl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) The hook has to be attached to a cgroup and will be called every time a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) process inside that cgroup tries to read from or write to sysctl knob in proc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) 1. Attach type
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) **************
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) 2. Context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) **********
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) ``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) BPF program::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) struct bpf_sysctl {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) __u32 write;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) __u32 file_pos;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) * ``write`` indicates whether sysctl value is being read (``0``) or written
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) (``1``). This field is read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) * ``file_pos`` indicates file position sysctl is being accessed at, read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) or written. This field is read-write. Writing to the field sets the starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) will be writing to. Writing zero to the field can be used e.g. to override
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) when it's called by user space on ``file_pos > 0``. Writing non-zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) value to the field can be used to access part of sysctl value starting from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) specified ``file_pos``. Not all sysctl support access with ``file_pos !=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) 0``, e.g. writes to numeric sysctl entries must always be at file position
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) ``0``. See also ``kernel.sysctl_writes_strict`` sysctl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) See `linux/bpf.h`_ for more details on how context field can be accessed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) 3. Return code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) **************
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) return codes:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) * ``0`` means "reject access to sysctl";
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) * ``1`` means "proceed with access".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) ``write(2)`` and ``errno`` will be set to ``EPERM``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) 4. Helpers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) **********
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) Since sysctl knob is represented by a name and a value, sysctl specific BPF
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) helpers focus on providing access to these properties:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) * ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) ``/proc/sys`` into provided by BPF program buffer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) * ``bpf_sysctl_get_current_value()`` to get string value currently held by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) sysctl into provided by BPF program buffer. This helper is available on both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) ``read(2)`` from and ``write(2)`` to sysctl;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) * ``bpf_sysctl_get_new_value()`` to get new string value currently being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) written to sysctl before actual write happens. This helper can be used only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) on ``ctx->write == 1``;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) * ``bpf_sysctl_set_new_value()`` to override new string value currently being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) written to sysctl before actual write happens. Sysctl value will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) overridden starting from the current ``ctx->file_pos``. If the whole value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) has to be overridden BPF program can set ``file_pos`` to zero before calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) to the helper. This helper can be used only on ``ctx->write == 1``. New
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) string value set by the helper is treated and verified by kernel same way as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) an equivalent string passed by user space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) BPF program sees sysctl value same way as user space does in proc filesystem,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) i.e. as a string. Since many sysctl values represent an integer or a vector
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) of integers, the following helpers can be used to get numeric value from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) string:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) * ``bpf_strtol()`` to convert initial part of the string to long integer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) similar to user space `strtol(3)`_;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) * ``bpf_strtoul()`` to convert initial part of the string to unsigned long
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) integer similar to user space `strtoul(3)`_;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) See `linux/bpf.h`_ for more details on helpers described here.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) 5. Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) ***********
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) See `test_sysctl_prog.c`_ for an example of BPF program in C that access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) sysctl name and value, parses string value to get vector of integers and uses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) the result to make decision whether to allow or deny access to sysctl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 6. Notes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ********
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) ``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) environment, for example to monitor sysctl usage or catch unreasonable values
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) an application, running as root in a separate cgroup, is trying to set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) may return results different from that at `sys_open` time, i.e. process that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) opened sysctl file in proc filesystem may differ from process that is trying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) to read from / write to it and two such processes may run in different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) security mechanism to limit sysctl usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) As with any cgroup-bpf program additional care should be taken if an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) application running as root in a cgroup should not be allowed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) detach/replace BPF program attached by administrator.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) .. Links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) .. _linux/bpf.h: ../../include/uapi/linux/bpf.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) .. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) .. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) .. _test_sysctl_prog.c:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c