^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) No New Privileges Flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) The execve system call can grant a newly-started program privileges that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) its parent did not have. The most obvious examples are setuid/setgid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) programs and file capabilities. To prevent the parent program from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) gaining these privileges as well, the kernel and user code must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) careful to prevent the parent from doing anything that could subvert the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) child. For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) - The dynamic loader handles ``LD_*`` environment variables differently if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) a program is setuid.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) - chroot is disallowed to unprivileged processes, since it would allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ``/etc/passwd`` to be replaced from the point of view of a process that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) inherited chroot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) - The exec code has special handling for ptrace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) These are all ad-hoc fixes. The ``no_new_privs`` bit (since Linux 3.5) is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) new, generic mechanism to make it safe for a process to modify its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) execution environment in a manner that persists across execve. Any task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) can set ``no_new_privs``. Once the bit is set, it is inherited across fork,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) clone, and execve and cannot be unset. With ``no_new_privs`` set, ``execve()``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) promises not to grant the privilege to do anything that could not have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) been done without the execve call. For example, the setuid and setgid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) bits will no longer change the uid or gid; file capabilities will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) add to the permitted set, and LSMs will not relax constraints after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) execve.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) To set ``no_new_privs``, use::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Be careful, though: LSMs might also not tighten constraints on exec
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) in ``no_new_privs`` mode. (This means that setting up a general-purpose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) service launcher to set ``no_new_privs`` before execing daemons may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) interfere with LSM-based sandboxing.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Note that ``no_new_privs`` does not prevent privilege changes that do not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) involve ``execve()``. An appropriately privileged task can still call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) ``setuid(2)`` and receive SCM_RIGHTS datagrams.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) There are two main use cases for ``no_new_privs`` so far:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) - Filters installed for the seccomp mode 2 sandbox persist across
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) execve and can change the behavior of newly-executed programs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) Unprivileged users are therefore only allowed to install such filters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) if ``no_new_privs`` is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) - By itself, ``no_new_privs`` can be used to reduce the attack surface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) available to an unprivileged user. If everything running with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) given uid has ``no_new_privs`` set, then that uid will be unable to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) escalate its privileges by directly attacking setuid, setgid, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) fcap-using binaries; it will need to compromise something without the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) ``no_new_privs`` bit set first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) In the future, other potentially dangerous kernel features could become
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) available to unprivileged tasks if ``no_new_privs`` is set. In principle,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) several options to ``unshare(2)`` and ``clone(2)`` would be safe when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) ``no_new_privs`` is set, and ``no_new_privs`` + ``chroot`` is considerable less
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) dangerous than chroot by itself.