Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Credentials in Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) By: David Howells <dhowells@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) .. contents:: :local:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) There are several parts to the security check performed by Linux when one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) object acts upon another:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)  1. Objects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17)      Objects are things in the system that may be acted upon directly by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18)      userspace programs.  Linux has a variety of actionable objects, including:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 	- Tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 	- Files/inodes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 	- Sockets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 	- Message queues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 	- Shared memory segments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 	- Semaphores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 	- Keys
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)      As a part of the description of all these objects there is a set of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)      credentials.  What's in the set depends on the type of object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)  2. Object ownership.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)      Amongst the credentials of most objects, there will be a subset that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)      indicates the ownership of that object.  This is used for resource
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)      accounting and limitation (disk quotas and task rlimits for example).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37)      In a standard UNIX filesystem, for instance, this will be defined by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)      UID marked on the inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)  3. The objective context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42)      Also amongst the credentials of those objects, there will be a subset that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43)      indicates the 'objective context' of that object.  This may or may not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44)      the same set as in (2) - in standard UNIX files, for instance, this is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45)      defined by the UID and the GID marked on the inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)      The objective context is used as part of the security calculation that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)      carried out when an object is acted upon.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)  4. Subjects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52)      A subject is an object that is acting upon another object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)      Most of the objects in the system are inactive: they don't act on other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)      objects within the system.  Processes/tasks are the obvious exception:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)      they do stuff; they access and manipulate things.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)      Objects other than tasks may under some circumstances also be subjects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)      For instance an open file may send SIGIO to a task using the UID and EUID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)      given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)      the file struct will have a subjective context too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)  5. The subjective context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)      A subject has an additional interpretation of its credentials.  A subset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)      of its credentials forms the 'subjective context'.  The subjective context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)      is used as part of the security calculation that is carried out when a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)      subject acts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)      A Linux task, for example, has the FSUID, FSGID and the supplementary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)      group list for when it is acting upon a file - which are quite separate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)      from the real UID and GID that normally form the objective context of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)      task.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)  6. Actions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)      Linux has a number of actions available that a subject may perform upon an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)      object.  The set of actions available depends on the nature of the subject
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)      and the object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)      Actions include reading, writing, creating and deleting files; forking or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)      signalling and tracing tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)  7. Rules, access control lists and security calculations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)      When a subject acts upon an object, a security calculation is made.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)      involves taking the subjective context, the objective context and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)      action, and searching one or more sets of rules to see whether the subject
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)      is granted or denied permission to act in the desired manner on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)      object, given those contexts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)      There are two main sources of rules:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)      a. Discretionary access control (DAC):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 	 Sometimes the object will include sets of rules as part of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 	 description.  This is an 'Access Control List' or 'ACL'.  A Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 	 file may supply more than one ACL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 	 A traditional UNIX file, for example, includes a permissions mask that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 	 is an abbreviated ACL with three fixed classes of subject ('user',
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	 'group' and 'other'), each of which may be granted certain privileges
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	 ('read', 'write' and 'execute' - whatever those map to for the object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 	 in question).  UNIX file permissions do not allow the arbitrary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 	 specification of subjects, however, and so are of limited use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 	 A Linux file might also sport a POSIX ACL.  This is a list of rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 	 that grants various permissions to arbitrary subjects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)      b. Mandatory access control (MAC):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 	 The system as a whole may have one or more sets of rules that get
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 	 applied to all subjects and objects, regardless of their source.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 	 SELinux and Smack are examples of this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 	 In the case of SELinux and Smack, each object is given a label as part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 	 of its credentials.  When an action is requested, they take the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	 subject label, the object label and the action and look for a rule
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 	 that says that this action is either granted or denied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) Types of Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) The Linux kernel supports the following types of credentials:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)  1. Traditional UNIX credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 	- Real User ID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 	- Real Group ID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)      The UID and GID are carried by most, if not all, Linux objects, even if in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)      some cases it has to be invented (FAT or CIFS files for example, which are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)      derived from Windows).  These (mostly) define the objective context of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)      that object, with tasks being slightly different in some cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 	- Effective, Saved and FS User ID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 	- Effective, Saved and FS Group ID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 	- Supplementary groups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)      These are additional credentials used by tasks only.  Usually, an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)      EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)      will be used as the objective.  For tasks, it should be noted that this is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)      not always true.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)  2. Capabilities.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 	- Set of permitted capabilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 	- Set of inheritable capabilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 	- Set of effective capabilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 	- Capability bounding set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)      These are only carried by tasks.  They indicate superior capabilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)      granted piecemeal to a task that an ordinary task wouldn't otherwise have.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)      These are manipulated implicitly by changes to the traditional UNIX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)      credentials, but can also be manipulated directly by the ``capset()``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)      system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)      The permitted capabilities are those caps that the process might grant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)      itself to its effective or permitted sets through ``capset()``.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)      inheritable set might also be so constrained.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)      The effective capabilities are the ones that a task is actually allowed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)      make use of itself.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)      The inheritable capabilities are the ones that may get passed across
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)      ``execve()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)      The bounding set limits the capabilities that may be inherited across
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)      ``execve()``, especially when a binary is executed that will execute as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)      UID 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)  3. Secure management flags (securebits).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)      These are only carried by tasks.  These govern the way the above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)      credentials are manipulated and inherited over certain operations such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)      execve().  They aren't used directly as objective or subjective
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)      credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)  4. Keys and keyrings.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)      These are only carried by tasks.  They carry and cache security tokens
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)      that don't fit into the other standard UNIX credentials.  They are for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)      making such things as network filesystem keys available to the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)      accesses performed by processes, without the necessity of ordinary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)      programs having to know about security details involved.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)      Keyrings are a special type of key.  They carry sets of other keys and can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)      be searched for the desired key.  Each process may subscribe to a number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)      of keyrings:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 	Per-thread keying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 	Per-process keyring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 	Per-session keyring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)      When a process accesses a key, if not already present, it will normally be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)      cached on one of these keyrings for future accesses to find.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)      For more information on using keys, see ``Documentation/security/keys/*``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)  5. LSM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)      The Linux Security Module allows extra controls to be placed over the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)      operations that a task may do.  Currently Linux supports several LSM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)      options.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)      Some work by labelling the objects in a system and then applying sets of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)      rules (policies) that say what operations a task with one label may do to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)      an object with another label.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)  6. AF_KEY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)      This is a socket-based approach to credential management for networking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)      stacks [RFC 2367].  It isn't discussed by this document as it doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)      interact directly with task and file credentials; rather it keeps system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)      level credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) When a file is opened, part of the opening task's subjective context is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) recorded in the file struct created.  This allows operations using that file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) struct to use those credentials instead of the subjective context of the task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) that issued the operation.  An example of this would be a file opened on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) network filesystem where the credentials of the opened file should be presented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) to the server, regardless of who is actually doing a read or a write upon it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) File Markings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) Files on disk or obtained over the network may have annotations that form the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) objective security context of that file.  Depending on the type of filesystem,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) this may include one or more of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)  * UNIX UID, GID, mode;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)  * Windows user ID;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)  * Access control list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)  * LSM security label;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)  * UNIX exec privilege escalation bits (SUID/SGID);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)  * File capabilities exec privilege escalation bits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) These are compared to the task's subjective security context, and certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) operations allowed or disallowed as a result.  In the case of execve(), the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) privilege escalation bits come into play, and may allow the resulting process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) extra privileges, based on the annotations on the executable file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Task Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) In Linux, all of a task's credentials are held in (uid, gid) or through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) (groups, keys, LSM security) a refcounted structure of type 'struct cred'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) Each task points to its credentials by a pointer called 'cred' in its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) task_struct.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) Once a set of credentials has been prepared and committed, it may not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) changed, barring the following exceptions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)  1. its reference count may be changed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)  2. the reference count on the group_info struct it points to may be changed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)  3. the reference count on the security data it points to may be changed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)  4. the reference count on any keyrings it points to may be changed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)  5. any keyrings it points to may be revoked, expired or have their security
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)     attributes changed; and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)  6. the contents of any keyrings to which it points may be changed (the whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)     point of keyrings being a shared set of credentials, modifiable by anyone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)     with appropriate access).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) To alter anything in the cred struct, the copy-and-replace principle must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) adhered to.  First take a copy, then alter the copy and then use RCU to change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) the task pointer to make it point to the new copy.  There are wrappers to aid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) with this (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) A task may only alter its _own_ credentials; it is no longer permitted for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) task to alter another's credentials.  This means the ``capset()`` system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) is no longer permitted to take any PID other than the one of the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) longer permit attachment to process-specific keyrings in the requesting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) process as the instantiating process may need to create them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) Immutable Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) Once a set of credentials has been made public (by calling ``commit_creds()``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) for example), it must be considered immutable, barring two exceptions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)  1. The reference count may be altered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)  2. While the keyring subscriptions of a set of credentials may not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)     changed, the keyrings subscribed to may have their contents altered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) To catch accidental credential alteration at compile time, struct task_struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) has _const_ pointers to its credential sets, as does struct file.  Furthermore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) pointers, thus rendering casts unnecessary, but require to temporarily ditch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) the const qualification to be able to alter the reference count.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) Accessing Task Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) A task being able to alter only its own credentials permits the current process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) to read or replace its own credentials without the need for any form of locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) -- which simplifies things greatly.  It can just call::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) 	const struct cred *current_cred()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) to get a pointer to its credentials structure, and it doesn't have to release
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) it afterwards.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) There are convenience wrappers for retrieving specific aspects of a task's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) credentials (the value is simply returned in each case)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 	uid_t current_uid(void)		Current's real UID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) 	gid_t current_gid(void)		Current's real GID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) 	uid_t current_euid(void)	Current's effective UID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) 	gid_t current_egid(void)	Current's effective GID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) 	uid_t current_fsuid(void)	Current's file access UID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 	gid_t current_fsgid(void)	Current's file access GID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) 	kernel_cap_t current_cap(void)	Current's effective capabilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 	struct user_struct *current_user(void)  Current's user account
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) There are also convenience wrappers for retrieving specific associated pairs of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) a task's credentials::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 	void current_uid_gid(uid_t *, gid_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) 	void current_euid_egid(uid_t *, gid_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) 	void current_fsuid_fsgid(uid_t *, gid_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) which return these pairs of values through their arguments after retrieving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) them from the current task's credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) In addition, there is a function for obtaining a reference on the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) process's current set of credentials::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) 	const struct cred *get_current_cred(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) and functions for getting references to one of the credentials that don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) actually live in struct cred::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) 	struct user_struct *get_current_user(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) 	struct group_info *get_current_groups(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) which get references to the current process's user accounting structure and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) supplementary groups list respectively.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) Once a reference has been obtained, it must be released with ``put_cred()``,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) ``free_uid()`` or ``put_group_info()`` as appropriate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) Accessing Another Task's Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) ------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) While a task may access its own credentials without the need for locking, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) same is not true of a task wanting to access another task's credentials.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) must use the RCU read lock and ``rcu_dereference()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) The ``rcu_dereference()`` is wrapped by::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 	const struct cred *__task_cred(struct task_struct *task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) This should be used inside the RCU read lock, as in the following example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) 	void foo(struct task_struct *t, struct foo_data *f)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 		const struct cred *tcred;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) 		...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) 		rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 		tcred = __task_cred(t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 		f->uid = tcred->uid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) 		f->gid = tcred->gid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 		f->groups = get_group_info(tcred->groups);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) 		rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 		...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) Should it be necessary to hold another task's credentials for a long period of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) time, and possibly to sleep while doing so, then the caller should get a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) reference on them using::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) 	const struct cred *get_task_cred(struct task_struct *task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) This does all the RCU magic inside of it.  The caller must call put_cred() on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) the credentials so obtained when they're finished with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)    The result of ``__task_cred()`` should not be passed directly to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)    ``get_cred()`` as this may race with ``commit_cred()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) There are a couple of convenience functions to access bits of another task's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) credentials, hiding the RCU magic from the caller::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) 	uid_t task_uid(task)		Task's real UID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) 	uid_t task_euid(task)		Task's effective UID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) If the caller is holding the RCU read lock at the time anyway, then::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) 	__task_cred(task)->uid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) 	__task_cred(task)->euid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) should be used instead.  Similarly, if multiple aspects of a task's credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) the result stored in a temporary pointer and then the credential aspects called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) from that before dropping the lock.  This prevents the potentially expensive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) RCU magic from being invoked multiple times.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) Should some other single aspect of another task's credentials need to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) accessed, then this can be used::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) 	task_cred_xxx(task, member)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) where 'member' is a non-pointer member of the cred struct.  For instance::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) 	uid_t task_cred_xxx(task, suid);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) magic.  This may not be used for pointer members as what they point to may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) disappear the moment the RCU read lock is dropped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) Altering Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) As previously mentioned, a task may only alter its own credentials, and may not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) alter those of another task.  This means that it doesn't need to use any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) locking to alter its own credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) To alter the current process's credentials, a function should first prepare a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) new set of credentials by calling::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) 	struct cred *prepare_creds(void);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) this locks current->cred_replace_mutex and then allocates and constructs a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) duplicate of the current process's credentials, returning with the mutex still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) held if successful.  It returns NULL if not successful (out of memory).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) The mutex prevents ``ptrace()`` from altering the ptrace state of a process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) while security checks on credentials construction and changing is taking place
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) as the ptrace state may alter the outcome, particularly in the case of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) ``execve()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) The new credentials set should be altered appropriately, and any security
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) checks and hooks done.  Both the current and the proposed sets of credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) are available for this purpose as current_cred() will return the current set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) still at this point.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) When replacing the group list, the new list must be sorted before it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) is added to the credential, as a binary search is used to test for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) membership.  In practice, this means groups_sort() should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) called before set_groups() or set_current_groups().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) groups_sort() must not be called on a ``struct group_list`` which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) is shared as it may permute elements as part of the sorting process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) even if the array is already sorted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) When the credential set is ready, it should be committed to the current process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) by calling::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) 	int commit_creds(struct cred *new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) This will alter various aspects of the credentials and the process, giving the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) actually commit the new credentials to ``current->cred``, it will release
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) ``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) will notify the scheduler and others of the changes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) This function is guaranteed to return 0, so that it can be tail-called at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) end of such functions as ``sys_setresuid()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) Note that this function consumes the caller's reference to the new credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) Furthermore, once this function has been called on a new set of credentials,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) those credentials may _not_ be changed further.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) Should the security checks fail or some other error occur after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) ``prepare_creds()`` has been called, then the following function should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) invoked::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) 	void abort_creds(struct cred *new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) This releases the lock on ``current->cred_replace_mutex`` that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) ``prepare_creds()`` got and then releases the new credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) A typical credentials alteration function would look something like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) 	int alter_suid(uid_t suid)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) 		struct cred *new;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) 		int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) 		new = prepare_creds();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) 		if (!new)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) 			return -ENOMEM;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) 		new->suid = suid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) 		ret = security_alter_suid(new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) 		if (ret < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) 			abort_creds(new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) 			return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) 		return commit_creds(new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) Managing Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) There are some functions to help manage credentials:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519)  - ``void put_cred(const struct cred *cred);``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521)      This releases a reference to the given set of credentials.  If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522)      reference count reaches zero, the credentials will be scheduled for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)      destruction by the RCU system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)  - ``const struct cred *get_cred(const struct cred *cred);``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527)      This gets a reference on a live set of credentials, returning a pointer to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528)      that set of credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530)  - ``struct cred *get_new_cred(struct cred *cred);``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532)      This gets a reference on a set of credentials that is under construction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533)      and is thus still mutable, returning a pointer to that set of credentials.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) Open File Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) When a new file is opened, a reference is obtained on the opening task's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) credentials and this is attached to the file struct as ``f_cred`` in place of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) ``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) ``file->f_gid`` should now access ``file->f_cred->fsuid`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) ``file->f_cred->fsgid``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) It is safe to access ``f_cred`` without the use of RCU or locking because the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) pointer will not change over the lifetime of the file struct, and nor will the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) contents of the cred struct pointed to, barring the exceptions listed above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) (see the Task Credentials section).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) To avoid "confused deputy" privilege escalation attacks, access control checks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) during subsequent operations on an opened file should use these credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) instead of "current"'s credentials, as the file may have been passed to a more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) privileged process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) Overriding the VFS's Use of Credentials
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) =======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) Under some circumstances it is desirable to override the credentials used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) different set of credentials.  This is done in the following places:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562)  * ``sys_faccessat()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563)  * ``do_coredump()``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564)  * nfs4recover.c.