^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Documentation for /proc/sys/fs/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) kernel version 2.2.10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) For general info and legal blurb, please look in intro.rst.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) ------------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) This file contains documentation for the sysctl files in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) /proc/sys/fs/ and is valid for Linux kernel version 2.2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) The files in this directory can be used to tune and monitor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) miscellaneous and general things in the operation of the Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) kernel. Since some of the files _can_ be used to screw up your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) system, it is advisable to read both documentation and source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) before actually making adjustments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) 1. /proc/sys/fs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) Currently, these files are in /proc/sys/fs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) - aio-max-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) - aio-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) - dentry-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) - dquot-max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) - dquot-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) - file-max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) - file-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) - inode-max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) - inode-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) - inode-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) - nr_open
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) - overflowuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) - overflowgid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) - pipe-user-pages-hard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) - pipe-user-pages-soft
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) - protected_fifos
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) - protected_hardlinks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) - protected_regular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) - protected_symlinks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) - suid_dumpable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) - super-max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) - super-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) aio-nr & aio-max-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) aio-nr is the running total of the number of events specified on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) io_setup system call for all currently active aio contexts. If aio-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) reaches aio-max-nr then io_setup will fail with EAGAIN. Note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) raising aio-max-nr does not result in the pre-allocation or re-sizing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) of any kernel data structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) dentry-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) From linux/include/linux/dcache.h::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) struct dentry_stat_t dentry_stat {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) int nr_dentry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) int nr_unused;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) int age_limit; /* age in seconds */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) int want_pages; /* pages requested by system */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) int nr_negative; /* # of unused negative dentries */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) int dummy; /* Reserved for future use */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Dentries are dynamically allocated and deallocated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) nr_dentry shows the total number of dentries allocated (active
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) + unused). nr_unused shows the number of dentries that are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) actively used, but are saved in the LRU list for future reuse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) Age_limit is the age in seconds after which dcache entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) can be reclaimed when memory is short and want_pages is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) nonzero when shrink_dcache_pages() has been called and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) dcache isn't pruned yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) nr_negative shows the number of unused dentries that are also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) negative dentries which do not map to any files. Instead,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) they help speeding up rejection of non-existing files provided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) by the users.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) dquot-max & dquot-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) The file dquot-max shows the maximum number of cached disk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) quota entries.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) The file dquot-nr shows the number of allocated disk quota
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) entries and the number of free disk quota entries.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) If the number of free cached disk quotas is very low and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) you have some awesome number of simultaneous system users,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) you might want to raise the limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) file-max & file-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) The value in file-max denotes the maximum number of file-
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) handles that the Linux kernel will allocate. When you get lots
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) of error messages about running out of file handles, you might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) want to increase this limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Historically,the kernel was able to allocate file handles
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) dynamically, but not to free them again. The three values in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) file-nr denote the number of allocated file handles, the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) of allocated but unused file handles, and the maximum number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) file handles. Linux 2.6 always reports 0 as the number of free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) file handles -- this is not an error, it just means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) number of allocated file handles exactly matches the number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) used file handles.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) Attempts to allocate more file descriptors than file-max are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) reported with printk, look for "VFS: file-max limit <number>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) reached".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) nr_open
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) This denotes the maximum number of file-handles a process can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) allocate. Default value is 1024*1024 (1048576) which should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) enough for most machines. Actual limit depends on RLIMIT_NOFILE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) resource limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) inode-max, inode-nr & inode-state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) As with file handles, the kernel allocates the inode structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) dynamically, but can't free them yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) The value in inode-max denotes the maximum number of inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) handlers. This value should be 3-4 times larger than the value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) in file-max, since stdin, stdout and network sockets also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) need an inode struct to handle them. When you regularly run
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) out of inodes, you need to increase this value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) The file inode-nr contains the first two items from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) inode-state, so we'll skip to that file...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) Inode-state contains three actual numbers and four dummies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) The actual numbers are, in order of appearance, nr_inodes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) nr_free_inodes and preshrink.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) Nr_inodes stands for the number of inodes the system has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) allocated, this can be slightly more than inode-max because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) Linux allocates them one pageful at a time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Nr_free_inodes represents the number of free inodes (?) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) preshrink is nonzero when the nr_inodes > inode-max and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) system needs to prune the inode list instead of allocating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) more.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) overflowgid & overflowuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) Some filesystems only support 16-bit UIDs and GIDs, although in Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) UIDs and GIDs are 32 bits. When one of these filesystems is mounted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) with writes enabled, any UID or GID that would exceed 65535 is translated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) to a fixed value before being written to disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) These sysctls allow you to change the value of the fixed UID and GID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) The default is 65534.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) pipe-user-pages-hard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Maximum total number of pages a non-privileged user may allocate for pipes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) Once this limit is reached, no new pipes may be allocated until usage goes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) below the limit again. When set to 0, no limit is applied, which is the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) setting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) pipe-user-pages-soft
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) Maximum total number of pages a non-privileged user may allocate for pipes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) before the pipe size gets limited to a single page. Once this limit is reached,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) new pipes will be limited to a single page in size for this user in order to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) limit total memory usage, and trying to increase them using fcntl() will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) denied until usage goes below the limit again. The default value allows to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) allocate up to 1024 pipes at their default size. When set to 0, no limit is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) applied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) protected_fifos
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) The intent of this protection is to avoid unintentional writes to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) an attacker-controlled FIFO, where a program expected to create a regular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) When set to "0", writing to FIFOs is unrestricted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) When set to "1" don't allow O_CREAT open on FIFOs that we don't own
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) in world writable sticky directories, unless they are owned by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) owner of the directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) When set to "2" it also applies to group writable sticky directories.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) This protection is based on the restrictions in Openwall.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) protected_hardlinks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) A long-standing class of security issues is the hardlink-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) time-of-check-time-of-use race, most commonly seen in world-writable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) directories like /tmp. The common method of exploitation of this flaw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) is to cross privilege boundaries when following a given hardlink (i.e. a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) root process follows a hardlink created by another user). Additionally,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) on systems without separated partitions, this stops unauthorized users
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) from "pinning" vulnerable setuid/setgid files against being upgraded by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) the administrator, or linking to special files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) When set to "0", hardlink creation behavior is unrestricted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) When set to "1" hardlinks cannot be created by users if they do not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) already own the source file, or do not have read/write access to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) This protection is based on the restrictions in Openwall and grsecurity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) protected_regular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) This protection is similar to protected_fifos, but it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) avoids writes to an attacker-controlled regular file, where a program
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) expected to create one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) When set to "0", writing to regular files is unrestricted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) When set to "1" don't allow O_CREAT open on regular files that we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) don't own in world writable sticky directories, unless they are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) owned by the owner of the directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) When set to "2" it also applies to group writable sticky directories.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) protected_symlinks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) A long-standing class of security issues is the symlink-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) time-of-check-time-of-use race, most commonly seen in world-writable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) directories like /tmp. The common method of exploitation of this flaw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) is to cross privilege boundaries when following a given symlink (i.e. a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) root process follows a symlink belonging to another user). For a likely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) incomplete list of hundreds of examples across the years, please see:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) When set to "0", symlink following behavior is unrestricted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) When set to "1" symlinks are permitted to be followed only when outside
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) a sticky world-writable directory, or when the uid of the symlink and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) follower match, or when the directory owner matches the symlink's owner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) This protection is based on the restrictions in Openwall and grsecurity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) suid_dumpable:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) This value can be used to query and set the core dump mode for setuid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) or otherwise protected/tainted binaries. The modes are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) = ========== ===============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 0 (default) traditional behaviour. Any process which has changed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) privilege levels or is execute only will not be dumped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 1 (debug) all processes dump core when possible. The core dump is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) owned by the current user and no security is applied. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) intended for system debugging situations only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) Ptrace is unchecked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) This is insecure as it allows regular users to examine the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) memory contents of privileged processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 2 (suidsafe) any binary which normally would not be dumped is dumped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) anyway, but only if the "core_pattern" kernel sysctl is set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) either a pipe handler or a fully qualified path. (For more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) details on this limitation, see CVE-2006-2451.) This mode is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) appropriate when administrators are attempting to debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) problems in a normal environment, and either have a core dump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) pipe handler that knows to treat privileged core dumps with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) care, or specific directory defined for catching core dumps.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) If a core dump happens without a pipe handler or fully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) qualified path, a message will be emitted to syslog warning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) about the lack of a correct setting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) = ========== ===============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) super-max & super-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) These numbers control the maximum number of superblocks, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) thus the maximum number of mounted filesystems the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) can have. You only need to increase super-max if you need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) mount more filesystems than the current value in super-max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) allows you to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) aio-nr & aio-max-nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) aio-nr shows the current system-wide number of asynchronous io
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) requests. aio-max-nr allows you to change the maximum value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) aio-nr can grow to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) mount-max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) This denotes the maximum number of mounts that may exist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) in a mount namespace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 2. /proc/sys/fs/binfmt_misc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) Documentation for the files in /proc/sys/fs/binfmt_misc is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) in Documentation/admin-guide/binfmt-misc.rst.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 3. /proc/sys/fs/mqueue - POSIX message queues filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) ========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) The "mqueue" filesystem provides the necessary kernel features to enable the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) creation of a user space library that implements the POSIX message queues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) API (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) Interfaces specification.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) The "mqueue" filesystem contains values for determining/setting the amount of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) resources used by the file system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) /proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) maximum number of message queues allowed on the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) /proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) maximum number of messages in a queue value. In fact it is the limiting value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) for another (user) limit which is set in mq_open invocation. This attribute of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) a queue must be less or equal then msg_max.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) /proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) maximum message size value (it is every message queue's attribute set during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) its creation).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) /proc/sys/fs/mqueue/msg_default is a read/write file for setting/getting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) default number of messages in a queue value if attr parameter of mq_open(2) is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) NULL. If it exceed msg_max, the default value is initialized msg_max.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) /proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) the default message size value if attr parameter of mq_open(2) is NULL. If it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) exceed msgsize_max, the default value is initialized msgsize_max.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 4. /proc/sys/fs/epoll - Configuration options for the epoll interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) =====================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) This directory contains configuration options for the epoll(7) interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) max_user_watches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) Every epoll file descriptor can store a number of files to be monitored
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) for event readiness. Each one of these monitored files constitutes a "watch".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) This configuration option sets the maximum number of "watches" that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) allowed for each user.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) on a 64bit one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) The current default value for max_user_watches is the 1/32 of the available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) low memory, divided for the "watch" cost in bytes.