Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) FUSE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) Userspace filesystem:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11)   A filesystem in which data and metadata are provided by an ordinary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12)   userspace process.  The filesystem can be accessed normally through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)   the kernel interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) Filesystem daemon:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)   The process(es) providing the data and metadata of the filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Non-privileged mount (or user mount):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19)   A userspace filesystem mounted by a non-privileged (non-root) user.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20)   The filesystem daemon is running with the privileges of the mounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)   user.  NOTE: this is not the same as mounts allowed with the "user"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)   option in /etc/fstab, which is not discussed here.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) Filesystem connection:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25)   A connection between the filesystem daemon and the kernel.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)   connection exists until either the daemon dies, or the filesystem is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)   umounted.  Note that detaching (or lazy umounting) the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)   does *not* break the connection, in this case it will exist until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)   the last reference to the filesystem is released.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) Mount owner:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)   The user who does the mounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) User:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)   The user who is performing filesystem operations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) What is FUSE?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) FUSE is a userspace filesystem framework.  It consists of a kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) module (fuse.ko), a userspace library (libfuse.*) and a mount utility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) (fusermount).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) One of the most important features of FUSE is allowing secure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) non-privileged mounts.  This opens up new possibilities for the use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) filesystems.  A good example is sshfs: a secure network filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) using the sftp protocol.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) The userspace library and utilities are available from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) `FUSE homepage: <https://github.com/libfuse/>`_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) Filesystem type
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) The filesystem type given to mount(2) can be one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)     fuse
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)       This is the usual way to mount a FUSE filesystem.  The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)       argument of the mount system call may contain an arbitrary string,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)       which is not interpreted by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)     fuseblk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)       The filesystem is block device based.  The first argument of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)       mount system call is interpreted as the name of the device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) Mount options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) fd=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)   The file descriptor to use for communication between the userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)   filesystem and the kernel.  The file descriptor must have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)   obtained by opening the FUSE device ('/dev/fuse').
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) rootmode=M
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)   The file mode of the filesystem's root in octal representation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) user_id=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)   The numeric user id of the mount owner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) group_id=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)   The numeric group id of the mount owner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) default_permissions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)   By default FUSE doesn't check file access permissions, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)   filesystem is free to implement its access policy or leave it to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)   the underlying file access mechanism (e.g. in case of network
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)   filesystems).  This option enables permission checking, restricting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)   access based on file mode.  It is usually useful together with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)   'allow_other' mount option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) allow_other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)   This option overrides the security measure restricting file access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)   to the user mounting the filesystem.  This option is by default only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)   allowed to root, but this restriction can be removed with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)   (userspace) configuration option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) max_read=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)   With this option the maximum size of read operations can be set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)   The default is infinite.  Note that the size of read requests is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)   limited anyway to 32 pages (which is 128kbyte on i386).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) blksize=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)   Set the block size for the filesystem.  The default is 512.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)   option is only valid for 'fuseblk' type mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) Control filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) There's a control filesystem for FUSE, which can be mounted by::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)   mount -t fusectl none /sys/fs/fuse/connections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Mounting it under the '/sys/fs/fuse/connections' directory makes it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) backwards compatible with earlier versions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Under the fuse control filesystem each connection has a directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) named by a unique number.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) For each connection the following files exist within this directory:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 	waiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 	  The number of requests which are waiting to be transferred to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 	  userspace or being processed by the filesystem daemon.  If there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 	  no filesystem activity and 'waiting' is non-zero, then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 	  filesystem is hung or deadlocked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 	abort
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 	  Writing anything into this file will abort the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 	  connection.  This means that all waiting requests will be aborted an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 	  error returned for all aborted and new requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) Only the owner of the mount may read or write these files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Interrupting filesystem operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ##################################
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) If a process issuing a FUSE filesystem request is interrupted, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) following will happen:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)   -  If the request is not yet sent to userspace AND the signal is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)      fatal (SIGKILL or unhandled fatal signal), then the request is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)      dequeued and returns immediately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)   -  If the request is not yet sent to userspace AND the signal is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)      fatal, then an interrupted flag is set for the request.  When
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)      the request has been successfully transferred to userspace and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)      this flag is set, an INTERRUPT request is queued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)   -  If the request is already sent to userspace, then an INTERRUPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)      request is queued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) INTERRUPT requests take precedence over other requests, so the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) userspace filesystem will receive queued INTERRUPTs before any others.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) The userspace filesystem may ignore the INTERRUPT requests entirely,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) or may honor them by sending a reply to the *original* request, with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) the error set to EINTR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) It is also possible that there's a race between processing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) original request and its INTERRUPT request.  There are two possibilities:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)   1. The INTERRUPT request is processed before the original request is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)      processed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)   2. The INTERRUPT request is processed after the original request has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)      been answered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) If the filesystem cannot find the original request, it should wait for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) some timeout and/or a number of new requests to arrive, after which it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) should reply to the INTERRUPT request with an EAGAIN error.  In case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 1) the INTERRUPT request will be requeued.  In case 2) the INTERRUPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) reply will be ignored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Aborting a filesystem connection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) It is possible to get into certain situations where the filesystem is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) not responding.  Reasons for this may be:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)   a) Broken userspace filesystem implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)   b) Network connection down
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)   c) Accidental deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)   d) Malicious deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) (For more on c) and d) see later sections)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) In either of these cases it may be useful to abort the connection to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) the filesystem.  There are several ways to do this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)   - Kill the filesystem daemon.  Works in case of a) and b)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)   - Kill the filesystem daemon and all users of the filesystem.  Works
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)     in all cases except some malicious deadlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)   - Use forced umount (umount -f).  Works in all cases but only if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)     filesystem is still attached (it hasn't been lazy unmounted)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)   - Abort filesystem through the FUSE control filesystem.  Most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)     powerful method, always works.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) How do non-privileged mounts work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) ==================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) Since the mount() system call is a privileged operation, a helper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) program (fusermount) is needed, which is installed setuid root.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) The implication of providing non-privileged mounts is that the mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) owner must not be able to use this capability to compromise the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) system.  Obvious requirements arising from this are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)  A) mount owner should not be able to get elevated privileges with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)     help of the mounted filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)  B) mount owner should not get illegitimate access to information from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)     other users' and the super user's processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)  C) mount owner should not be able to induce undesired behavior in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)     other users' or the super user's processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) How are requirements fulfilled?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)  A) The mount owner could gain elevated privileges by either:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228)     1. creating a filesystem containing a device file, then opening this device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)     2. creating a filesystem containing a suid or sgid application, then executing this application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)     The solution is not to allow opening device files and ignore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)     setuid and setgid bits when executing programs.  To ensure this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)     fusermount always adds "nosuid" and "nodev" to the mount options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)     for non-privileged mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)  B) If another user is accessing files or directories in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)     filesystem, the filesystem daemon serving requests can record the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)     exact sequence and timing of operations performed.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)     information is otherwise inaccessible to the mount owner, so this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)     counts as an information leak.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)     The solution to this problem will be presented in point 2) of C).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)  C) There are several ways in which the mount owner can induce
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)     undesired behavior in other users' processes, such as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248)      1) mounting a filesystem over a file or directory which the mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)         owner could otherwise not be able to modify (or could only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)         make limited modifications).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)         This is solved in fusermount, by checking the access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)         permissions on the mountpoint and only allowing the mount if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)         the mount owner can do unlimited modification (has write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)         access to the mountpoint, and mountpoint is not a "sticky"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)         directory)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)      2) Even if 1) is solved the mount owner can change the behavior
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)         of other users' processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)          i) It can slow down or indefinitely delay the execution of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)             filesystem operation creating a DoS against the user or the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)             whole system.  For example a suid application locking a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)             system file, and then accessing a file on the mount owner's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)             filesystem could be stopped, and thus causing the system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)             file to be locked forever.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268)          ii) It can present files or directories of unlimited length, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)              directory structures of unlimited depth, possibly causing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)              system process to eat up diskspace, memory or other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)              resources, again causing *DoS*.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 	The solution to this as well as B) is not to allow processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 	to access the filesystem, which could otherwise not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) 	monitored or manipulated by the mount owner.  Since if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 	mount owner can ptrace a process, it can do all of the above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 	without using a FUSE mount, the same criteria as used in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 	ptrace can be used to check if a process is allowed to access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 	the filesystem or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 	Note that the *ptrace* check is not strictly necessary to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 	prevent B/2/i, it is enough to check if mount owner has enough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) 	privilege to send signal to the process accessing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 	filesystem, since *SIGSTOP* can be used to get a similar effect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) I think these limitations are unacceptable?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) ===========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) If a sysadmin trusts the users enough, or can ensure through other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) measures, that system processes will never enter non-privileged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) mounts, it can relax the last limitation with a 'user_allow_other'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) config option.  If this config option is set, the mounting user can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) add the 'allow_other' mount option which disables the check for other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) users' processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) Kernel - userspace interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) The following diagram shows how a filesystem operation (in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) example unlink) is performed in FUSE. ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)  |  "rm /mnt/fuse/file"               |  FUSE filesystem daemon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)  |                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)  |                                    |  >sys_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)  |                                    |    >fuse_dev_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)  |                                    |      >request_wait()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)  |                                    |        [sleep on fc->waitq]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)  |                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310)  |  >sys_unlink()                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311)  |    >fuse_unlink()                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)  |      [get request from             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)  |       fc->unused_list]             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314)  |      >request_send()               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)  |        [queue req on fc->pending]  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)  |        [wake up fc->waitq]         |        [woken up]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)  |        >request_wait_answer()      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)  |          [sleep on req->waitq]     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)  |                                    |      <request_wait()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)  |                                    |      [remove req from fc->pending]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)  |                                    |      [copy req to read buffer]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)  |                                    |      [add req to fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)  |                                    |    <fuse_dev_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)  |                                    |  <sys_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)  |                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)  |                                    |  [perform unlink]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)  |                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)  |                                    |  >sys_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)  |                                    |    >fuse_dev_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)  |                                    |      [look up req in fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)  |                                    |      [remove from fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)  |                                    |      [copy write buffer to req]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)  |          [woken up]                |      [wake up req->waitq]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)  |                                    |    <fuse_dev_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)  |                                    |  <sys_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336)  |        <request_wait_answer()      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337)  |      <request_send()               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338)  |      [add request to               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339)  |       fc->unused_list]             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340)  |    <fuse_unlink()                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)  |  <sys_unlink()                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) .. note:: Everything in the description above is greatly simplified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) There are a couple of ways in which to deadlock a FUSE filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) Since we are talking about unprivileged userspace programs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) something must be done about these.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) **Scenario 1 -  Simple deadlock**::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351)  |  "rm /mnt/fuse/file"               |  FUSE filesystem daemon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352)  |                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)  |  >sys_unlink("/mnt/fuse/file")     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)  |    [acquire inode semaphore        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355)  |     for "file"]                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)  |    >fuse_unlink()                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)  |      [sleep on req->waitq]         |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358)  |                                    |  <sys_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)  |                                    |  >sys_unlink("/mnt/fuse/file")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)  |                                    |    [acquire inode semaphore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361)  |                                    |     for "file"]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)  |                                    |    *DEADLOCK*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) The solution for this is to allow the filesystem to be aborted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) **Scenario 2 - Tricky deadlock**
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) This one needs a carefully crafted filesystem.  It's a variation on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) the above, only the call back to the filesystem is not explicit,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) but is caused by a pagefault. ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)  |  Kamikaze filesystem thread 1      |  Kamikaze filesystem thread 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374)  |                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375)  |  [fd = open("/mnt/fuse/file")]     |  [request served normally]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)  |  [mmap fd to 'addr']               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377)  |  [close fd]                        |  [FLUSH triggers 'magic' flag]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)  |  [read a byte from addr]           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379)  |    >do_page_fault()                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)  |      [find or create page]         |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)  |      [lock page]                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)  |      >fuse_readpage()              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)  |         [queue READ request]       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)  |         [sleep on req->waitq]      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385)  |                                    |  [read request to buffer]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)  |                                    |  [create reply header before addr]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)  |                                    |  >sys_write(addr - headerlength)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388)  |                                    |    >fuse_dev_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)  |                                    |      [look up req in fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390)  |                                    |      [remove from fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391)  |                                    |      [copy write buffer to req]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)  |                                    |        >do_page_fault()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)  |                                    |           [find or create page]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)  |                                    |           [lock page]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)  |                                    |           * DEADLOCK *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) The solution is basically the same as above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) An additional problem is that while the write buffer is being copied
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) to the request, the request must not be interrupted/aborted.  This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) because the destination address of the copy may not be valid after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) request has returned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) This is solved with doing the copy atomically, and allowing abort
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) while the page(s) belonging to the write buffer are faulted with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) get_user_pages().  The 'req->locked' flag indicates when the copy is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) taking place, and abort is delayed until this flag is unset.