^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) FUSE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Userspace filesystem:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) A filesystem in which data and metadata are provided by an ordinary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) userspace process. The filesystem can be accessed normally through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) the kernel interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Filesystem daemon:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) The process(es) providing the data and metadata of the filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) Non-privileged mount (or user mount):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) A userspace filesystem mounted by a non-privileged (non-root) user.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) The filesystem daemon is running with the privileges of the mounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) user. NOTE: this is not the same as mounts allowed with the "user"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) option in /etc/fstab, which is not discussed here.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) Filesystem connection:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) A connection between the filesystem daemon and the kernel. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) connection exists until either the daemon dies, or the filesystem is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) umounted. Note that detaching (or lazy umounting) the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) does *not* break the connection, in this case it will exist until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) the last reference to the filesystem is released.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) Mount owner:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) The user who does the mounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) User:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) The user who is performing filesystem operations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) What is FUSE?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) FUSE is a userspace filesystem framework. It consists of a kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) module (fuse.ko), a userspace library (libfuse.*) and a mount utility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) (fusermount).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) One of the most important features of FUSE is allowing secure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) non-privileged mounts. This opens up new possibilities for the use of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) filesystems. A good example is sshfs: a secure network filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) using the sftp protocol.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) The userspace library and utilities are available from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) `FUSE homepage: <https://github.com/libfuse/>`_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) Filesystem type
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) The filesystem type given to mount(2) can be one of the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) fuse
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) This is the usual way to mount a FUSE filesystem. The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) argument of the mount system call may contain an arbitrary string,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) which is not interpreted by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) fuseblk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) The filesystem is block device based. The first argument of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) mount system call is interpreted as the name of the device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) Mount options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) fd=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) The file descriptor to use for communication between the userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) filesystem and the kernel. The file descriptor must have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) obtained by opening the FUSE device ('/dev/fuse').
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) rootmode=M
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) The file mode of the filesystem's root in octal representation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) user_id=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) The numeric user id of the mount owner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) group_id=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) The numeric group id of the mount owner.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) default_permissions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) By default FUSE doesn't check file access permissions, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) filesystem is free to implement its access policy or leave it to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) the underlying file access mechanism (e.g. in case of network
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) filesystems). This option enables permission checking, restricting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) access based on file mode. It is usually useful together with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) 'allow_other' mount option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) allow_other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) This option overrides the security measure restricting file access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) to the user mounting the filesystem. This option is by default only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) allowed to root, but this restriction can be removed with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) (userspace) configuration option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) max_read=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) With this option the maximum size of read operations can be set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) The default is infinite. Note that the size of read requests is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) limited anyway to 32 pages (which is 128kbyte on i386).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) blksize=N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) Set the block size for the filesystem. The default is 512. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) option is only valid for 'fuseblk' type mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) Control filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) There's a control filesystem for FUSE, which can be mounted by::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) mount -t fusectl none /sys/fs/fuse/connections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Mounting it under the '/sys/fs/fuse/connections' directory makes it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) backwards compatible with earlier versions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Under the fuse control filesystem each connection has a directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) named by a unique number.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) For each connection the following files exist within this directory:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) waiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) The number of requests which are waiting to be transferred to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) userspace or being processed by the filesystem daemon. If there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) no filesystem activity and 'waiting' is non-zero, then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) filesystem is hung or deadlocked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) abort
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) Writing anything into this file will abort the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) connection. This means that all waiting requests will be aborted an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) error returned for all aborted and new requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) Only the owner of the mount may read or write these files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Interrupting filesystem operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ##################################
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) If a process issuing a FUSE filesystem request is interrupted, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) following will happen:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) - If the request is not yet sent to userspace AND the signal is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) fatal (SIGKILL or unhandled fatal signal), then the request is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) dequeued and returns immediately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) - If the request is not yet sent to userspace AND the signal is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) fatal, then an interrupted flag is set for the request. When
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) the request has been successfully transferred to userspace and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) this flag is set, an INTERRUPT request is queued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) - If the request is already sent to userspace, then an INTERRUPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) request is queued.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) INTERRUPT requests take precedence over other requests, so the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) userspace filesystem will receive queued INTERRUPTs before any others.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) The userspace filesystem may ignore the INTERRUPT requests entirely,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) or may honor them by sending a reply to the *original* request, with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) the error set to EINTR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) It is also possible that there's a race between processing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) original request and its INTERRUPT request. There are two possibilities:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 1. The INTERRUPT request is processed before the original request is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) processed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 2. The INTERRUPT request is processed after the original request has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) been answered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) If the filesystem cannot find the original request, it should wait for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) some timeout and/or a number of new requests to arrive, after which it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) should reply to the INTERRUPT request with an EAGAIN error. In case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) reply will be ignored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Aborting a filesystem connection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) It is possible to get into certain situations where the filesystem is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) not responding. Reasons for this may be:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) a) Broken userspace filesystem implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) b) Network connection down
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) c) Accidental deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) d) Malicious deadlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) (For more on c) and d) see later sections)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) In either of these cases it may be useful to abort the connection to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) the filesystem. There are several ways to do this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) - Kill the filesystem daemon. Works in case of a) and b)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) - Kill the filesystem daemon and all users of the filesystem. Works
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) in all cases except some malicious deadlocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) - Use forced umount (umount -f). Works in all cases but only if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) filesystem is still attached (it hasn't been lazy unmounted)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) - Abort filesystem through the FUSE control filesystem. Most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) powerful method, always works.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) How do non-privileged mounts work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) ==================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) Since the mount() system call is a privileged operation, a helper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) program (fusermount) is needed, which is installed setuid root.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) The implication of providing non-privileged mounts is that the mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) owner must not be able to use this capability to compromise the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) system. Obvious requirements arising from this are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) A) mount owner should not be able to get elevated privileges with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) help of the mounted filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) B) mount owner should not get illegitimate access to information from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) other users' and the super user's processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) C) mount owner should not be able to induce undesired behavior in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) other users' or the super user's processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) How are requirements fulfilled?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) A) The mount owner could gain elevated privileges by either:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 1. creating a filesystem containing a device file, then opening this device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 2. creating a filesystem containing a suid or sgid application, then executing this application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) The solution is not to allow opening device files and ignore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) setuid and setgid bits when executing programs. To ensure this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) fusermount always adds "nosuid" and "nodev" to the mount options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) for non-privileged mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) B) If another user is accessing files or directories in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) filesystem, the filesystem daemon serving requests can record the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) exact sequence and timing of operations performed. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) information is otherwise inaccessible to the mount owner, so this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) counts as an information leak.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) The solution to this problem will be presented in point 2) of C).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) C) There are several ways in which the mount owner can induce
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) undesired behavior in other users' processes, such as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 1) mounting a filesystem over a file or directory which the mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) owner could otherwise not be able to modify (or could only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) make limited modifications).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) This is solved in fusermount, by checking the access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) permissions on the mountpoint and only allowing the mount if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) the mount owner can do unlimited modification (has write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) access to the mountpoint, and mountpoint is not a "sticky"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) directory)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 2) Even if 1) is solved the mount owner can change the behavior
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) of other users' processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) i) It can slow down or indefinitely delay the execution of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) filesystem operation creating a DoS against the user or the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) whole system. For example a suid application locking a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) system file, and then accessing a file on the mount owner's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) filesystem could be stopped, and thus causing the system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) file to be locked forever.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) ii) It can present files or directories of unlimited length, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) directory structures of unlimited depth, possibly causing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) system process to eat up diskspace, memory or other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) resources, again causing *DoS*.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) The solution to this as well as B) is not to allow processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) to access the filesystem, which could otherwise not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) monitored or manipulated by the mount owner. Since if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) mount owner can ptrace a process, it can do all of the above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) without using a FUSE mount, the same criteria as used in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) ptrace can be used to check if a process is allowed to access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) the filesystem or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) Note that the *ptrace* check is not strictly necessary to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) prevent B/2/i, it is enough to check if mount owner has enough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) privilege to send signal to the process accessing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) filesystem, since *SIGSTOP* can be used to get a similar effect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) I think these limitations are unacceptable?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) ===========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) If a sysadmin trusts the users enough, or can ensure through other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) measures, that system processes will never enter non-privileged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) mounts, it can relax the last limitation with a 'user_allow_other'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) config option. If this config option is set, the mounting user can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) add the 'allow_other' mount option which disables the check for other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) users' processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) Kernel - userspace interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) The following diagram shows how a filesystem operation (in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) example unlink) is performed in FUSE. ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) | "rm /mnt/fuse/file" | FUSE filesystem daemon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) | | >sys_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) | | >fuse_dev_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) | | >request_wait()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) | | [sleep on fc->waitq]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) | >sys_unlink() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) | >fuse_unlink() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) | [get request from |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) | fc->unused_list] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) | >request_send() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) | [queue req on fc->pending] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) | [wake up fc->waitq] | [woken up]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) | >request_wait_answer() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) | [sleep on req->waitq] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) | | <request_wait()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) | | [remove req from fc->pending]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) | | [copy req to read buffer]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) | | [add req to fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) | | <fuse_dev_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) | | <sys_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) | | [perform unlink]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) | | >sys_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) | | >fuse_dev_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) | | [look up req in fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) | | [remove from fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) | | [copy write buffer to req]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) | [woken up] | [wake up req->waitq]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) | | <fuse_dev_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) | | <sys_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) | <request_wait_answer() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) | <request_send() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) | [add request to |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) | fc->unused_list] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) | <fuse_unlink() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) | <sys_unlink() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) .. note:: Everything in the description above is greatly simplified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) There are a couple of ways in which to deadlock a FUSE filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) Since we are talking about unprivileged userspace programs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) something must be done about these.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) **Scenario 1 - Simple deadlock**::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) | "rm /mnt/fuse/file" | FUSE filesystem daemon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) | >sys_unlink("/mnt/fuse/file") |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) | [acquire inode semaphore |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) | for "file"] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) | >fuse_unlink() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) | [sleep on req->waitq] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) | | <sys_read()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) | | >sys_unlink("/mnt/fuse/file")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) | | [acquire inode semaphore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) | | for "file"]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) | | *DEADLOCK*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) The solution for this is to allow the filesystem to be aborted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) **Scenario 2 - Tricky deadlock**
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) This one needs a carefully crafted filesystem. It's a variation on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) the above, only the call back to the filesystem is not explicit,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) but is caused by a pagefault. ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) | Kamikaze filesystem thread 1 | Kamikaze filesystem thread 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) | [fd = open("/mnt/fuse/file")] | [request served normally]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) | [mmap fd to 'addr'] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) | [close fd] | [FLUSH triggers 'magic' flag]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) | [read a byte from addr] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) | >do_page_fault() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) | [find or create page] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) | [lock page] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) | >fuse_readpage() |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) | [queue READ request] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) | [sleep on req->waitq] |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) | | [read request to buffer]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) | | [create reply header before addr]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) | | >sys_write(addr - headerlength)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) | | >fuse_dev_write()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) | | [look up req in fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) | | [remove from fc->processing]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) | | [copy write buffer to req]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) | | >do_page_fault()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) | | [find or create page]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) | | [lock page]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) | | * DEADLOCK *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) The solution is basically the same as above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) An additional problem is that while the write buffer is being copied
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) to the request, the request must not be interrupted/aborted. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) because the destination address of the copy may not be valid after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) request has returned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) This is solved with doing the copy atomically, and allowing abort
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) while the page(s) belonging to the write buffer are faulted with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) get_user_pages(). The 'req->locked' flag indicates when the copy is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) taking place, and abort is delayed until this flag is unset.