Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) .. _fsverity:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) fs-verity: read-only file-based authenticity protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) fs-verity (``fs/verity/``) is a support layer that filesystems can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) hook into to support transparent integrity and authenticity protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) of read-only files.  Currently, it is supported by the ext4 and f2fs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) filesystems.  Like fscrypt, not too much filesystem-specific code is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) needed to support fs-verity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) fs-verity is similar to `dm-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) but works on files rather than block devices.  On regular files on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) filesystems supporting fs-verity, userspace can execute an ioctl that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) causes the filesystem to build a Merkle tree for the file and persist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) it to a filesystem-specific location associated with the file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) After this, the file is made readonly, and all reads from the file are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) automatically verified against the file's Merkle tree.  Reads of any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) corrupted data, including mmap reads, will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) Userspace can use another ioctl to retrieve the root hash (actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) the "fs-verity file digest", which is a hash that includes the Merkle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) tree root hash) that fs-verity is enforcing for the file.  This ioctl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) executes in constant time, regardless of the file size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) fs-verity is essentially a way to hash a file in constant time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) subject to the caveat that reads which would violate the hash will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) fail at runtime.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) Use cases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) By itself, the base fs-verity feature only provides integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) protection, i.e. detection of accidental (non-malicious) corruption.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) However, because fs-verity makes retrieving the file hash extremely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) efficient, it's primarily meant to be used as a tool to support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) authentication (detection of malicious modifications) or auditing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) (logging file hashes before use).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) Trusted userspace code (e.g. operating system code running on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) read-only partition that is itself authenticated by dm-verity) can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) authenticate the contents of an fs-verity file by using the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) `FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) digital signature of it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) A standard file hash could be used instead of fs-verity.  However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) this is inefficient if the file is large and only a small portion may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) be accessed.  This is often the case for Android application package
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) (APK) files, for example.  These typically contain many translations,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) classes, and other resources that are infrequently or even never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) accessed on a particular device.  It would be slow and wasteful to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) read and hash the entire file before starting the application.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) Unlike an ahead-of-time hash, fs-verity also re-verifies data each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) time it's paged in.  This ensures that malicious disk firmware can't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) undetectably change the contents of the file at runtime.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) fs-verity does not replace or obsolete dm-verity.  dm-verity should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) still be used on read-only filesystems.  fs-verity is for files that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) must live on a read-write filesystem because they are independently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) updated and potentially user-installed, so dm-verity cannot be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) The base fs-verity feature is a hashing mechanism only; actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) authenticating the files is up to userspace.  However, to meet some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) users' needs, fs-verity optionally supports a simple signature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) verification mechanism where users can configure the kernel to require
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) that all fs-verity files be signed by a key loaded into a keyring; see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) `Built-in signature verification`_.  Support for fs-verity file hashes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) in IMA (Integrity Measurement Architecture) policies is also planned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) User API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) FS_IOC_ENABLE_VERITY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file.  It takes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) in a pointer to a struct fsverity_enable_arg, defined as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)     struct fsverity_enable_arg {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)             __u32 version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)             __u32 hash_algorithm;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)             __u32 block_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)             __u32 salt_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)             __u64 salt_ptr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)             __u32 sig_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)             __u32 __reserved1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)             __u64 sig_ptr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)             __u64 __reserved2[11];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) This structure contains the parameters of the Merkle tree to build for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) the file, and optionally contains a signature.  It must be initialized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) - ``version`` must be 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) - ``hash_algorithm`` must be the identifier for the hash algorithm to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)   use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256.  See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)   ``include/uapi/linux/fsverity.h`` for the list of possible values.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) - ``block_size`` must be the Merkle tree block size.  Currently, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)   must be equal to the system page size, which is usually 4096 bytes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)   Other sizes may be supported in the future.  This value is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)   necessarily the same as the filesystem block size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)   provided.  The salt is a value that is prepended to every hashed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)   block; it can be used to personalize the hashing for a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)   file or device.  Currently the maximum salt size is 32 bytes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)   provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) - ``sig_size`` is the size of the signature in bytes, or 0 if no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)   signature is provided.  Currently the signature is (somewhat
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)   arbitrarily) limited to 16128 bytes.  See `Built-in signature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)   verification`_ for more information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) - ``sig_ptr``  is the pointer to the signature, or NULL if no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)   signature is provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) - All reserved fields must be zeroed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) the file and persist it to a filesystem-specific location associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) with the file, then mark the file as a verity file.  This ioctl may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) take a long time to execute on large files, and it is interruptible by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) fatal signals.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) FS_IOC_ENABLE_VERITY checks for write access to the inode.  However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) it must be executed on an O_RDONLY file descriptor and no processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) can have the file open for writing.  Attempts to open the file for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) writing while this ioctl is executing will fail with ETXTBSY.  (This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) is necessary to guarantee that no writable file descriptors will exist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) after verity is enabled, and to guarantee that the file's contents are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) stable while the Merkle tree is being built over it.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) verity file.  On failure (including the case of interruption by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) fatal signal), no changes are made to the file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) FS_IOC_ENABLE_VERITY can fail with the following errors:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) - ``EACCES``: the process does not have write access to the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) - ``EBADMSG``: the signature is malformed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) - ``EBUSY``: this ioctl is already running on the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) - ``EEXIST``: the file already has verity enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) - ``EFAULT``: the caller provided inaccessible memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) - ``EINTR``: the operation was interrupted by a fatal signal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) - ``EINVAL``: unsupported version, hash algorithm, or block size; or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)   reserved bits are set; or the file descriptor refers to neither a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)   regular file nor a directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) - ``EISDIR``: the file descriptor refers to a directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) - ``EKEYREJECTED``: the signature doesn't match the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) - ``EMSGSIZE``: the salt or signature is too long
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) - ``ENOKEY``: the fs-verity keyring doesn't contain the certificate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)   needed to verify the signature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)   available in the kernel's crypto API as currently configured (e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)   for SHA-512, missing CONFIG_CRYPTO_SHA512).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) - ``ENOTTY``: this type of filesystem does not implement fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)   support; or the filesystem superblock has not had the 'verity'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)   feature enabled on it; or the filesystem does not support fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)   on this file.  (See `Filesystem support`_.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) - ``EPERM``: the file is append-only; or, a signature is required and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)   one was not provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) - ``EROFS``: the filesystem is read-only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) - ``ETXTBSY``: someone has the file open for writing.  This can be the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)   caller's file descriptor, another open file descriptor, or the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)   reference held by a writable memory map.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) FS_IOC_MEASURE_VERITY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) The FS_IOC_MEASURE_VERITY ioctl retrieves the digest of a verity file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) The fs-verity file digest is a cryptographic digest that identifies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) the file contents that are being enforced on reads; it is computed via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) a Merkle tree and is different from a traditional full-file digest.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) This ioctl takes in a pointer to a variable-length structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)     struct fsverity_digest {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)             __u16 digest_algorithm;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)             __u16 digest_size; /* input/output */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)             __u8 digest[];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) ``digest_size`` is an input/output field.  On input, it must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) initialized to the number of bytes allocated for the variable-length
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) ``digest`` field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) On success, 0 is returned and the kernel fills in the structure as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) - ``digest_algorithm`` will be the hash algorithm used for the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)   digest.  It will match ``fsverity_enable_arg::hash_algorithm``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) - ``digest_size`` will be the size of the digest in bytes, e.g. 32
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)   for SHA-256.  (This can be redundant with ``digest_algorithm``.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) - ``digest`` will be the actual bytes of the digest.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) regardless of the size of the file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) FS_IOC_MEASURE_VERITY can fail with the following errors:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) - ``EFAULT``: the caller provided inaccessible memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) - ``ENODATA``: the file is not a verity file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) - ``ENOTTY``: this type of filesystem does not implement fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)   support, or the filesystem superblock has not had the 'verity'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)   feature enabled on it.  (See `Filesystem support`_.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) - ``EOVERFLOW``: the digest is longer than the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)   ``digest_size`` bytes.  Try providing a larger buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) FS_IOC_READ_VERITY_METADATA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) The FS_IOC_READ_VERITY_METADATA ioctl reads verity metadata from a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) verity file.  This ioctl is available since Linux v5.12.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) This ioctl allows writing a server program that takes a verity file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) and serves it to a client program, such that the client can do its own
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) fs-verity compatible verification of the file.  This only makes sense
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) if the client doesn't trust the server and if the server needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) provide the storage for the client.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) This is a fairly specialized use case, and most fs-verity users won't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) need this ioctl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) This ioctl takes in a pointer to the following structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)    #define FS_VERITY_METADATA_TYPE_MERKLE_TREE     1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)    #define FS_VERITY_METADATA_TYPE_DESCRIPTOR      2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)    #define FS_VERITY_METADATA_TYPE_SIGNATURE       3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)    struct fsverity_read_metadata_arg {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)            __u64 metadata_type;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)            __u64 offset;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)            __u64 length;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)            __u64 buf_ptr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)            __u64 __reserved;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)    };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) ``metadata_type`` specifies the type of metadata to read:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) - ``FS_VERITY_METADATA_TYPE_MERKLE_TREE`` reads the blocks of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)   Merkle tree.  The blocks are returned in order from the root level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)   to the leaf level.  Within each level, the blocks are returned in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)   the same order that their hashes are themselves hashed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)   See `Merkle tree`_ for more information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) - ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)   descriptor.  See `fs-verity descriptor`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) - ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the signature which was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)   passed to FS_IOC_ENABLE_VERITY, if any.  See `Built-in signature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)   verification`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) The semantics are similar to those of ``pread()``.  ``offset``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) specifies the offset in bytes into the metadata item to read from, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) ``length`` specifies the maximum number of bytes to read from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) metadata item.  ``buf_ptr`` is the pointer to the buffer to read into,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) cast to a 64-bit integer.  ``__reserved`` must be 0.  On success, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) number of bytes read is returned.  0 is returned at the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) metadata item.  The returned length may be less than ``length``, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) example if the ioctl is interrupted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) The metadata returned by FS_IOC_READ_VERITY_METADATA isn't guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) to be authenticated against the file digest that would be returned by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) `FS_IOC_MEASURE_VERITY`_, as the metadata is expected to be used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) implement fs-verity compatible verification anyway (though absent a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) malicious disk, the metadata will indeed match).  E.g. to implement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) this ioctl, the filesystem is allowed to just read the Merkle tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) blocks from disk without actually verifying the path to the root node.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) FS_IOC_READ_VERITY_METADATA can fail with the following errors:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) - ``EFAULT``: the caller provided inaccessible memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) - ``EINTR``: the ioctl was interrupted before any data was read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) - ``EINVAL``: reserved fields were set, or ``offset + length``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)   overflowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) - ``ENODATA``: the file is not a verity file, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)   FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)   have a built-in signature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) - ``ENOTTY``: this type of filesystem does not implement fs-verity, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)   this ioctl is not yet implemented on it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) - ``EOPNOTSUPP``: the kernel was not configured with fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)   support, or the filesystem superblock has not had the 'verity'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)   feature enabled on it.  (See `Filesystem support`_.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) FS_IOC_GETFLAGS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) can also be used to check whether a file has fs-verity enabled or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) To do so, check for FS_VERITY_FL (0x00100000) in the returned flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) The verity flag is not settable via FS_IOC_SETFLAGS.  You must use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) FS_IOC_ENABLE_VERITY instead, since parameters must be provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) statx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) Since Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) the file has fs-verity enabled.  This can perform better than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) opening the file, and opening verity files can be expensive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) Accessing verity files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) Applications can transparently access a verity file just like a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) non-verity one, with the following exceptions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) - Verity files are readonly.  They cannot be opened for writing or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)   truncate()d, even if the file mode bits allow it.  Attempts to do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)   one of these things will fail with EPERM.  However, changes to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)   metadata such as owner, mode, timestamps, and xattrs are still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)   allowed, since these are not measured by fs-verity.  Verity files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)   can also still be renamed, deleted, and linked to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) - Direct I/O is not supported on verity files.  Attempts to use direct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)   I/O on such files will fall back to buffered I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) - DAX (Direct Access) is not supported on verity files, because this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)   would circumvent the data verification.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) - Reads of data that doesn't match the verity Merkle tree will fail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)   with EIO (for read()) or SIGBUS (for mmap() reads).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) - If the sysctl "fs.verity.require_signatures" is set to 1 and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337)   file is not signed by a key in the fs-verity keyring, then opening
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338)   the file will fail.  See `Built-in signature verification`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) Direct access to the Merkle tree is not supported.  Therefore, if a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) verity file is copied, or is backed up and restored, then it will lose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) its "verity"-ness.  fs-verity is primarily meant for files like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) executables that are managed by a package manager.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) File digest computation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) This section describes how fs-verity hashes the file contents using a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) Merkle tree to produce the digest which cryptographically identifies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) the file contents.  This algorithm is the same for all filesystems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) that support fs-verity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) Userspace only needs to be aware of this algorithm if it needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) compute fs-verity file digests itself, e.g. in order to sign files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) .. _fsverity_merkle_tree:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) Merkle tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) The file contents is divided into blocks, where the block size is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) configurable but is usually 4096 bytes.  The end of the last block is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) zero-padded if needed.  Each block is then hashed, producing the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) level of hashes.  Then, the hashes in this first level are grouped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) into 'blocksize'-byte blocks (zero-padding the ends as needed) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) these blocks are hashed, producing the second level of hashes.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) proceeds up the tree until only a single block remains.  The hash of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) this block is the "Merkle tree root hash".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) If the file fits in one block and is nonempty, then the "Merkle tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) root hash" is simply the hash of the single data block.  If the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) is empty, then the "Merkle tree root hash" is all zeroes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) The "blocks" here are not necessarily the same as "filesystem blocks".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) If a salt was specified, then it's zero-padded to the closest multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) of the input size of the hash algorithm's compression function, e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 64 bytes for SHA-256 or 128 bytes for SHA-512.  The padded salt is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) prepended to every data or Merkle tree block that is hashed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) The purpose of the block padding is to cause every hash to be taken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) over the same amount of data, which simplifies the implementation and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) keeps open more possibilities for hardware acceleration.  The purpose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) of the salt padding is to make the salting "free" when the salted hash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) state is precomputed, then imported for each hash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) Example: in the recommended configuration of SHA-256 and 4K blocks,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 128 hash values fit in each block.  Thus, each level of the Merkle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) tree is approximately 128 times smaller than the previous, and for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) large files the Merkle tree's size converges to approximately 1/127 of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) the original file size.  However, for small files, the padding is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) significant, making the space overhead proportionally more.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) .. _fsverity_descriptor:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) fs-verity descriptor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) By itself, the Merkle tree root hash is ambiguous.  For example, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) can't a distinguish a large file from a small second file whose data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) is exactly the top-level hash block of the first file.  Ambiguities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) also arise from the convention of padding to the next block boundary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) To solve this problem, the fs-verity file digest is actually computed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) as a hash of the following structure, which contains the Merkle tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) root hash as well as other fields such as the file size::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408)     struct fsverity_descriptor {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409)             __u8 version;           /* must be 1 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410)             __u8 hash_algorithm;    /* Merkle tree hash algorithm */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411)             __u8 log_blocksize;     /* log2 of size of data and tree blocks */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412)             __u8 salt_size;         /* size of salt in bytes; 0 if none */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413)             __le32 __reserved_0x04; /* must be 0 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414)             __le64 data_size;       /* size of file the Merkle tree is built over */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415)             __u8 root_hash[64];     /* Merkle tree root hash */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416)             __u8 salt[32];          /* salt prepended to each hashed block */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417)             __u8 __reserved[144];   /* must be 0's */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) Built-in signature verification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) a portion of an authentication policy (see `Use cases`_) in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) kernel.  Specifically, it adds support for:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) 1. At fs-verity module initialization time, a keyring ".fs-verity" is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428)    created.  The root user can add trusted X.509 certificates to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429)    keyring using the add_key() system call, then (when done)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430)    optionally use keyctl_restrict_keyring() to prevent additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431)    certificates from being added.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434)    detached signature in DER format of the file's fs-verity digest.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435)    On success, this signature is persisted alongside the Merkle tree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436)    Then, any time the file is opened, the kernel will verify the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437)    file's actual digest against this signature, using the certificates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438)    in the ".fs-verity" keyring.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) 3. A new sysctl "fs.verity.require_signatures" is made available.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)    When set to 1, the kernel requires that all verity files have a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442)    correctly signed digest as described in (2).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) fs-verity file digests must be signed in the following format, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) is similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447)     struct fsverity_formatted_digest {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448)             char magic[8];                  /* must be "FSVerity" */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449)             __le16 digest_algorithm;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450)             __le16 digest_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451)             __u8 digest[];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) fs-verity's built-in signature verification support is meant as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) relatively simple mechanism that can be used to provide some level of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) authenticity protection for verity files, as an alternative to doing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) the signature verification in userspace or using IMA-appraisal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) However, with this mechanism, userspace programs still need to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) that the verity bit is set, and there is no protection against verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) files being swapped around.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) Filesystem support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) fs-verity is currently supported by the ext4 and f2fs filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) on either filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) ``include/linux/fsverity.h`` declares the interface between the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) ``fs/verity/`` support layer and filesystems.  Briefly, filesystems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) must provide an ``fsverity_operations`` structure that provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) methods to read and write the verity metadata to a filesystem-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) location, including the Merkle tree blocks and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) ``fsverity_descriptor``.  Filesystems must also call functions in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) ``fs/verity/`` at certain times, such as when a file is opened or when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) pages have been read into the pagecache.  (See `Verifying data`_.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) ext4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) ----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) To create verity files on an ext4 filesystem, the filesystem must have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) it.  "verity" is an RO_COMPAT filesystem feature, so once set, old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) kernels will only be able to mount the filesystem readonly, and old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) versions of e2fsck will be unable to check the filesystem.  Moreover,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) currently ext4 only supports mounting a filesystem with the "verity"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) feature when its block size is equal to PAGE_SIZE (often 4096 bytes).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) ext4 also supports encryption, which can be used simultaneously with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) fs-verity.  In this case, the plaintext data is verified rather than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) the ciphertext.  This is necessary in order to make the fs-verity file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) digest meaningful, since every file is encrypted differently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) past the end of the file, starting at the first 64K boundary beyond
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) i_size.  This approach works because (a) verity files are readonly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) and (b) pages fully beyond i_size aren't visible to userspace but can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) be read/written internally by ext4 with only some relatively small
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) changes to ext4.  This approach avoids having to depend on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) EA_INODE feature and on rearchitecturing ext4's xattr support to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) support paging multi-gigabyte xattrs into memory, and to support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) encrypting xattrs.  Note that the verity metadata *must* be encrypted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) when the file is, since it contains hashes of the plaintext data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) Currently, ext4 verity only supports the case where the Merkle tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) block size, filesystem block size, and page size are all the same.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) also only supports extent-based files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) f2fs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) ----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) To create verity files on an f2fs filesystem, the filesystem must have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) been formatted with ``-O verity``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) cleared.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) Like ext4, f2fs stores the verity metadata (Merkle tree and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) fsverity_descriptor) past the end of the file, starting at the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) 64K boundary beyond i_size.  See explanation for ext4 above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) Moreover, f2fs supports at most 4096 bytes of xattr entries per inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) which wouldn't be enough for even a single Merkle tree block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) Currently, f2fs verity only supports a Merkle tree block size of 4096.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) Also, f2fs doesn't support enabling verity on files that currently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) have atomic or volatile writes pending.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) Implementation details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) Verifying data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) fs-verity ensures that all reads of a verity file's data are verified,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) regardless of which syscall is used to do the read (e.g. mmap(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) read(), pread()) and regardless of whether it's the first read or a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) later read (unless the later read can return cached data that was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) already verified).  Below, we describe how filesystems implement this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) Pagecache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) ~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) For filesystems using Linux's pagecache, the ``->readpage()`` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) ``->readpages()`` methods must be modified to verify pages before they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) are marked Uptodate.  Merely hooking ``->read_iter()`` would be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) insufficient, since ``->read_iter()`` is not used for memory maps.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) Therefore, fs/verity/ provides a function fsverity_verify_page() which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) verifies a page that has been read into the pagecache of a verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) inode, but is still locked and not Uptodate, so it's not yet readable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) by userspace.  As needed to do the verification,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) fsverity_verify_page() will call back into the filesystem to read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) Merkle tree pages via fsverity_operations::read_merkle_tree_page().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) fsverity_verify_page() returns false if verification failed; in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) case, the filesystem must not set the page Uptodate.  Following this,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) as per the usual Linux pagecache behavior, attempts by userspace to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) read() from the part of the file containing the page will fail with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) EIO, and accesses to the page within a memory map will raise SIGBUS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) fsverity_verify_page() currently only supports the case where the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) In principle, fsverity_verify_page() verifies the entire path in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) Merkle tree from the data page to the root hash.  However, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) efficiency the filesystem may cache the hash pages.  Therefore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) fsverity_verify_page() only ascends the tree reading hash pages until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) an already-verified hash page is seen, as indicated by the PageChecked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) bit being set.  It then verifies the path to that page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) This optimization, which is also used by dm-verity, results in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) excellent sequential read performance.  This is because usually (e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) 127 in 128 times for 4K blocks and SHA-256) the hash page from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) bottom level of the tree will already be cached and checked from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) reading a previous data page.  However, random reads perform worse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) Block device based filesystems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) Block device based filesystems (e.g. ext4 and f2fs) in Linux also use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) the pagecache, so the above subsection applies too.  However, they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) also usually read many pages from a file at once, grouped into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) structure called a "bio".  To make it easier for these types of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) filesystems to support fs-verity, fs/verity/ also provides a function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) fsverity_verify_bio() which verifies all pages in a bio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) ext4 and f2fs also support encryption.  If a verity file is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) encrypted, the pages must be decrypted before being verified.  To
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) support this, these filesystems allocate a "post-read context" for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) each bio and store it in ``->bi_private``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600)     struct bio_post_read_ctx {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601)            struct bio *bio;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602)            struct work_struct work;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603)            unsigned int cur_step;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604)            unsigned int enabled_steps;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) ``enabled_steps`` is a bitmask that specifies whether decryption,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) verity, or both is enabled.  After the bio completes, for each needed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) postprocessing step the filesystem enqueues the bio_post_read_ctx on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) workqueue, and then the workqueue work does the decryption or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) verification.  Finally, pages where no decryption or verity error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) occurred are marked Uptodate, and the pages are unlocked.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) Files on ext4 and f2fs may contain holes.  Normally, ``->readpages()``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) simply zeroes holes and sets the corresponding pages Uptodate; no bios
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) are issued.  To prevent this case from bypassing fs-verity, these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) filesystems use fsverity_verify_page() to verify hole pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) ext4 and f2fs disable direct I/O on verity files, since otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) direct I/O would bypass fs-verity.  (They also do the same for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) encrypted files.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) Userspace utility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) This document focuses on the kernel, but a userspace utility for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) fs-verity can be found at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) 	https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) See the README.md file in the fsverity-utils source tree for details,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) including examples of setting up fs-verity protected files.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) Tests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) To test fs-verity, use xfstests.  For example, using `kvm-xfstests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640)     kvm-xfstests -c ext4,f2fs -g verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) FAQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) ===
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) This section answers frequently asked questions about fs-verity that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) weren't already directly answered in other parts of this document.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) :Q: Why isn't fs-verity part of IMA?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) :A: fs-verity and IMA (Integrity Measurement Architecture) have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650)     different focuses.  fs-verity is a filesystem-level mechanism for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651)     hashing individual files using a Merkle tree.  In contrast, IMA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652)     specifies a system-wide policy that specifies which files are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653)     hashed and what to do with those hashes, such as log them,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654)     authenticate them, or add them to a measurement list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656)     IMA is planned to support the fs-verity hashing mechanism as an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657)     alternative to doing full file hashes, for people who want the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658)     performance and security benefits of the Merkle tree based hash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659)     But it doesn't make sense to force all uses of fs-verity to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660)     through IMA.  As a standalone filesystem feature, fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661)     already meets many users' needs, and it's testable like other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662)     filesystem features e.g. with xfstests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) :Q: Isn't fs-verity useless because the attacker can just modify the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665)     hashes in the Merkle tree, which is stored on-disk?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) :A: To verify the authenticity of an fs-verity file you must verify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667)     the authenticity of the "fs-verity file digest", which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668)     incorporates the root hash of the Merkle tree.  See `Use cases`_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) :Q: Isn't fs-verity useless because the attacker can just replace a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671)     verity file with a non-verity one?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) :A: See `Use cases`_.  In the initial use case, it's really trusted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673)     userspace code that authenticates the files; fs-verity is just a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674)     tool to do this job efficiently and securely.  The trusted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)     userspace code will consider non-verity files to be inauthentic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) :Q: Why does the Merkle tree need to be stored on-disk?  Couldn't you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678)     store just the root hash?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) :A: If the Merkle tree wasn't stored on-disk, then you'd have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680)     compute the entire tree when the file is first accessed, even if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681)     just one byte is being read.  This is a fundamental consequence of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682)     how Merkle tree hashing works.  To verify a leaf node, you need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683)     verify the whole path to the root hash, including the root node
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684)     (the thing which the root hash is a hash of).  But if the root
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685)     node isn't stored on-disk, you have to compute it by hashing its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686)     children, and so on until you've actually hashed the entire file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688)     That defeats most of the point of doing a Merkle tree-based hash,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689)     since if you have to hash the whole file ahead of time anyway,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690)     then you could simply do sha256(file) instead.  That would be much
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691)     simpler, and a bit faster too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693)     It's true that an in-memory Merkle tree could still provide the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694)     advantage of verification on every read rather than just on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695)     first read.  However, it would be inefficient because every time a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696)     hash page gets evicted (you can't pin the entire Merkle tree into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697)     memory, since it may be very large), in order to restore it you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)     again need to hash everything below it in the tree.  This again
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699)     defeats most of the point of doing a Merkle tree-based hash, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700)     a single block read could trigger re-hashing gigabytes of data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) :Q: But couldn't you store just the leaf nodes and compute the rest?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) :A: See previous answer; this really just moves up one level, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704)     one could alternatively interpret the data blocks as being the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705)     leaf nodes of the Merkle tree.  It's true that the tree can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706)     computed much faster if the leaf level is stored rather than just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707)     the data, but that's only because each level is less than 1% the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708)     size of the level below (assuming the recommended settings of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709)     SHA-256 and 4K blocks).  For the exact same reason, by storing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710)     "just the leaf nodes" you'd already be storing over 99% of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711)     tree, so you might as well simply store the whole tree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) :Q: Can the Merkle tree be built ahead of time, e.g. distributed as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714)     part of a package that is installed to many computers?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) :A: This isn't currently supported.  It was part of the original
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716)     design, but was removed to simplify the kernel UAPI and because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717)     wasn't a critical use case.  Files are usually installed once and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718)     used many times, and cryptographic hashing is somewhat fast on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719)     most modern processors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) :Q: Why doesn't fs-verity support writes?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) :A: Write support would be very difficult and would require a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723)     completely different design, so it's well outside the scope of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724)     fs-verity.  Write support would require:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726)     - A way to maintain consistency between the data and hashes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727)       including all levels of hashes, since corruption after a crash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728)       (especially of potentially the entire file!) is unacceptable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729)       The main options for solving this are data journalling,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730)       copy-on-write, and log-structured volume.  But it's very hard to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731)       retrofit existing filesystems with new consistency mechanisms.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732)       Data journalling is available on ext4, but is very slow.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734)     - Rebuilding the Merkle tree after every write, which would be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)       extremely inefficient.  Alternatively, a different authenticated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736)       dictionary structure such as an "authenticated skiplist" could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737)       be used.  However, this would be far more complex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739)     Compare it to dm-verity vs. dm-integrity.  dm-verity is very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740)     simple: the kernel just verifies read-only data against a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741)     read-only Merkle tree.  In contrast, dm-integrity supports writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742)     but is slow, is much more complex, and doesn't actually support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743)     full-device authentication since it authenticates each sector
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744)     independently, i.e. there is no "root hash".  It doesn't really
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745)     make sense for the same device-mapper target to support these two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746)     very different cases; the same applies to fs-verity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) :Q: Since verity files are immutable, why isn't the immutable bit set?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) :A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750)     specific set of semantics which not only make the file contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751)     read-only, but also prevent the file from being deleted, renamed,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752)     linked to, or having its owner or mode changed.  These extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753)     properties are unwanted for fs-verity, so reusing the immutable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754)     bit isn't appropriate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) :Q: Why does the API use ioctls instead of setxattr() and getxattr()?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) :A: Abusing the xattr interface for basically arbitrary syscalls is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758)     heavily frowned upon by most of the Linux filesystem developers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759)     An xattr should really just be an xattr on-disk, not an API to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760)     e.g. magically trigger construction of a Merkle tree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) :Q: Does fs-verity support remote filesystems?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) :A: Only ext4 and f2fs support is implemented currently, but in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764)     principle any filesystem that can store per-file verity metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765)     can support fs-verity, regardless of whether it's local or remote.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766)     Some filesystems may have fewer options of where to store the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767)     verity metadata; one possibility is to store it past the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768)     the file and "hide" it from userspace by manipulating i_size.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769)     data verification functions provided by ``fs/verity/`` also assume
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770)     that the filesystem uses the Linux pagecache, but both local and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771)     remote filesystems normally do so.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) :Q: Why is anything filesystem-specific at all?  Shouldn't fs-verity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774)     be implemented entirely at the VFS level?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) :A: There are many reasons why this is not possible or would be very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776)     difficult, including the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778)     - To prevent bypassing verification, pages must not be marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779)       Uptodate until they've been verified.  Currently, each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780)       filesystem is responsible for marking pages Uptodate via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781)       ``->readpages()``.  Therefore, currently it's not possible for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782)       the VFS to do the verification on its own.  Changing this would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783)       require significant changes to the VFS and all filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785)     - It would require defining a filesystem-independent way to store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786)       the verity metadata.  Extended attributes don't work for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787)       because (a) the Merkle tree may be gigabytes, but many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788)       filesystems assume that all xattrs fit into a single 4K
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789)       filesystem block, and (b) ext4 and f2fs encryption doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790)       encrypt xattrs, yet the Merkle tree *must* be encrypted when the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791)       file contents are, because it stores hashes of the plaintext
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792)       file contents.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794)       So the verity metadata would have to be stored in an actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795)       file.  Using a separate file would be very ugly, since the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796)       metadata is fundamentally part of the file to be protected, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797)       it could cause problems where users could delete the real file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798)       but not the metadata file or vice versa.  On the other hand,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799)       having it be in the same file would break applications unless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800)       filesystems' notion of i_size were divorced from the VFS's,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801)       which would be complex and require changes to all filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803)     - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804)       transaction mechanism so that either the file ends up with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805)       verity enabled, or no changes were made.  Allowing intermediate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806)       states to occur after a crash may cause problems.