^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Overview of the Linux Virtual File System
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Original author: Richard Gooch <rgooch@atnf.csiro.au>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) - Copyright (C) 1999 Richard Gooch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) - Copyright (C) 2005 Pekka Enberg
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) The Virtual File System (also known as the Virtual Filesystem Switch) is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) the software layer in the kernel that provides the filesystem interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) to userspace programs. It also provides an abstraction within the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) kernel which allows different filesystem implementations to coexist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) are called from a process context. Filesystem locking is described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) the document Documentation/filesystems/locking.rst.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) Directory Entry Cache (dcache)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) The VFS implements the open(2), stat(2), chmod(2), and similar system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) calls. The pathname argument that is passed to them is used by the VFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) to search through the directory entry cache (also known as the dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) cache or dcache). This provides a very fast look-up mechanism to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) translate a pathname (filename) into a specific dentry. Dentries live
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) in RAM and are never saved to disc: they exist only for performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) The dentry cache is meant to be a view into your entire filespace. As
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) most computers cannot fit all dentries in the RAM at the same time, some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) bits of the cache are missing. In order to resolve your pathname into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) dentry, the VFS may have to resort to creating dentries along the way,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) and then loading the inode. This is done by looking up the inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) The Inode Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) An individual dentry usually has a pointer to an inode. Inodes are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) filesystem objects such as regular files, directories, FIFOs and other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) beasts. They live either on the disc (for block device filesystems) or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) in the memory (for pseudo filesystems). Inodes that live on the disc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) are copied into the memory when required and changes to the inode are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) written back to disc. A single inode can be pointed to by multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) dentries (hard links, for example, do this).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) To look up an inode requires that the VFS calls the lookup() method of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) the parent directory inode. This method is installed by the specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) filesystem implementation that the inode lives in. Once the VFS has the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) required dentry (and hence the inode), we can do all those boring things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) like open(2) the file, or stat(2) it to peek at the inode data. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) stat(2) operation is fairly simple: once the VFS has the dentry, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) peeks at the inode data and passes some of it back to userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) The File Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) Opening a file requires another operation: allocation of a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) structure (this is the kernel-side implementation of file descriptors).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) The freshly allocated file structure is initialized with a pointer to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) the dentry and a set of file operation member functions. These are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) taken from the inode data. The open() file method is then called so the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) specific filesystem implementation can do its work. You can see that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) this is another switch performed by the VFS. The file structure is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) placed into the file descriptor table for the process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Reading, writing and closing files (and other assorted VFS operations)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) is done by using the userspace file descriptor to grab the appropriate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) file structure, and then calling the required file structure method to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) do whatever is required. For as long as the file is open, it keeps the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) dentry in use, which in turn means that the VFS inode is still in use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) Registering and Mounting a Filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) To register and unregister a filesystem, use the following API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) functions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) #include <linux/fs.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) extern int register_filesystem(struct file_system_type *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) extern int unregister_filesystem(struct file_system_type *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) The passed struct file_system_type describes your filesystem. When a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) request is made to mount a filesystem onto a directory in your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) namespace, the VFS will call the appropriate mount() method for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) specific filesystem. New vfsmount referring to the tree returned by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) ->mount() will be attached to the mountpoint, so that when pathname
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) resolution reaches the mountpoint it will jump into the root of that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) vfsmount.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) You can see all filesystems that are registered to the kernel in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) file /proc/filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) struct file_system_type
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) This describes the filesystem. As of kernel 2.6.39, the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) struct file_system_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) const char *name;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) int fs_flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) struct dentry *(*mount) (struct file_system_type *, int,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) const char *, void *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) void (*kill_sb) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) struct module *owner;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) struct file_system_type * next;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) struct list_head fs_supers;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) struct lock_class_key s_lock_key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) struct lock_class_key s_umount_key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) ``name``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) the name of the filesystem type, such as "ext2", "iso9660",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) "msdos" and so on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) ``fs_flags``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ``mount``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) the method to call when a new instance of this filesystem should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) be mounted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) ``kill_sb``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) the method to call when an instance of this filesystem should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) shut down
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) ``owner``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) for internal VFS use: you should initialize this to THIS_MODULE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) in most cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) ``next``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) for internal VFS use: you should initialize this to NULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) s_lock_key, s_umount_key: lockdep-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) The mount() method has the following arguments:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) ``struct file_system_type *fs_type``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) describes the filesystem, partly initialized by the specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) filesystem code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) ``int flags``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) mount flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) ``const char *dev_name``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) the device name we are mounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) ``void *data``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) arbitrary mount options, usually comes as an ASCII string (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) "Mount Options" section)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) The mount() method must return the root dentry of the tree requested by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) caller. An active reference to its superblock must be grabbed and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) superblock must be locked. On failure it should return ERR_PTR(error).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) The arguments match those of mount(2) and their interpretation depends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) on filesystem type. E.g. for block filesystems, dev_name is interpreted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) as block device name, that device is opened and if it contains a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) suitable filesystem image the method creates and initializes struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) super_block accordingly, returning its root dentry to caller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) ->mount() may choose to return a subtree of existing filesystem - it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) doesn't have to create a new one. The main result from the caller's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) point of view is a reference to dentry at the root of (sub)tree to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) attached; creation of new superblock is a common side effect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) The most interesting member of the superblock structure that the mount()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) method fills in is the "s_op" field. This is a pointer to a "struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) super_operations" which describes the next level of the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) implementation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) Usually, a filesystem uses one of the generic mount() implementations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) and provides a fill_super() callback instead. The generic variants are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) ``mount_bdev``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) mount a filesystem residing on a block device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) ``mount_nodev``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) mount a filesystem that is not backed by a device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) ``mount_single``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) mount a filesystem which shares the instance between all mounts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) A fill_super() callback implementation has the following arguments:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) ``struct super_block *sb``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) the superblock structure. The callback must initialize this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) properly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) ``void *data``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) arbitrary mount options, usually comes as an ASCII string (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) "Mount Options" section)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) ``int silent``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) whether or not to be silent on error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) The Superblock Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) A superblock object represents a mounted filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) struct super_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) This describes how the VFS can manipulate the superblock of your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) filesystem. As of kernel 2.6.22, the following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) struct super_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) struct inode *(*alloc_inode)(struct super_block *sb);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) void (*destroy_inode)(struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) void (*dirty_inode) (struct inode *, int flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) int (*write_inode) (struct inode *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) void (*drop_inode) (struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) void (*delete_inode) (struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) void (*put_super) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) int (*sync_fs)(struct super_block *sb, int wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) int (*freeze_fs) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) int (*unfreeze_fs) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) int (*statfs) (struct dentry *, struct kstatfs *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) int (*remount_fs) (struct super_block *, int *, char *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) void (*clear_inode) (struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) void (*umount_begin) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) int (*show_options)(struct seq_file *, struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) int (*nr_cached_objects)(struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) void (*free_cached_objects)(struct super_block *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) All methods are called without any locks being held, unless otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) noted. This means that most methods can block safely. All methods are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) only called from a process context (i.e. not from an interrupt handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) or bottom half).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) ``alloc_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) this method is called by alloc_inode() to allocate memory for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) struct inode and initialize it. If this function is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) defined, a simple 'struct inode' is allocated. Normally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) alloc_inode will be used to allocate a larger structure which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) contains a 'struct inode' embedded within it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) ``destroy_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) this method is called by destroy_inode() to release resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) allocated for struct inode. It is only required if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) ->alloc_inode was defined and simply undoes anything done by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) ->alloc_inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) ``dirty_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) this method is called by the VFS to mark an inode dirty.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) ``write_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) this method is called when the VFS needs to write an inode to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) disc. The second parameter indicates whether the write should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) be synchronous or not, not all filesystems check this flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) ``drop_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) called when the last access to the inode is dropped, with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) inode->i_lock spinlock held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) This method should be either NULL (normal UNIX filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) semantics) or "generic_delete_inode" (for filesystems that do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) not want to cache inodes - causing "delete_inode" to always be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) called regardless of the value of i_nlink)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) The "generic_delete_inode()" behavior is equivalent to the old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) practice of using "force_delete" in the put_inode() case, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) does not have the races that the "force_delete()" approach had.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) ``delete_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) called when the VFS wants to delete an inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) ``put_super``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) called when the VFS wishes to free the superblock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) (i.e. unmount). This is called with the superblock lock held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) ``sync_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) called when VFS is writing out all dirty data associated with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) superblock. The second parameter indicates whether the method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) should wait until the write out has been completed. Optional.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) ``freeze_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) called when VFS is locking a filesystem and forcing it into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) consistent state. This method is currently used by the Logical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) Volume Manager (LVM).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) ``unfreeze_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) called when VFS is unlocking a filesystem and making it writable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) ``statfs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) called when the VFS needs to get filesystem statistics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) ``remount_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) called when the filesystem is remounted. This is called with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) the kernel lock held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) ``clear_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) called then the VFS clears the inode. Optional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) ``umount_begin``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) called when the VFS is unmounting a filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) ``show_options``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) called by the VFS to show mount options for /proc/<pid>/mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) (see "Mount Options" section)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) ``quota_read``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) called by the VFS to read from filesystem quota file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) ``quota_write``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) called by the VFS to write to filesystem quota file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) ``nr_cached_objects``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) called by the sb cache shrinking function for the filesystem to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) return the number of freeable cached objects it contains.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) Optional.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) ``free_cache_objects``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) called by the sb cache shrinking function for the filesystem to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) scan the number of objects indicated to try to free them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) Optional, but any filesystem implementing this method needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) also implement ->nr_cached_objects for it to be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) correctly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) We can't do anything with any errors that the filesystem might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) encountered, hence the void return type. This will never be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) called if the VM is trying to reclaim under GFP_NOFS conditions,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) hence this method does not need to handle that situation itself.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) Implementations must include conditional reschedule calls inside
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) any scanning loop that is done. This allows the VFS to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) determine appropriate scan batch sizes without having to worry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) about whether implementations will cause holdoff problems due to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) large scan batch sizes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) Whoever sets up the inode is responsible for filling in the "i_op"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) field. This is a pointer to a "struct inode_operations" which describes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) the methods that can be performed on individual inodes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) struct xattr_handlers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) On filesystems that support extended attributes (xattrs), the s_xattr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) superblock field points to a NULL-terminated array of xattr handlers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) Extended attributes are name:value pairs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) ``name``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) Indicates that the handler matches attributes with the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) name (such as "system.posix_acl_access"); the prefix field must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) be NULL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) ``prefix``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) Indicates that the handler matches all attributes with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) specified name prefix (such as "user."); the name field must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) NULL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) ``list``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) Determine if attributes matching this xattr handler should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) listed for a particular dentry. Used by some listxattr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) implementations like generic_listxattr.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) ``get``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) Called by the VFS to get the value of a particular extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) attribute. This method is called by the getxattr(2) system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) ``set``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) Called by the VFS to set the value of a particular extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) attribute. When the new value is NULL, called to remove a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) particular extended attribute. This method is called by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) setxattr(2) and removexattr(2) system calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) When none of the xattr handlers of a filesystem match the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) attribute name or when a filesystem doesn't support extended attributes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) the various ``*xattr(2)`` system calls return -EOPNOTSUPP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) The Inode Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) An inode object represents an object within the filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) struct inode_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) This describes how the VFS can manipulate an inode in your filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) As of kernel 2.6.22, the following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) struct inode_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) int (*create) (struct inode *,struct dentry *, umode_t, bool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) int (*link) (struct dentry *,struct inode *,struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) int (*unlink) (struct inode *,struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) int (*symlink) (struct inode *,struct dentry *,const char *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) int (*mkdir) (struct inode *,struct dentry *,umode_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) int (*rmdir) (struct inode *,struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) int (*rename) (struct inode *, struct dentry *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) struct inode *, struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) int (*readlink) (struct dentry *, char __user *,int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) const char *(*get_link) (struct dentry *, struct inode *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) struct delayed_call *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) int (*permission) (struct inode *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) int (*get_acl)(struct inode *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) int (*setattr) (struct dentry *, struct iattr *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) ssize_t (*listxattr) (struct dentry *, char *, size_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) void (*update_time)(struct inode *, struct timespec *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) int (*atomic_open)(struct inode *, struct dentry *, struct file *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) unsigned open_flag, umode_t create_mode);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) int (*tmpfile) (struct inode *, struct dentry *, umode_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) Again, all methods are called without any locks being held, unless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) otherwise noted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) ``create``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) called by the open(2) and creat(2) system calls. Only required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) if you want to support regular files. The dentry you get should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) not have an inode (i.e. it should be a negative dentry). Here
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) you will probably call d_instantiate() with the dentry and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) newly created inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) ``lookup``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) called when the VFS needs to look up an inode in a parent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) directory. The name to look for is found in the dentry. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) method must call d_add() to insert the found inode into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) dentry. The "i_count" field in the inode structure should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) incremented. If the named inode does not exist a NULL inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) should be inserted into the dentry (this is called a negative
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) dentry). Returning an error code from this routine must only be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) done on a real error, otherwise creating inodes with system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) calls like create(2), mknod(2), mkdir(2) and so on will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) If you wish to overload the dentry methods then you should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) initialise the "d_dop" field in the dentry; this is a pointer to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) a struct "dentry_operations". This method is called with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) directory inode semaphore held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) ``link``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) called by the link(2) system call. Only required if you want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) support hard links. You will probably need to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) d_instantiate() just as you would in the create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) ``unlink``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) called by the unlink(2) system call. Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) to support deleting inodes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) ``symlink``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) called by the symlink(2) system call. Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) to support symlinks. You will probably need to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) d_instantiate() just as you would in the create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) ``mkdir``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) called by the mkdir(2) system call. Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) to support creating subdirectories. You will probably need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) call d_instantiate() just as you would in the create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) ``rmdir``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) called by the rmdir(2) system call. Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) to support deleting subdirectories
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) ``mknod``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) called by the mknod(2) system call to create a device (char,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) block) inode or a named pipe (FIFO) or socket. Only required if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) you want to support creating these types of inodes. You will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) probably need to call d_instantiate() just as you would in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) ``rename``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) called by the rename(2) system call to rename the object to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) the parent and name given by the second inode and dentry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) The filesystem must return -EINVAL for any unsupported or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) unknown flags. Currently the following flags are implemented:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) (1) RENAME_NOREPLACE: this flag indicates that if the target of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) the rename exists the rename should fail with -EEXIST instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) replacing the target. The VFS already checks for existence, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) for local filesystems the RENAME_NOREPLACE implementation is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) equivalent to plain rename.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) (2) RENAME_EXCHANGE: exchange source and target. Both must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) exist; this is checked by the VFS. Unlike plain rename, source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) and target may be of different type.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) ``get_link``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) called by the VFS to follow a symbolic link to the inode it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) points to. Only required if you want to support symbolic links.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) This method returns the symlink body to traverse (and possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) resets the current position with nd_jump_link()). If the body
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) won't go away until the inode is gone, nothing else is needed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) if it needs to be otherwise pinned, arrange for its release by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) having get_link(..., ..., done) do set_delayed_call(done,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) destructor, argument). In that case destructor(argument) will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) be called once VFS is done with the body you've returned. May
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) be called in RCU mode; that is indicated by NULL dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) argument. If request can't be handled without leaving RCU mode,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) have it return ERR_PTR(-ECHILD).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) If the filesystem stores the symlink target in ->i_link, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) VFS may use it directly without calling ->get_link(); however,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) ->get_link() must still be provided. ->i_link must not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) freed until after an RCU grace period. Writing to ->i_link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) post-iget() time requires a 'release' memory barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) ``readlink``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) this is now just an override for use by readlink(2) for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) cases when ->get_link uses nd_jump_link() or object is not in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) fact a symlink. Normally filesystems should only implement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) ->get_link for symlinks and readlink(2) will automatically use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) ``permission``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) called by the VFS to check for access rights on a POSIX-like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) rcu-walk mode, the filesystem must check the permission without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) blocking or storing to the inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) If a situation is encountered that rcu-walk cannot handle,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) -ECHILD and it will be called again in ref-walk mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) ``setattr``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) called by the VFS to set attributes for a file. This method is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) called by chmod(2) and related system calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) ``getattr``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) called by the VFS to get attributes of a file. This method is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) called by stat(2) and related system calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) ``listxattr``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) called by the VFS to list all extended attributes for a given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) file. This method is called by the listxattr(2) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) ``update_time``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) called by the VFS to update a specific time or the i_version of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) an inode. If this is not defined the VFS will update the inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) itself and call mark_inode_dirty_sync.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) ``atomic_open``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) called on the last component of an open. Using this optional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) method the filesystem can look up, possibly create and open the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) file in one atomic operation. If it wants to leave actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) opening to the caller (e.g. if the file turned out to be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) symlink, device, or just something filesystem won't do atomic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) open for), it may signal this by returning finish_no_open(file,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) dentry). This method is only called if the last component is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) negative or needs lookup. Cached positive dentries are still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) handled by f_op->open(). If the file was created, FMODE_CREATED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) flag should be set in file->f_mode. In case of O_EXCL the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) method must only succeed if the file didn't exist and hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) FMODE_CREATED shall always be set on success.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) ``tmpfile``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) called in the end of O_TMPFILE open(). Optional, equivalent to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) atomically creating, opening and unlinking a file in given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) The Address Space Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) The address space object is used to group and manage pages in the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) cache. It can be used to keep track of the pages in a file (or anything
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) else) and also track the mapping of sections of the file into process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) address spaces.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) There are a number of distinct yet related services that an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) address-space can provide. These include communicating memory pressure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) page lookup by address, and keeping track of pages tagged as Dirty or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) Writeback.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) The first can be used independently to the others. The VM can try to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) either write dirty pages in order to clean them, or release clean pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) in order to reuse them. To do this it can call the ->writepage method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) on dirty pages, and ->releasepage on clean pages with PagePrivate set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) Clean pages without PagePrivate and with no external references will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) released without notice being given to the address_space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) To achieve this functionality, pages need to be placed on an LRU with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) lru_cache_add and mark_page_active needs to be called whenever the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) Pages are normally kept in a radix tree index by ->index. This tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) maintains information about the PG_Dirty and PG_Writeback status of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) page, so that pages with either of these flags can be found quickly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) The Dirty tag is primarily used by mpage_writepages - the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) ->writepages method. It uses the tag to find dirty pages to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) ->writepage on. If mpage_writepages is not used (i.e. the address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) unused. write_inode_now and sync_inode do use it (through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) __sync_single_inode) to check if ->writepages has been successful in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) writing out the whole address_space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) The Writeback tag is used by filemap*wait* and sync_page* functions, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) filemap_fdatawait_range, to wait for all writeback to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) An address_space handler may attach extra information to a page,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) typically using the 'private' field in the 'struct page'. If such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) information is attached, the PG_Private flag should be set. This will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) cause various VM routines to make extra calls into the address_space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) handler to deal with that data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) An address space acts as an intermediate between storage and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) application. Data is read into the address space a whole page at a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) time, and provided to the application either by copying of the page, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) by memory-mapping the page. Data is written into the address space by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) the application, and then written-back to storage typically in whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) pages, however the address_space has finer control of write sizes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) The read process essentially only requires 'readpage'. The write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) process is more complicated and uses write_begin/write_end or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) set_page_dirty to write data into the address_space, and writepage and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) writepages to writeback data to storage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) Adding and removing pages to/from an address_space is protected by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) inode's i_mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) When data is written to a page, the PG_Dirty flag should be set. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) typically remains set until writepage asks for it to be written. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) should clear PG_Dirty and set PG_Writeback. It can be actually written
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) at any point after PG_Dirty is clear. Once it is known to be safe,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) PG_Writeback is cleared.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) Writeback makes use of a writeback_control structure to direct the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) operations. This gives the writepage and writepages operations some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) information about the nature of and reason for the writeback request,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) and the constraints under which it is being done. It is also used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) return information back to the caller about the result of a writepage or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) writepages request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) Handling errors during writeback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) Most applications that do buffered I/O will periodically call a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) synchronization call (fsync, fdatasync, msync or sync_file_range) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) ensure that data written has made it to the backing store. When there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) is an error during writeback, they expect that error to be reported when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) a file sync request is made. After an error has been reported on one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) request, subsequent requests on the same file descriptor should return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) 0, unless further writeback errors have occurred since the previous file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) syncronization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) Ideally, the kernel would report errors only on file descriptions on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675) which writes were done that subsequently failed to be written back. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) generic pagecache infrastructure does not track the file descriptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) that have dirtied each individual page however, so determining which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) file descriptors should get back an error is not possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) Instead, the generic writeback error tracking infrastructure in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) kernel settles for reporting errors to fsync on all file descriptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) that were open at the time that the error occurred. In a situation with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) multiple writers, all of them will get back an error on a subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) fsync, even if all of the writes done through that particular file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) descriptor succeeded (or even if there were no writes on that file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) descriptor at all).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) Filesystems that wish to use this infrastructure should call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) mapping_set_error to record the error in the address_space when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) occurs. Then, after writing back data from the pagecache in their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) file->fsync operation, they should call file_check_and_advance_wb_err to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) ensure that the struct file's error cursor has advanced to the correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) point in the stream of errors emitted by the backing device(s).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) struct address_space_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) This describes how the VFS can manipulate mapping of a file to page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) cache in your filesystem. The following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) struct address_space_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) int (*writepage)(struct page *page, struct writeback_control *wbc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) int (*readpage)(struct file *, struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) int (*writepages)(struct address_space *, struct writeback_control *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) int (*set_page_dirty)(struct page *page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) void (*readahead)(struct readahead_control *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) int (*readpages)(struct file *filp, struct address_space *mapping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) struct list_head *pages, unsigned nr_pages);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) int (*write_begin)(struct file *, struct address_space *mapping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) loff_t pos, unsigned len, unsigned flags,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) struct page **pagep, void **fsdata);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) int (*write_end)(struct file *, struct address_space *mapping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) loff_t pos, unsigned len, unsigned copied,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) struct page *page, void *fsdata);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) sector_t (*bmap)(struct address_space *, sector_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) void (*invalidatepage) (struct page *, unsigned int, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) int (*releasepage) (struct page *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) void (*freepage)(struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) /* isolate a page for migration */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) bool (*isolate_page) (struct page *, isolate_mode_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) /* migrate the contents of a page to the specified target */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) int (*migratepage) (struct page *, struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) /* put migration-failed page back to right list */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) void (*putback_page) (struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) int (*launder_page) (struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) int (*is_partially_uptodate) (struct page *, unsigned long,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) void (*is_dirty_writeback) (struct page *, bool *, bool *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) int (*error_remove_page) (struct mapping *mapping, struct page *page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735) int (*swap_activate)(struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) int (*swap_deactivate)(struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) ``writepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) called by the VM to write a dirty page to backing store. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) may happen for data integrity reasons (i.e. 'sync'), or to free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) up memory (flush). The difference can be seen in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) wbc->sync_mode. The PG_Dirty flag has been cleared and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) PageLocked is true. writepage should start writeout, should set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) PG_Writeback, and should make sure the page is unlocked, either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) synchronously or asynchronously when the write operation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) completes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) try too hard if there are problems, and may choose to write out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) other pages from the mapping if that is easier (e.g. due to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) internal dependencies). If it chooses not to start writeout, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) should return AOP_WRITEPAGE_ACTIVATE so that the VM will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) keep calling ->writepage on that page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) See the file "Locking" for more details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) ``readpage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) called by the VM to read a page from backing store. The page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) will be Locked when readpage is called, and should be unlocked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) and marked uptodate once the read completes. If ->readpage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) discovers that it needs to unlock the page for some reason, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) can do so, and then return AOP_TRUNCATED_PAGE. In this case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764) the page will be relocated, relocked and if that all succeeds,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) ->readpage will be called again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) ``writepages``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) called by the VM to write out pages associated with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) address_space object. If wbc->sync_mode is WB_SYNC_ALL, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) the writeback_control will specify a range of pages that must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) written out. If it is WB_SYNC_NONE, then a nr_to_write is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) given and that many pages should be written if possible. If no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) ->writepages is given, then mpage_writepages is used instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) This will choose pages from the address space that are tagged as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) DIRTY and will pass them to ->writepage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) ``set_page_dirty``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) called by the VM to set a page dirty. This is particularly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779) needed if an address space attaches private data to a page, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) that data needs to be updated when a page is dirtied. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) called, for example, when a memory mapped page gets modified.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) If defined, it should set the PageDirty flag, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) PAGECACHE_TAG_DIRTY tag in the radix tree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) ``readahead``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786) Called by the VM to read pages associated with the address_space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) object. The pages are consecutive in the page cache and are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) locked. The implementation should decrement the page refcount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789) after starting I/O on each page. Usually the page will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) unlocked by the I/O completion handler. If the filesystem decides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791) to stop attempting I/O before reaching the end of the readahead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) window, it can simply return. The caller will decrement the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) refcount and unlock the remaining pages for you. Set PageUptodate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) if the I/O completes successfully. Setting PageError on any page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795) will be ignored; simply unlock the page if an I/O error occurs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) ``readpages``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) called by the VM to read pages associated with the address_space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799) object. This is essentially just a vector version of readpage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800) Instead of just one page, several pages are requested.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) readpages is only used for read-ahead, so read errors are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) ignored. If anything goes wrong, feel free to give up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803) This interface is deprecated and will be removed by the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804) 2020; implement readahead instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806) ``write_begin``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807) Called by the generic buffered write code to ask the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808) to prepare to write len bytes at the given offset in the file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809) The address_space should check that the write will be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810) complete, by allocating space if necessary and doing any other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811) internal housekeeping. If the write will update parts of any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812) basic-blocks on storage, then those blocks should be pre-read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813) (if they haven't been read already) so that the updated blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814) can be written out properly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816) The filesystem must return the locked pagecache page for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817) specified offset, in ``*pagep``, for the caller to write into.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819) It must be able to cope with short writes (where the length
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820) passed to write_begin is greater than the number of bytes copied
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821) into the page).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823) flags is a field for AOP_FLAG_xxx flags, described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824) include/linux/fs.h.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826) A void * may be returned in fsdata, which then gets passed into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827) write_end.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829) Returns 0 on success; < 0 on failure (which is the error code),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) in which case write_end is not called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832) ``write_end``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833) After a successful write_begin, and data copy, write_end must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834) called. len is the original len passed to write_begin, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835) copied is the amount that was able to be copied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837) The filesystem must take care of unlocking the page and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838) releasing it refcount, and updating i_size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840) Returns < 0 on failure, otherwise the number of bytes (<=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841) 'copied') that were able to be copied into pagecache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) ``bmap``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844) called by the VFS to map a logical block offset within object to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845) physical block number. This method is used by the FIBMAP ioctl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846) and for working with swap-files. To be able to swap to a file,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847) the file must have a stable mapping to a block device. The swap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848) system does not go through the filesystem but instead uses bmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849) to find out where the blocks in the file are and uses those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850) addresses directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852) ``invalidatepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853) If a page has PagePrivate set, then invalidatepage will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854) called when part or all of the page is to be removed from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) address space. This generally corresponds to either a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856) truncation, punch hole or a complete invalidation of the address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857) space (in the latter case 'offset' will always be 0 and 'length'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858) will be PAGE_SIZE). Any private data associated with the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859) should be updated to reflect this truncation. If offset is 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860) and length is PAGE_SIZE, then the private data should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861) released, because the page must be able to be completely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862) discarded. This may be done by calling the ->releasepage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863) function, but in this case the release MUST succeed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865) ``releasepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866) releasepage is called on PagePrivate pages to indicate that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) page should be freed if possible. ->releasepage should remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868) any private data from the page and clear the PagePrivate flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869) If releasepage() fails for some reason, it must indicate failure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870) with a 0 return value. releasepage() is used in two distinct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871) though related cases. The first is when the VM finds a clean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872) page with no active users and wants to make it a free page. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873) ->releasepage succeeds, the page will be removed from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874) address_space and become free.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876) The second case is when a request has been made to invalidate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877) some or all pages in an address_space. This can happen through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878) the fadvise(POSIX_FADV_DONTNEED) system call or by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879) filesystem explicitly requesting it as nfs and 9fs do (when they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) believe the cache may be out of date with storage) by calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) invalidate_inode_pages2(). If the filesystem makes such a call,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882) and needs to be certain that all pages are invalidated, then its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883) releasepage will need to ensure this. Possibly it can clear the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884) PageUptodate bit if it cannot free private data yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886) ``freepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887) freepage is called once the page is no longer visible in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888) page cache in order to allow the cleanup of any private data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889) Since it may be called by the memory reclaimer, it should not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890) assume that the original address_space mapping still exists, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891) it should not block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893) ``direct_IO``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894) called by the generic read/write routines to perform direct_IO -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) that is IO requests which bypass the page cache and transfer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896) data directly between the storage and the application's address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897) space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899) ``isolate_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900) Called by the VM when isolating a movable non-lru page. If page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901) is successfully isolated, VM marks the page as PG_isolated via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902) __SetPageIsolated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904) ``migrate_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905) This is used to compact the physical memory usage. If the VM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906) wants to relocate a page (maybe off a memory card that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907) signalling imminent failure) it will pass a new page and an old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908) page to this function. migrate_page should transfer any private
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909) data across and update any references that it has to the page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911) ``putback_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912) Called by the VM when isolated page's migration fails.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914) ``launder_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915) Called before freeing a page - it writes back the dirty page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916) To prevent redirtying the page, it is kept locked during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917) whole operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919) ``is_partially_uptodate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920) Called by the VM when reading a file through the pagecache when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921) the underlying blocksize != pagesize. If the required block is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922) up to date then the read can complete without needing the IO to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923) bring the whole page up to date.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) ``is_dirty_writeback``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) Called by the VM when attempting to reclaim a page. The VM uses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927) dirty and writeback information to determine if it needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928) stall to allow flushers a chance to complete some IO.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929) Ordinarily it can use PageDirty and PageWriteback but some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930) filesystems have more complex state (unstable pages in NFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931) prevent reclaim) or do not set those flags due to locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932) problems. This callback allows a filesystem to indicate to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933) VM if a page should be treated as dirty or writeback for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934) purposes of stalling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936) ``error_remove_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937) normally set to generic_error_remove_page if truncation is ok
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938) for this address space. Used for memory failure handling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939) Setting this implies you deal with pages going away under you,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) unless you have them locked or reference counts increased.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942) ``swap_activate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943) Called when swapon is used on a file to allocate space if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944) necessary and pin the block lookup information in memory. A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945) return value of zero indicates success, in which case this file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946) can be used to back swapspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948) ``swap_deactivate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949) Called during swapoff on files where swap_activate was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950) successful.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953) The File Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) A file object represents a file opened by a process. This is also known
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957) as an "open file description" in POSIX parlance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960) struct file_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963) This describes how the VFS can manipulate an open file. As of kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964) 4.18, the following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968) struct file_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969) struct module *owner;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970) loff_t (*llseek) (struct file *, loff_t, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971) ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972) ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973) ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974) ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975) int (*iopoll)(struct kiocb *kiocb, bool spin);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976) int (*iterate) (struct file *, struct dir_context *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977) int (*iterate_shared) (struct file *, struct dir_context *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978) __poll_t (*poll) (struct file *, struct poll_table_struct *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979) long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980) long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981) int (*mmap) (struct file *, struct vm_area_struct *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982) int (*open) (struct inode *, struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983) int (*flush) (struct file *, fl_owner_t id);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 984) int (*release) (struct inode *, struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 985) int (*fsync) (struct file *, loff_t, loff_t, int datasync);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 986) int (*fasync) (int, struct file *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 987) int (*lock) (struct file *, int, struct file_lock *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 988) ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 989) unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 990) int (*check_flags)(int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 991) int (*flock) (struct file *, int, struct file_lock *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 992) ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 993) ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 994) int (*setlease)(struct file *, long, struct file_lock **, void **);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 995) long (*fallocate)(struct file *file, int mode, loff_t offset,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 996) loff_t len);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 997) void (*show_fdinfo)(struct seq_file *m, struct file *f);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 998) #ifndef CONFIG_MMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 999) unsigned (*mmap_capabilities)(struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) struct file *file_out, loff_t pos_out,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) loff_t len, unsigned int remap_flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) int (*fadvise)(struct file *, loff_t, loff_t, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) Again, all methods are called without any locks being held, unless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) otherwise noted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) ``llseek``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) called when the VFS needs to move the file position index
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) ``read``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) called by read(2) and related system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) ``read_iter``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) possibly asynchronous read with iov_iter as destination
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) ``write``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) called by write(2) and related system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) ``write_iter``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) possibly asynchronous write with iov_iter as source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) ``iopoll``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) called when aio wants to poll for completions on HIPRI iocbs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) ``iterate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) called when the VFS needs to read the directory contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) ``iterate_shared``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) called when the VFS needs to read the directory contents when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) filesystem supports concurrent dir iterators
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) ``poll``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) called by the VFS when a process wants to check if there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) activity on this file and (optionally) go to sleep until there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) is activity. Called by the select(2) and poll(2) system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) ``unlocked_ioctl``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) called by the ioctl(2) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) ``compat_ioctl``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) called by the ioctl(2) system call when 32 bit system calls are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) used on 64 bit kernels.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) ``mmap``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) called by the mmap(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) ``open``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) called by the VFS when an inode should be opened. When the VFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) opens a file, it creates a new "struct file". It then calls the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) open method for the newly allocated file structure. You might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) think that the open method really belongs in "struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) inode_operations", and you may be right. I think it's done the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) way it is because it makes filesystems simpler to implement.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) The open() method is a good place to initialize the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) "private_data" member in the file structure if you want to point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) to a device structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) ``flush``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) called by the close(2) system call to flush a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) ``release``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) called when the last reference to an open file is closed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) ``fsync``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) called by the fsync(2) system call. Also see the section above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) entitled "Handling errors during writeback".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) ``fasync``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) called by the fcntl(2) system call when asynchronous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) (non-blocking) mode is enabled for a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) ``lock``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) called by the fcntl(2) system call for F_GETLK, F_SETLK, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) F_SETLKW commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) ``get_unmapped_area``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) called by the mmap(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) ``check_flags``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) called by the fcntl(2) system call for F_SETFL command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) ``flock``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) called by the flock(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) ``splice_write``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) called by the VFS to splice data from a pipe to a file. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) method is used by the splice(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) ``splice_read``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) called by the VFS to splice data from file to a pipe. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) method is used by the splice(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) ``setlease``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) called by the VFS to set or release a file lock lease. setlease
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) implementations should call generic_setlease to record or remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) the lease in the inode after setting it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) ``fallocate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) called by the VFS to preallocate blocks or punch a hole.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) ``copy_file_range``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) called by the copy_file_range(2) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) ``remap_file_range``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) called by the ioctl(2) system call for FICLONERANGE and FICLONE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) and FIDEDUPERANGE commands to remap file ranges. An
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) implementation should remap len bytes at pos_in of the source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) file into the dest file at pos_out. Implementations must handle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) callers passing in len == 0; this means "remap to the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) source file". The return value should the number of bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) remapped, or the usual negative error code if errors occurred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) before any bytes were remapped. The remap_flags parameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117) accepts REMAP_FILE_* flags. If REMAP_FILE_DEDUP is set then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) implementation must only remap if the requested file ranges have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) identical contents. If REMAP_FILE_CAN_SHORTEN is set, the caller is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) ok with the implementation shortening the request length to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) satisfy alignment or EOF requirements (or any other reason).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) ``fadvise``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) possibly called by the fadvise64() system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) Note that the file operations are implemented by the specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) filesystem in which the inode resides. When opening a device node
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) (character or block special) most filesystems will call special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) support routines in the VFS which will locate the required device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) driver information. These support routines replace the filesystem file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) operations with those for the device driver, and then proceed to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) the new open() method for the file. This is how opening a device file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) in the filesystem eventually ends up calling the device driver open()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) method.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) Directory Entry Cache (dcache)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) ==============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) struct dentry_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) This describes how a filesystem can overload the standard dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145) operations. Dentries and the dcache are the domain of the VFS and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) individual filesystem implementations. Device drivers have no business
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) here. These methods may be set to NULL, as they are either optional or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) the VFS uses a default. As of kernel 2.6.22, the following members are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) struct dentry_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) int (*d_revalidate)(struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155) int (*d_weak_revalidate)(struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156) int (*d_hash)(const struct dentry *, struct qstr *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157) int (*d_compare)(const struct dentry *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158) unsigned int, const char *, const struct qstr *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159) int (*d_delete)(const struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160) int (*d_init)(struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) void (*d_release)(struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162) void (*d_iput)(struct dentry *, struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) char *(*d_dname)(struct dentry *, char *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) struct vfsmount *(*d_automount)(struct path *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) int (*d_manage)(const struct path *, bool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166) struct dentry *(*d_real)(struct dentry *, const struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169) ``d_revalidate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170) called when the VFS needs to revalidate a dentry. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171) called whenever a name look-up finds a dentry in the dcache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172) Most local filesystems leave this as NULL, because all their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) dentries in the dcache are valid. Network filesystems are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174) different since things can change on the server without the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175) client necessarily being aware of it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) This function should return a positive value if the dentry is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) still valid, and zero or a negative error code if it isn't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) d_revalidate may be called in rcu-walk mode (flags &
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) LOOKUP_RCU). If in rcu-walk mode, the filesystem must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182) revalidate the dentry without blocking or storing to the dentry,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183) d_parent and d_inode should not be used without care (because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184) they can change and, in d_inode case, even become NULL under
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185) us).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187) If a situation is encountered that rcu-walk cannot handle,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188) return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189) -ECHILD and it will be called again in ref-walk mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) ``_weak_revalidate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192) called when the VFS needs to revalidate a "jumped" dentry. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) is called when a path-walk ends at dentry that was not acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194) by doing a lookup in the parent directory. This includes "/",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) "." and "..", as well as procfs-style symlinks and mountpoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) traversal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) In this case, we are less concerned with whether the dentry is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) still fully correct, but rather that the inode is still valid.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200) As with d_revalidate, most local filesystems will set this to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201) NULL since their dcache entries are always valid.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) This function has the same return code semantics as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) d_revalidate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206) d_weak_revalidate is only called after leaving rcu-walk mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) ``d_hash``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209) called when the VFS adds a dentry to the hash table. The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210) dentry passed to d_hash is the parent directory that the name is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211) to be hashed into.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1212)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1213) Same locking and synchronisation rules as d_compare regarding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1214) what is safe to dereference etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1216) ``d_compare``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1217) called to compare a dentry name with a given name. The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1218) dentry is the parent of the dentry to be compared, the second is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1219) the child dentry. len and name string are properties of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1220) dentry to be compared. qstr is the name to compare it with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1222) Must be constant and idempotent, and should not take locks if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1223) possible, and should not or store into the dentry. Should not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1224) dereference pointers outside the dentry without lots of care
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1225) (eg. d_parent, d_inode, d_name should not be used).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1227) However, our vfsmount is pinned, and RCU held, so the dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1228) and inodes won't disappear, neither will our sb or filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1229) module. ->d_sb may be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1231) It is a tricky calling convention because it needs to be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1232) under "rcu-walk", ie. without any locks or references on things.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1233)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1234) ``d_delete``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1235) called when the last reference to a dentry is dropped and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1236) dcache is deciding whether or not to cache it. Return 1 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1237) delete immediately, or 0 to cache the dentry. Default is NULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1238) which means to always cache a reachable dentry. d_delete must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1239) be constant and idempotent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1240)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1241) ``d_init``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1242) called when a dentry is allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1244) ``d_release``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1245) called when a dentry is really deallocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1247) ``d_iput``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1248) called when a dentry loses its inode (just prior to its being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1249) deallocated). The default when this is NULL is that the VFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1250) calls iput(). If you define this method, you must call iput()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1251) yourself
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1252)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1253) ``d_dname``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1254) called when the pathname of a dentry should be generated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1255) Useful for some pseudo filesystems (sockfs, pipefs, ...) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1256) delay pathname generation. (Instead of doing it when dentry is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1257) created, it's done only when the path is needed.). Real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1258) filesystems probably dont want to use it, because their dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1259) are present in global dcache hash, so their hash should be an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1260) invariant. As no lock is held, d_dname() should not try to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1261) modify the dentry itself, unless appropriate SMP safety is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1262) CAUTION : d_path() logic is quite tricky. The correct way to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1263) return for example "Hello" is to put it at the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1264) buffer, and returns a pointer to the first char.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1265) dynamic_dname() helper function is provided to take care of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1266) this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1268) Example :
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1269)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1270) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1272) static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1273) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1274) return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1275) dentry->d_inode->i_ino);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1276) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1278) ``d_automount``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1279) called when an automount dentry is to be traversed (optional).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1280) This should create a new VFS mount record and return the record
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1281) to the caller. The caller is supplied with a path parameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1282) giving the automount directory to describe the automount target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1283) and the parent VFS mount record to provide inheritable mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1284) parameters. NULL should be returned if someone else managed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1285) make the automount first. If the vfsmount creation failed, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1286) an error code should be returned. If -EISDIR is returned, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1287) the directory will be treated as an ordinary directory and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1288) returned to pathwalk to continue walking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1289)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1290) If a vfsmount is returned, the caller will attempt to mount it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1291) on the mountpoint and will remove the vfsmount from its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1292) expiration list in the case of failure. The vfsmount should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1293) returned with 2 refs on it to prevent automatic expiration - the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1294) caller will clean up the additional ref.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1296) This function is only used if DCACHE_NEED_AUTOMOUNT is set on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1297) the dentry. This is set by __d_instantiate() if S_AUTOMOUNT is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1298) set on the inode being added.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1299)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1300) ``d_manage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1301) called to allow the filesystem to manage the transition from a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1302) dentry (optional). This allows autofs, for example, to hold up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1303) clients waiting to explore behind a 'mountpoint' while letting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1304) the daemon go past and construct the subtree there. 0 should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1305) returned to let the calling process continue. -EISDIR can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1306) returned to tell pathwalk to use this directory as an ordinary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1307) directory and to ignore anything mounted on it and not to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1308) the automount flag. Any other error code will abort pathwalk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1309) completely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1310)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1311) If the 'rcu_walk' parameter is true, then the caller is doing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1312) pathwalk in RCU-walk mode. Sleeping is not permitted in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1313) mode, and the caller can be asked to leave it and call again by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1314) returning -ECHILD. -EISDIR may also be returned to tell
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1315) pathwalk to ignore d_automount or any mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1316)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1317) This function is only used if DCACHE_MANAGE_TRANSIT is set on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1318) the dentry being transited from.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1319)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1320) ``d_real``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1321) overlay/union type filesystems implement this method to return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1322) one of the underlying dentries hidden by the overlay. It is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1323) used in two different modes:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1325) Called from file_dentry() it returns the real dentry matching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1326) the inode argument. The real dentry may be from a lower layer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1327) already copied up, but still referenced from the file. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1328) mode is selected with a non-NULL inode argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1330) With NULL inode the topmost real underlying dentry is returned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1331)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1332) Each dentry has a pointer to its parent dentry, as well as a hash list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1333) of child dentries. Child dentries are basically like files in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1334) directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1335)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1336)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1337) Directory Entry Cache API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1338) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1339)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1340) There are a number of functions defined which permit a filesystem to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1341) manipulate dentries:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1342)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1343) ``dget``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1344) open a new handle for an existing dentry (this just increments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1345) the usage count)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1346)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1347) ``dput``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1348) close a handle for a dentry (decrements the usage count). If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1349) the usage count drops to 0, and the dentry is still in its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1350) parent's hash, the "d_delete" method is called to check whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1351) it should be cached. If it should not be cached, or if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1352) dentry is not hashed, it is deleted. Otherwise cached dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1353) are put into an LRU list to be reclaimed on memory shortage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1354)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1355) ``d_drop``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1356) this unhashes a dentry from its parents hash list. A subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1357) call to dput() will deallocate the dentry if its usage count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1358) drops to 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1359)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1360) ``d_delete``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1361) delete a dentry. If there are no other open references to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1362) dentry then the dentry is turned into a negative dentry (the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1363) d_iput() method is called). If there are other references, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1364) d_drop() is called instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1365)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1366) ``d_add``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1367) add a dentry to its parents hash list and then calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1368) d_instantiate()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1369)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1370) ``d_instantiate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1371) add a dentry to the alias hash list for the inode and updates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1372) the "d_inode" member. The "i_count" member in the inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1373) structure should be set/incremented. If the inode pointer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1374) NULL, the dentry is called a "negative dentry". This function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1375) is commonly called when an inode is created for an existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1376) negative dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1377)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1378) ``d_lookup``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1379) look up a dentry given its parent and path name component It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1380) looks up the child of that given name from the dcache hash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1381) table. If it is found, the reference count is incremented and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1382) the dentry is returned. The caller must use dput() to free the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1383) dentry when it finishes using it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1384)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1385)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1386) Mount Options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1387) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1388)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1389)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1390) Parsing options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1391) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1392)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1393) On mount and remount the filesystem is passed a string containing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1394) comma separated list of mount options. The options can have either of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1395) these forms:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1396)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1397) option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1398) option=value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1399)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1400) The <linux/parser.h> header defines an API that helps parse these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1401) options. There are plenty of examples on how to use it in existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1402) filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1403)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1404)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1405) Showing options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1406) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1407)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1408) If a filesystem accepts mount options, it must define show_options() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1409) show all the currently active options. The rules are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1410)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1411) - options MUST be shown which are not default or their values differ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1412) from the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1413)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1414) - options MAY be shown which are enabled by default or have their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1415) default value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1416)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1417) Options used only internally between a mount helper and the kernel (such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1418) as file descriptors), or which only have an effect during the mounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1419) (such as ones controlling the creation of a journal) are exempt from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1420) above rules.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1421)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1422) The underlying reason for the above rules is to make sure, that a mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1423) can be accurately replicated (e.g. umounting and mounting again) based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1424) on the information found in /proc/mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1425)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1426)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1427) Resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1428) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1429)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1430) (Note some of these resources are not up-to-date with the latest kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1431) version.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1432)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1433) Creating Linux virtual filesystems. 2002
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1434) <https://lwn.net/Articles/13325/>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1435)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1436) The Linux Virtual File-system Layer by Neil Brown. 1999
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1437) <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1438)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1439) A tour of the Linux VFS by Michael K. Johnson. 1996
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1440) <https://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1442) A small trail through the Linux kernel by Andries Brouwer. 2001
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1443) <https://www.win.tue.nl/~aeb/linux/vfs/trail.html>