Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    3) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    4) Overview of the Linux Virtual File System
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    5) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    7) Original author: Richard Gooch <rgooch@atnf.csiro.au>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    9) - Copyright (C) 1999 Richard Gooch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   10) - Copyright (C) 2005 Pekka Enberg
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   13) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   14) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   16) The Virtual File System (also known as the Virtual Filesystem Switch) is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   17) the software layer in the kernel that provides the filesystem interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   18) to userspace programs.  It also provides an abstraction within the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   19) kernel which allows different filesystem implementations to coexist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   21) VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   22) are called from a process context.  Filesystem locking is described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   23) the document Documentation/filesystems/locking.rst.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   26) Directory Entry Cache (dcache)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   27) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   29) The VFS implements the open(2), stat(2), chmod(2), and similar system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   30) calls.  The pathname argument that is passed to them is used by the VFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   31) to search through the directory entry cache (also known as the dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   32) cache or dcache).  This provides a very fast look-up mechanism to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   33) translate a pathname (filename) into a specific dentry.  Dentries live
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   34) in RAM and are never saved to disc: they exist only for performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   36) The dentry cache is meant to be a view into your entire filespace.  As
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   37) most computers cannot fit all dentries in the RAM at the same time, some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   38) bits of the cache are missing.  In order to resolve your pathname into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   39) dentry, the VFS may have to resort to creating dentries along the way,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   40) and then loading the inode.  This is done by looking up the inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   43) The Inode Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   44) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   46) An individual dentry usually has a pointer to an inode.  Inodes are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   47) filesystem objects such as regular files, directories, FIFOs and other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   48) beasts.  They live either on the disc (for block device filesystems) or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   49) in the memory (for pseudo filesystems).  Inodes that live on the disc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   50) are copied into the memory when required and changes to the inode are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   51) written back to disc.  A single inode can be pointed to by multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   52) dentries (hard links, for example, do this).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   54) To look up an inode requires that the VFS calls the lookup() method of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   55) the parent directory inode.  This method is installed by the specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   56) filesystem implementation that the inode lives in.  Once the VFS has the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   57) required dentry (and hence the inode), we can do all those boring things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   58) like open(2) the file, or stat(2) it to peek at the inode data.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   59) stat(2) operation is fairly simple: once the VFS has the dentry, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   60) peeks at the inode data and passes some of it back to userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   63) The File Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   64) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   66) Opening a file requires another operation: allocation of a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   67) structure (this is the kernel-side implementation of file descriptors).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   68) The freshly allocated file structure is initialized with a pointer to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   69) the dentry and a set of file operation member functions.  These are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   70) taken from the inode data.  The open() file method is then called so the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   71) specific filesystem implementation can do its work.  You can see that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   72) this is another switch performed by the VFS.  The file structure is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   73) placed into the file descriptor table for the process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   75) Reading, writing and closing files (and other assorted VFS operations)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   76) is done by using the userspace file descriptor to grab the appropriate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   77) file structure, and then calling the required file structure method to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   78) do whatever is required.  For as long as the file is open, it keeps the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   79) dentry in use, which in turn means that the VFS inode is still in use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   82) Registering and Mounting a Filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   83) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   85) To register and unregister a filesystem, use the following API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   86) functions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   88) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   90) 	#include <linux/fs.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   92) 	extern int register_filesystem(struct file_system_type *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   93) 	extern int unregister_filesystem(struct file_system_type *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   95) The passed struct file_system_type describes your filesystem.  When a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   96) request is made to mount a filesystem onto a directory in your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   97) namespace, the VFS will call the appropriate mount() method for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   98) specific filesystem.  New vfsmount referring to the tree returned by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   99) ->mount() will be attached to the mountpoint, so that when pathname
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  100) resolution reaches the mountpoint it will jump into the root of that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  101) vfsmount.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  103) You can see all filesystems that are registered to the kernel in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  104) file /proc/filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  107) struct file_system_type
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  108) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  110) This describes the filesystem.  As of kernel 2.6.39, the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  111) members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  113) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  115) 	struct file_system_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  116) 		const char *name;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  117) 		int fs_flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  118) 		struct dentry *(*mount) (struct file_system_type *, int,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  119) 					 const char *, void *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  120) 		void (*kill_sb) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  121) 		struct module *owner;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  122) 		struct file_system_type * next;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  123) 		struct list_head fs_supers;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  124) 		struct lock_class_key s_lock_key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  125) 		struct lock_class_key s_umount_key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  126) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  128) ``name``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  129) 	the name of the filesystem type, such as "ext2", "iso9660",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  130) 	"msdos" and so on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  132) ``fs_flags``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  133) 	various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  135) ``mount``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  136) 	the method to call when a new instance of this filesystem should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  137) 	be mounted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  139) ``kill_sb``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  140) 	the method to call when an instance of this filesystem should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  141) 	shut down
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  144) ``owner``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  145) 	for internal VFS use: you should initialize this to THIS_MODULE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  146) 	in most cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  148) ``next``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  149) 	for internal VFS use: you should initialize this to NULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  151)   s_lock_key, s_umount_key: lockdep-specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  153) The mount() method has the following arguments:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  155) ``struct file_system_type *fs_type``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  156) 	describes the filesystem, partly initialized by the specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  157) 	filesystem code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  159) ``int flags``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  160) 	mount flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  162) ``const char *dev_name``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  163) 	the device name we are mounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  165) ``void *data``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  166) 	arbitrary mount options, usually comes as an ASCII string (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  167) 	"Mount Options" section)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  169) The mount() method must return the root dentry of the tree requested by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  170) caller.  An active reference to its superblock must be grabbed and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  171) superblock must be locked.  On failure it should return ERR_PTR(error).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  173) The arguments match those of mount(2) and their interpretation depends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  174) on filesystem type.  E.g. for block filesystems, dev_name is interpreted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  175) as block device name, that device is opened and if it contains a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  176) suitable filesystem image the method creates and initializes struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  177) super_block accordingly, returning its root dentry to caller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  179) ->mount() may choose to return a subtree of existing filesystem - it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  180) doesn't have to create a new one.  The main result from the caller's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  181) point of view is a reference to dentry at the root of (sub)tree to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  182) attached; creation of new superblock is a common side effect.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  184) The most interesting member of the superblock structure that the mount()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  185) method fills in is the "s_op" field.  This is a pointer to a "struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  186) super_operations" which describes the next level of the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  187) implementation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  189) Usually, a filesystem uses one of the generic mount() implementations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  190) and provides a fill_super() callback instead.  The generic variants are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  192) ``mount_bdev``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  193) 	mount a filesystem residing on a block device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  195) ``mount_nodev``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  196) 	mount a filesystem that is not backed by a device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  198) ``mount_single``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  199) 	mount a filesystem which shares the instance between all mounts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  201) A fill_super() callback implementation has the following arguments:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  203) ``struct super_block *sb``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  204) 	the superblock structure.  The callback must initialize this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  205) 	properly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  207) ``void *data``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  208) 	arbitrary mount options, usually comes as an ASCII string (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  209) 	"Mount Options" section)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  211) ``int silent``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  212) 	whether or not to be silent on error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  213) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  215) The Superblock Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  216) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  218) A superblock object represents a mounted filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  221) struct super_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  222) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  223) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  224) This describes how the VFS can manipulate the superblock of your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  225) filesystem.  As of kernel 2.6.22, the following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  227) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  229) 	struct super_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  230) 		struct inode *(*alloc_inode)(struct super_block *sb);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  231) 		void (*destroy_inode)(struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  233) 		void (*dirty_inode) (struct inode *, int flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  234) 		int (*write_inode) (struct inode *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  235) 		void (*drop_inode) (struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  236) 		void (*delete_inode) (struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  237) 		void (*put_super) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  238) 		int (*sync_fs)(struct super_block *sb, int wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  239) 		int (*freeze_fs) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  240) 		int (*unfreeze_fs) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  241) 		int (*statfs) (struct dentry *, struct kstatfs *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  242) 		int (*remount_fs) (struct super_block *, int *, char *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  243) 		void (*clear_inode) (struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  244) 		void (*umount_begin) (struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  246) 		int (*show_options)(struct seq_file *, struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  248) 		ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  249) 		ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  250) 		int (*nr_cached_objects)(struct super_block *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  251) 		void (*free_cached_objects)(struct super_block *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  252) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  253) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  254) All methods are called without any locks being held, unless otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  255) noted.  This means that most methods can block safely.  All methods are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  256) only called from a process context (i.e. not from an interrupt handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  257) or bottom half).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  258) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  259) ``alloc_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  260) 	this method is called by alloc_inode() to allocate memory for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  261) 	struct inode and initialize it.  If this function is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  262) 	defined, a simple 'struct inode' is allocated.  Normally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  263) 	alloc_inode will be used to allocate a larger structure which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  264) 	contains a 'struct inode' embedded within it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  265) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  266) ``destroy_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  267) 	this method is called by destroy_inode() to release resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  268) 	allocated for struct inode.  It is only required if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  269) 	->alloc_inode was defined and simply undoes anything done by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  270) 	->alloc_inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  272) ``dirty_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  273) 	this method is called by the VFS to mark an inode dirty.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  274) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  275) ``write_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  276) 	this method is called when the VFS needs to write an inode to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  277) 	disc.  The second parameter indicates whether the write should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  278) 	be synchronous or not, not all filesystems check this flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  280) ``drop_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  281) 	called when the last access to the inode is dropped, with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  282) 	inode->i_lock spinlock held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  283) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  284) 	This method should be either NULL (normal UNIX filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  285) 	semantics) or "generic_delete_inode" (for filesystems that do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  286) 	not want to cache inodes - causing "delete_inode" to always be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  287) 	called regardless of the value of i_nlink)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  288) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  289) 	The "generic_delete_inode()" behavior is equivalent to the old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  290) 	practice of using "force_delete" in the put_inode() case, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  291) 	does not have the races that the "force_delete()" approach had.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  292) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  293) ``delete_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  294) 	called when the VFS wants to delete an inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  296) ``put_super``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  297) 	called when the VFS wishes to free the superblock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  298) 	(i.e. unmount).  This is called with the superblock lock held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  300) ``sync_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  301) 	called when VFS is writing out all dirty data associated with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  302) 	superblock.  The second parameter indicates whether the method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  303) 	should wait until the write out has been completed.  Optional.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  305) ``freeze_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  306) 	called when VFS is locking a filesystem and forcing it into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  307) 	consistent state.  This method is currently used by the Logical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  308) 	Volume Manager (LVM).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  310) ``unfreeze_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  311) 	called when VFS is unlocking a filesystem and making it writable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  312) 	again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  314) ``statfs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  315) 	called when the VFS needs to get filesystem statistics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  317) ``remount_fs``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  318) 	called when the filesystem is remounted.  This is called with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  319) 	the kernel lock held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  321) ``clear_inode``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  322) 	called then the VFS clears the inode.  Optional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  323) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  324) ``umount_begin``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  325) 	called when the VFS is unmounting a filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  327) ``show_options``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  328) 	called by the VFS to show mount options for /proc/<pid>/mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  329) 	(see "Mount Options" section)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  330) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  331) ``quota_read``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  332) 	called by the VFS to read from filesystem quota file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  334) ``quota_write``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  335) 	called by the VFS to write to filesystem quota file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  337) ``nr_cached_objects``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  338) 	called by the sb cache shrinking function for the filesystem to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  339) 	return the number of freeable cached objects it contains.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  340) 	Optional.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  342) ``free_cache_objects``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  343) 	called by the sb cache shrinking function for the filesystem to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  344) 	scan the number of objects indicated to try to free them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  345) 	Optional, but any filesystem implementing this method needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  346) 	also implement ->nr_cached_objects for it to be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  347) 	correctly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  348) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  349) 	We can't do anything with any errors that the filesystem might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  350) 	encountered, hence the void return type.  This will never be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  351) 	called if the VM is trying to reclaim under GFP_NOFS conditions,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  352) 	hence this method does not need to handle that situation itself.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  353) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  354) 	Implementations must include conditional reschedule calls inside
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  355) 	any scanning loop that is done.  This allows the VFS to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  356) 	determine appropriate scan batch sizes without having to worry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  357) 	about whether implementations will cause holdoff problems due to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  358) 	large scan batch sizes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  359) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  360) Whoever sets up the inode is responsible for filling in the "i_op"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  361) field.  This is a pointer to a "struct inode_operations" which describes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  362) the methods that can be performed on individual inodes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  365) struct xattr_handlers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  366) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  368) On filesystems that support extended attributes (xattrs), the s_xattr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  369) superblock field points to a NULL-terminated array of xattr handlers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  370) Extended attributes are name:value pairs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  371) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  372) ``name``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  373) 	Indicates that the handler matches attributes with the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  374) 	name (such as "system.posix_acl_access"); the prefix field must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  375) 	be NULL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  376) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  377) ``prefix``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  378) 	Indicates that the handler matches all attributes with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  379) 	specified name prefix (such as "user."); the name field must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  380) 	NULL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  381) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  382) ``list``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  383) 	Determine if attributes matching this xattr handler should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  384) 	listed for a particular dentry.  Used by some listxattr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  385) 	implementations like generic_listxattr.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  386) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  387) ``get``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  388) 	Called by the VFS to get the value of a particular extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  389) 	attribute.  This method is called by the getxattr(2) system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  390) 	call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  391) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  392) ``set``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  393) 	Called by the VFS to set the value of a particular extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  394) 	attribute.  When the new value is NULL, called to remove a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  395) 	particular extended attribute.  This method is called by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  396) 	setxattr(2) and removexattr(2) system calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  397) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  398) When none of the xattr handlers of a filesystem match the specified
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  399) attribute name or when a filesystem doesn't support extended attributes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  400) the various ``*xattr(2)`` system calls return -EOPNOTSUPP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  402) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  403) The Inode Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  404) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  405) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  406) An inode object represents an object within the filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  407) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  408) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  409) struct inode_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  410) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  411) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  412) This describes how the VFS can manipulate an inode in your filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  413) As of kernel 2.6.22, the following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  414) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  415) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  416) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  417) 	struct inode_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  418) 		int (*create) (struct inode *,struct dentry *, umode_t, bool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  419) 		struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  420) 		int (*link) (struct dentry *,struct inode *,struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  421) 		int (*unlink) (struct inode *,struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  422) 		int (*symlink) (struct inode *,struct dentry *,const char *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  423) 		int (*mkdir) (struct inode *,struct dentry *,umode_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  424) 		int (*rmdir) (struct inode *,struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  425) 		int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  426) 		int (*rename) (struct inode *, struct dentry *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  427) 			       struct inode *, struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  428) 		int (*readlink) (struct dentry *, char __user *,int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  429) 		const char *(*get_link) (struct dentry *, struct inode *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  430) 					 struct delayed_call *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  431) 		int (*permission) (struct inode *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  432) 		int (*get_acl)(struct inode *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  433) 		int (*setattr) (struct dentry *, struct iattr *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  434) 		int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  435) 		ssize_t (*listxattr) (struct dentry *, char *, size_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  436) 		void (*update_time)(struct inode *, struct timespec *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  437) 		int (*atomic_open)(struct inode *, struct dentry *, struct file *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  438) 				   unsigned open_flag, umode_t create_mode);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  439) 		int (*tmpfile) (struct inode *, struct dentry *, umode_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  440) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  442) Again, all methods are called without any locks being held, unless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  443) otherwise noted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  444) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  445) ``create``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  446) 	called by the open(2) and creat(2) system calls.  Only required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  447) 	if you want to support regular files.  The dentry you get should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  448) 	not have an inode (i.e. it should be a negative dentry).  Here
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  449) 	you will probably call d_instantiate() with the dentry and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  450) 	newly created inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  451) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  452) ``lookup``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  453) 	called when the VFS needs to look up an inode in a parent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  454) 	directory.  The name to look for is found in the dentry.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  455) 	method must call d_add() to insert the found inode into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  456) 	dentry.  The "i_count" field in the inode structure should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  457) 	incremented.  If the named inode does not exist a NULL inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  458) 	should be inserted into the dentry (this is called a negative
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  459) 	dentry).  Returning an error code from this routine must only be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  460) 	done on a real error, otherwise creating inodes with system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  461) 	calls like create(2), mknod(2), mkdir(2) and so on will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  462) 	If you wish to overload the dentry methods then you should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  463) 	initialise the "d_dop" field in the dentry; this is a pointer to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  464) 	a struct "dentry_operations".  This method is called with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  465) 	directory inode semaphore held
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  466) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  467) ``link``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  468) 	called by the link(2) system call.  Only required if you want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  469) 	support hard links.  You will probably need to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  470) 	d_instantiate() just as you would in the create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  471) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  472) ``unlink``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  473) 	called by the unlink(2) system call.  Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  474) 	to support deleting inodes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  475) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  476) ``symlink``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  477) 	called by the symlink(2) system call.  Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  478) 	to support symlinks.  You will probably need to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  479) 	d_instantiate() just as you would in the create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  481) ``mkdir``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  482) 	called by the mkdir(2) system call.  Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  483) 	to support creating subdirectories.  You will probably need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  484) 	call d_instantiate() just as you would in the create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  485) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  486) ``rmdir``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  487) 	called by the rmdir(2) system call.  Only required if you want
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  488) 	to support deleting subdirectories
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  489) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  490) ``mknod``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  491) 	called by the mknod(2) system call to create a device (char,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  492) 	block) inode or a named pipe (FIFO) or socket.  Only required if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  493) 	you want to support creating these types of inodes.  You will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  494) 	probably need to call d_instantiate() just as you would in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  495) 	create() method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  496) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  497) ``rename``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  498) 	called by the rename(2) system call to rename the object to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  499) 	the parent and name given by the second inode and dentry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  500) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  501) 	The filesystem must return -EINVAL for any unsupported or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  502) 	unknown flags.  Currently the following flags are implemented:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  503) 	(1) RENAME_NOREPLACE: this flag indicates that if the target of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  504) 	the rename exists the rename should fail with -EEXIST instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  505) 	replacing the target.  The VFS already checks for existence, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  506) 	for local filesystems the RENAME_NOREPLACE implementation is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  507) 	equivalent to plain rename.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  508) 	(2) RENAME_EXCHANGE: exchange source and target.  Both must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  509) 	exist; this is checked by the VFS.  Unlike plain rename, source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  510) 	and target may be of different type.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  511) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  512) ``get_link``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  513) 	called by the VFS to follow a symbolic link to the inode it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  514) 	points to.  Only required if you want to support symbolic links.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  515) 	This method returns the symlink body to traverse (and possibly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  516) 	resets the current position with nd_jump_link()).  If the body
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  517) 	won't go away until the inode is gone, nothing else is needed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  518) 	if it needs to be otherwise pinned, arrange for its release by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  519) 	having get_link(..., ..., done) do set_delayed_call(done,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  520) 	destructor, argument).  In that case destructor(argument) will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  521) 	be called once VFS is done with the body you've returned.  May
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  522) 	be called in RCU mode; that is indicated by NULL dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  523) 	argument.  If request can't be handled without leaving RCU mode,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  524) 	have it return ERR_PTR(-ECHILD).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  526) 	If the filesystem stores the symlink target in ->i_link, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  527) 	VFS may use it directly without calling ->get_link(); however,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  528) 	->get_link() must still be provided.  ->i_link must not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  529) 	freed until after an RCU grace period.  Writing to ->i_link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  530) 	post-iget() time requires a 'release' memory barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  531) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  532) ``readlink``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  533) 	this is now just an override for use by readlink(2) for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  534) 	cases when ->get_link uses nd_jump_link() or object is not in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  535) 	fact a symlink.  Normally filesystems should only implement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  536) 	->get_link for symlinks and readlink(2) will automatically use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  537) 	that.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  538) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  539) ``permission``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  540) 	called by the VFS to check for access rights on a POSIX-like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  541) 	filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  542) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  543) 	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  544) 	rcu-walk mode, the filesystem must check the permission without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  545) 	blocking or storing to the inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  546) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  547) 	If a situation is encountered that rcu-walk cannot handle,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  548) 	return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  549) 	-ECHILD and it will be called again in ref-walk mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  550) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  551) ``setattr``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  552) 	called by the VFS to set attributes for a file.  This method is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  553) 	called by chmod(2) and related system calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  555) ``getattr``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  556) 	called by the VFS to get attributes of a file.  This method is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  557) 	called by stat(2) and related system calls.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  558) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  559) ``listxattr``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  560) 	called by the VFS to list all extended attributes for a given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  561) 	file.  This method is called by the listxattr(2) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  562) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  563) ``update_time``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  564) 	called by the VFS to update a specific time or the i_version of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  565) 	an inode.  If this is not defined the VFS will update the inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  566) 	itself and call mark_inode_dirty_sync.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  567) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  568) ``atomic_open``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  569) 	called on the last component of an open.  Using this optional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  570) 	method the filesystem can look up, possibly create and open the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  571) 	file in one atomic operation.  If it wants to leave actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  572) 	opening to the caller (e.g. if the file turned out to be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  573) 	symlink, device, or just something filesystem won't do atomic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  574) 	open for), it may signal this by returning finish_no_open(file,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  575) 	dentry).  This method is only called if the last component is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  576) 	negative or needs lookup.  Cached positive dentries are still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  577) 	handled by f_op->open().  If the file was created, FMODE_CREATED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  578) 	flag should be set in file->f_mode.  In case of O_EXCL the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  579) 	method must only succeed if the file didn't exist and hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  580) 	FMODE_CREATED shall always be set on success.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  581) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  582) ``tmpfile``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  583) 	called in the end of O_TMPFILE open().  Optional, equivalent to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  584) 	atomically creating, opening and unlinking a file in given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  585) 	directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  586) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  588) The Address Space Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  589) ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  590) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  591) The address space object is used to group and manage pages in the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  592) cache.  It can be used to keep track of the pages in a file (or anything
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  593) else) and also track the mapping of sections of the file into process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  594) address spaces.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  595) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  596) There are a number of distinct yet related services that an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  597) address-space can provide.  These include communicating memory pressure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  598) page lookup by address, and keeping track of pages tagged as Dirty or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  599) Writeback.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  600) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  601) The first can be used independently to the others.  The VM can try to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  602) either write dirty pages in order to clean them, or release clean pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  603) in order to reuse them.  To do this it can call the ->writepage method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  604) on dirty pages, and ->releasepage on clean pages with PagePrivate set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  605) Clean pages without PagePrivate and with no external references will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  606) released without notice being given to the address_space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  607) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  608) To achieve this functionality, pages need to be placed on an LRU with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  609) lru_cache_add and mark_page_active needs to be called whenever the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  610) is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  611) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  612) Pages are normally kept in a radix tree index by ->index.  This tree
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  613) maintains information about the PG_Dirty and PG_Writeback status of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  614) page, so that pages with either of these flags can be found quickly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  615) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  616) The Dirty tag is primarily used by mpage_writepages - the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  617) ->writepages method.  It uses the tag to find dirty pages to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  618) ->writepage on.  If mpage_writepages is not used (i.e. the address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  619) provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  620) unused.  write_inode_now and sync_inode do use it (through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  621) __sync_single_inode) to check if ->writepages has been successful in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  622) writing out the whole address_space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  623) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  624) The Writeback tag is used by filemap*wait* and sync_page* functions, via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  625) filemap_fdatawait_range, to wait for all writeback to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  626) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  627) An address_space handler may attach extra information to a page,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  628) typically using the 'private' field in the 'struct page'.  If such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  629) information is attached, the PG_Private flag should be set.  This will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  630) cause various VM routines to make extra calls into the address_space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  631) handler to deal with that data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  632) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  633) An address space acts as an intermediate between storage and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  634) application.  Data is read into the address space a whole page at a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  635) time, and provided to the application either by copying of the page, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  636) by memory-mapping the page.  Data is written into the address space by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  637) the application, and then written-back to storage typically in whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  638) pages, however the address_space has finer control of write sizes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  639) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  640) The read process essentially only requires 'readpage'.  The write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  641) process is more complicated and uses write_begin/write_end or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  642) set_page_dirty to write data into the address_space, and writepage and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  643) writepages to writeback data to storage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  644) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  645) Adding and removing pages to/from an address_space is protected by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  646) inode's i_mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  647) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  648) When data is written to a page, the PG_Dirty flag should be set.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  649) typically remains set until writepage asks for it to be written.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  650) should clear PG_Dirty and set PG_Writeback.  It can be actually written
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  651) at any point after PG_Dirty is clear.  Once it is known to be safe,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  652) PG_Writeback is cleared.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  653) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  654) Writeback makes use of a writeback_control structure to direct the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  655) operations.  This gives the writepage and writepages operations some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  656) information about the nature of and reason for the writeback request,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  657) and the constraints under which it is being done.  It is also used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  658) return information back to the caller about the result of a writepage or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  659) writepages request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  660) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  661) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  662) Handling errors during writeback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  663) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  664) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  665) Most applications that do buffered I/O will periodically call a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  666) synchronization call (fsync, fdatasync, msync or sync_file_range) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  667) ensure that data written has made it to the backing store.  When there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  668) is an error during writeback, they expect that error to be reported when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  669) a file sync request is made.  After an error has been reported on one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  670) request, subsequent requests on the same file descriptor should return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  671) 0, unless further writeback errors have occurred since the previous file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  672) syncronization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  673) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  674) Ideally, the kernel would report errors only on file descriptions on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  675) which writes were done that subsequently failed to be written back.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  676) generic pagecache infrastructure does not track the file descriptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  677) that have dirtied each individual page however, so determining which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  678) file descriptors should get back an error is not possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  679) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  680) Instead, the generic writeback error tracking infrastructure in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  681) kernel settles for reporting errors to fsync on all file descriptions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  682) that were open at the time that the error occurred.  In a situation with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  683) multiple writers, all of them will get back an error on a subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  684) fsync, even if all of the writes done through that particular file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  685) descriptor succeeded (or even if there were no writes on that file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  686) descriptor at all).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  688) Filesystems that wish to use this infrastructure should call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  689) mapping_set_error to record the error in the address_space when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  690) occurs.  Then, after writing back data from the pagecache in their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  691) file->fsync operation, they should call file_check_and_advance_wb_err to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  692) ensure that the struct file's error cursor has advanced to the correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  693) point in the stream of errors emitted by the backing device(s).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  694) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  695) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  696) struct address_space_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  697) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  698) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  699) This describes how the VFS can manipulate mapping of a file to page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  700) cache in your filesystem.  The following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  701) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  702) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  703) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  704) 	struct address_space_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  705) 		int (*writepage)(struct page *page, struct writeback_control *wbc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  706) 		int (*readpage)(struct file *, struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  707) 		int (*writepages)(struct address_space *, struct writeback_control *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  708) 		int (*set_page_dirty)(struct page *page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  709) 		void (*readahead)(struct readahead_control *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  710) 		int (*readpages)(struct file *filp, struct address_space *mapping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  711) 				 struct list_head *pages, unsigned nr_pages);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  712) 		int (*write_begin)(struct file *, struct address_space *mapping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  713) 				   loff_t pos, unsigned len, unsigned flags,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  714) 				struct page **pagep, void **fsdata);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  715) 		int (*write_end)(struct file *, struct address_space *mapping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  716) 				 loff_t pos, unsigned len, unsigned copied,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  717) 				 struct page *page, void *fsdata);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  718) 		sector_t (*bmap)(struct address_space *, sector_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  719) 		void (*invalidatepage) (struct page *, unsigned int, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  720) 		int (*releasepage) (struct page *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  721) 		void (*freepage)(struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  722) 		ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  723) 		/* isolate a page for migration */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  724) 		bool (*isolate_page) (struct page *, isolate_mode_t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  725) 		/* migrate the contents of a page to the specified target */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  726) 		int (*migratepage) (struct page *, struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  727) 		/* put migration-failed page back to right list */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  728) 		void (*putback_page) (struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  729) 		int (*launder_page) (struct page *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  730) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  731) 		int (*is_partially_uptodate) (struct page *, unsigned long,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  732) 					      unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  733) 		void (*is_dirty_writeback) (struct page *, bool *, bool *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  734) 		int (*error_remove_page) (struct mapping *mapping, struct page *page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  735) 		int (*swap_activate)(struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  736) 		int (*swap_deactivate)(struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  737) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  738) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  739) ``writepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  740) 	called by the VM to write a dirty page to backing store.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  741) 	may happen for data integrity reasons (i.e. 'sync'), or to free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  742) 	up memory (flush).  The difference can be seen in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  743) 	wbc->sync_mode.  The PG_Dirty flag has been cleared and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  744) 	PageLocked is true.  writepage should start writeout, should set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  745) 	PG_Writeback, and should make sure the page is unlocked, either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  746) 	synchronously or asynchronously when the write operation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  747) 	completes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  748) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  749) 	If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  750) 	try too hard if there are problems, and may choose to write out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  751) 	other pages from the mapping if that is easier (e.g. due to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  752) 	internal dependencies).  If it chooses not to start writeout, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  753) 	should return AOP_WRITEPAGE_ACTIVATE so that the VM will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  754) 	keep calling ->writepage on that page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  755) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  756) 	See the file "Locking" for more details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  757) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  758) ``readpage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  759) 	called by the VM to read a page from backing store.  The page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  760) 	will be Locked when readpage is called, and should be unlocked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  761) 	and marked uptodate once the read completes.  If ->readpage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  762) 	discovers that it needs to unlock the page for some reason, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  763) 	can do so, and then return AOP_TRUNCATED_PAGE.  In this case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  764) 	the page will be relocated, relocked and if that all succeeds,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  765) 	->readpage will be called again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  766) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  767) ``writepages``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  768) 	called by the VM to write out pages associated with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  769) 	address_space object.  If wbc->sync_mode is WB_SYNC_ALL, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  770) 	the writeback_control will specify a range of pages that must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  771) 	written out.  If it is WB_SYNC_NONE, then a nr_to_write is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  772) 	given and that many pages should be written if possible.  If no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  773) 	->writepages is given, then mpage_writepages is used instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  774) 	This will choose pages from the address space that are tagged as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  775) 	DIRTY and will pass them to ->writepage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  776) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  777) ``set_page_dirty``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  778) 	called by the VM to set a page dirty.  This is particularly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  779) 	needed if an address space attaches private data to a page, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  780) 	that data needs to be updated when a page is dirtied.  This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  781) 	called, for example, when a memory mapped page gets modified.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  782) 	If defined, it should set the PageDirty flag, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  783) 	PAGECACHE_TAG_DIRTY tag in the radix tree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  784) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  785) ``readahead``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  786) 	Called by the VM to read pages associated with the address_space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  787) 	object.  The pages are consecutive in the page cache and are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  788) 	locked.  The implementation should decrement the page refcount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  789) 	after starting I/O on each page.  Usually the page will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  790) 	unlocked by the I/O completion handler.  If the filesystem decides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  791) 	to stop attempting I/O before reaching the end of the readahead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  792) 	window, it can simply return.  The caller will decrement the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  793) 	refcount and unlock the remaining pages for you.  Set PageUptodate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  794) 	if the I/O completes successfully.  Setting PageError on any page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  795) 	will be ignored; simply unlock the page if an I/O error occurs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  796) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  797) ``readpages``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  798) 	called by the VM to read pages associated with the address_space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  799) 	object.  This is essentially just a vector version of readpage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  800) 	Instead of just one page, several pages are requested.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  801) 	readpages is only used for read-ahead, so read errors are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  802) 	ignored.  If anything goes wrong, feel free to give up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  803) 	This interface is deprecated and will be removed by the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  804) 	2020; implement readahead instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  805) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  806) ``write_begin``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  807) 	Called by the generic buffered write code to ask the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  808) 	to prepare to write len bytes at the given offset in the file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  809) 	The address_space should check that the write will be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  810) 	complete, by allocating space if necessary and doing any other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  811) 	internal housekeeping.  If the write will update parts of any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  812) 	basic-blocks on storage, then those blocks should be pre-read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  813) 	(if they haven't been read already) so that the updated blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  814) 	can be written out properly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  815) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  816) 	The filesystem must return the locked pagecache page for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  817) 	specified offset, in ``*pagep``, for the caller to write into.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  818) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  819) 	It must be able to cope with short writes (where the length
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  820) 	passed to write_begin is greater than the number of bytes copied
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  821) 	into the page).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  822) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  823) 	flags is a field for AOP_FLAG_xxx flags, described in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  824) 	include/linux/fs.h.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  825) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  826) 	A void * may be returned in fsdata, which then gets passed into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  827) 	write_end.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  828) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  829) 	Returns 0 on success; < 0 on failure (which is the error code),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  830) 	in which case write_end is not called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  831) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  832) ``write_end``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  833) 	After a successful write_begin, and data copy, write_end must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  834) 	called.  len is the original len passed to write_begin, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  835) 	copied is the amount that was able to be copied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  836) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  837) 	The filesystem must take care of unlocking the page and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  838) 	releasing it refcount, and updating i_size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  839) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  840) 	Returns < 0 on failure, otherwise the number of bytes (<=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  841) 	'copied') that were able to be copied into pagecache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  842) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  843) ``bmap``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  844) 	called by the VFS to map a logical block offset within object to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  845) 	physical block number.  This method is used by the FIBMAP ioctl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  846) 	and for working with swap-files.  To be able to swap to a file,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  847) 	the file must have a stable mapping to a block device.  The swap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  848) 	system does not go through the filesystem but instead uses bmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  849) 	to find out where the blocks in the file are and uses those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  850) 	addresses directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  851) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  852) ``invalidatepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  853) 	If a page has PagePrivate set, then invalidatepage will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  854) 	called when part or all of the page is to be removed from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  855) 	address space.  This generally corresponds to either a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  856) 	truncation, punch hole or a complete invalidation of the address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  857) 	space (in the latter case 'offset' will always be 0 and 'length'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  858) 	will be PAGE_SIZE).  Any private data associated with the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  859) 	should be updated to reflect this truncation.  If offset is 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  860) 	and length is PAGE_SIZE, then the private data should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  861) 	released, because the page must be able to be completely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  862) 	discarded.  This may be done by calling the ->releasepage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  863) 	function, but in this case the release MUST succeed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  864) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  865) ``releasepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  866) 	releasepage is called on PagePrivate pages to indicate that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  867) 	page should be freed if possible.  ->releasepage should remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  868) 	any private data from the page and clear the PagePrivate flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  869) 	If releasepage() fails for some reason, it must indicate failure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  870) 	with a 0 return value.  releasepage() is used in two distinct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  871) 	though related cases.  The first is when the VM finds a clean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  872) 	page with no active users and wants to make it a free page.  If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  873) 	->releasepage succeeds, the page will be removed from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  874) 	address_space and become free.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  875) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  876) 	The second case is when a request has been made to invalidate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  877) 	some or all pages in an address_space.  This can happen through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  878) 	the fadvise(POSIX_FADV_DONTNEED) system call or by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  879) 	filesystem explicitly requesting it as nfs and 9fs do (when they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  880) 	believe the cache may be out of date with storage) by calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  881) 	invalidate_inode_pages2().  If the filesystem makes such a call,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  882) 	and needs to be certain that all pages are invalidated, then its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  883) 	releasepage will need to ensure this.  Possibly it can clear the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  884) 	PageUptodate bit if it cannot free private data yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  885) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  886) ``freepage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  887) 	freepage is called once the page is no longer visible in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  888) 	page cache in order to allow the cleanup of any private data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  889) 	Since it may be called by the memory reclaimer, it should not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  890) 	assume that the original address_space mapping still exists, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  891) 	it should not block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  892) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  893) ``direct_IO``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  894) 	called by the generic read/write routines to perform direct_IO -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  895) 	that is IO requests which bypass the page cache and transfer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  896) 	data directly between the storage and the application's address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  897) 	space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  898) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  899) ``isolate_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  900) 	Called by the VM when isolating a movable non-lru page.  If page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  901) 	is successfully isolated, VM marks the page as PG_isolated via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  902) 	__SetPageIsolated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  903) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  904) ``migrate_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  905) 	This is used to compact the physical memory usage.  If the VM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  906) 	wants to relocate a page (maybe off a memory card that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  907) 	signalling imminent failure) it will pass a new page and an old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  908) 	page to this function.  migrate_page should transfer any private
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  909) 	data across and update any references that it has to the page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  910) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  911) ``putback_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  912) 	Called by the VM when isolated page's migration fails.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  913) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  914) ``launder_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  915) 	Called before freeing a page - it writes back the dirty page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  916) 	To prevent redirtying the page, it is kept locked during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  917) 	whole operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  918) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  919) ``is_partially_uptodate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  920) 	Called by the VM when reading a file through the pagecache when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  921) 	the underlying blocksize != pagesize.  If the required block is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  922) 	up to date then the read can complete without needing the IO to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  923) 	bring the whole page up to date.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  924) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  925) ``is_dirty_writeback``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  926) 	Called by the VM when attempting to reclaim a page.  The VM uses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  927) 	dirty and writeback information to determine if it needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  928) 	stall to allow flushers a chance to complete some IO.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  929) 	Ordinarily it can use PageDirty and PageWriteback but some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  930) 	filesystems have more complex state (unstable pages in NFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  931) 	prevent reclaim) or do not set those flags due to locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  932) 	problems.  This callback allows a filesystem to indicate to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  933) 	VM if a page should be treated as dirty or writeback for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  934) 	purposes of stalling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  935) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  936) ``error_remove_page``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  937) 	normally set to generic_error_remove_page if truncation is ok
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  938) 	for this address space.  Used for memory failure handling.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  939) 	Setting this implies you deal with pages going away under you,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  940) 	unless you have them locked or reference counts increased.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  941) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  942) ``swap_activate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  943) 	Called when swapon is used on a file to allocate space if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  944) 	necessary and pin the block lookup information in memory.  A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  945) 	return value of zero indicates success, in which case this file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  946) 	can be used to back swapspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  947) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  948) ``swap_deactivate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  949) 	Called during swapoff on files where swap_activate was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  950) 	successful.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  951) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  952) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  953) The File Object
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  954) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  955) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  956) A file object represents a file opened by a process.  This is also known
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  957) as an "open file description" in POSIX parlance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  958) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  959) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  960) struct file_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  961) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  962) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  963) This describes how the VFS can manipulate an open file.  As of kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  964) 4.18, the following members are defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  965) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  966) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  967) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  968) 	struct file_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  969) 		struct module *owner;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  970) 		loff_t (*llseek) (struct file *, loff_t, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  971) 		ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  972) 		ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  973) 		ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  974) 		ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  975) 		int (*iopoll)(struct kiocb *kiocb, bool spin);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  976) 		int (*iterate) (struct file *, struct dir_context *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  977) 		int (*iterate_shared) (struct file *, struct dir_context *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  978) 		__poll_t (*poll) (struct file *, struct poll_table_struct *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  979) 		long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  980) 		long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  981) 		int (*mmap) (struct file *, struct vm_area_struct *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  982) 		int (*open) (struct inode *, struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  983) 		int (*flush) (struct file *, fl_owner_t id);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  984) 		int (*release) (struct inode *, struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  985) 		int (*fsync) (struct file *, loff_t, loff_t, int datasync);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  986) 		int (*fasync) (int, struct file *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  987) 		int (*lock) (struct file *, int, struct file_lock *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  988) 		ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  989) 		unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  990) 		int (*check_flags)(int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  991) 		int (*flock) (struct file *, int, struct file_lock *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  992) 		ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  993) 		ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  994) 		int (*setlease)(struct file *, long, struct file_lock **, void **);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  995) 		long (*fallocate)(struct file *file, int mode, loff_t offset,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  996) 				  loff_t len);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  997) 		void (*show_fdinfo)(struct seq_file *m, struct file *f);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  998) 	#ifndef CONFIG_MMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  999) 		unsigned (*mmap_capabilities)(struct file *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) 	#endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) 		ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) 		loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) 					   struct file *file_out, loff_t pos_out,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) 					   loff_t len, unsigned int remap_flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) 		int (*fadvise)(struct file *, loff_t, loff_t, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) Again, all methods are called without any locks being held, unless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) otherwise noted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) ``llseek``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) 	called when the VFS needs to move the file position index
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) ``read``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) 	called by read(2) and related system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) ``read_iter``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) 	possibly asynchronous read with iov_iter as destination
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) ``write``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) 	called by write(2) and related system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) ``write_iter``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) 	possibly asynchronous write with iov_iter as source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) ``iopoll``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) 	called when aio wants to poll for completions on HIPRI iocbs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) ``iterate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) 	called when the VFS needs to read the directory contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) ``iterate_shared``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) 	called when the VFS needs to read the directory contents when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) 	filesystem supports concurrent dir iterators
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) ``poll``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) 	called by the VFS when a process wants to check if there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) 	activity on this file and (optionally) go to sleep until there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) 	is activity.  Called by the select(2) and poll(2) system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) ``unlocked_ioctl``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) 	called by the ioctl(2) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) ``compat_ioctl``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) 	called by the ioctl(2) system call when 32 bit system calls are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) 	 used on 64 bit kernels.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) ``mmap``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) 	called by the mmap(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) ``open``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) 	called by the VFS when an inode should be opened.  When the VFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) 	opens a file, it creates a new "struct file".  It then calls the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) 	open method for the newly allocated file structure.  You might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) 	think that the open method really belongs in "struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) 	inode_operations", and you may be right.  I think it's done the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) 	way it is because it makes filesystems simpler to implement.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) 	The open() method is a good place to initialize the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) 	"private_data" member in the file structure if you want to point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) 	to a device structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) ``flush``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) 	called by the close(2) system call to flush a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) ``release``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) 	called when the last reference to an open file is closed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) ``fsync``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) 	called by the fsync(2) system call.  Also see the section above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) 	entitled "Handling errors during writeback".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) ``fasync``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) 	called by the fcntl(2) system call when asynchronous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) 	(non-blocking) mode is enabled for a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) ``lock``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) 	called by the fcntl(2) system call for F_GETLK, F_SETLK, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) 	F_SETLKW commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) ``get_unmapped_area``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) 	called by the mmap(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) ``check_flags``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) 	called by the fcntl(2) system call for F_SETFL command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) ``flock``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) 	called by the flock(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) ``splice_write``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) 	called by the VFS to splice data from a pipe to a file.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) 	method is used by the splice(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) ``splice_read``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) 	called by the VFS to splice data from file to a pipe.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) 	method is used by the splice(2) system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) ``setlease``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) 	called by the VFS to set or release a file lock lease.  setlease
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) 	implementations should call generic_setlease to record or remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) 	the lease in the inode after setting it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) ``fallocate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) 	called by the VFS to preallocate blocks or punch a hole.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) ``copy_file_range``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) 	called by the copy_file_range(2) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) ``remap_file_range``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) 	called by the ioctl(2) system call for FICLONERANGE and FICLONE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) 	and FIDEDUPERANGE commands to remap file ranges.  An
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) 	implementation should remap len bytes at pos_in of the source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) 	file into the dest file at pos_out.  Implementations must handle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) 	callers passing in len == 0; this means "remap to the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) 	source file".  The return value should the number of bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) 	remapped, or the usual negative error code if errors occurred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) 	before any bytes were remapped.  The remap_flags parameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117) 	accepts REMAP_FILE_* flags.  If REMAP_FILE_DEDUP is set then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) 	implementation must only remap if the requested file ranges have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) 	identical contents.  If REMAP_FILE_CAN_SHORTEN is set, the caller is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) 	ok with the implementation shortening the request length to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) 	satisfy alignment or EOF requirements (or any other reason).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) ``fadvise``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) 	possibly called by the fadvise64() system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) Note that the file operations are implemented by the specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) filesystem in which the inode resides.  When opening a device node
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) (character or block special) most filesystems will call special
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) support routines in the VFS which will locate the required device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) driver information.  These support routines replace the filesystem file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) operations with those for the device driver, and then proceed to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) the new open() method for the file.  This is how opening a device file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) in the filesystem eventually ends up calling the device driver open()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) method.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) Directory Entry Cache (dcache)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) ==============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) struct dentry_operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) This describes how a filesystem can overload the standard dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145) operations.  Dentries and the dcache are the domain of the VFS and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) individual filesystem implementations.  Device drivers have no business
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) here.  These methods may be set to NULL, as they are either optional or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) the VFS uses a default.  As of kernel 2.6.22, the following members are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) defined:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) 	struct dentry_operations {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) 		int (*d_revalidate)(struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155) 		int (*d_weak_revalidate)(struct dentry *, unsigned int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156) 		int (*d_hash)(const struct dentry *, struct qstr *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157) 		int (*d_compare)(const struct dentry *,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158) 				 unsigned int, const char *, const struct qstr *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159) 		int (*d_delete)(const struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160) 		int (*d_init)(struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) 		void (*d_release)(struct dentry *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162) 		void (*d_iput)(struct dentry *, struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) 		char *(*d_dname)(struct dentry *, char *, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) 		struct vfsmount *(*d_automount)(struct path *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) 		int (*d_manage)(const struct path *, bool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166) 		struct dentry *(*d_real)(struct dentry *, const struct inode *);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169) ``d_revalidate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170) 	called when the VFS needs to revalidate a dentry.  This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171) 	called whenever a name look-up finds a dentry in the dcache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172) 	Most local filesystems leave this as NULL, because all their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) 	dentries in the dcache are valid.  Network filesystems are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174) 	different since things can change on the server without the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175) 	client necessarily being aware of it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) 	This function should return a positive value if the dentry is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) 	still valid, and zero or a negative error code if it isn't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) 	d_revalidate may be called in rcu-walk mode (flags &
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) 	LOOKUP_RCU).  If in rcu-walk mode, the filesystem must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182) 	revalidate the dentry without blocking or storing to the dentry,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183) 	d_parent and d_inode should not be used without care (because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184) 	they can change and, in d_inode case, even become NULL under
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185) 	us).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187) 	If a situation is encountered that rcu-walk cannot handle,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188) 	return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189) 	-ECHILD and it will be called again in ref-walk mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) ``_weak_revalidate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192) 	called when the VFS needs to revalidate a "jumped" dentry.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) 	is called when a path-walk ends at dentry that was not acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194) 	by doing a lookup in the parent directory.  This includes "/",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) 	"." and "..", as well as procfs-style symlinks and mountpoint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) 	traversal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) 	In this case, we are less concerned with whether the dentry is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) 	still fully correct, but rather that the inode is still valid.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200) 	As with d_revalidate, most local filesystems will set this to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201) 	NULL since their dcache entries are always valid.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) 	This function has the same return code semantics as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) 	d_revalidate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206) 	d_weak_revalidate is only called after leaving rcu-walk mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) ``d_hash``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209) 	called when the VFS adds a dentry to the hash table.  The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210) 	dentry passed to d_hash is the parent directory that the name is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211) 	to be hashed into.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1213) 	Same locking and synchronisation rules as d_compare regarding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1214) 	what is safe to dereference etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1216) ``d_compare``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1217) 	called to compare a dentry name with a given name.  The first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1218) 	dentry is the parent of the dentry to be compared, the second is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1219) 	the child dentry.  len and name string are properties of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1220) 	dentry to be compared.  qstr is the name to compare it with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1221) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1222) 	Must be constant and idempotent, and should not take locks if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1223) 	possible, and should not or store into the dentry.  Should not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1224) 	dereference pointers outside the dentry without lots of care
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1225) 	(eg.  d_parent, d_inode, d_name should not be used).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1227) 	However, our vfsmount is pinned, and RCU held, so the dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1228) 	and inodes won't disappear, neither will our sb or filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1229) 	module.  ->d_sb may be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1230) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1231) 	It is a tricky calling convention because it needs to be called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1232) 	under "rcu-walk", ie. without any locks or references on things.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1234) ``d_delete``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1235) 	called when the last reference to a dentry is dropped and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1236) 	dcache is deciding whether or not to cache it.  Return 1 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1237) 	delete immediately, or 0 to cache the dentry.  Default is NULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1238) 	which means to always cache a reachable dentry.  d_delete must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1239) 	be constant and idempotent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1241) ``d_init``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1242) 	called when a dentry is allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1244) ``d_release``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1245) 	called when a dentry is really deallocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1247) ``d_iput``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1248) 	called when a dentry loses its inode (just prior to its being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1249) 	deallocated).  The default when this is NULL is that the VFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1250) 	calls iput().  If you define this method, you must call iput()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1251) 	yourself
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1253) ``d_dname``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1254) 	called when the pathname of a dentry should be generated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1255) 	Useful for some pseudo filesystems (sockfs, pipefs, ...) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1256) 	delay pathname generation.  (Instead of doing it when dentry is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1257) 	created, it's done only when the path is needed.).  Real
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1258) 	filesystems probably dont want to use it, because their dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1259) 	are present in global dcache hash, so their hash should be an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1260) 	invariant.  As no lock is held, d_dname() should not try to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1261) 	modify the dentry itself, unless appropriate SMP safety is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1262) 	CAUTION : d_path() logic is quite tricky.  The correct way to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1263) 	return for example "Hello" is to put it at the end of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1264) 	buffer, and returns a pointer to the first char.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1265) 	dynamic_dname() helper function is provided to take care of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1266) 	this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1268) 	Example :
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1269) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1270) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1272) 	static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1273) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1274) 		return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1275) 				dentry->d_inode->i_ino);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1276) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1278) ``d_automount``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1279) 	called when an automount dentry is to be traversed (optional).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1280) 	This should create a new VFS mount record and return the record
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1281) 	to the caller.  The caller is supplied with a path parameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1282) 	giving the automount directory to describe the automount target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1283) 	and the parent VFS mount record to provide inheritable mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1284) 	parameters.  NULL should be returned if someone else managed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1285) 	make the automount first.  If the vfsmount creation failed, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1286) 	an error code should be returned.  If -EISDIR is returned, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1287) 	the directory will be treated as an ordinary directory and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1288) 	returned to pathwalk to continue walking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1290) 	If a vfsmount is returned, the caller will attempt to mount it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1291) 	on the mountpoint and will remove the vfsmount from its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1292) 	expiration list in the case of failure.  The vfsmount should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1293) 	returned with 2 refs on it to prevent automatic expiration - the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1294) 	caller will clean up the additional ref.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1296) 	This function is only used if DCACHE_NEED_AUTOMOUNT is set on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1297) 	the dentry.  This is set by __d_instantiate() if S_AUTOMOUNT is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1298) 	set on the inode being added.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1300) ``d_manage``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1301) 	called to allow the filesystem to manage the transition from a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1302) 	dentry (optional).  This allows autofs, for example, to hold up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1303) 	clients waiting to explore behind a 'mountpoint' while letting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1304) 	the daemon go past and construct the subtree there.  0 should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1305) 	returned to let the calling process continue.  -EISDIR can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1306) 	returned to tell pathwalk to use this directory as an ordinary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1307) 	directory and to ignore anything mounted on it and not to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1308) 	the automount flag.  Any other error code will abort pathwalk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1309) 	completely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1310) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1311) 	If the 'rcu_walk' parameter is true, then the caller is doing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1312) 	pathwalk in RCU-walk mode.  Sleeping is not permitted in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1313) 	mode, and the caller can be asked to leave it and call again by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1314) 	returning -ECHILD.  -EISDIR may also be returned to tell
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1315) 	pathwalk to ignore d_automount or any mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1317) 	This function is only used if DCACHE_MANAGE_TRANSIT is set on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1318) 	the dentry being transited from.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1320) ``d_real``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1321) 	overlay/union type filesystems implement this method to return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1322) 	one of the underlying dentries hidden by the overlay.  It is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1323) 	used in two different modes:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1325) 	Called from file_dentry() it returns the real dentry matching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1326) 	the inode argument.  The real dentry may be from a lower layer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1327) 	already copied up, but still referenced from the file.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1328) 	mode is selected with a non-NULL inode argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1330) 	With NULL inode the topmost real underlying dentry is returned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1332) Each dentry has a pointer to its parent dentry, as well as a hash list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1333) of child dentries.  Child dentries are basically like files in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1334) directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1335) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1337) Directory Entry Cache API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1338) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1339) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1340) There are a number of functions defined which permit a filesystem to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1341) manipulate dentries:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1342) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1343) ``dget``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1344) 	open a new handle for an existing dentry (this just increments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1345) 	the usage count)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1346) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1347) ``dput``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1348) 	close a handle for a dentry (decrements the usage count).  If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1349) 	the usage count drops to 0, and the dentry is still in its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1350) 	parent's hash, the "d_delete" method is called to check whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1351) 	it should be cached.  If it should not be cached, or if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1352) 	dentry is not hashed, it is deleted.  Otherwise cached dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1353) 	are put into an LRU list to be reclaimed on memory shortage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1354) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1355) ``d_drop``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1356) 	this unhashes a dentry from its parents hash list.  A subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1357) 	call to dput() will deallocate the dentry if its usage count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1358) 	drops to 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1359) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1360) ``d_delete``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1361) 	delete a dentry.  If there are no other open references to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1362) 	dentry then the dentry is turned into a negative dentry (the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1363) 	d_iput() method is called).  If there are other references, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1364) 	d_drop() is called instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1365) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1366) ``d_add``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1367) 	add a dentry to its parents hash list and then calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1368) 	d_instantiate()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1370) ``d_instantiate``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1371) 	add a dentry to the alias hash list for the inode and updates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1372) 	the "d_inode" member.  The "i_count" member in the inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1373) 	structure should be set/incremented.  If the inode pointer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1374) 	NULL, the dentry is called a "negative dentry".  This function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1375) 	is commonly called when an inode is created for an existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1376) 	negative dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1377) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1378) ``d_lookup``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1379) 	look up a dentry given its parent and path name component It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1380) 	looks up the child of that given name from the dcache hash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1381) 	table.  If it is found, the reference count is incremented and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1382) 	the dentry is returned.  The caller must use dput() to free the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1383) 	dentry when it finishes using it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1384) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1385) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1386) Mount Options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1387) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1389) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1390) Parsing options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1391) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1393) On mount and remount the filesystem is passed a string containing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1394) comma separated list of mount options.  The options can have either of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1395) these forms:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1396) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1397)   option
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1398)   option=value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1399) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1400) The <linux/parser.h> header defines an API that helps parse these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1401) options.  There are plenty of examples on how to use it in existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1402) filesystems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1403) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1404) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1405) Showing options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1406) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1407) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1408) If a filesystem accepts mount options, it must define show_options() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1409) show all the currently active options.  The rules are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1411)   - options MUST be shown which are not default or their values differ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1412)     from the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1413) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1414)   - options MAY be shown which are enabled by default or have their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1415)     default value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1416) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1417) Options used only internally between a mount helper and the kernel (such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1418) as file descriptors), or which only have an effect during the mounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1419) (such as ones controlling the creation of a journal) are exempt from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1420) above rules.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1421) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1422) The underlying reason for the above rules is to make sure, that a mount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1423) can be accurately replicated (e.g. umounting and mounting again) based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1424) on the information found in /proc/mounts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1425) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1426) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1427) Resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1428) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1430) (Note some of these resources are not up-to-date with the latest kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1431)  version.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1432) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1433) Creating Linux virtual filesystems. 2002
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1434)     <https://lwn.net/Articles/13325/>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1436) The Linux Virtual File-system Layer by Neil Brown. 1999
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1437)     <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1438) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1439) A tour of the Linux VFS by Michael K. Johnson. 1996
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1440)     <https://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1442) A small trail through the Linux kernel by Andries Brouwer. 2001
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1443)     <https://www.win.tue.nl/~aeb/linux/vfs/trail.html>