^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) :orphan:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) Making Filesystems Exportable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) All filesystem operations require a dentry (or two) as a starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) point. Local applications have a reference-counted hold on suitable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) dentries via open file descriptors or cwd/root. However remote
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) applications that access a filesystem via a remote filesystem protocol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) such as NFS may not be able to hold such a reference, and so need a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) different way to refer to a particular dentry. As the alternative
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) form of reference needs to be stable across renames, truncates, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) server-reboot (among other things, though these tend to be the most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) problematic), there is no simple answer like 'filename'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) The mechanism discussed here allows each filesystem implementation to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) specify how to generate an opaque (outside of the filesystem) byte
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) string for any dentry, and how to find an appropriate dentry for any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) given opaque byte string.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) This byte string will be called a "filehandle fragment" as it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) corresponds to part of an NFS filehandle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) A filesystem which supports the mapping between filehandle fragments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) and dentries will be termed "exportable".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) Dcache Issues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) The dcache normally contains a proper prefix of any given filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) tree. This means that if any filesystem object is in the dcache, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) all of the ancestors of that filesystem object are also in the dcache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) As normal access is by filename this prefix is created naturally and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) maintained easily (by each object maintaining a reference count on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) its parent).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) However when objects are included into the dcache by interpreting a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) filehandle fragment, there is no automatic creation of a path prefix
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) for the object. This leads to two related but distinct features of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) the dcache that are not needed for normal filesystem access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) 1. The dcache must sometimes contain objects that are not part of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) proper prefix. i.e that are not connected to the root.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) 2. The dcache must be prepared for a newly found (via ->lookup) directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) to already have a (non-connected) dentry, and must be able to move
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) that dentry into place (based on the parent and name in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) ->lookup). This is particularly needed for directories as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) it is a dcache invariant that directories only have one dentry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) To implement these features, the dcache has:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) a. A dentry flag DCACHE_DISCONNECTED which is set on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) any dentry that might not be part of the proper prefix.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) This is set when anonymous dentries are created, and cleared when a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) dentry is noticed to be a child of a dentry which is in the proper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) prefix. If the refcount on a dentry with this flag set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) becomes zero, the dentry is immediately discarded, rather than being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) kept in the dcache. If a dentry that is not already in the dcache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) is repeatedly accessed by filehandle (as NFSD might do), an new dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) will be a allocated for each access, and discarded at the end of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) the access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Note that such a dentry can acquire children, name, ancestors, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) without losing DCACHE_DISCONNECTED - that flag is only cleared when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) subtree is successfully reconnected to root. Until then dentries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) in such subtree are retained only as long as there are references;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) refcount reaching zero means immediate eviction, same as for unhashed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) dentries. That guarantees that we won't need to hunt them down upon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) umount.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) b. A primitive for creation of secondary roots - d_obtain_root(inode).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) Those do _not_ bear DCACHE_DISCONNECTED. They are placed on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) per-superblock list (->s_roots), so they can be located at umount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) time for eviction purposes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) c. Helper routines to allocate anonymous dentries, and to help attach
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) loose directory dentries at lookup time. They are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) d_obtain_alias(inode) will return a dentry for the given inode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) If the inode already has a dentry, one of those is returned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) If it doesn't, a new anonymous (IS_ROOT and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) DCACHE_DISCONNECTED) dentry is allocated and attached.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) In the case of a directory, care is taken that only one dentry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) can ever be attached.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) d_splice_alias(inode, dentry) will introduce a new dentry into the tree;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) either the passed-in dentry or a preexisting alias for the given inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) (such as an anonymous one created by d_obtain_alias), if appropriate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) It returns NULL when the passed-in dentry is used, following the calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) convention of ->lookup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) Filesystem Issues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) For a filesystem to be exportable it must:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 1. provide the filehandle fragment routines described below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 2. make sure that d_splice_alias is used rather than d_add
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) when ->lookup finds an inode for a given parent and name.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) If inode is NULL, d_splice_alias(inode, dentry) is equivalent to::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) d_add(dentry, inode), NULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Typically the ->lookup routine will simply end with a::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) return d_splice_alias(inode, dentry);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) A file system implementation declares that instances of the filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) are exportable by setting the s_export_op field in the struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) super_block. This field must point to a "struct export_operations"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) struct which has the following members:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) encode_fh (optional)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) Takes a dentry and creates a filehandle fragment which can later be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) to find or create a dentry for the same object. The default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) implementation creates a filehandle fragment that encodes a 32bit inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) and generation number for the inode encoded, and if necessary the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) same information for the parent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) fh_to_dentry (mandatory)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Given a filehandle fragment, this should find the implied object and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) create a dentry for it (possibly with d_obtain_alias).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) fh_to_parent (optional but strongly recommended)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) Given a filehandle fragment, this should find the parent of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) implied object and create a dentry for it (possibly with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) d_obtain_alias). May fail if the filehandle fragment is too small.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) get_parent (optional but strongly recommended)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) When given a dentry for a directory, this should return a dentry for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) the parent. Quite possibly the parent dentry will have been allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) by d_alloc_anon. The default get_parent function just returns an error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) so any filehandle lookup that requires finding a parent will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) ->lookup("..") is *not* used as a default as it can leave ".." entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) in the dcache which are too messy to work with.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) get_name (optional)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) When given a parent dentry and a child dentry, this should find a name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) in the directory identified by the parent dentry, which leads to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) object identified by the child dentry. If no get_name function is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) supplied, a default implementation is provided which uses vfs_readdir
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) to find potential names, and matches inode numbers to find the correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) match.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) A filehandle fragment consists of an array of 1 or more 4byte words,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) together with a one byte "type".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) The decode_fh routine should not depend on the stated size that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) passed to it. This size may be larger than the original filehandle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) generated by encode_fh, in which case it will have been padded with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) nuls. Rather, the encode_fh routine should choose a "type" which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) indicates the decode_fh how much of the filehandle is valid, and how
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) it should be interpreted.