Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) ORANGEFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) for large storage problems faced by HPC, BigData, Streaming Video,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Genomics, Bioinformatics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) Orangefs, originally called PVFS, was first developed in 1993 by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) Walt Ligon and Eric Blumer as a parallel file system for Parallel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) of parallel programs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) Orangefs features include:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18)   * Distributes file data among multiple file servers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19)   * Supports simultaneous access by multiple clients
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20)   * Stores file data and metadata on servers using local file system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)     and access methods
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)   * Userspace implementation is easy to install and maintain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23)   * Direct MPI support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24)   * Stateless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) Mailing List Archives
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) Mailing List Submissions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) devel@lists.orangefs.org
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) Documentation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) http://www.orangefs.org/documentation/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) Running ORANGEFS On a Single Server
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) OrangeFS is usually run in large installations with multiple servers and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) clients, but a complete filesystem can be run on a single machine for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) development and testing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) On Fedora, install orangefs and orangefs-server::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)     dnf -y install orangefs orangefs-server
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) There is an example server configuration file in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) /etc/orangefs/orangefs.conf.  Change localhost to your hostname if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) To generate a filesystem to run xfstests against, see below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) There is an example client configuration file in /etc/pvfs2tab.  It is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) single line.  Uncomment it and change the hostname if necessary.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) controls clients which use libpvfs2.  This does not control the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) pvfs2-client-core.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) Create the filesystem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)     pvfs2-server -f /etc/orangefs/orangefs.conf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) Start the server::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)     systemctl start orangefs-server
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) Test the server::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)     pvfs2-ping -m /pvfsmnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) Start the client.  The module must be compiled in or loaded before this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) point::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)     systemctl start orangefs-client
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) Mount the filesystem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)     mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) Userspace Filesystem Source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) http://www.orangefs.org/download
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) Orangefs versions prior to 2.9.3 would not be compatible with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) upstream version of the kernel client.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) Building ORANGEFS on a Single Server
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) Where OrangeFS cannot be installed from distribution packages, it may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) built from source.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) You can omit --prefix if you don't care that things are sprinkled around
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) in /usr/local.  As of version 2.9.6, OrangeFS uses Berkeley DB by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) default, we will probably be changing the default to LMDB soon.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)     ./configure --prefix=/opt/ofs --with-db-backend=lmdb --disable-usrint
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)     make
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)     make install
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) Create an orangefs config file by running pvfs2-genconfig and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) specifying a target config file. Pvfs2-genconfig will prompt you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) through. Generally it works fine to take the defaults, but you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) should use your server's hostname, rather than "localhost" when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) it comes to that question::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)     /opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) Create an /etc/pvfs2tab file (localhost is fine)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)     echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 	/etc/pvfs2tab
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) Create the mount point you specified in the tab file if needed::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)     mkdir /pvfsmnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Bootstrap the server::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)     /opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Start the server::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)     /opt/ofs/sbin/pvfs2-server /etc/pvfs2.conf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Now the server should be running. Pvfs2-ls is a simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) test to verify that the server is running::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)     /opt/ofs/bin/pvfs2-ls /pvfsmnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) If stuff seems to be working, load the kernel module and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) turn on the client core::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)     /opt/ofs/sbin/pvfs2-client -p /opt/ofs/sbin/pvfs2-client-core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) Mount your filesystem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)     mount -t pvfs2 tcp://`hostname`:3334/orangefs /pvfsmnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) Running xfstests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) It is useful to use a scratch filesystem with xfstests.  This can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) done with only one server.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) Make a second copy of the FileSystem section in the server configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) file, which is /etc/orangefs/orangefs.conf.  Change the Name to scratch.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Change the ID to something other than the ID of the first FileSystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) section (2 is usually a good choice).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) Then there are two FileSystem sections: orangefs and scratch.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) This change should be made before creating the filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)     pvfs2-server -f /etc/orangefs/orangefs.conf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) To run xfstests, create /etc/xfsqa.config::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)     TEST_DIR=/orangefs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)     TEST_DEV=tcp://localhost:3334/orangefs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)     SCRATCH_MNT=/scratch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)     SCRATCH_DEV=tcp://localhost:3334/scratch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) Then xfstests can be run::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)     ./check -pvfs2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) Options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) =======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) The following mount options are accepted:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)   acl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)     Allow the use of Access Control Lists on files and directories.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)   intr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)     Some operations between the kernel client and the user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)     filesystem can be interruptible, such as changes in debug levels
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)     and the setting of tunable parameters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)   local_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)     Enable posix locking from the perspective of "this" kernel. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)     default file_operations lock action is to return ENOSYS. Posix
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)     locking kicks in if the filesystem is mounted with -o local_lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)     Distributed locking is being worked on for the future.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) Debugging
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) If you want the debug (GOSSIP) statements in a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) source file (inode.c for example) go to syslog::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)   echo inode > /sys/kernel/debug/orangefs/kernel-debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) No debugging (the default)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)   echo none > /sys/kernel/debug/orangefs/kernel-debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) Debugging from several source files::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)   echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) All debugging::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)   echo all > /sys/kernel/debug/orangefs/kernel-debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) Get a list of all debugging keywords::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)   cat /sys/kernel/debug/orangefs/debug-help
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) Protocol between Kernel Module and Userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) ============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Orangefs is a user space filesystem and an associated kernel module.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) We'll just refer to the user space part of Orangefs as "userspace"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) from here on out. Orangefs descends from PVFS, and userspace code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) still uses PVFS for function and variable names. Userspace typedefs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) many of the important structures. Function and variable names in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) the kernel module have been transitioned to "orangefs", and The Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) Coding Style avoids typedefs, so kernel module structures that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) correspond to userspace structures are not typedefed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) The kernel module implements a pseudo device that userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) can read from and write to. Userspace can also manipulate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) kernel module through the pseudo device with ioctl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) The Bufmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) At startup userspace allocates two page-size-aligned (posix_memalign)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) mlocked memory buffers, one is used for IO and one is used for readdir
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) operations. The IO buffer is 41943040 bytes and the readdir buffer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 4194304 bytes. Each buffer contains logical chunks, or partitions, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) a pointer to each buffer is added to its own PVFS_dev_map_desc structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) which also describes its total size, as well as the size and number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) the partitions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) A pointer to the IO buffer's PVFS_dev_map_desc structure is sent to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) mapping routine in the kernel module with an ioctl. The structure is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) copied from user space to kernel space with copy_from_user and is used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) then contains:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)   * refcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)     - a reference counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)   * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)     partition size, which represents the filesystem's block size and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)     is used for s_blocksize in super blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268)   * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) - the number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)     partitions in the IO buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)   * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)   * total_size - the total size of the IO buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)   * page_count - the number of 4096 byte pages in the IO buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)   * page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)     of kcalloced memory. This memory is used as an array of pointers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)     to each of the pages in the IO buffer through a call to get_user_pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)   * desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)     bytes of kcalloced memory. This memory is further intialized:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)       user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)       structure. user_desc->ptr points to the IO buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)       ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 	pages_per_desc = bufmap->desc_size / PAGE_SIZE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 	offset = 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)         bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)         bufmap->desc_array[0].array_count = pages_per_desc = 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)         bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)         offset += 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)                            .
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)                            .
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)                            .
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)         bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)         bufmap->desc_array[9].array_count = pages_per_desc = 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)         bufmap->desc_array[9].uaddr = (user_desc->ptr) +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)                                                (9 * 1024 * 4096)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)         offset += 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)   * buffer_index_array - a desc_count sized array of ints, used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)     indicate which of the IO buffer's partitions are available to use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)   * buffer_index_lock - a spinlock to protect buffer_index_array during update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)   * readdir_index_array - a five (ORANGEFS_READDIR_DEFAULT_DESC_COUNT) element
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)     int array used to indicate which of the readdir buffer's partitions are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)     available to use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)   * readdir_index_lock - a spinlock to protect readdir_index_array during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)     update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) Operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) needs to communicate with userspace. Part of the op contains the "upcall"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) which expresses the request to userspace. Part of the op eventually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) contains the "downcall" which expresses the results of the request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) The slab allocator is used to keep a cache of op structures handy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) At init time the kernel module defines and initializes a request list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) and an in_progress hash table to keep track of all the ops that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) in flight at any given time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) Ops are stateful:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)  * unknown
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 	    - op was just initialized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)  * waiting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) 	    - op is on request_list (upward bound)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)  * inprogr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 	    - op is in progress (waiting for downcall)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)  * serviced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) 	    - op has matching downcall; ok
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)  * purged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) 	    - op has to start a timer since client-core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)               exited uncleanly before servicing op
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336)  * given up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 	    - submitter has given up waiting for it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) When some arbitrary userspace program needs to perform a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) filesystem operation on Orangefs (readdir, I/O, create, whatever)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) an op structure is initialized and tagged with a distinguishing ID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) number. The upcall part of the op is filled out, and the op is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) passed to the "service_operation" function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) Service_operation changes the op's state to "waiting", puts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) it on the request list, and signals the Orangefs file_operations.poll
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) function through a wait queue. Userspace is polling the pseudo-device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) and thus becomes aware of the upcall request that needs to be read.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) When the Orangefs file_operations.read function is triggered, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) request list is searched for an op that seems ready-to-process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) The op is removed from the request list. The tag from the op and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) the filled-out upcall struct are copy_to_user'ed back to userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) If any of these (and some additional protocol) copy_to_users fail,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) the op's state is set to "waiting" and the op is added back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) the request list. Otherwise, the op's state is changed to "in progress",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) and the op is hashed on its tag and put onto the end of a list in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) in_progress hash table at the index the tag hashed to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) When userspace has assembled the response to the upcall, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) writes the response, which includes the distinguishing tag, back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) the pseudo device in a series of io_vecs. This triggers the Orangefs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) file_operations.write_iter function to find the op with the associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) tag and remove it from the in_progress hash table. As long as the op's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) state is not "canceled" or "given up", its state is set to "serviced".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) The file_operations.write_iter function returns to the waiting vfs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) and back to service_operation through wait_for_matching_downcall.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) Service operation returns to its caller with the op's downcall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) part (the response to the upcall) filled out.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) The "client-core" is the bridge between the kernel module and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) userspace. The client-core is a daemon. The client-core has an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) associated watchdog daemon. If the client-core is ever signaled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) to die, the watchdog daemon restarts the client-core. Even though
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) the client-core is restarted "right away", there is a period of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) time during such an event that the client-core is dead. A dead client-core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) can't be triggered by the Orangefs file_operations.poll function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) Ops that pass through service_operation during a "dead spell" can timeout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) on the wait queue and one attempt is made to recycle them. Obviously,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) if the client-core stays dead too long, the arbitrary userspace processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) trying to use Orangefs will be negatively affected. Waiting ops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) that can't be serviced will be removed from the request list and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) have their states set to "given up". In-progress ops that can't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) be serviced will be removed from the in_progress hash table and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) have their states set to "given up".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) Readdir and I/O ops are atypical with respect to their payloads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391)   - readdir ops use the smaller of the two pre-allocated pre-partitioned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)     memory buffers. The readdir buffer is only available to userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)     The kernel module obtains an index to a free partition before launching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)     a readdir op. Userspace deposits the results into the indexed partition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)     and then writes them to back to the pvfs device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)   - io (read and write) ops use the larger of the two pre-allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398)     pre-partitioned memory buffers. The IO buffer is accessible from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399)     both userspace and the kernel module. The kernel module obtains an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400)     index to a free partition before launching an io op. The kernel module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401)     deposits write data into the indexed partition, to be consumed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402)     directly by userspace. Userspace deposits the results of read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403)     requests into the indexed partition, to be consumed directly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404)     by the kernel module.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) Responses to kernel requests are all packaged in pvfs2_downcall_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) structs. Besides a few other members, pvfs2_downcall_t contains a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) union of structs, each of which is associated with a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) response type.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) The several members outside of the union are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413)  ``int32_t type``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414)     - type of operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415)  ``int32_t status``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416)     - return code for the operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417)  ``int64_t trailer_size``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418)     - 0 unless readdir operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419)  ``char *trailer_buf``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420)     - initialized to NULL, used during readdir operations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) The appropriate member inside the union is filled out for any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) particular response.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425)   PVFS2_VFS_OP_FILE_IO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426)     fill a pvfs2_io_response_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428)   PVFS2_VFS_OP_LOOKUP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429)     fill a PVFS_object_kref
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431)   PVFS2_VFS_OP_CREATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432)     fill a PVFS_object_kref
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434)   PVFS2_VFS_OP_SYMLINK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435)     fill a PVFS_object_kref
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437)   PVFS2_VFS_OP_GETATTR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438)     fill in a PVFS_sys_attr_s (tons of stuff the kernel doesn't need)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439)     fill in a string with the link target when the object is a symlink.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)   PVFS2_VFS_OP_MKDIR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442)     fill a PVFS_object_kref
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444)   PVFS2_VFS_OP_STATFS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445)     fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446)     us to know, in a timely fashion, these statistics about our
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447)     distributed network filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449)   PVFS2_VFS_OP_FS_MOUNT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450)     fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451)     except its members are in a different order and "__pad1" is replaced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452)     with "id".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454)   PVFS2_VFS_OP_GETXATTR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455)     fill a pvfs2_getxattr_response_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457)   PVFS2_VFS_OP_LISTXATTR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458)     fill a pvfs2_listxattr_response_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460)   PVFS2_VFS_OP_PARAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461)     fill a pvfs2_param_response_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463)   PVFS2_VFS_OP_PERF_COUNT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464)     fill a pvfs2_perf_count_response_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466)   PVFS2_VFS_OP_FSKEY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467)     file a pvfs2_fs_key_response_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469)   PVFS2_VFS_OP_READDIR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)     jamb everything needed to represent a pvfs2_readdir_response_t into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471)     the readdir buffer descriptor specified in the upcall.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) made by the kernel side.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) A buffer_list containing:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478)   - a pointer to the prepared response to the request from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479)     kernel (struct pvfs2_downcall_t).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480)   - and also, in the case of a readdir request, a pointer to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481)     buffer containing descriptors for the objects in the target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482)     directory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) ... is sent to the function (PINT_dev_write_list) which performs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) the writev.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) The first four elements of io_array are initialized like this for all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) responses::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492)   io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493)   io_array[0].iov_len = sizeof(int32_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495)   io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496)   io_array[1].iov_len = sizeof(int32_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498)   io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499)   io_array[2].iov_len = sizeof(int64_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501)   io_array[3].iov_base = address of out_downcall member (pvfs2_downcall_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502)                          of global variable vfs_request (vfs_request_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503)   io_array[3].iov_len = sizeof(pvfs2_downcall_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) Readdir responses initialize the fifth element io_array like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507)   io_array[4].iov_base = contents of member trailer_buf (char *)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508)                          from out_downcall member of global variable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509)                          vfs_request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510)   io_array[4].iov_len = contents of member trailer_size (PVFS_size)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511)                         from out_downcall member of global variable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512)                         vfs_request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) Orangefs exploits the dcache in order to avoid sending redundant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) requests to userspace. We keep object inode attributes up-to-date with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) help it decide whether or not to update an inode: "new" and "bypass".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) Orangefs keeps private data in an object's inode that includes a short
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) timeout value, getattr_time, which allows any iteration of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) orangefs_inode_getattr to know how long it has been since the inode was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) updated. When the object is not new (new == 0) and the bypass flag is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) set (bypass == 0) orangefs_inode_getattr returns without updating the inode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) if getattr_time has not timed out. Getattr_time is updated each time the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) inode is updated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) Creation of a new object (file, dir, sym-link) includes the evaluation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) its pathname, resulting in a negative directory entry for the object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) A new inode is allocated and associated with the dentry, turning it from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) a negative dentry into a "productive full member of society". Orangefs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) obtains the new inode from Linux with new_inode() and associates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) the inode with the dentry by sending the pair back to Linux with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) d_instantiate().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) The evaluation of a pathname for an object resolves to its corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) dentry. If there is no corresponding dentry, one is created for it in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) the dcache. Whenever a dentry is modified or verified Orangefs stores a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) short timeout value in the dentry's d_time, and the dentry will be trusted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) for that amount of time. Orangefs is a network filesystem, and objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) can potentially change out-of-band with any particular Orangefs kernel module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) instance, so trusting a dentry is risky. The alternative to trusting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) dentries is to always obtain the needed information from userspace - at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) least a trip to the client-core, maybe to the servers. Obtaining information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) from a dentry is cheap, obtaining it from userspace is relatively expensive,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) hence the motivation to use the dentry when possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) The timeout values d_time and getattr_time are jiffy based, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) code is designed to avoid the jiffy-wrap problem::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549)     "In general, if the clock may have wrapped around more than once, there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550)     is no way to tell how much time has elapsed. However, if the times t1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551)     and t2 are known to be fairly close, we can reliably compute the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552)     difference in a way that takes into account the possibility that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553)     clock may have wrapped between times."
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) from course notes by instructor Andy Wang
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556)