Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) File management in the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) This document describes how locking for files (struct file)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) and file descriptor table (struct files) works.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) Up until 2.6.12, the file descriptor table has been protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) with a lock (files->file_lock) and reference count (files->count).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) ->file_lock protected accesses to all the file related fields
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) of the table. ->count was used for sharing the file descriptor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) table between tasks cloned with CLONE_FILES flag. Typically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) this would be the case for posix threads. As with the common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) refcounting model in the kernel, the last task doing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) a put_files_struct() frees the file descriptor (fd) table.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) The files (struct file) themselves are protected using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) reference count (->f_count).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) In the new lock-free model of file descriptor management,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) the reference counting is similar, but the locking is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) based on RCU. The file descriptor table contains multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) elements - the fd sets (open_fds and close_on_exec, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) array of file pointers, the sizes of the sets and the array
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) etc.). In order for the updates to appear atomic to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) a lock-free reader, all the elements of the file descriptor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) table are in a separate structure - struct fdtable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) files_struct contains a pointer to struct fdtable through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) which the actual fd table is accessed. Initially the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) fdtable is embedded in files_struct itself. On a subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) expansion of fdtable, a new fdtable structure is allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) and files->fdtab points to the new structure. The fdtable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) structure is freed with RCU and lock-free readers either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) see the old fdtable or the new fdtable making the update
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) appear atomic. Here are the locking rules for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) the fdtable structure -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 1. All references to the fdtable must be done through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)    the files_fdtable() macro::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 	struct fdtable *fdt;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 	rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 	fdt = files_fdtable(files);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 	....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 	if (n <= fdt->max_fds)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 		....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 	rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)    files_fdtable() uses rcu_dereference() macro which takes care of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)    the memory barrier requirements for lock-free dereference.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)    The fdtable pointer must be read within the read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)    critical section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 2. Reading of the fdtable as described above must be protected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)    by rcu_read_lock()/rcu_read_unlock().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 3. For any update to the fd table, files->file_lock must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)    be held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 4. To look up the file structure given an fd, a reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)    must use either fcheck() or fcheck_files() APIs. These
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)    take care of barrier requirements due to lock-free lookup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)    An example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 	struct file *file;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 	rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 	file = fcheck(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 	if (file) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 		...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 	....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 	rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 5. Handling of the file structures is special. Since the look-up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)    of the fd (fget()/fget_light()) are lock-free, it is possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)    that look-up may race with the last put() operation on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)    file structure. This is avoided using atomic_long_inc_not_zero()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)    on ->f_count::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 	rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 	file = fcheck_files(files, fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 	if (file) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 		if (atomic_long_inc_not_zero(&file->f_count))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 			*fput_needed = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 		else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 		/* Didn't get the reference, someone's freed */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 			file = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 	rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 	....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 	return file;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)    atomic_long_inc_not_zero() detects if refcounts is already zero or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)    goes to zero during increment. If it does, we fail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)    fget()/fget_light().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 6. Since both fdtable and file structures can be looked up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)    lock-free, they must be installed using rcu_assign_pointer()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)    API. If they are looked up lock-free, rcu_dereference()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)    must be used. However it is advisable to use files_fdtable()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)    and fcheck()/fcheck_files() which take care of these issues.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 7. While updating, the fdtable pointer must be looked up while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)    holding files->file_lock. If ->file_lock is dropped, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)    another thread expand the files thereby creating a new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)    fdtable and making the earlier fdtable pointer stale.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)    For example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 	spin_lock(&files->file_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 	fd = locate_fd(files, file, start);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	if (fd >= 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 		/* locate_fd() may have expanded fdtable, load the ptr */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 		fdt = files_fdtable(files);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 		__set_open_fd(fd, fdt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 		__clear_close_on_exec(fd, fdt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 		spin_unlock(&files->file_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 	.....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)    Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)    the fdtable pointer (fdt) must be loaded after locate_fd().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)