Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) unshare system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) This document describes the new system call, unshare(). The document
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) provides an overview of the feature, why it is needed, how it can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) be used, its interface specification, design, implementation and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) how it can be tested.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Change Log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) version 0.1  Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) Contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 	1) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 	2) Benefits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 	3) Cost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 	4) Requirements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 	5) Functional Specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 	6) High Level Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 	7) Low Level Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 	8) Test Specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 	9) Future Work
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 1) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Most legacy operating system kernels support an abstraction of threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) as multiple execution contexts within a process. These kernels provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) special resources and mechanisms to maintain these "threads". The Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) kernel, in a clever and simple manner, does not make distinction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) between processes and "threads". The kernel allows processes to share
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) resources and thus they can achieve legacy "threads" behavior without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) requiring additional data structures and mechanisms in the kernel. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) power of implementing threads in this manner comes not only from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) its simplicity but also from allowing application programmers to work
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) outside the confinement of all-or-nothing shared resources of legacy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) threads. On Linux, at the time of thread creation using the clone system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) call, applications can selectively choose which resources to share
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) between threads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) unshare() system call adds a primitive to the Linux thread model that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) allows threads to selectively 'unshare' any resources that were being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) shared at the time of their creation. unshare() was conceptualized by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) of the discussion on POSIX threads on Linux.  unshare() augments the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) usefulness of Linux threads for applications that would like to control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) shared resources without creating a new process. unshare() is a natural
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) addition to the set of available primitives on Linux that implement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) the concept of process/thread as a virtual machine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 2) Benefits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) unshare() would be useful to large application frameworks such as PAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) where creating a new process to control sharing/unsharing of process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) resources is not possible. Since namespaces are shared by default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) when creating a new process using fork or clone, unshare() can benefit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) even non-threaded applications if they have a need to disassociate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) from default shared namespace. The following lists two use-cases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) where unshare() can be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 2.1 Per-security context namespaces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) unshare() can be used to implement polyinstantiated directories using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) the kernel's per-process namespace mechanism. Polyinstantiated directories,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) such as per-user and/or per-security context instance of /tmp, /var/tmp or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) per-security context instance of a user's home directory, isolate user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) processes when working with these directories. Using unshare(), a PAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) module can easily setup a private namespace for a user at login.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) Polyinstantiated directories are required for Common Criteria certification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) with Labeled System Protection Profile, however, with the availability
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) of shared-tree feature in the Linux kernel, even regular Linux systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) can benefit from setting up private namespaces at login and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) polyinstantiating /tmp, /var/tmp and other directories deemed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) appropriate by system administrators.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 2.2 unsharing of virtual memory and/or open files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) Consider a client/server application where the server is processing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) client requests by creating processes that share resources such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) virtual memory and open files. Without unshare(), the server has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) decide what needs to be shared at the time of creating the process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) which services the request. unshare() allows the server an ability to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) disassociate parts of the context during the servicing of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) request. For large and complex middleware application frameworks, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) ability to unshare() after the process was created can be very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) useful.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 3) Cost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) In order to not duplicate code and to handle the fact that unshare()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) works on an active task (as opposed to clone/fork working on a newly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) allocated inactive task) unshare() had to make minor reorganizational
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) changes to copy_* functions utilized by clone/fork system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) There is a cost associated with altering existing, well tested and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) stable code to implement a new feature that may not get exercised
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) extensively in the beginning. However, with proper design and code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) review of the changes and creation of an unshare() test for the LTP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) the benefits of this new feature can exceed its cost.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 4) Requirements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) unshare() reverses sharing that was done using clone(2) system call,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) so unshare() should have a similar interface as clone(2). That is,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) since flags in clone(int flags, void \*stack) specifies what should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) be shared, similar flags in unshare(int flags) should specify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) what should be unshared. Unfortunately, this may appear to invert
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) the meaning of the flags from the way they are used in clone(2).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) However, there was no easy solution that was less confusing and that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) allowed incremental context unsharing in future without an ABI change.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) unshare() interface should accommodate possible future addition of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) new context flags without requiring a rebuild of old applications.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) If and when new context flags are added, unshare() design should allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) incremental unsharing of those resources on an as needed basis.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 5) Functional Specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) NAME
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 	unshare - disassociate parts of the process execution context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) SYNOPSIS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 	#include <sched.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 	int unshare(int flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) DESCRIPTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 	unshare() allows a process to disassociate parts of its execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 	context that are currently being shared with other processes. Part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 	of execution context, such as the namespace, is shared by default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 	when a new process is created using fork(2), while other parts,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 	such as the virtual memory, open file descriptors, etc, may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 	shared by explicit request to share them when creating a process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 	using clone(2).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 	The main use of unshare() is to allow a process to control its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 	shared execution context without creating a new process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 	The flags argument specifies one or bitwise-or'ed of several of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 	the following constants.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 	CLONE_FS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 		If CLONE_FS is set, file system information of the caller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 		is disassociated from the shared file system information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 	CLONE_FILES
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 		If CLONE_FILES is set, the file descriptor table of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 		caller is disassociated from the shared file descriptor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 		table.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 	CLONE_NEWNS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 		If CLONE_NEWNS is set, the namespace of the caller is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 		disassociated from the shared namespace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 	CLONE_VM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 		If CLONE_VM is set, the virtual memory of the caller is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 		disassociated from the shared virtual memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) RETURN VALUE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 	On success, zero returned. On failure, -1 is returned and errno is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) ERRORS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 	EPERM	CLONE_NEWNS was specified by a non-root process (process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 		without CAP_SYS_ADMIN).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 	ENOMEM	Cannot allocate sufficient memory to copy parts of caller's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 		context that need to be unshared.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 	EINVAL	Invalid flag was specified as an argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) CONFORMING TO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 	The unshare() call is Linux-specific and  should  not be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 	in programs intended to be portable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) SEE ALSO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 	clone(2), fork(2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 6) High Level Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) Depending on the flags argument, the unshare() system call allocates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) appropriate process context structures, populates it with values from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) the current shared version, associates newly duplicated structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) with the current task structure and releases corresponding shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) versions. Helper functions of clone (copy_*) could not be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) directly by unshare() because of the following two reasons.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)   1) clone operates on a newly allocated not-yet-active task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)      structure, where as unshare() operates on the current active
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)      task. Therefore unshare() has to take appropriate task_lock()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)      before associating newly duplicated context structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)   2) unshare() has to allocate and duplicate all context structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)      that are being unshared, before associating them with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)      current task and releasing older shared structures. Failure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)      do so will create race conditions and/or oops when trying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)      to backout due to an error. Consider the case of unsharing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)      both virtual memory and namespace. After successfully unsharing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)      vm, if the system call encounters an error while allocating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)      new namespace structure, the error return code will have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)      reverse the unsharing of vm. As part of the reversal the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)      system call will have to go back to older, shared, vm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)      structure, which may not exist anymore.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) Therefore code from copy_* functions that allocated and duplicated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) current context structure was moved into new dup_* functions. Now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) copy_* functions call dup_* functions to allocate and duplicate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) appropriate context structures and then associate them with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) task structure that is being constructed. unshare() system call on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) the other hand performs the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)   1) Check flags to force missing, but implied, flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)   2) For each context structure, call the corresponding unshare()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)      helper function to allocate and duplicate a new context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)      structure, if the appropriate bit is set in the flags argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)   3) If there is no error in allocation and duplication and there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)      are new context structures then lock the current task structure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)      associate new context structures with the current task structure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)      and release the lock on the current task structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)   4) Appropriately release older, shared, context structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 7) Low Level Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) Implementation of unshare() can be grouped in the following 4 different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) items:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)   a) Reorganization of existing copy_* functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)   b) unshare() system call service function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)   c) unshare() helper functions for each different process context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)   d) Registration of system call number for different architectures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 7.1) Reorganization of copy_* functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) Each copy function such as copy_mm, copy_namespace, copy_files,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) etc, had roughly two components. The first component allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) and duplicated the appropriate structure and the second component
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) linked it to the task structure passed in as an argument to the copy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) function. The first component was split into its own function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) These dup_* functions allocated and duplicated the appropriate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) context structure. The reorganized copy_* functions invoked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) their corresponding dup_* functions and then linked the newly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) duplicated structures to the task structure with which the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) copy function was called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 7.2) unshare() system call service function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)        * Check flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 	 Force implied flags. If CLONE_THREAD is set force CLONE_VM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 	 If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) 	 set and signals are also being shared, force CLONE_THREAD. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) 	 CLONE_NEWNS is set, force CLONE_FS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268)        * For each context flag, invoke the corresponding unshare_*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) 	 helper routine with flags passed into the system call and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) 	 reference to pointer pointing the new unshared structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)        * If any new structures are created by unshare_* helper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 	 functions, take the task_lock() on the current task,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 	 modify appropriate context pointers, and release the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)          task lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)        * For all newly unshared structures, release the corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)          older, shared, structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 7.3) unshare_* helper functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) and CLONE_THREAD, return -EINVAL since they are not implemented yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) For others, check the flag value to see if the unsharing is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) required for that structure. If it is, invoke the corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) dup_* function to allocate and duplicate the structure and return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) a pointer to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 7.4) Finally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) ~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) Appropriately modify architecture specific code to register the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) new system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 8) Test Specification
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) The test for unshare() should test the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)   1) Valid flags: Test to check that clone flags for signal and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)      signal handlers, for which unsharing is not implemented
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)      yet, return -EINVAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)   2) Missing/implied flags: Test to make sure that if unsharing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)      namespace without specifying unsharing of filesystem, correctly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)      unshares both namespace and filesystem information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)   3) For each of the four (namespace, filesystem, files and vm)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310)      supported unsharing, verify that the system call correctly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311)      unshares the appropriate structure. Verify that unsharing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312)      them individually as well as in combination with each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)      other works as expected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)   4) Concurrent execution: Use shared memory segments and futex on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316)      an address in the shm segment to synchronize execution of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)      about 10 threads. Have a couple of threads execute execve,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)      a couple _exit and the rest unshare with different combination
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)      of flags. Verify that unsharing is performed as expected and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)      that there are no oops or hangs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) 9) Future Work
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) The current implementation of unshare() does not allow unsharing of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) signals and signal handlers. Signals are complex to begin with and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) to unshare signals and/or signal handlers of a currently running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) process is even more complex. If in the future there is a specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) need to allow unsharing of signals and/or signal handlers, it can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) be incrementally added to unshare() without affecting legacy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) applications using unshare().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)