^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Generic Mutex Subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) started by Ingo Molnar <mingo@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) updated by Davidlohr Bueso <davidlohr@hp.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) What are mutexes?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) In the Linux kernel, mutexes refer to a particular locking primitive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) that enforces serialization on shared memory systems, and not only to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) the generic term referring to 'mutual exclusion' found in academia
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) or similar theoretical text books. Mutexes are sleeping locks which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) behave similarly to binary semaphores, and were introduced in 2006[1]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) as an alternative to these. This new data structure provided a number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) of advantages, including simpler interfaces, and at that time smaller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) code (see Disadvantages).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) [1] https://lwn.net/Articles/164802/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) and implemented in kernel/locking/mutex.c. These locks use an atomic variable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) (->owner) to keep track of the lock state during its lifetime. Field owner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) actually contains `struct task_struct *` to the current lock owner and it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) therefore NULL if not currently owned. Since task_struct pointers are aligned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) to at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) if waiter list is non-empty). In its most basic form it also includes a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) wait-queue and a spinlock that serializes access to it. Furthermore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) CONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) below in (ii).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) When acquiring a mutex, there are three possible paths that can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) taken, depending on the state of the lock:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) (i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) the current task. This only works in the uncontended case (cmpxchg() checks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) against 0UL, so all 3 state bits above have to be 0). If the lock is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) contended it goes to the next possible path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) (ii) midpath: aka optimistic spinning, tries to spin for acquisition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) while the lock owner is running and there are no other tasks ready
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) to run that have higher priority (need_resched). The rationale is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) that if the lock owner is running, it is likely to release the lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) soon. The mutex spinners are queued up using MCS lock so that only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) one spinner can compete for the mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) with the desirable properties of being fair and with each cpu trying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) to acquire the lock spinning on a local variable. It avoids expensive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) cacheline bouncing that common test-and-set spinlock implementations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) incur. An MCS-like lock is specially tailored for optimistic spinning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) for sleeping lock implementation. An important feature of the customized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) MCS lock is that it has the extra property that spinners are able to exit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) the MCS spinlock queue when they need to reschedule. This further helps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) avoid situations where MCS spinners that need to reschedule would continue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) waiting to spin on mutex owner, only to go directly to slowpath upon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) obtaining the MCS lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) (iii) slowpath: last resort, if the lock is still unable to be acquired,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) the task is added to the wait-queue and sleeps until woken up by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) While formally kernel mutexes are sleepable locks, it is path (ii) that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) makes them more practically a hybrid type. By simply not interrupting a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) task and busy-waiting for a few cycles instead of immediately sleeping,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) the performance of this lock has been seen to significantly improve a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) number of workloads. Note that this technique is also used for rw-semaphores.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Semantics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) The mutex subsystem checks and enforces the following rules:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) - Only one task can hold the mutex at a time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) - Only the owner can unlock the mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) - Multiple unlocks are not permitted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) - Recursive locking/unlocking is not permitted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) - A mutex must only be initialized via the API (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) - A task may not exit with a mutex held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) - Memory areas where held locks reside must not be freed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) - Held mutexes must not be reinitialized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) - Mutexes may not be used in hardware or software interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) contexts such as tasklets and timers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) In addition, the mutex debugging code also implements a number of other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) features that make lock debugging easier and faster:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) - Uses symbolic names of mutexes, whenever they are printed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) in debug output.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) - Point-of-acquire tracking, symbolic lookup of function names,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) list of all locks held in the system, printout of them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) - Owner tracking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) - Detects self-recursing locks and prints out all relevant info.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) - Detects multi-task circular deadlocks and prints out all affected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) locks and tasks (and only those tasks).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Interfaces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Statically define the mutex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) DEFINE_MUTEX(name);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Dynamically initialize the mutex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) mutex_init(mutex);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) Acquire the mutex, uninterruptible::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) void mutex_lock(struct mutex *lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) int mutex_trylock(struct mutex *lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Acquire the mutex, interruptible::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) int mutex_lock_interruptible_nested(struct mutex *lock,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) unsigned int subclass);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) int mutex_lock_interruptible(struct mutex *lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) Acquire the mutex, interruptible, if dec to 0::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Unlock the mutex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) void mutex_unlock(struct mutex *lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Test if the mutex is taken::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) int mutex_is_locked(struct mutex *lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Disadvantages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) Unlike its original design and purpose, 'struct mutex' is among the largest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) locks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) is 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) cache and memory footprint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) When to use mutexes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) Unless the strict semantics of mutexes are unsuitable and/or the critical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) region prevents the lock from being shared, always prefer them to any other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) locking primitive.