Orange Pi5 kernel

^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Wound/Wait Deadlock-Proof Mutex Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Motivation for WW-Mutexes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) GPU's do operations that commonly involve many buffers.  Those buffers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) can be shared across contexts/processes, exist in different memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) domains (for example VRAM vs system memory), and so on.  And with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) PRIME / dmabuf, they can even be shared across devices.  So there are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) a handful of situations where the driver needs to wait for buffers to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) become ready.  If you think about this in terms of waiting on a buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) mutex for it to become available, this presents a problem because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) there is no way to guarantee that buffers appear in a execbuf/batch in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) the same order in all contexts.  That is directly under control of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) userspace, and a result of the sequence of GL calls that an application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) makes.	Which results in the potential for deadlock.  The problem gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) more complex when you consider that the kernel may need to migrate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) buffer(s) into VRAM before the GPU operates on the buffer(s), which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) may in turn require evicting some other buffers (and you don't want to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) evict other buffers which are already queued up to the GPU), but for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) simplified understanding of the problem you can ignore this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) The algorithm that the TTM graphics subsystem came up with for dealing with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) this problem is quite simple.  For each group of buffers (execbuf) that need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) to be locked, the caller would be assigned a unique reservation id/ticket,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) from a global counter.  In case of deadlock while locking all the buffers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) associated with a execbuf, the one with the lowest reservation ticket (i.e.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) the oldest task) wins, and the one with the higher reservation id (i.e. the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) younger task) unlocks all of the buffers that it has already locked, and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) tries again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) In the RDBMS literature, a reservation ticket is associated with a transaction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) and the deadlock handling approach is called Wait-Die. The name is based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) the actions of a locking thread when it encounters an already locked mutex.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) If the transaction holding the lock is younger, the locking transaction waits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) If the transaction holding the lock is older, the locking transaction backs off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) and dies. Hence Wait-Die.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) There is also another algorithm called Wound-Wait:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) If the transaction holding the lock is younger, the locking transaction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) wounds the transaction holding the lock, requesting it to die.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) If the transaction holding the lock is older, it waits for the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) transaction. Hence Wound-Wait.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) The two algorithms are both fair in that a transaction will eventually succeed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) compared to Wait-Die, but is, on the other hand, associated with more work than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) algorithm in that transactions are wounded by other transactions, and that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) requires a reliable way to pick up the wounded condition and preempt the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) running transaction. Note that this is not the same as process preemption. A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) Wound-Wait transaction is considered preempted when it dies (returning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) -EDEADLK) following a wound.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) Concepts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Compared to normal mutexes two additional concepts/objects show up in the lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) interface for w/w mutexes:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) Acquire context: To ensure eventual forward progress it is important the a task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) trying to acquire locks doesn't grab a new reservation id, but keeps the one it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) acquired when starting the lock acquisition. This ticket is stored in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) acquire context. Furthermore the acquire context keeps track of debugging state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) to catch w/w mutex interface abuse. An acquire context is representing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) transaction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) W/w class: In contrast to normal mutexes the lock class needs to be explicit for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) w/w mutexes, since it is required to initialize the acquire context. The lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) class also specifies what algorithm to use, Wound-Wait or Wait-Die.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) Furthermore there are three different class of w/w lock acquire functions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) * Normal lock acquisition with a context, using ww_mutex_lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) * Slowpath lock acquisition on the contending lock, used by the task that just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)   killed its transaction after having dropped all already acquired locks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)   These functions have the _slow postfix.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)   From a simple semantics point-of-view the _slow functions are not strictly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   required, since simply calling the normal ww_mutex_lock functions on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)   contending lock (after having dropped all other already acquired locks) will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)   work correctly. After all if no other ww mutex has been acquired yet there's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)   no deadlock potential and hence the ww_mutex_lock call will block and not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)   prematurely return -EDEADLK. The advantage of the _slow functions is in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)   interface safety:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)   - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)     has a void return type. Note that since ww mutex code needs loops/retries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)     anyway the __must_check doesn't result in spurious warnings, even though the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)     very first lock operation can never fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)   - When full debugging is enabled ww_mutex_lock_slow checks that all acquired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)     ww mutex have been released (preventing deadlocks) and makes sure that we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)     block on the contending lock (preventing spinning through the -EDEADLK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)     slowpath until the contended lock can be acquired).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) * Functions to only acquire a single w/w mutex, which results in the exact same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)   semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)   context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)   Again this is not strictly required. But often you only want to acquire a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)   single lock in which case it's pointless to set up an acquire context (and so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)   better to avoid grabbing a deadlock avoidance ticket).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Of course, all the usual variants for handling wake-ups due to signals are also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) DEFINE_WW_CLASS() (Wound-Wait) or DEFINE_WD_CLASS() (Wait-Die)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) As a rough rule of thumb, use Wound-Wait iff you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) expect the number of simultaneous competing transactions to be typically small,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) and you want to reduce the number of rollbacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Three different ways to acquire locks within the same w/w class. Common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) definitions for methods #1 and #2::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)   static DEFINE_WW_CLASS(ww_class);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)   struct obj {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 	struct ww_mutex lock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 	/* obj data */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)   };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)   struct obj_entry {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 	struct list_head head;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 	struct obj *obj;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)   };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) This is useful if a list of required objects is already tracked somewhere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) Furthermore the lock helper can use propagate the -EALREADY return code back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) the caller as a signal that an object is twice on the list. This is useful if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) the list is constructed from userspace input and the ABI requires userspace to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)   int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 	struct obj *res_obj = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 	struct obj_entry *contended_entry = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 	struct obj_entry *entry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 	ww_acquire_init(ctx, &ww_class);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)   retry:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 	list_for_each_entry (entry, list, head) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 		if (entry->obj == res_obj) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 			res_obj = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 		ret = ww_mutex_lock(&entry->obj->lock, ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 		if (ret < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 			contended_entry = entry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 			goto err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 	ww_acquire_done(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)   err:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 	list_for_each_entry_continue_reverse (entry, list, head)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 		ww_mutex_unlock(&entry->obj->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 	if (res_obj)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 		ww_mutex_unlock(&res_obj->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 	if (ret == -EDEADLK) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 		/* we lost out in a seqno race, lock and retry.. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 		ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 		res_obj = contended_entry->obj;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 		goto retry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 	ww_acquire_fini(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 	return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) of duplicate entry detection using -EALREADY as method 1 above. But the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) list-reordering allows for a bit more idiomatic code::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)   int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 	struct obj_entry *entry, *entry2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 	ww_acquire_init(ctx, &ww_class);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 	list_for_each_entry (entry, list, head) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 		ret = ww_mutex_lock(&entry->obj->lock, ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 		if (ret < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 			entry2 = entry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 			list_for_each_entry_continue_reverse (entry2, list, head)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 				ww_mutex_unlock(&entry2->obj->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) 			if (ret != -EDEADLK) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 				ww_acquire_fini(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 				return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 			/* we lost out in a seqno race, lock and retry.. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 			ww_mutex_lock_slow(&entry->obj->lock, ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 			 * Move buf to head of the list, this will point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 			 * buf->next to the first unlocked entry,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) 			 * restarting the for loop.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 			list_del(&entry->head);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 			list_add(&entry->head, list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 	ww_acquire_done(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) Unlocking works the same way for both methods #1 and #2::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)   void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 	struct obj_entry *entry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 	list_for_each_entry (entry, list, head)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 		ww_mutex_unlock(&entry->obj->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 	ww_acquire_fini(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) and edges can only be changed when holding the locks of all involved nodes. w/w
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) mutexes are a natural fit for such a case for two reasons:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) - They can handle lock-acquisition in any order which allows us to start walking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)   a graph from a starting point and then iteratively discovering new edges and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)   locking down the nodes those edges connect to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) - Due to the -EALREADY return code signalling that a given objects is already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)   held there's no need for additional book-keeping to break cycles in the graph
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)   or keep track off which looks are already held (when using more than one node
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)   as a starting point).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) Note that this approach differs in two important ways from the above methods:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) - Since the list of objects is dynamically constructed (and might very well be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)   different when retrying due to hitting the -EDEADLK die condition) there's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)   no need to keep any object on a persistent list when it's not locked. We can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)   therefore move the list_head into the object itself.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) - On the other hand the dynamic object list construction also means that the -EALREADY return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)   code can't be propagated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) Note also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) list of starting nodes (passed in from userspace) using one of the above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) methods. And then lock any additional objects affected by the operations using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) method #3 below. The backoff/retry procedure will be a bit more involved, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) when the dynamic locking step hits -EDEADLK we also need to unlock all the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) objects acquired with the fixed list. But the w/w mutex debug checks will catch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) any interface misuse for these cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) Also, method 3 can't fail the lock acquisition step since it doesn't return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) -EALREADY. Of course this would be different when using the _interruptible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) variants, but that's outside of the scope of these examples here::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)   struct obj {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) 	struct ww_mutex ww_mutex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 	struct list_head locked_list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)   };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)   static DEFINE_WW_CLASS(ww_class);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)   void __unlock_objs(struct list_head *list)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 	struct obj *entry, *temp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 	list_for_each_entry_safe (entry, temp, list, locked_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 		/* need to do that before unlocking, since only the current lock holder is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 		allowed to use object */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) 		list_del(&entry->locked_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 		ww_mutex_unlock(entry->ww_mutex)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)   void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 	struct obj *obj;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) 	ww_acquire_init(ctx, &ww_class);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)   retry:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 	/* re-init loop start state */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 	loop {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) 		/* magic code which walks over a graph and decides which objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 		 * to lock */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 		ret = ww_mutex_lock(obj->ww_mutex, ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) 		if (ret == -EALREADY) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 			/* we have that one already, get to the next object */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) 		if (ret == -EDEADLK) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) 			__unlock_objs(list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) 			ww_mutex_lock_slow(obj, ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 			list_add(&entry->locked_list, list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) 			goto retry;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 		/* locked a new object, add it to the list */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) 		list_add_tail(&entry->locked_list, list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) 	ww_acquire_done(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)   void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) 	__unlock_objs(list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 	ww_acquire_fini(ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) Method 4: Only lock one single objects. In that case deadlock detection and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) prevention is obviously overkill, since with grabbing just one lock you can't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) produce a deadlock within just one class. To simplify this case the w/w mutex
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) api can be used with a NULL context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) Implementation Details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) Design:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) ^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338)   ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339)   normal mutex locks, which are far more common. As such there is only a small
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340)   increase in code size if wait/wound mutexes are not used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342)   We maintain the following invariants for the wait list:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344)   (1) Waiters with an acquire context are sorted by stamp order; waiters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345)       without an acquire context are interspersed in FIFO order.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346)   (2) For Wait-Die, among waiters with contexts, only the first one can have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347)       other locks acquired already (ctx->acquired > 0). Note that this waiter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)       may come after other waiters without contexts in the list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350)   The Wound-Wait preemption is implemented with a lazy-preemption scheme:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351)   The wounded status of the transaction is checked only when there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352)   contention for a new lock and hence a true chance of deadlock. In that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)   situation, if the transaction is wounded, it backs off, clears the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)   wounded status and retries. A great benefit of implementing preemption in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355)   this way is that the wounded transaction can identify a contending lock to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)   wait for before restarting the transaction. Just blindly restarting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)   transaction would likely make the transaction end up in a situation where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358)   it would have to back off again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)   In general, not much contention is expected. The locks are typically used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361)   serialize access to resources for devices, and optimization focus should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)   therefore be directed towards the uncontended cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) Lockdep:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) ^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)   Special care has been taken to warn for as many cases of api abuse
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368)   as possible. Some common api abuses will be caught with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369)   CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)   Some of the errors which will be warned about:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372)    - Forgetting to call ww_acquire_fini or ww_acquire_init.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)    - Attempting to lock more mutexes after ww_acquire_done.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374)    - Attempting to lock the wrong mutex after -EDEADLK and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375)      unlocking all mutexes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376)    - Attempting to lock the right mutex after -EDEADLK,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377)      before unlocking all mutexes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379)    - Calling ww_mutex_lock_slow before -EDEADLK was returned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)    - Unlocking mutexes with the wrong unlock function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)    - Calling one of the ww_acquire_* twice on the same context.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)    - Using a different ww_class for the mutex than for the ww_acquire_ctx.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)    - Normal lockdep errors that can result in deadlocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)   Some of the lockdep errors that can result in deadlocks:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)    - Calling ww_acquire_init to initialize a second ww_acquire_ctx before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388)      having called ww_acquire_fini on the first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)    - 'normal' deadlocks that can occur.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) FIXME:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)   Update this section once we have the TASK_DEADLOCK task state flag magic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)   implemented.