Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Lockless Ring Buffer Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Copyright 2009 Red Hat Inc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) :Author:   Steven Rostedt <srostedt@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) :License:  The GNU Free Documentation License, Version 1.2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11)            (dual licensed under the GPL v2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) :Reviewers:  Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 	     and Frederic Weisbecker.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) Written for: 2.6.31
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Terminology used in this Document
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) tail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 	- where new writes happen in the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) head
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 	- where new reads happen in the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 	- the task that writes into the ring buffer (same as writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 	- same as producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) consumer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 	- the task that reads from the buffer (same as reader)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 	- same as consumer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) reader_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 	- A page outside the ring buffer used solely (for the most part)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 	  by the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) head_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 	- a pointer to the page that the reader will use next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) tail_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 	- a pointer to the page that will be written to next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) commit_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 	- a pointer to the page with the last finished non-nested write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) cmpxchg
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 	- hardware-assisted atomic transaction that performs the following::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 	    A = B if previous A == C
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 	    R = cmpxchg(A, C, B) is saying that we replace A with B if and only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 		if current A is equal to C, and we put the old (current)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 		A into R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 	    R gets the previous A regardless if A is updated with B or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 	  To see if the update was successful a compare of ``R == C``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 	  may be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) The Generic Ring Buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) The ring buffer can be used in either an overwrite mode or in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) producer/consumer mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) Producer/consumer mode is where if the producer were to fill up the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) buffer before the consumer could free up anything, the producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) will stop writing to the buffer. This will lose most recent events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) Overwrite mode is where if the producer were to fill up the buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) before the consumer could free up anything, the producer will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) overwrite the older data. This will lose the oldest events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) No two writers can write at the same time (on the same per-cpu buffer),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) but a writer may interrupt another writer, but it must finish writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) before the previous writer may continue. This is very important to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) algorithm. The writers act like a "stack". The way interrupts works
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) enforces this behavior::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)   writer1 start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)      <preempted> writer2 start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)          <preempted> writer3 start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)                      writer3 finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)                  writer2 finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)   writer1 finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) This is very much like a writer being preempted by an interrupt and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) the interrupt doing a write as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) Readers can happen at any time. But no two readers may run at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) same time, nor can a reader preempt/interrupt another reader. A reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) cannot preempt/interrupt a writer, but it may read/consume from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) buffer at the same time as a writer is writing, but the reader must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) on another processor to do so. A reader may read on its own processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) and can be preempted by a writer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) A writer can preempt a reader, but a reader cannot preempt a writer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) But a reader can read the buffer at the same time (on another processor)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) as a writer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) The ring buffer is made up of a list of pages held together by a linked list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) At initialization a reader page is allocated for the reader that is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) part of the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) The head_page, tail_page and commit_page are all initialized to point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) to the same page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) The reader page is initialized to have its next pointer pointing to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) the head page, and its previous pointer pointing to a page before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) the head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) The reader has its own page to use. At start up time, this page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) allocated but is not attached to the list. When the reader wants
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) to read from the buffer, if its page is empty (like it is on start-up),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) it will swap its page with the head_page. The old reader page will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) become part of the ring buffer and the head_page will be removed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) The page after the inserted page (old reader_page) will become the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) new head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) Once the new page is given to the reader, the reader could do what
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) it wants with it, as long as a writer has left that page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) A sample of how the reader page is swapped: Note this does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) show the head page in the buffer, it is for demonstrating a swap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)   |page  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)                   +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)                   |   |-->|   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)                   |   |<--|   |<--|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)                   +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)                    ^ |             ^ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)                    | +-------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)                    +-----------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)   |page  |-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)   +------+                   v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)     |             +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)     |             |   |-->|   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)     |             |   |<--|   |<--|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)     |             +---+   +---+   +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)     |              ^ |             ^ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)     |              | +-------------+ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)     |              +-----------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)     +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)   |page  |-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)   +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)     |  ^          +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)     |  |          |   |-->|   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)     |  |          |   |   |   |<--|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)     |  |          +---+   +---+   +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)     |  |             |             ^ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)     |  |             +-------------+ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)     |  +-----------------------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)     +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)   |buffer|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)   |page  |-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)   +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)     |  ^          +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)     |  |          |   |   |   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)     |  |  New     |   |   |   |<--|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)     |  | Reader   +---+   +---+   +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)     |  |  page ----^                 |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)     |  |                             |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)     |  +-----------------------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)     +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) It is possible that the page swapped is the commit page and the tail page,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) if what is in the ring buffer is less than what is held in a buffer page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)             reader page    commit page   tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)                 |              |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)                 v              |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)                +---+           |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)                |   |<----------+             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)                |   |<------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)                |   |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)                +---+      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) This case is still valid for this algorithm.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) When the writer leaves the page, it simply goes into the ring buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) since the reader page still points to the next location in the ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) The main pointers:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)   reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 	    - The page used solely by the reader and is not part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)               of the ring buffer (may be swapped in)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)   head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 	    - the next page in the ring buffer that will be swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)               with the reader page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)   tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 	    - the page where the next write will take place.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)   commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 	    - the page that last finished a write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) The commit page only is updated by the outermost writer in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) writer stack. A writer that preempts another writer will not move the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) commit page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) When data is written into the ring buffer, a position is reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) in the ring buffer and passed back to the writer. When the writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) is finished writing data into that position, it commits the write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) Another write (or a read) may take place at anytime during this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) transaction. If another write happens it must finish before continuing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) with the previous write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)    Write reserve::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248)        Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)       +---------+  <--- given back to writer (current commit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)       |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)       +---------+ <--- tail pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)       | empty   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)    Write commit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)        Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)       +---------+  <--- next position for write (current commit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)       | empty   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)  If a write happens after the first reserve::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)        Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)       +---------+  <-- current commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)       |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)       +---------+  <--- given back to second writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)       |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)       +---------+ <--- tail pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)   After second writer commits::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)        Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)       +---------+  <--(last full commit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)       |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)       |pending  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)       |commit   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)       +---------+ <--- tail pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)   When the first writer commits::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)        Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)       +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)       |written  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)       +---------+  <--(last full commit and tail pointer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) The commit pointer points to the last write location that was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) committed without preempting another write. When a write that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) preempted another write is committed, it only becomes a pending commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) and will not be a full commit until all writes have been committed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) The commit page points to the page that has the last full commit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) The tail page points to the page with the last write (before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) committing).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) The tail page is always equal to or after the commit page. It may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) be several pages ahead. If the tail page catches up to the commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) page then no more writes may take place (regardless of the mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) of the ring buffer: overwrite and produce/consumer).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) The order of pages is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)  head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)  commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)  tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) Possible scenario::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)                                tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)     head page         commit page  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)         |                 |        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)         v                 v        v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) There is a special case that the head page is after either the commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) and possibly the tail page. That is when the commit (and tail) page has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) swapped with the reader page. This is because the head page is always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) part of the ring buffer, but the reader page is not. Whenever there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) has been less than a full page that has been committed inside the ring buffer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) and a reader swaps out a page, it will be swapping out the commit page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345)             reader page    commit page   tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346)                 |              |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347)                 v              |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)                +---+           |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349)                |   |<----------+             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350)                |   |<------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351)                |   |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352)                +---+      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)                           ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361)                       head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) In this case, the head page will not move when the tail and commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) move back into the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) The reader cannot swap a page into the ring buffer if the commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) is still on that page. If the read meets the last commit (real commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) not pending or reserved), then there is nothing more to read.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) The buffer is considered empty until another full commit finishes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) When the tail meets the head page, if the buffer is in overwrite mode,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) the head page will be pushed ahead one. If the buffer is in producer/consumer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) mode, the write will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) Overwrite mode::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385)                           ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)                       head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)                                    ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398)                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399)                                head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409)                                    ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410)                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411)                                head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) Note, the reader page will still point to the previous head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) But when a swap takes place, it will use the most recent head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) Making the Ring Buffer Lockless:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) The main idea behind the lockless algorithm is to combine the moving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) of the head_page pointer with the swapping of pages with the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) State flags are placed inside the pointer to the page. To do this,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) each page must be aligned in memory by 4 bytes. This will allow the 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) least significant bits of the address to be used as flags, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) they will always be zero for the address. To get the address,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) simply mask out the flags::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428)   MASK = ~3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430)   address & MASK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) Two flags will be kept by these two bits:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434)    HEADER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) 	- the page being pointed to is a head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437)    UPDATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) 	- the page being pointed to is being updated by a writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439)           and was or is about to be a head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) 	      reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) 		  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) 		  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) 		+---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) 		|   |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) 		+---+      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) 			    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) 			    v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) 	+---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452)     <---|   |--->|   |-H->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453)     --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) 	+---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) The above pointer "-H->" would have the HEADER flag set. That is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) the next page is the next page to be swapped out by the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) This pointer means the next page is the head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) When the tail page meets the head pointer, it will use cmpxchg to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) change the pointer to the UPDATE state::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469)   <---|   |--->|   |-H->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477)   <---|   |--->|   |-U->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) "-U->" represents a pointer in the UPDATE state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) Any access to the reader will need to take some sort of lock to serialize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) the readers. But the writers will never take a lock to write to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) ring buffer. This means we only need to worry about a single reader,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) and writes only preempt in "stack" formation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) When the reader tries to swap the page with the ring buffer, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) will also use cmpxchg. If the flag bit in the pointer to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) head page does not have the HEADER flag set, the compare will fail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) and the reader will need to look for the new head page and try again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) Note, the flags UPDATE and HEADER are never set at the same time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) The reader swaps the reader page as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498)   |page  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500)                   +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501)                   |   |--->|   |--->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502)                   |   |<---|   |<---|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503)                   +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504)                    ^ |               ^ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505)                    | +---------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506)                    +-----H-------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) The reader sets the reader page next pointer as HEADER to the page after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) the head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514)   |page  |-------H-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515)   +------+                   v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516)     |             +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517)     |             |   |--->|   |--->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518)     |             |   |<---|   |<---|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519)     |             +---+    +---+    +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520)     |              ^ |               ^ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521)     |              | +---------------+ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522)     |              +-----H-------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523)     +--------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) It does a cmpxchg with the pointer to the previous head page to make it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) point to the reader page. Note that the new pointer does not have the HEADER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) flag set.  This action atomically moves the head page forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531)   |page  |-------H-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532)   +------+                   v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533)     |  ^          +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534)     |  |          |   |-->|   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535)     |  |          |   |<--|   |<--|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536)     |  |          +---+   +---+   +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537)     |  |             |             ^ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538)     |  |             +-------------+ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539)     |  +-----------------------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540)     +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) After the new head page is set, the previous pointer of the head page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) updated to the reader page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546)   |reader|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547)   |page  |-------H-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548)   +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549)     |  ^          +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550)     |  |          |   |-->|   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551)     |  |          |   |   |   |<--|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552)     |  |          +---+   +---+   +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553)     |  |             |             ^ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554)     |  |             +-------------+ |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555)     |  +-----------------------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556)     +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558)   +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559)   |buffer|          RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560)   |page  |-------H-----------+  <--- New head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561)   +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562)     |  ^          +---+   +---+   +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563)     |  |          |   |   |   |-->|   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564)     |  |  New     |   |   |   |<--|   |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565)     |  | Reader   +---+   +---+   +---+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566)     |  |  page ----^                 |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567)     |  |                             |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568)     |  +-----------------------------+   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569)     +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) Another important point: The page that the reader page points back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) by its previous pointer (the one that now points to the new head page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) never points back to the reader page. That is because the reader page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) not part of the ring buffer. Traversing the ring buffer via the next pointers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) will always stay in the ring buffer. Traversing the ring buffer via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) prev pointers may not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) Note, the way to determine a reader page is simply by examining the previous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) pointer of the page. If the next pointer of the previous page does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) point back to the original page, then the original page is a reader page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583)              +--------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584)              | reader |  next   +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585)              |  page  |-------->|    |<====== (buffer page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586)              +--------+         +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587)                  |                | ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588)                  |                v | next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589)             prev |              +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590)                  +------------->|    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591)                                 +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) The way the head page moves forward:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) When the tail page meets the head page and the buffer is in overwrite mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) and more writes take place, the head page must be moved forward before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) writer may move the tail page. The way this is done is that the writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) performs a cmpxchg to convert the pointer to the head page from the HEADER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) flag to have the UPDATE flag set. Once this is done, the reader will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) not be able to swap the head page from the buffer, nor will it be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) move the head page, until the writer is finished with the move.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) This eliminates any races that the reader can have on the writer. The reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) must spin, and this is why the reader cannot preempt the writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610)   <---|   |--->|   |-H->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618)   <---|   |--->|   |-U->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) The following page will be made into the new head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624)              tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628)   <---|   |--->|   |-U->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) After the new head page has been set, we can set the old head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) pointer back to NORMAL::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635)              tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639)   <---|   |--->|   |--->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) After the head page has been moved, the tail page may now move forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649)   <---|   |--->|   |--->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) The above are the trivial updates. Now for the more complex scenarios.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) As stated before, if enough writes preempt the first write, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) tail page may make it all the way around the buffer and meet the commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) page. At this time, we must start dropping writes (usually with some kind
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) of warning to the user). But what happens if the commit was still on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) reader page? The commit page is not part of the ring buffer. The tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) must account for this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665)             reader page    commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666)                 |              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667)                 v              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668)                +---+           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669)                |   |<----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670)                |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671)                |   |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672)                +---+      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676)   <---|   |--->|   |-H->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679)                  ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681)              tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) If the tail page were to simply push the head page forward, the commit when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) leaving the reader page would not be pointing to the correct page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) The solution to this is to test if the commit page is on the reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) before pushing the head page. If it is, then it can be assumed that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) tail page wrapped the buffer, and we must drop new writes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) This is not a race condition, because the commit page can only be moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) by the outermost writer (the writer that was preempted).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) This means that the commit will not move while a writer is moving the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) tail page. The reader cannot swap the reader page if it is also being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) used as the commit page. The reader can simply check that the commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) is off the reader page. Once the commit page leaves the reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) it will never go back on it unless a reader does another swap with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) buffer page that is also the commit page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) Nested writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) In the pushing forward of the tail page we must first push forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) the head page if the head page is the next page. If the head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) is not the next page, the tail page is simply updated with a cmpxchg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) Only writers move the tail page. This must be done atomically to protect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) against nested writers::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710)   temp_page = tail_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711)   next_page = temp_page->next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712)   cmpxchg(tail_page, temp_page, next_page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) The above will update the tail page if it is still pointing to the expected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) page. If this fails, a nested write pushed it forward, the current write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) does not need to push it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719)              temp page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) Nested write comes in and moves the tail page forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732)                       tail page (moved by nested writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733)               temp page   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734)                  |        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)                  v        v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737)   <---|   |--->|   |--->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) The above would fail the cmpxchg, but since the tail page has already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) been moved forward, the writer will just try again to reserve storage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) on the new tail page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) But the moving of the head page is a bit more complex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751)   <---|   |--->|   |-H->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) The write converts the head page pointer to UPDATE::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761)   <---|   |--->|   |-U->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) But if a nested writer preempts here, it will see that the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) page is a head page, but it is also nested. It will detect that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) it is nested and will save that information. The detection is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) fact that it sees the UPDATE flag instead of a HEADER or NORMAL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) The nested writer will set the new head page pointer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773)              tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777)   <---|   |--->|   |-U->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) But it will not reset the update back to normal. Only the writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) that converted a pointer from HEAD to UPDATE will convert it back
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) to NORMAL::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789)   <---|   |--->|   |-U->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) After the nested writer finishes, the outermost writer will convert
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) the UPDATE pointer to NORMAL::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801)   <---|   |--->|   |--->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806) It can be even more complex if several nested writes came in and moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807) the tail page ahead several pages::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810)   (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816)   <---|   |--->|   |-H->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820) The write converts the head page pointer to UPDATE::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822)               tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826)   <---|   |--->|   |-U->|   |--->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) Next writer comes in, and sees the update and sets up the new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831) head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833)   (second writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835)              tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836)                  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837)                  v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839)   <---|   |--->|   |-U->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) The nested writer moves the tail page forward. But does not set the old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844) update page to NORMAL because it is not the outermost writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850)   <---|   |--->|   |-U->|   |-H->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854) Another writer preempts and sees the page after the tail page is a head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) It changes it from HEAD to UPDATE::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857)   (third writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863)   <---|   |--->|   |-U->|   |-U->|   |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) The writer will move the head page forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870)   (third writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876)   <---|   |--->|   |-U->|   |-U->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) But now that the third writer did change the HEAD flag to UPDATE it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) will convert it to normal::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884)   (third writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886)                       tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887)                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888)                           v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890)   <---|   |--->|   |-U->|   |--->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) Then it will move the tail page, and return back to the second writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898)   (second writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900)                                tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901)                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902)                                    v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904)   <---|   |--->|   |-U->|   |--->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909) The second writer will fail to move the tail page because it was already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910) moved, so it will try again and add its data to the new tail page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911) It will return to the first writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914)   (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916)                                tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917)                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918)                                    v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920)   <---|   |--->|   |-U->|   |--->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924) The first writer cannot know atomically if the tail page moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) while it updates the HEAD page. It will then update the head page to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) what it thinks is the new head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929)   (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931)                                tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932)                                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933)                                    v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935)   <---|   |--->|   |-U->|   |-H->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939) Since the cmpxchg returns the old value of the pointer the first writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) will see it succeeded in updating the pointer from NORMAL to HEAD.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941) But as we can see, this is not good enough. It must also check to see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942) if the tail page is either where it use to be or on the next page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945)   (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947)                  A        B    tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948)                  |        |        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949)                  v        v        v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951)   <---|   |--->|   |-U->|   |-H->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955) If tail page != A and tail page != B, then it must reset the pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) back to NORMAL. The fact that it only needs to worry about nested
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957) writers means that it only needs to check this after setting the HEAD page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960)   (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962)                  A        B    tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963)                  |        |        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964)                  v        v        v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966)   <---|   |--->|   |-U->|   |--->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970) Now the writer can update the head page. This is also why the head page must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971) remain in UPDATE and only reset by the outermost writer. This prevents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972) the reader from seeing the incorrect head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975)   (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977)                  A        B    tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978)                  |        |        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979)                  v        v        v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980)       +---+    +---+    +---+    +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981)   <---|   |--->|   |--->|   |--->|   |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982)   --->|   |<---|   |<---|   |<---|   |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983)       +---+    +---+    +---+    +---+