^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Lockless Ring Buffer Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Copyright 2009 Red Hat Inc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) :Author: Steven Rostedt <srostedt@redhat.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) :License: The GNU Free Documentation License, Version 1.2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) (dual licensed under the GPL v2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) :Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) and Frederic Weisbecker.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) Written for: 2.6.31
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) Terminology used in this Document
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) tail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) - where new writes happen in the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) head
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) - where new reads happen in the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) - the task that writes into the ring buffer (same as writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) - same as producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) consumer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) - the task that reads from the buffer (same as reader)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) - same as consumer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) reader_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) - A page outside the ring buffer used solely (for the most part)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) by the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) head_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) - a pointer to the page that the reader will use next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) tail_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) - a pointer to the page that will be written to next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) commit_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) - a pointer to the page with the last finished non-nested write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) cmpxchg
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) - hardware-assisted atomic transaction that performs the following::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) A = B if previous A == C
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) R = cmpxchg(A, C, B) is saying that we replace A with B if and only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) if current A is equal to C, and we put the old (current)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) A into R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) R gets the previous A regardless if A is updated with B or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) To see if the update was successful a compare of ``R == C``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) may be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) The Generic Ring Buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) The ring buffer can be used in either an overwrite mode or in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) producer/consumer mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) Producer/consumer mode is where if the producer were to fill up the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) buffer before the consumer could free up anything, the producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) will stop writing to the buffer. This will lose most recent events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) Overwrite mode is where if the producer were to fill up the buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) before the consumer could free up anything, the producer will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) overwrite the older data. This will lose the oldest events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) No two writers can write at the same time (on the same per-cpu buffer),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) but a writer may interrupt another writer, but it must finish writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) before the previous writer may continue. This is very important to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) algorithm. The writers act like a "stack". The way interrupts works
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) enforces this behavior::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) writer1 start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) <preempted> writer2 start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) <preempted> writer3 start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) writer3 finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) writer2 finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) writer1 finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) This is very much like a writer being preempted by an interrupt and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) the interrupt doing a write as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Readers can happen at any time. But no two readers may run at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) same time, nor can a reader preempt/interrupt another reader. A reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) cannot preempt/interrupt a writer, but it may read/consume from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) buffer at the same time as a writer is writing, but the reader must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) on another processor to do so. A reader may read on its own processor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) and can be preempted by a writer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) A writer can preempt a reader, but a reader cannot preempt a writer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) But a reader can read the buffer at the same time (on another processor)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) as a writer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) The ring buffer is made up of a list of pages held together by a linked list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) At initialization a reader page is allocated for the reader that is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) part of the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) The head_page, tail_page and commit_page are all initialized to point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) to the same page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) The reader page is initialized to have its next pointer pointing to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) the head page, and its previous pointer pointing to a page before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) the head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) The reader has its own page to use. At start up time, this page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) allocated but is not attached to the list. When the reader wants
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) to read from the buffer, if its page is empty (like it is on start-up),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) it will swap its page with the head_page. The old reader page will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) become part of the ring buffer and the head_page will be removed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) The page after the inserted page (old reader_page) will become the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) new head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) Once the new page is given to the reader, the reader could do what
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) it wants with it, as long as a writer has left that page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) A sample of how the reader page is swapped: Note this does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) show the head page in the buffer, it is for demonstrating a swap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) |page |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) | |-->| |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) | |<--| |<--| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) ^ | ^ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) | +-------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) +-----------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) |page |-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) +------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) | +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) | | |-->| |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) | | |<--| |<--| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) | +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) | ^ | ^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) | | +-------------+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) | +-----------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) |page |-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) | ^ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) | | | |-->| |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) | | | | | |<--| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) | | +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) | | | ^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) | | +-------------+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) | +-----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) |buffer| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) |page |-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) | ^ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) | | | | | |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) | | New | | | |<--| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) | | Reader +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) | | page ----^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) | | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) | +-----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) It is possible that the page swapped is the commit page and the tail page,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) if what is in the ring buffer is less than what is held in a buffer page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) reader page commit page tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) v | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) +---+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) | |<----------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) | |<------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) | |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) This case is still valid for this algorithm.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) When the writer leaves the page, it simply goes into the ring buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) since the reader page still points to the next location in the ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) The main pointers:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) - The page used solely by the reader and is not part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) of the ring buffer (may be swapped in)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) - the next page in the ring buffer that will be swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) with the reader page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) - the page where the next write will take place.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) - the page that last finished a write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) The commit page only is updated by the outermost writer in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) writer stack. A writer that preempts another writer will not move the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) commit page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) When data is written into the ring buffer, a position is reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) in the ring buffer and passed back to the writer. When the writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) is finished writing data into that position, it commits the write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) Another write (or a read) may take place at anytime during this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) transaction. If another write happens it must finish before continuing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) with the previous write.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) Write reserve::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) +---------+ <--- given back to writer (current commit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) +---------+ <--- tail pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) | empty |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) Write commit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) +---------+ <--- next position for write (current commit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) | empty |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) If a write happens after the first reserve::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) +---------+ <-- current commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) +---------+ <--- given back to second writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) +---------+ <--- tail pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) After second writer commits::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) +---------+ <--(last full commit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) |reserved |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) |pending |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) |commit |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) +---------+ <--- tail pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) When the first writer commits::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) Buffer page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) +---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) |written |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) +---------+ <--(last full commit and tail pointer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) The commit pointer points to the last write location that was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) committed without preempting another write. When a write that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) preempted another write is committed, it only becomes a pending commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) and will not be a full commit until all writes have been committed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) The commit page points to the page that has the last full commit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) The tail page points to the page with the last write (before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) committing).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) The tail page is always equal to or after the commit page. It may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) be several pages ahead. If the tail page catches up to the commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) page then no more writes may take place (regardless of the mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) of the ring buffer: overwrite and produce/consumer).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) The order of pages is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) Possible scenario::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) head page commit page |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) v v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) There is a special case that the head page is after either the commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) and possibly the tail page. That is when the commit (and tail) page has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) swapped with the reader page. This is because the head page is always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) part of the ring buffer, but the reader page is not. Whenever there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) has been less than a full page that has been committed inside the ring buffer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) and a reader swaps out a page, it will be swapping out the commit page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) reader page commit page tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) v | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) +---+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) | |<----------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) | |<------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) | |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) In this case, the head page will not move when the tail and commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) move back into the ring buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) The reader cannot swap a page into the ring buffer if the commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) is still on that page. If the read meets the last commit (real commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) not pending or reserved), then there is nothing more to read.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) The buffer is considered empty until another full commit finishes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) When the tail meets the head page, if the buffer is in overwrite mode,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) the head page will be pushed ahead one. If the buffer is in producer/consumer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) mode, the write will fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) Overwrite mode::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) Note, the reader page will still point to the previous head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) But when a swap takes place, it will use the most recent head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) Making the Ring Buffer Lockless:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) The main idea behind the lockless algorithm is to combine the moving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) of the head_page pointer with the swapping of pages with the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) State flags are placed inside the pointer to the page. To do this,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) each page must be aligned in memory by 4 bytes. This will allow the 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) least significant bits of the address to be used as flags, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) they will always be zero for the address. To get the address,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) simply mask out the flags::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) MASK = ~3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) address & MASK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) Two flags will be kept by these two bits:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) HEADER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) - the page being pointed to is a head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) UPDATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) - the page being pointed to is being updated by a writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) and was or is about to be a head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) | |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) <---| |--->| |-H->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) The above pointer "-H->" would have the HEADER flag set. That is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) the next page is the next page to be swapped out by the reader.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) This pointer means the next page is the head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) When the tail page meets the head pointer, it will use cmpxchg to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) change the pointer to the UPDATE state::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) <---| |--->| |-H->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) <---| |--->| |-U->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) "-U->" represents a pointer in the UPDATE state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) Any access to the reader will need to take some sort of lock to serialize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) the readers. But the writers will never take a lock to write to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) ring buffer. This means we only need to worry about a single reader,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) and writes only preempt in "stack" formation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) When the reader tries to swap the page with the ring buffer, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) will also use cmpxchg. If the flag bit in the pointer to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) head page does not have the HEADER flag set, the compare will fail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) and the reader will need to look for the new head page and try again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) Note, the flags UPDATE and HEADER are never set at the same time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) The reader swaps the reader page as follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) |page |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) | |--->| |--->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) | |<---| |<---| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504) ^ | ^ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) | +---------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) +-----H-------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) The reader sets the reader page next pointer as HEADER to the page after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) the head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) |page |-------H-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) +------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) | +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) | | |--->| |--->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) | | |<---| |<---| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) | +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) | ^ | ^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) | | +---------------+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) | +-----H-------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) +--------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) It does a cmpxchg with the pointer to the previous head page to make it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) point to the reader page. Note that the new pointer does not have the HEADER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) flag set. This action atomically moves the head page forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) |page |-------H-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) +------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) | ^ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) | | | |-->| |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) | | | |<--| |<--| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) | | +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) | | | ^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) | | +-------------+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) | +-----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) After the new head page is set, the previous pointer of the head page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) updated to the reader page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) |reader| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) |page |-------H-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) | ^ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) | | | |-->| |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) | | | | | |<--| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) | | +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) | | | ^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) | | +-------------+ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) | +-----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) +------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) |buffer| RING BUFFER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) |page |-------H-----------+ <--- New head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) +------+ <---------------+ v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) | ^ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) | | | | | |-->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) | | New | | | |<--| |<-+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) | | Reader +---+ +---+ +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) | | page ----^ | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) | | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) | +-----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) +------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) Another important point: The page that the reader page points back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) by its previous pointer (the one that now points to the new head page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) never points back to the reader page. That is because the reader page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) not part of the ring buffer. Traversing the ring buffer via the next pointers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) will always stay in the ring buffer. Traversing the ring buffer via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576) prev pointers may not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578) Note, the way to determine a reader page is simply by examining the previous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) pointer of the page. If the next pointer of the previous page does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) point back to the original page, then the original page is a reader page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) +--------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) | reader | next +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) | page |-------->| |<====== (buffer page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) +--------+ +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) | | ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588) | v | next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) prev | +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) +------------->| |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) +----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) The way the head page moves forward:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) When the tail page meets the head page and the buffer is in overwrite mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) and more writes take place, the head page must be moved forward before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) writer may move the tail page. The way this is done is that the writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) performs a cmpxchg to convert the pointer to the head page from the HEADER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599) flag to have the UPDATE flag set. Once this is done, the reader will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) not be able to swap the head page from the buffer, nor will it be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601) move the head page, until the writer is finished with the move.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) This eliminates any races that the reader can have on the writer. The reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) must spin, and this is why the reader cannot preempt the writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) <---| |--->| |-H->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) <---| |--->| |-U->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) The following page will be made into the new head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) <---| |--->| |-U->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) After the new head page has been set, we can set the old head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) pointer back to NORMAL::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) <---| |--->| |--->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) After the head page has been moved, the tail page may now move forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) <---| |--->| |--->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654) The above are the trivial updates. Now for the more complex scenarios.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) As stated before, if enough writes preempt the first write, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) tail page may make it all the way around the buffer and meet the commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659) page. At this time, we must start dropping writes (usually with some kind
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) of warning to the user). But what happens if the commit was still on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) reader page? The commit page is not part of the ring buffer. The tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662) must account for this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) reader page commit page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) v |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) | |<----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) | |------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) +---+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676) <---| |--->| |-H->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683) If the tail page were to simply push the head page forward, the commit when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) leaving the reader page would not be pointing to the correct page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) The solution to this is to test if the commit page is on the reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) before pushing the head page. If it is, then it can be assumed that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688) tail page wrapped the buffer, and we must drop new writes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) This is not a race condition, because the commit page can only be moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) by the outermost writer (the writer that was preempted).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692) This means that the commit will not move while a writer is moving the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) tail page. The reader cannot swap the reader page if it is also being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) used as the commit page. The reader can simply check that the commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) is off the reader page. Once the commit page leaves the reader page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696) it will never go back on it unless a reader does another swap with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) buffer page that is also the commit page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) Nested writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) In the pushing forward of the tail page we must first push forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) the head page if the head page is the next page. If the head page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) is not the next page, the tail page is simply updated with a cmpxchg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) Only writers move the tail page. This must be done atomically to protect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) against nested writers::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) temp_page = tail_page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) next_page = temp_page->next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712) cmpxchg(tail_page, temp_page, next_page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) The above will update the tail page if it is still pointing to the expected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) page. If this fails, a nested write pushed it forward, the current write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) does not need to push it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) temp page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) Nested write comes in and moves the tail page forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) tail page (moved by nested writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) temp page |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735) v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) <---| |--->| |--->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) The above would fail the cmpxchg, but since the tail page has already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742) been moved forward, the writer will just try again to reserve storage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) on the new tail page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) But the moving of the head page is a bit more complex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751) <---| |--->| |-H->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) The write converts the head page pointer to UPDATE::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) <---| |--->| |-U->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765) But if a nested writer preempts here, it will see that the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) page is a head page, but it is also nested. It will detect that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) it is nested and will save that information. The detection is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) fact that it sees the UPDATE flag instead of a HEADER or NORMAL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) The nested writer will set the new head page pointer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) <---| |--->| |-U->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) But it will not reset the update back to normal. Only the writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) that converted a pointer from HEAD to UPDATE will convert it back
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783) to NORMAL::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789) <---| |--->| |-U->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) After the nested writer finishes, the outermost writer will convert
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) the UPDATE pointer to NORMAL::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) <---| |--->| |--->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806) It can be even more complex if several nested writes came in and moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807) the tail page ahead several pages::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810) (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816) <---| |--->| |-H->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820) The write converts the head page pointer to UPDATE::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826) <---| |--->| |-U->| |--->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) Next writer comes in, and sees the update and sets up the new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831) head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833) (second writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839) <---| |--->| |-U->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) The nested writer moves the tail page forward. But does not set the old
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844) update page to NORMAL because it is not the outermost writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850) <---| |--->| |-U->| |-H->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854) Another writer preempts and sees the page after the tail page is a head page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) It changes it from HEAD to UPDATE::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857) (third writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863) <---| |--->| |-U->| |-U->| |--->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) The writer will move the head page forward::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870) (third writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876) <---| |--->| |-U->| |-U->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) But now that the third writer did change the HEAD flag to UPDATE it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) will convert it to normal::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884) (third writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890) <---| |--->| |-U->| |--->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) Then it will move the tail page, and return back to the second writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898) (second writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904) <---| |--->| |-U->| |--->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909) The second writer will fail to move the tail page because it was already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910) moved, so it will try again and add its data to the new tail page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911) It will return to the first writer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914) (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920) <---| |--->| |-U->| |--->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924) The first writer cannot know atomically if the tail page moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) while it updates the HEAD page. It will then update the head page to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) what it thinks is the new head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929) (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931) tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935) <---| |--->| |-U->| |-H->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939) Since the cmpxchg returns the old value of the pointer the first writer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) will see it succeeded in updating the pointer from NORMAL to HEAD.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941) But as we can see, this is not good enough. It must also check to see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942) if the tail page is either where it use to be or on the next page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945) (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947) A B tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949) v v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951) <---| |--->| |-U->| |-H->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955) If tail page != A and tail page != B, then it must reset the pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) back to NORMAL. The fact that it only needs to worry about nested
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957) writers means that it only needs to check this after setting the HEAD page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960) (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962) A B tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964) v v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966) <---| |--->| |-U->| |--->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970) Now the writer can update the head page. This is also why the head page must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971) remain in UPDATE and only reset by the outermost writer. This prevents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972) the reader from seeing the incorrect head page::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975) (first writer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977) A B tail page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979) v v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980) +---+ +---+ +---+ +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981) <---| |--->| |--->| |--->| |-H->
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982) --->| |<---| |<---| |<---| |<---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983) +---+ +---+ +---+ +---+