Orange Pi5 kernel

^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) dm-log-writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) This target takes 2 devices, one to pass all IO to normally, and one to log all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) of the write operations to.  This is intended for file system developers wishing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) to verify the integrity of metadata or data as the file system is written to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) There is a log_write_entry written for every WRITE request and the target is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) able to take arbitrary data from userspace to insert into the log.  The data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) that is in the WRITE requests is copied into the log to make the replay happen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) exactly as it happened originally.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) Log Ordering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) We log things in order of completion once we are sure the write is no longer in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) cache.  This means that normal WRITE requests are not actually logged until the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) the log in a way that correlates to what is on disk and not what is in cache,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) to make it easier to detect improper waiting/flushing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) This works by attaching all WRITE requests to a list once the write completes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) Once we see a REQ_PREFLUSH request we splice this list onto the request and once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) simulate the worst case scenario with regard to power failures.  Consider the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) following example (W means write, C means complete):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 	W1,W2,W3,C3,C2,Wflush,C1,Cflush
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) The log would show the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 	W3,W2,flush,W1....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) Again this is to simulate what is actually on disk, this allows us to detect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) cases where a power failure at a particular point in time would create an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) inconsistent file system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) they complete as those requests will obviously bypass the device cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) have all the DISCARD requests, and then the WRITE requests and then the FLUSH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) request.  Consider the following example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 	WRITE block 1, DISCARD block 1, FLUSH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) If we logged DISCARD when it completed, the replay would look like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 	DISCARD 1, WRITE 1, FLUSH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) which isn't quite what happened and wouldn't be caught during the log replay.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) Target interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) i) Constructor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)    log-writes <dev_path> <log_dev_path>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)    ============= ==============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)    dev_path	 Device that all of the IO will go to normally.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)    log_dev_path  Device where the log entries are written to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)    ============= ==============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) ii) Status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)     <#logged entries> <highest allocated sector>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)     =========================== ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)     #logged entries	        Number of logged entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)     highest allocated sector    Highest allocated sector
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)     =========================== ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) iii) Messages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)     mark <description>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 	You can use a dmsetup message to set an arbitrary mark in a log.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 	For example say you want to fsck a file system after every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 	write, but first you need to replay up to the mkfs to make sure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 	we're fsck'ing something reasonable, you would do something like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 	this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 	  mkfs.btrfs -f /dev/mapper/log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 	  dmsetup message log 0 mark mkfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 	  <run test>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 	This would allow you to replay the log up to the mkfs mark and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 	then replay from that point on doing the fsck check in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 	interval that you want.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 	Every log has a mark at the end labeled "dm-log-writes-end".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) Userspace component
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) There is a userspace tool that will replay the log for you in various ways.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) It can be found here: https://github.com/josefbacik/log-writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) Example usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Say you want to test fsync on your file system.  You would do something like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)   TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)   dmsetup create log --table "$TABLE"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)   mkfs.btrfs -f /dev/mapper/log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)   dmsetup message log 0 mark mkfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)   mount /dev/mapper/log /mnt/btrfs-test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)   <some test that does fsync at the end>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)   dmsetup message log 0 mark fsync
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)   md5sum /mnt/btrfs-test/foo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)   umount /mnt/btrfs-test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)   dmsetup remove log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)   replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)   mount /dev/sdb /mnt/btrfs-test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)   md5sum /mnt/btrfs-test/foo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)   <verify md5sum's are correct>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)   Another option is to do a complicated file system operation and verify the file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)   system is consistent during the entire operation.  You could do this with:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)   TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)   dmsetup create log --table "$TABLE"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)   mkfs.btrfs -f /dev/mapper/log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)   dmsetup message log 0 mark mkfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)   mount /dev/mapper/log /mnt/btrfs-test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)   <fsstress to dirty the fs>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)   btrfs filesystem balance /mnt/btrfs-test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)   umount /mnt/btrfs-test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)   dmsetup remove log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)   replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)   btrfsck /dev/sdb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)   replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 	--fsck "btrfsck /dev/sdb" --check fua
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) And that will replay the log until it sees a FUA request, run the fsck command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) and if the fsck passes it will replay to the next FUA, until it is completed or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) the fsck command exists abnormally.