Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) dm-zoned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) The dm-zoned device mapper target exposes a zoned block device (ZBC and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ZAC compliant devices) as a regular block device without any write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) pattern constraints. In effect, it implements a drive-managed zoned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) block device which hides from the user (a file system or an application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) doing raw block device accesses) the sequential write constraints of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) host-managed zoned block devices and can mitigate the potential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) device-side performance degradation due to excessive random writes on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) host-aware zoned block devices.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) For a more detailed description of the zoned block device models and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) their constraints see (for SCSI devices):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) https://www.t10.org/drafts.htm#ZBC_Family
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) and (for ATA devices):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) The dm-zoned implementation is simple and minimizes system overhead (CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) and memory usage as well as storage capacity loss). For a 10TB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) host-managed disk with 256 MB zones, dm-zoned memory usage per disk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) instance is at most 4.5 MB and as little as 5 zones will be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) internally for storing metadata and performaing reclaim operations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) dm-zoned target devices are formatted and checked using the dmzadm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) utility available at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) https://github.com/hgst/dm-zoned-tools
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) Algorithm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) dm-zoned implements an on-disk buffering scheme to handle non-sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) write accesses to the sequential zones of a zoned block device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) Conventional zones are used for caching as well as for storing internal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) metadata. It can also use a regular block device together with the zoned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) block device; in that case the regular block device will be split logically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) in zones with the same size as the zoned block device. These zones will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) placed in front of the zones from the zoned block device and will be handled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) just like conventional zones.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) The zones of the device(s) are separated into 2 types:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 1) Metadata zones: these are conventional zones used to store metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) Metadata zones are not reported as useable capacity to the user.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 2) Data zones: all remaining zones, the vast majority of which will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) sequential zones used exclusively to store user data. The conventional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) zones of the device may be used also for buffering user random writes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) Data in these zones may be directly mapped to the conventional zone, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) later moved to a sequential zone so that the conventional zone can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) reused for buffering incoming random writes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) dm-zoned exposes a logical device with a sector size of 4096 bytes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) irrespective of the physical sector size of the backend zoned block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) device being used. This allows reducing the amount of metadata needed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) manage valid blocks (blocks written).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) The on-disk metadata format is as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 1) The first block of the first conventional zone found contains the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) super block which describes the on disk amount and position of metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 2) Following the super block, a set of blocks is used to describe the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) mapping of the logical device blocks. The mapping is done per chunk of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) blocks, with the chunk size equal to the zoned block device size. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) mapping table is indexed by chunk number and each mapping entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) indicates the zone number of the device storing the chunk of data. Each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) mapping entry may also indicate if the zone number of a conventional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) zone used to buffer random modification to the data zone.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 3) A set of blocks used to store bitmaps indicating the validity of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) blocks in the data zones follows the mapping table. A valid block is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) defined as a block that was written and not discarded. For a buffered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) data chunk, a block is always valid only in the data zone mapping the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) chunk or in the buffer zone of the chunk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) For a logical chunk mapped to a conventional zone, all write operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) are processed by directly writing to the zone. If the mapping zone is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) sequential zone, the write operation is processed directly only if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) write offset within the logical chunk is equal to the write pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) offset within of the sequential data zone (i.e. the write operation is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) aligned on the zone write pointer). Otherwise, write operations are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) processed indirectly using a buffer zone. In that case, an unused
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) conventional zone is allocated and assigned to the chunk being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) accessed. Writing a block to the buffer zone of a chunk will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) automatically invalidate the same block in the sequential zone mapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) the chunk. If all blocks of the sequential zone become invalid, the zone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) is freed and the chunk buffer zone becomes the primary zone mapping the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) chunk, resulting in native random write performance similar to a regular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) block device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) Read operations are processed according to the block validity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) information provided by the bitmaps. Valid blocks are read either from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) the sequential zone mapping a chunk, or if the chunk is buffered, from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) the buffer zone assigned. If the accessed chunk has no mapping, or the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) accessed blocks are invalid, the read buffer is zeroed and the read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) operation terminated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) After some time, the limited number of convnetional zones available may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) be exhausted (all used to map chunks or buffer sequential zones) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) unaligned writes to unbuffered chunks become impossible. To avoid this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) situation, a reclaim process regularly scans used conventional zones and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) tries to reclaim the least recently used zones by copying the valid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) blocks of the buffer zone to a free sequential zone. Once the copy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) completes, the chunk mapping is updated to point to the sequential zone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) and the buffer zone freed for reuse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) Metadata Protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) To protect metadata against corruption in case of sudden power loss or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) system crash, 2 sets of metadata zones are used. One set, the primary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) set, is used as the main metadata region, while the secondary set is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) used as a staging area. Modified metadata is first written to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) secondary set and validated by updating the super block in the secondary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) set, a generation counter is used to indicate that this set contains the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) newest metadata. Once this operation completes, in place of metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) block updates can be done in the primary metadata set. This ensures that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) one of the set is always consistent (all modifications committed or none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) at all). Flush operations are used as a commit point. Upon reception of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) a flush request, metadata modification activity is temporarily blocked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) (for both incoming BIO processing and reclaim process) and all dirty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) metadata blocks are staged and updated. Normal operation is then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) resumed. Flushing metadata thus only temporarily delays write and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) discard requests. Read requests can be processed concurrently while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) metadata flush is being executed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) If a regular device is used in conjunction with the zoned block device,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) a third set of metadata (without the zone bitmaps) is written to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) start of the zoned block device. This metadata has a generation counter of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) '0' and will never be updated during normal operation; it just serves for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) identification purposes. The first and second copy of the metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) are located at the start of the regular block device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) A zoned block device must first be formatted using the dmzadm tool. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) will analyze the device zone configuration, determine where to place the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) metadata sets on the device and initialize the metadata sets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) Ex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 	dmzadm --format /dev/sdxx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) If two drives are to be used, both devices must be specified, with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) regular block device as the first device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Ex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 	dmzadm --format /dev/sdxx /dev/sdyy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) Fomatted device(s) can be started with the dmzadm utility, too.:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) Ex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 	dmzadm --start /dev/sdxx /dev/sdyy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) Information about the internal layout and current usage of the zones can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) be obtained with the 'status' callback from dmsetup:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) Ex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 	dmsetup status /dev/dm-X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) will return a line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 	0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) where <nr_zones> is the total number of zones, <nr_unmap_rnd> is the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) of unmapped (ie free) random zones, <nr_rnd> the total number of zones,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) <nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) total number of sequential zones.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) Normally the reclaim process will be started once there are less than 50
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) percent free random zones. In order to start the reclaim process manually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) even before reaching this threshold the 'dmsetup message' function can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) used:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) Ex::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 	dmsetup message /dev/dm-X 0 reclaim
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) will start the reclaim process and random zones will be moved to sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) zones.