^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0-only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) dm-clone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) dm-clone is a device mapper target which produces a one-to-one copy of an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) existing, read-only source device into a writable destination device: It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) presents a virtual block device which makes all data appear immediately, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) redirects reads and writes accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) The main use case of dm-clone is to clone a potentially remote, high-latency,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) read-only, archival-type block device into a writable, fast, primary-type device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) for fast, low-latency I/O. The cloned device is visible/mountable immediately
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) and the copy of the source device to the destination device happens in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) background, in parallel with user I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) For example, one could restore an application backup from a read-only copy,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) accessible through a network storage protocol (NBD, Fibre Channel, iSCSI, AoE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) etc.), into a local SSD or NVMe device, and start using the device immediately,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) without waiting for the restore to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) When the cloning completes, the dm-clone table can be removed altogether and be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) replaced, e.g., by a linear table, mapping directly to the destination device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) The dm-clone target reuses the metadata library used by the thin-provisioning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) target.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) Glossary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) Hydration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) The process of filling a region of the destination device with data from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) the same region of the source device, i.e., copying the region from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) source to the destination device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Once a region gets hydrated we redirect all I/O regarding it to the destination
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) Sub-devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) The target is constructed by passing three devices to it (along with other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) parameters detailed later):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) 1. A source device - the read-only device that gets cloned and source of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) hydration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) 2. A destination device - the destination of the hydration, which will become a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) clone of the source device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) 3. A small metadata device - it records which regions are already valid in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) destination device, i.e., which regions have already been hydrated, or have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) been written to directly, via user I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) The size of the destination device must be at least equal to the size of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) source device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) Regions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) dm-clone divides the source and destination devices in fixed sized regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) Regions are the unit of hydration, i.e., the minimum amount of data copied from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) the source to the destination device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) The region size is configurable when you first create the dm-clone device. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) recommended region size is the same as the file system block size, which usually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) is 4KB. The region size must be between 8 sectors (4KB) and 2097152 sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) (1GB) and a power of two.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Reads and writes from/to hydrated regions are serviced from the destination
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) A read to a not yet hydrated region is serviced directly from the source device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) A write to a not yet hydrated region will be delayed until the corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) region has been hydrated and the hydration of the region starts immediately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) Note that a write request with size equal to region size will skip copying of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) the corresponding region from the source device and overwrite the region of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) destination device directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) Discards
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) dm-clone interprets a discard request to a range that hasn't been hydrated yet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) as a hint to skip hydration of the regions covered by the request, i.e., it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) skips copying the region's data from the source to the destination device, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) only updates its metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) If the destination device supports discards, then by default dm-clone will pass
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) down discard requests to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Background Hydration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) dm-clone copies continuously from the source to the destination device, until
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) all of the device has been copied.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) Copying data from the source to the destination device uses bandwidth. The user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) can set a throttle to prevent more than a certain amount of copying occurring at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) any one time. Moreover, dm-clone takes into account user I/O traffic going to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) the devices and pauses the background hydration when there is I/O in-flight.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) A message `hydration_threshold <#regions>` can be used to set the maximum number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) of regions being copied, the default being 1 region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) dm-clone employs dm-kcopyd for copying portions of the source device to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) destination device. By default, we issue copy requests of size equal to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) region size. A message `hydration_batch_size <#regions>` can be used to tune the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) size of these copy requests. Increasing the hydration batch size results in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) dm-clone trying to batch together contiguous regions, so we copy the data in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) batches of this many regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) When the hydration of the destination device finishes, a dm event will be sent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) to user space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) Updating on-disk metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) On-disk metadata is committed every time a FLUSH or FUA bio is written. If no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) such requests are made then commits will occur every second. This means the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) dm-clone device behaves like a physical disk that has a volatile write cache. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) power is lost you may lose some recent writes. The metadata should always be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) consistent in spite of any crash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Target Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) Constructor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) clone <metadata dev> <destination dev> <source dev> <region size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) [<#feature args> [<feature arg>]* [<#core args> [<core arg>]*]]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) ================ ==============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) metadata dev Fast device holding the persistent metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) destination dev The destination device, where the source will be cloned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) source dev Read only device containing the data that gets cloned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) region size The size of a region in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) #feature args Number of feature arguments passed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) feature args no_hydration or no_discard_passdown
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) #core args An even number of arguments corresponding to key/value pairs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) passed to dm-clone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) core args Key/value pairs passed to dm-clone, e.g. `hydration_threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 256`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) ================ ==============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) Optional feature arguments are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) ==================== =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) no_hydration Create a dm-clone instance with background hydration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) no_discard_passdown Disable passing down discards to the destination device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) ==================== =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) Optional core arguments are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) ================================ ==============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) hydration_threshold <#regions> Maximum number of regions being copied from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) the source to the destination device at any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) one time, during background hydration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) hydration_batch_size <#regions> During background hydration, try to batch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) together contiguous regions, so we copy data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) from the source to the destination device in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) batches of this many regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) ================================ ==============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) Status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) ------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) <metadata block size> <#used metadata blocks>/<#total metadata blocks>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) <region size> <#hydrated regions>/<#total regions> <#hydrating regions>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) <#feature args> <feature args>* <#core args> <core args>*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) <clone metadata mode>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) ======================= =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) metadata block size Fixed block size for each metadata block in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) #used metadata blocks Number of metadata blocks used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) #total metadata blocks Total number of metadata blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) region size Configurable region size for the device in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) #hydrated regions Number of regions that have finished hydrating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) #total regions Total number of regions to hydrate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) #hydrating regions Number of regions currently hydrating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) #feature args Number of feature arguments to follow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) feature args Feature arguments, e.g. `no_hydration`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) #core args Even number of core arguments to follow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) core args Key/value pairs for tuning the core, e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) `hydration_threshold 256`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) clone metadata mode ro if read-only, rw if read-write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) In serious cases where even a read-only mode is deemed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) unsafe no further I/O will be permitted and the status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) will just contain the string 'Fail'. If the metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) mode changes, a dm event will be sent to user space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) ======================= =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Messages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) `disable_hydration`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) Disable the background hydration of the destination device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) `enable_hydration`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) Enable the background hydration of the destination device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) `hydration_threshold <#regions>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) Set background hydration threshold.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) `hydration_batch_size <#regions>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) Set background hydration batch size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) Clone a device containing a file system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 1. Create the dm-clone device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) dmsetup create clone --table "0 1048576000 clone $metadata_dev $dest_dev \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) $source_dev 8 1 no_hydration"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 2. Mount the device and trim the file system. dm-clone interprets the discards
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) sent by the file system and it will not hydrate the unused space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) mount /dev/mapper/clone /mnt/cloned-fs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) fstrim /mnt/cloned-fs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 3. Enable background hydration of the destination device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) dmsetup message clone 0 enable_hydration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 4. When the hydration finishes, we can replace the dm-clone table with a linear
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) table.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) dmsetup suspend clone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) dmsetup load clone --table "0 1048576000 linear $dest_dev 0"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) dmsetup resume clone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) The metadata device is no longer needed and can be safely discarded or reused
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) for other purposes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) Known issues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 1. We redirect reads, to not-yet-hydrated regions, to the source device. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) reading the source device has high latency and the user repeatedly reads from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) the same regions, this behaviour could degrade performance. We should use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) these reads as hints to hydrate the relevant regions sooner. Currently, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) rely on the page cache to cache these regions, so we hopefully don't end up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) reading them multiple times from the source device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) 2. Release in-core resources, i.e., the bitmaps tracking which regions are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) hydrated, after the hydration has finished.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 3. During background hydration, if we fail to read the source or write to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) destination device, we print an error message, but the hydration process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) continues indefinitely, until it succeeds. We should stop the background
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) hydration after a number of failures and emit a dm event for user space to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) notice.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) Why not...?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) We explored the following alternatives before implementing dm-clone:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) 1. Use dm-cache with cache size equal to the source device and implement a new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) cloning policy:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) * The resulting cache device is not a one-to-one mirror of the source device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) and thus we cannot remove the cache device once cloning completes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) * dm-cache writes to the source device, which violates our requirement that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) the source device must be treated as read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) * Caching is semantically different from cloning.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) 2. Use dm-snapshot with a COW device equal to the source device:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) * dm-snapshot stores its metadata in the COW device, so the resulting device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) is not a one-to-one mirror of the source device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) * No background copying mechanism.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) * dm-snapshot needs to commit its metadata whenever a pending exception
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) completes, to ensure snapshot consistency. In the case of cloning, we don't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) need to be so strict and can rely on committing metadata every time a FLUSH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) or FUA bio is written, or periodically, like dm-thin and dm-cache do. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) improves the performance significantly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) 3. Use dm-mirror: The mirror target has a background copying/mirroring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) mechanism, but it writes to all mirrors, thus violating our requirement that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) the source device must be treated as read-only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 4. Use dm-thin's external snapshot functionality. This approach is the most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) promising among all alternatives, as the thinly-provisioned volume is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) one-to-one mirror of the source device and handles reads and writes to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) un-provisioned/not-yet-cloned areas the same way as dm-clone does.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) Still:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) * There is no background copying mechanism, though one could be implemented.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) * Most importantly, we want to support arbitrary block devices as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) destination of the cloning process and not restrict ourselves to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) thinly-provisioned volumes. Thin-provisioning has an inherent metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) overhead, for maintaining the thin volume mappings, which significantly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) degrades performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) Moreover, cloning a device shouldn't force the use of thin-provisioning. On
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) the other hand, if we wish to use thin provisioning, we can just use a thin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) LV as dm-clone's destination device.