^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) dm-cache is a device mapper target written by Joe Thornber, Heinz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) Mauelshagen, and Mike Snitzer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) It aims to improve performance of a block device (eg, a spindle) by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) dynamically migrating some of its data to a faster, smaller device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) (eg, an SSD).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) This device-mapper solution allows us to insert this caching at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) different levels of the dm stack, for instance above the data device for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) a thin-provisioning pool. Caching solutions that are integrated more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) closely with the virtual memory system should give better performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) The target reuses the metadata library used in the thin-provisioning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) library.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) The decision as to what data to migrate and when is left to a plug-in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) policy module. Several of these have been written as we experiment,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) and we hope other people will contribute others for specific io
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) scenarios (eg. a vm image server).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) Glossary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) Migration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) Movement of the primary copy of a logical block from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) device to the other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) Promotion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) Migration from slow device to fast device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Demotion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) Migration from fast device to slow device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) The origin device always contains a copy of the logical block, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) may be out of date or kept in sync with the copy on the cache device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) (depending on policy).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) Sub-devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) The target is constructed by passing three devices to it (along with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) other parameters detailed later):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) 1. An origin device - the big, slow one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) 2. A cache device - the small, fast one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) 3. A small metadata device - records which blocks are in the cache,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) which are dirty, and extra hints for use by the policy object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) This information could be put on the cache device, but having it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) separate allows the volume manager to configure it differently,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) e.g. as a mirror for extra robustness. This metadata device may only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) be used by a single cache device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) Fixed block size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) The origin is divided up into blocks of a fixed size. This block size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) is configurable when you first create the cache. Typically we've been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) using block sizes of 256KB - 1024KB. The block size must be between 64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) Having a fixed block size simplifies the target a lot. But it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) something of a compromise. For instance, a small part of a block may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) getting hit a lot, yet the whole block will be promoted to the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) So large block sizes are bad because they waste cache space. And small
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) block sizes are bad because they increase the amount of metadata (both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) in core and on disk).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) Cache operating modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) The cache has three operating modes: writeback, writethrough and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) passthrough.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) If writeback, the default, is selected then a write to a block that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) cached will go only to the cache and the block will be marked dirty in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) the metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) If writethrough is selected then a write to a cached block will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) complete until it has hit both the origin and cache devices. Clean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) blocks should remain clean.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) If passthrough is selected, useful when the cache contents are not known
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) to be coherent with the origin device, then all reads are served from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) the origin device (all reads miss the cache) and all writes are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) forwarded to the origin device; additionally, write hits cause cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) block invalidates. To enable passthrough mode the cache must be clean.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Passthrough mode allows a cache device to be activated without having to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) worry about coherency. Coherency that exists is maintained, although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) the cache will gradually cool as writes take place. If the coherency of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) the cache can later be verified, or established through use of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) "invalidate_cblocks" message, the cache device can be transitioned to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) writethrough or writeback mode while still warm. Otherwise, the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) contents can be discarded prior to transitioning to the desired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) operating mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) A simple cleaner policy is provided, which will clean (write back) all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) dirty blocks in a cache. Useful for decommissioning a cache or when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) shrinking a cache. Shrinking the cache's fast device requires all cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) blocks, in the area of the cache being removed, to be clean. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) area being removed from the cache still contains dirty blocks the resize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) will fail. Care must be taken to never reduce the volume used for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) cache's fast device until the cache is clean. This is of particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) importance if writeback mode is used. Writethrough and passthrough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) modes already maintain a clean cache. Future support to partially clean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) the cache, above a specified threshold, will allow for keeping the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) warm and in writeback mode during resize.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Migration throttling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Migrating data between the origin and cache device uses bandwidth.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) The user can set a throttle to prevent more than a certain amount of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) migration occurring at any one time. Currently we're not taking any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) account of normal io traffic going to the devices. More work needs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) doing here to avoid migrating during those peak io moments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) For the time being, a message "migration_threshold <#sectors>"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) can be used to set the maximum number of sectors being migrated,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) the default being 2048 sectors (1MB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Updating on-disk metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) On-disk metadata is committed every time a FLUSH or FUA bio is written.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) If no such requests are made then commits will occur every second. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) means the cache behaves like a physical disk that has a volatile write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) cache. If power is lost you may lose some recent writes. The metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) should always be consistent in spite of any crash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) The 'dirty' state for a cache block changes far too frequently for us
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) to keep updating it on the fly. So we treat it as a hint. In normal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) operation it will be written when the dm device is suspended. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) system crashes all cache blocks will be assumed dirty when restarted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) Per-block policy hints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) Policy plug-ins can store a chunk of data per cache block. It's up to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) the policy how big this chunk is, but it should be kept small. Like the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) dirty flags this data is lost if there's a crash so a safe fallback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) value should always be possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) Policy hints affect performance, not correctness.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) Policy messaging
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) Policies will have different tunables, specific to each one, so we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) need a generic way of getting and setting these. Device-mapper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) messages are used. Refer to cache-policies.txt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Discard bitset resolution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) We can avoid copying data during migration if we know the block has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) been discarded. A prime example of this is when mkfs discards the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) whole block device. We store a bitset tracking the discard state of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) blocks. However, we allow this bitset to have a different block size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) from the cache blocks. This is because we need to track the discard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) state for all of the origin device (compare with the dirty bitset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) which is just for the smaller cache device).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Target interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Constructor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) cache <metadata dev> <cache dev> <origin dev> <block size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) <#feature args> [<feature arg>]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) <policy> <#policy args> [policy args]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) ================ =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) metadata dev fast device holding the persistent metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) cache dev fast device holding cached data blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) origin dev slow device holding original data blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) block size cache unit size in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) #feature args number of feature arguments passed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) feature args writethrough or passthrough (The default is writeback.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) policy the replacement policy to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) #policy args an even number of arguments corresponding to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) key/value pairs passed to the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) policy args key/value pairs passed to the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) E.g. 'sequential_threshold 1024'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) See cache-policies.txt for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) ================ =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) Optional feature arguments are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) ==================== ========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) writethrough write through caching that prohibits cache block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) content from being different from origin block content.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) Without this argument, the default behaviour is to write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) back cache block contents later for performance reasons,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) so they may differ from the corresponding origin blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) passthrough a degraded mode useful for various cache coherency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) situations (e.g., rolling back snapshots of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) underlying storage). Reads and writes always go to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) the origin. If a write goes to a cached origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) block, then the cache block is invalidated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) To enable passthrough mode the cache must be clean.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) metadata2 use version 2 of the metadata. This stores the dirty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) bits in a separate btree, which improves speed of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) shutting down the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) no_discard_passdown disable passing down discards from the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) to the origin's data device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) ==================== ========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) A policy called 'default' is always registered. This is an alias for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) the policy we currently think is giving best all round performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) As the default policy could vary between kernels, if you are relying on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) the characteristics of a specific policy, always request it by name.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) ------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) <metadata block size> <#used metadata blocks>/<#total metadata blocks>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) <cache block size> <#used cache blocks>/<#total cache blocks>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) <#read hits> <#read misses> <#write hits> <#write misses>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) <#demotions> <#promotions> <#dirty> <#features> <features>*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) <#core args> <core args>* <policy name> <#policy args> <policy args>*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) <cache metadata mode>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) ========================= =====================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) metadata block size Fixed block size for each metadata block in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) #used metadata blocks Number of metadata blocks used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) #total metadata blocks Total number of metadata blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) cache block size Configurable block size for the cache device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) #used cache blocks Number of blocks resident in the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) #total cache blocks Total number of cache blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) #read hits Number of times a READ bio has been mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) to the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) #read misses Number of times a READ bio has been mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) to the origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) #write hits Number of times a WRITE bio has been mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) to the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) #write misses Number of times a WRITE bio has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) mapped to the origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) #demotions Number of times a block has been removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) from the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) #promotions Number of times a block has been moved to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) #dirty Number of blocks in the cache that differ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) from the origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) #feature args Number of feature args to follow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) feature args 'writethrough' (optional)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) #core args Number of core arguments (must be even)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) core args Key/value pairs for tuning the core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) e.g. migration_threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) policy name Name of the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) #policy args Number of policy arguments to follow (must be even)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) policy args Key/value pairs e.g. sequential_threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) cache metadata mode ro if read-only, rw if read-write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) In serious cases where even a read-only mode is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) deemed unsafe no further I/O will be permitted and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) the status will just contain the string 'Fail'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) The userspace recovery tools should then be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) needs_check 'needs_check' if set, '-' if not set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) A metadata operation has failed, resulting in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) needs_check flag being set in the metadata's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) superblock. The metadata device must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) deactivated and checked/repaired before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) cache can be made fully operational again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) '-' indicates needs_check is not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) ========================= =====================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) Messages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) Policies will have different tunables, specific to each one, so we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) need a generic way of getting and setting these. Device-mapper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) messages are used. (A sysfs interface would also be possible.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) The message format is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) <key> <value>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) E.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) dmsetup message my_cache 0 sequential_threshold 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) Invalidation is removing an entry from the cache without writing it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) back. Cache blocks can be invalidated via the invalidate_cblocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) message, which takes an arbitrary number of cblock ranges. Each cblock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) range's end value is "one past the end", meaning 5-10 expresses a range
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) of values from 5 to 9. Each cblock must be expressed as a decimal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) value, in the future a variant message that takes cblock ranges
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) expressed in hexadecimal may be needed to better support efficient
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) invalidation of larger caches. The cache must be in passthrough mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) when invalidate_cblocks is used::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) E.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) The test suite can be found here:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) https://github.com/jthornber/device-mapper-test-suite
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) mq 4 sequential_threshold 1024 random_threshold 8'