Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) dm-cache is a device mapper target written by Joe Thornber, Heinz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Mauelshagen, and Mike Snitzer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) It aims to improve performance of a block device (eg, a spindle) by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) dynamically migrating some of its data to a faster, smaller device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) (eg, an SSD).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) This device-mapper solution allows us to insert this caching at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) different levels of the dm stack, for instance above the data device for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) a thin-provisioning pool.  Caching solutions that are integrated more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) closely with the virtual memory system should give better performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) The target reuses the metadata library used in the thin-provisioning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) library.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) The decision as to what data to migrate and when is left to a plug-in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) policy module.  Several of these have been written as we experiment,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) and we hope other people will contribute others for specific io
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) scenarios (eg. a vm image server).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Glossary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)   Migration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 	       Movement of the primary copy of a logical block from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 	       device to the other.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)   Promotion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 	       Migration from slow device to fast device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)   Demotion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 	       Migration from fast device to slow device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) The origin device always contains a copy of the logical block, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) may be out of date or kept in sync with the copy on the cache device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) (depending on policy).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) Sub-devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) The target is constructed by passing three devices to it (along with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) other parameters detailed later):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 1. An origin device - the big, slow one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 2. A cache device - the small, fast one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 3. A small metadata device - records which blocks are in the cache,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)    which are dirty, and extra hints for use by the policy object.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)    This information could be put on the cache device, but having it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)    separate allows the volume manager to configure it differently,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)    e.g. as a mirror for extra robustness.  This metadata device may only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)    be used by a single cache device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) Fixed block size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) The origin is divided up into blocks of a fixed size.  This block size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) is configurable when you first create the cache.  Typically we've been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) using block sizes of 256KB - 1024KB.  The block size must be between 64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) Having a fixed block size simplifies the target a lot.  But it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) something of a compromise.  For instance, a small part of a block may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) getting hit a lot, yet the whole block will be promoted to the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) So large block sizes are bad because they waste cache space.  And small
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) block sizes are bad because they increase the amount of metadata (both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) in core and on disk).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) Cache operating modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) The cache has three operating modes: writeback, writethrough and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) passthrough.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) If writeback, the default, is selected then a write to a block that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) cached will go only to the cache and the block will be marked dirty in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) the metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) If writethrough is selected then a write to a cached block will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) complete until it has hit both the origin and cache devices.  Clean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) blocks should remain clean.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) If passthrough is selected, useful when the cache contents are not known
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) to be coherent with the origin device, then all reads are served from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) the origin device (all reads miss the cache) and all writes are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) forwarded to the origin device; additionally, write hits cause cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) block invalidates.  To enable passthrough mode the cache must be clean.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) Passthrough mode allows a cache device to be activated without having to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) worry about coherency.  Coherency that exists is maintained, although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) the cache will gradually cool as writes take place.  If the coherency of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) the cache can later be verified, or established through use of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) "invalidate_cblocks" message, the cache device can be transitioned to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) writethrough or writeback mode while still warm.  Otherwise, the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) contents can be discarded prior to transitioning to the desired
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) operating mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) A simple cleaner policy is provided, which will clean (write back) all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) dirty blocks in a cache.  Useful for decommissioning a cache or when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) shrinking a cache.  Shrinking the cache's fast device requires all cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) blocks, in the area of the cache being removed, to be clean.  If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) area being removed from the cache still contains dirty blocks the resize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) will fail.  Care must be taken to never reduce the volume used for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) cache's fast device until the cache is clean.  This is of particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) importance if writeback mode is used.  Writethrough and passthrough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) modes already maintain a clean cache.  Future support to partially clean
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) the cache, above a specified threshold, will allow for keeping the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) warm and in writeback mode during resize.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Migration throttling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Migrating data between the origin and cache device uses bandwidth.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) The user can set a throttle to prevent more than a certain amount of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) migration occurring at any one time.  Currently we're not taking any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) account of normal io traffic going to the devices.  More work needs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) doing here to avoid migrating during those peak io moments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) For the time being, a message "migration_threshold <#sectors>"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) can be used to set the maximum number of sectors being migrated,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) the default being 2048 sectors (1MB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Updating on-disk metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) On-disk metadata is committed every time a FLUSH or FUA bio is written.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) If no such requests are made then commits will occur every second.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) means the cache behaves like a physical disk that has a volatile write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) cache.  If power is lost you may lose some recent writes.  The metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) should always be consistent in spite of any crash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) The 'dirty' state for a cache block changes far too frequently for us
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) to keep updating it on the fly.  So we treat it as a hint.  In normal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) operation it will be written when the dm device is suspended.  If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) system crashes all cache blocks will be assumed dirty when restarted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) Per-block policy hints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) Policy plug-ins can store a chunk of data per cache block.  It's up to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) the policy how big this chunk is, but it should be kept small.  Like the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) dirty flags this data is lost if there's a crash so a safe fallback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) value should always be possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) Policy hints affect performance, not correctness.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) Policy messaging
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) Policies will have different tunables, specific to each one, so we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) need a generic way of getting and setting these.  Device-mapper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) messages are used.  Refer to cache-policies.txt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) Discard bitset resolution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) We can avoid copying data during migration if we know the block has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) been discarded.  A prime example of this is when mkfs discards the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) whole block device.  We store a bitset tracking the discard state of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) blocks.  However, we allow this bitset to have a different block size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) from the cache blocks.  This is because we need to track the discard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) state for all of the origin device (compare with the dirty bitset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) which is just for the smaller cache device).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Target interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Constructor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)   ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)    cache <metadata dev> <cache dev> <origin dev> <block size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)          <#feature args> [<feature arg>]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)          <policy> <#policy args> [policy args]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)  ================ =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)  metadata dev     fast device holding the persistent metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)  cache dev	  fast device holding cached data blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)  origin dev	  slow device holding original data blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)  block size       cache unit size in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)  #feature args    number of feature arguments passed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)  feature args     writethrough or passthrough (The default is writeback.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)  policy           the replacement policy to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)  #policy args     an even number of arguments corresponding to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)                   key/value pairs passed to the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)  policy args      key/value pairs passed to the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 		  E.g. 'sequential_threshold 1024'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 		  See cache-policies.txt for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)  ================ =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) Optional feature arguments are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)    ==================== ========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)    writethrough		write through caching that prohibits cache block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 			content from being different from origin block content.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 			Without this argument, the default behaviour is to write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 			back cache block contents later for performance reasons,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 			so they may differ from the corresponding origin blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)    passthrough		a degraded mode useful for various cache coherency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 			situations (e.g., rolling back snapshots of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 			underlying storage).	 Reads and writes always go to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 			the origin.	If a write goes to a cached origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 			block, then the cache block is invalidated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 			To enable passthrough mode the cache must be clean.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)    metadata2		use version 2 of the metadata.  This stores the dirty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 			bits in a separate btree, which improves speed of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 			shutting down the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)    no_discard_passdown	disable passing down discards from the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 			to the origin's data device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)    ==================== ========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) A policy called 'default' is always registered.  This is an alias for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) the policy we currently think is giving best all round performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) As the default policy could vary between kernels, if you are relying on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) the characteristics of a specific policy, always request it by name.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) ------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)   <metadata block size> <#used metadata blocks>/<#total metadata blocks>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)   <cache block size> <#used cache blocks>/<#total cache blocks>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)   <#read hits> <#read misses> <#write hits> <#write misses>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)   <#demotions> <#promotions> <#dirty> <#features> <features>*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)   <#core args> <core args>* <policy name> <#policy args> <policy args>*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)   <cache metadata mode>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) ========================= =====================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) metadata block size	  Fixed block size for each metadata block in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 			  sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) #used metadata blocks	  Number of metadata blocks used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) #total metadata blocks	  Total number of metadata blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) cache block size	  Configurable block size for the cache device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 			  in sectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) #used cache blocks	  Number of blocks resident in the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) #total cache blocks	  Total number of cache blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) #read hits		  Number of times a READ bio has been mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) 			  to the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) #read misses		  Number of times a READ bio has been mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 			  to the origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) #write hits		  Number of times a WRITE bio has been mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 			  to the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) #write misses		  Number of times a WRITE bio has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) 			  mapped to the origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) #demotions		  Number of times a block has been removed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 			  from the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) #promotions		  Number of times a block has been moved to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) 			  the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) #dirty			  Number of blocks in the cache that differ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 			  from the origin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) #feature args		  Number of feature args to follow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) feature args		  'writethrough' (optional)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) #core args		  Number of core arguments (must be even)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) core args		  Key/value pairs for tuning the core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 			  e.g. migration_threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) policy name		  Name of the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) #policy args		  Number of policy arguments to follow (must be even)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) policy args		  Key/value pairs e.g. sequential_threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) cache metadata mode       ro if read-only, rw if read-write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 			  In serious cases where even a read-only mode is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) 			  deemed unsafe no further I/O will be permitted and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 			  the status will just contain the string 'Fail'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 			  The userspace recovery tools should then be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) needs_check		  'needs_check' if set, '-' if not set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 			  A metadata operation has failed, resulting in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 			  needs_check flag being set in the metadata's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 			  superblock.  The metadata device must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) 			  deactivated and checked/repaired before the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) 			  cache can be made fully operational again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 			  '-' indicates	needs_check is not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) ========================= =====================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) Messages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) Policies will have different tunables, specific to each one, so we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) need a generic way of getting and setting these.  Device-mapper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) messages are used.  (A sysfs interface would also be possible.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) The message format is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)    <key> <value>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) E.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)    dmsetup message my_cache 0 sequential_threshold 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) Invalidation is removing an entry from the cache without writing it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) back.  Cache blocks can be invalidated via the invalidate_cblocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) message, which takes an arbitrary number of cblock ranges.  Each cblock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) range's end value is "one past the end", meaning 5-10 expresses a range
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) of values from 5 to 9.  Each cblock must be expressed as a decimal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) value, in the future a variant message that takes cblock ranges
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) expressed in hexadecimal may be needed to better support efficient
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) invalidation of larger caches.  The cache must be in passthrough mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) when invalidate_cblocks is used::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)    invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) E.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)    dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) The test suite can be found here:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) https://github.com/jthornber/device-mapper-test-suite
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)   dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) 	  /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)   dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) 	  /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 	  mq 4 sequential_threshold 1024 random_threshold 8'