Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Guidance for writing policies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Try to keep transactionality out of it.  The core is careful to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) avoid asking about anything that is migrating.  This is a pain, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) makes it easier to write the policies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) Mappings are loaded into the policy at construction time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) Every bio that is mapped by the target is referred to the policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) The policy can return a simple HIT or MISS or issue a migration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) Currently there's no way for the policy to issue background work,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) e.g. to start writing back dirty blocks that are going to be evicted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) soon.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Because we map bios, rather than requests it's easy for the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) to get fooled by many small bios.  For this reason the core target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) issues periodic ticks to the policy.  It's suggested that the policy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) doesn't update states (eg, hit counts) for a block more than once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) for each tick.  The core ticks by watching bios complete, and so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) trying to see when the io scheduler has let the ios run.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) Overview of supplied cache replacement policies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) multiqueue (mq)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) This policy is now an alias for smq (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) The following tunables are accepted, but have no effect::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 	'sequential_threshold <#nr_sequential_ios>'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 	'random_threshold <#nr_random_ios>'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 	'read_promote_adjustment <value>'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 	'write_promote_adjustment <value>'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 	'discard_promote_adjustment <value>'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) Stochastic multiqueue (smq)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) ---------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) This policy is the default.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) The stochastic multi-queue (smq) policy addresses some of the problems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) with the multiqueue (mq) policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) The smq policy (vs mq) offers the promise of less memory utilization,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) improved performance and increased adaptability in the face of changing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) workloads.  smq also does not have any cumbersome tuning knobs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) Users may switch from "mq" to "smq" simply by appropriately reloading a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) DM table that is using the cache target.  Doing so will cause all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) mq policy's hints to be dropped.  Also, performance of the cache may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) degrade slightly until smq recalculates the origin device's hotspots
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) that should be cached.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Memory usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) ^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) The mq policy used a lot of memory; 88 bytes per cache block on a 64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) bit machine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) smq uses 28bit indexes to implement its data structures rather than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) pointers.  It avoids storing an explicit hit count for each block.  It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) the entries (each hotspot block covers a larger area than a single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) cache block).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) All this means smq uses ~25bytes per cache block.  Still a lot of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) memory, but a substantial improvement nontheless.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) Level balancing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) ^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) mq placed entries in different levels of the multiqueue structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) based on their hit count (~ln(hit count)).  This meant the bottom
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) levels generally had the most entries, and the top ones had very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) few.  Having unbalanced levels like this reduced the efficacy of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) multiqueue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) smq does not maintain a hit count, instead it swaps hit entries with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) the least recently used entry from the level above.  The overall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) ordering being a side effect of this stochastic process.  With this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) scheme we can decide how many entries occupy each multiqueue level,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) resulting in better promotion/demotion decisions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) Adaptability:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) The mq policy maintained a hit count for each cache block.  For a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) different block to get promoted to the cache its hit count has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) exceed the lowest currently in the cache.  This meant it could take a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) long time for the cache to adapt between varying IO patterns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) smq doesn't maintain hit counts, so a lot of this problem just goes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) away.  In addition it tracks performance of the hotspot queue, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) is used to decide which blocks to promote.  If the hotspot queue is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) performing badly then it starts moving entries more quickly between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) levels.  This lets it adapt to new IO patterns very quickly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) Performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) Testing smq shows substantially better performance than mq.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) cleaner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) The cleaner writes back all dirty blocks in a cache to decommission it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) The syntax for a table is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 	cache <metadata dev> <cache dev> <origin dev> <block size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 	<#feature_args> [<feature arg>]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 	<policy> <#policy_args> [<policy arg>]*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) The syntax to send a message using the dmsetup command is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 	dmsetup message <mapped device> 0 sequential_threshold 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 	dmsetup message <mapped device> 0 random_threshold 8
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) Using dmsetup::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 	creates a 128GB large mapped device named 'blah' with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 	sequential threshold set to 1024 and the random_threshold set to 8.