^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Persistent data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) The more-sophisticated device-mapper targets require complex metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) that is managed in kernel. In late 2010 we were seeing that various
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) different targets were rolling their own data structures, for example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) - Mikulas Patocka's multisnap implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) - Heinz Mauelshagen's thin provisioning target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) - Another btree-based caching target posted to dm-devel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) - Another multi-snapshot target based on a design of Daniel Phillips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) Maintaining these data structures takes a lot of work, so if possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) we'd like to reduce the number.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) The persistent-data library is an attempt to provide a re-usable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) framework for people who want to store metadata in device-mapper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) targets. It's currently used by the thin-provisioning target and an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) upcoming hierarchical storage target.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) The main documentation is in the header files which can all be found
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) under drivers/md/persistent-data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) The block manager
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) dm-block-manager.[hc]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) This provides access to the data on disk in fixed sized-blocks. There
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) is a read/write locking interface to prevent concurrent accesses, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) keep data that is being used in the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Clients of persistent-data are unlikely to use this directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) The transaction manager
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) dm-transaction-manager.[hc]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) This restricts access to blocks and enforces copy-on-write semantics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) The only way you can get hold of a writable block through the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) transaction manager is by shadowing an existing block (ie. doing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) copy-on-write) or allocating a fresh one. Shadowing is elided within
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) the same transaction so performance is reasonable. The commit method
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) ensures that all data is flushed before it writes the superblock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) On power failure your metadata will be as it was when last committed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) The Space Maps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) dm-space-map.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) dm-space-map-metadata.[hc]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) dm-space-map-disk.[hc]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) On-disk data structures that keep track of reference counts of blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) Also acts as the allocator of new blocks. Currently two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) implementations: a simpler one for managing blocks on a different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) device (eg. thinly-provisioned data blocks); and one for managing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) the metadata space. The latter is complicated by the need to store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) its own data within the space it's managing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) The data structures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) dm-btree.[hc]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) dm-btree-remove.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) dm-btree-spine.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) dm-btree-internal.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Currently there is only one data structure, a hierarchical btree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) There are plans to add more. For example, something with an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) array-like interface would see a lot of use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) The btree is 'hierarchical' in that you can define it to be composed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) of nested btrees, and take multiple keys. For example, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) thin-provisioning target uses a btree with two levels of nesting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) The first maps a device id to a mapping tree, and that in turn maps a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) virtual block to a physical block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) Values stored in the btrees can have arbitrary size. Keys are always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) 64bits, although nesting allows you to use multiple keys.