^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Device-mapper "unstriped" target
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) The device-mapper "unstriped" target provides a transparent mechanism to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) unstripe a device-mapper "striped" target to access the underlying disks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) without having to touch the true backing block-device. It can also be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) used to unstripe a hardware RAID-0 to access backing disks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) Parameters:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) <number of stripes> <chunk size> <stripe #> <dev_path> <offset>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) <number of stripes>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) The number of stripes in the RAID 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) <chunk size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) The amount of 512B sectors in the chunk striping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) <dev_path>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) The block device you wish to unstripe.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) <stripe #>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) The stripe number within the device that corresponds to physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) drive you wish to unstripe. This must be 0 indexed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) Why use this module?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) An example of undoing an existing dm-stripe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) -------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) This small bash script will setup 4 loop devices and use the existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) striped target to combine the 4 devices into one. It then will use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) the unstriped target ontop of the striped device to access the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) individual backing loop devices. We write data to the newly exposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) unstriped devices and verify the data written matches the correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) underlying device on the striped array::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) #!/bin/bash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) MEMBER_SIZE=$((128 * 1024 * 1024))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) NUM=4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) SEQ_END=$((${NUM}-1))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) CHUNK=256
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) BS=4096
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) COUNT=$((${MEMBER_SIZE} / ${BS}))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) for i in $(seq 0 ${SEQ_END}); do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) losetup /dev/loop${i} member-${i}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) DM_PARMS+=" /dev/loop${i} 0"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) echo $DM_PARMS | dmsetup create raid0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) for i in $(seq 0 ${SEQ_END}); do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) done;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) for i in $(seq 0 ${SEQ_END}); do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) diff /dev/mapper/set-${i} member-${i}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) done;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) for i in $(seq 0 ${SEQ_END}); do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) dmsetup remove set-${i}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) dmsetup remove raid0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) for i in $(seq 0 ${SEQ_END}); do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) losetup -d /dev/loop${i}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) rm -f member-${i}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) Another example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) Intel NVMe drives contain two cores on the physical device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) Each core of the drive has segregated access to its LBA range.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) The current LBA model has a RAID 0 128k chunk on each core, resulting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) in a 256k stripe across the two cores::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) Core 0: Core 1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) __________ __________
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) | LBA 512| | LBA 768|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) | LBA 0 | | LBA 256|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) ---------- ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) The purpose of this unstriping is to provide better QoS in noisy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) neighbor environments. When two partitions are created on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) aggregate drive without this unstriping, reads on one partition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) can affect writes on another partition. This is because the partitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) are striped across the two cores. When we unstripe this hardware RAID 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) and make partitions on each new exposed device the two partitions are now
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) physically separated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) With the dm-unstriped target we're able to segregate an fio script that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) has read and write jobs that are independent of each other. Compared to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) when we run the test on a combined drive with partitions, we were able
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) to get a 92% reduction in read latency using this device mapper target.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Example dmsetup usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) unstriped ontop of Intel NVMe device that has 2 cores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) -----------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) There will now be two devices that expose Intel NVMe core 0 and 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) respectively::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) /dev/mapper/nvmset0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) /dev/mapper/nvmset1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) unstriped ontop of striped with 4 drives using 128K chunk size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) --------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'