^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Partial Parity Log
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) addressed by PPL is that after a dirty shutdown, parity of a particular stripe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) may become inconsistent with data on other member disks. If the array is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) in degraded state, there is no way to recalculate parity, because one of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) disks is missing. This can lead to silent data corruption when rebuilding the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) array or using it is as degraded - data calculated from parity for array blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) that have not been touched by a write request during the unclean shutdown can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) be incorrect. Such condition is known as the RAID5 Write Hole. Because of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) this, md by default does not allow starting a dirty degraded array.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Partial parity for a write operation is the XOR of stripe data chunks not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) modified by this write. It is just enough data needed for recovering from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) write hole. XORing partial parity with the modified chunks produces parity for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) the stripe, consistent with its state before the write operation, regardless of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) which chunk writes have completed. If one of the not modified data disks of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) this stripe is missing, this updated parity can be used to recover its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) contents. PPL recovery is also performed when starting an array after an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) unclean shutdown and all disks are available, eliminating the need to resync
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) the array. Because of this, using write-intent bitmap and PPL together is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) supported.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) When handling a write request PPL writes partial parity before new data and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) parity are dispatched to disks. PPL is a distributed log - it is stored on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) array member drives in the metadata area, on the parity drive of a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) stripe. It does not require a dedicated journaling drive. Write performance is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) reduced by up to 30%-40% but it scales with the number of drives in the array
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) and the journaling drive does not become a bottleneck or a single point of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) failure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) not a true journal. It does not protect from losing in-flight data, only from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) performed for this stripe (parity is not updated). So it is possible to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) arbitrary data in the written part of a stripe if that disk is lost. In such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) case the behavior is the same as in plain raid5.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) PPL is available for md version-1 metadata and external (specifically IMSM)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) There is a limitation of maximum 64 disks in the array for PPL. It allows to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) keep data structures and implementation simple. RAID5 arrays with so many disks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) are not likely due to high risk of multiple disks failure. Such restriction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) should not be a real life limitation.