^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ==========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Explicit volatile write back cache control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Many storage devices, especially in the consumer market, come with volatile
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) write back caches. That means the devices signal I/O completion to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) operating system before data actually has hit the non-volatile storage. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) behavior obviously speeds up various workloads, but it means the operating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) system needs to force data out to the non-volatile storage when it performs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) a data integrity operation like fsync, sync or an unmount.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) The Linux block layer provides two simple mechanisms that let filesystems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) control the caching behavior of the storage device. These mechanisms are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) a forced cache flush, and the Force Unit Access (FUA) flag for requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) Explicit cache flushes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) the filesystem and will make sure the volatile cache of the storage device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) has been flushed before the actual I/O operation is started. This explicitly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) guarantees that previously completed write requests are on non-volatile
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) set on an otherwise empty bio structure, which causes only an explicit cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) flush without any dependent I/O. It is recommend to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) the blkdev_issue_flush() helper for a pure cache flush.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) Forced Unit Access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) filesystem and will make sure that I/O completion for this request is only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) signaled after the data has been committed to non-volatile storage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Implementation details for filesystems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) worry if the underlying devices need any explicit cache flushing and how
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) may both be set on a single bio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) Implementation details for bio based block drivers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) --------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) directly below the submit_bio interface. For remapping drivers the REQ_FUA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) bits need to be propagated to underlying devices, and a global flush needs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) to be implemented for bios with the REQ_PREFLUSH bit set. For real device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) data can be completed successfully without doing any work. Drivers for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) devices with volatile caches need to implement the support for these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) flags themselves without any help from the block layer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Implementation details for request_fn based block drivers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) ---------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) For devices that do not support volatile write caches there is no driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) support required, the block layer completes empty REQ_PREFLUSH requests before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) requests that have a payload. For devices with volatile write caches the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) driver needs to tell the block layer that it supports flushing caches by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) doing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) blk_queue_write_cache(sdkp->disk->queue, true, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) REQ_PREFLUSH requests with a payload are automatically turned into a sequence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) of an empty REQ_OP_FLUSH request followed by the actual write by the block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) layer. For devices that also support the FUA bit the block layer needs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) to be told to pass through the REQ_FUA bit using::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) blk_queue_write_cache(sdkp->disk->queue, true, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) and the driver must handle write requests that have the REQ_FUA bit set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) in prep_fn/request_fn. If the FUA bit is not natively supported the block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) layer turns it into an empty REQ_OP_FLUSH request after the actual write.