^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Data Integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) 1. Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Modern filesystems feature checksumming of data and metadata to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) protect against data corruption. However, the detection of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) corruption is done at read time which could potentially be months
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) after the data was written. At that point the original data that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) application tried to write is most likely lost.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) The solution is to ensure that the disk is actually storing what the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) application meant it to. Recent additions to both the SCSI family
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) protocols (SBC Data Integrity Field, SCC protection proposal) as well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) as SATA/T13 (External Path Protection) try to remedy this by adding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) support for appending integrity metadata to an I/O. The integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) metadata (or protection information in SCSI terminology) includes a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) checksum for each sector as well as an incrementing counter that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) ensures the individual sectors are written in the right order. And
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) for some protection schemes also that the I/O is written to the right
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) place on disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) Current storage controllers and devices implement various protective
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) measures, for instance checksumming and scrubbing. But these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) technologies are working in their own isolated domains or at best
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) between adjacent nodes in the I/O path. The interesting thing about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) DIF and the other integrity extensions is that the protection format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) is well defined and every node in the I/O path can verify the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) integrity of the I/O and reject it if corruption is detected. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) allows not only corruption prevention but also isolation of the point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) of failure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) 2. The Data Integrity Extensions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) As written, the protocol extensions only protect the path between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) controller and storage device. However, many controllers actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) allow the operating system to interact with the integrity metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) (IMD). We have been working with several FC/SAS HBA vendors to enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) the protection information to be transferred to and from their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) controllers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) The SCSI Data Integrity Field works by appending 8 bytes of protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) information to each sector. The data + integrity metadata is stored
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) in 520 byte sectors on disk. Data + IMD are interleaved when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) transferred between the controller and target. The T13 proposal is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) similar.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) Because it is highly inconvenient for operating systems to deal with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) 520 (and 4104) byte sectors, we approached several HBA vendors and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) encouraged them to allow separation of the data and integrity metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) scatter-gather lists.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) The controller will interleave the buffers on write and split them on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) read. This means that Linux can DMA the data buffers to and from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) host memory without changes to the page cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) is somewhat heavy to compute in software. Benchmarks found that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) calculating this checksum had a significant impact on system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) performance for a number of workloads. Some controllers allow a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) lighter-weight checksum to be used when interfacing with the operating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) system. Emulex, for instance, supports the TCP/IP checksum instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) The IP checksum received from the OS is converted to the 16-bit CRC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) when writing and vice versa. This allows the integrity metadata to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) generated by Linux or the application at very low cost (comparable to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) software RAID5).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) The IP checksum is weaker than the CRC in terms of detecting bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) errors. However, the strength is really in the separation of the data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) buffers and the integrity metadata. These two distinct buffers must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) match up for an I/O to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) The separation of the data and integrity metadata buffers as well as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) the choice in checksums is referred to as the Data Integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) Extensions. As these extensions are outside the scope of the protocol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) bodies (T10, T13), Oracle and its partners are trying to standardize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) them within the Storage Networking Industry Association.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) 3. Kernel Changes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) The data integrity framework in Linux enables protection information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) to be pinned to I/Os and sent to/received from controllers that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) support it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) The advantage to the integrity extensions in SCSI and SATA is that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) they enable us to protect the entire path from application to storage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) device. However, at the same time this is also the biggest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) disadvantage. It means that the protection information must be in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) format that can be understood by the disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) Generally Linux/POSIX applications are agnostic to the intricacies of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) the storage devices they are accessing. The virtual filesystem switch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) and the block layer make things like hardware sector size and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) transport protocols completely transparent to the application.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) However, this level of detail is required when preparing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) protection information to send to a disk. Consequently, the very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) concept of an end-to-end protection scheme is a layering violation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) It is completely unreasonable for an application to be aware whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) it is accessing a SCSI or SATA disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) The data integrity support implemented in Linux attempts to hide this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) from the application. As far as the application (and to some extent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) the kernel) is concerned, the integrity metadata is opaque information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) that's attached to the I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) The current implementation allows the block layer to automatically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) generate the protection information for any I/O. Eventually the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) intent is to move the integrity metadata calculation to userspace for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) user data. Metadata and other I/O that originates within the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) will still use the automatic generation interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) Some storage devices allow each hardware sector to be tagged with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 16-bit value. The owner of this tag space is the owner of the block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) device. I.e. the filesystem in most cases. The filesystem can use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) this extra space to tag sectors as they see fit. Because the tag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) space is limited, the block interface allows tagging bigger chunks by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) way of interleaving. This way, 8*16 bits of information can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) attached to a typical 4KB filesystem block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) This also means that applications such as fsck and mkfs will need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) access to manipulate the tags from user space. A passthrough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) interface for this is being worked on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 4. Block Layer Implementation Details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 4.1 Bio
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) The data integrity patches add a new field to struct bio when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) CONFIG_BLK_DEV_INTEGRITY is enabled. bio_integrity(bio) returns a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) pointer to a struct bip which contains the bio integrity payload.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Essentially a bip is a trimmed down struct bio which holds a bio_vec
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) containing the integrity metadata and the required housekeeping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) information (bvec pool, vector count, etc.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) A kernel subsystem can enable data integrity protection on a bio by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) calling bio_integrity_alloc(bio). This will allocate and attach the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) bip to the bio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Individual pages containing integrity metadata can subsequently be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) attached using bio_integrity_add_page().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) bio_free() will automatically free the bip.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 4.2 Block Device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Because the format of the protection data is tied to the physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) disk, each block device has been extended with a block integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) profile (struct blk_integrity). This optional profile is registered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) with the block layer using blk_integrity_register().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) The profile contains callback functions for generating and verifying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) the protection data, as well as getting and setting application tags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) The profile also contains a few constants to aid in completing,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) merging and splitting the integrity metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) Layered block devices will need to pick a profile that's appropriate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) for all subdevices. blk_integrity_compare() can help with that. DM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) will require extra work due to the application tag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 5.0 Block Layer Integrity API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 5.1 Normal Filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) The normal filesystem is unaware that the underlying block device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) is capable of sending/receiving integrity metadata. The IMD will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) be automatically generated by the block layer at submit_bio() time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) in case of a WRITE. A READ request will cause the I/O integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) to be verified upon completion.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) IMD generation and verification can be toggled using the::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) /sys/block/<bdev>/integrity/write_generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) and::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) /sys/block/<bdev>/integrity/read_verify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 5.2 Integrity-Aware Filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) A filesystem that is integrity-aware can prepare I/Os with IMD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) attached. It can also use the application tag space if this is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) supported by the block device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) `bool bio_integrity_prep(bio);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) To generate IMD for WRITE and to set up buffers for READ, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) filesystem must call bio_integrity_prep(bio).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) Prior to calling this function, the bio data direction and start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) sector must be set, and the bio should have all data pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) added. It is up to the caller to ensure that the bio does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) change while I/O is in progress.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) Complete bio with error if prepare failed for some reson.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 5.3 Passing Existing Integrity Metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) Filesystems that either generate their own integrity metadata or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) are capable of transferring IMD from user space can use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) following calls:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) `struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) Allocates the bio integrity payload and hangs it off of the bio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) nr_pages indicate how many pages of protection data need to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) stored in the integrity bio_vec list (similar to bio_alloc()).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) The integrity payload will be freed at bio_free() time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) `int bio_integrity_add_page(bio, page, len, offset);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) Attaches a page containing integrity metadata to an existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) bio. The bio must have an existing bip,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) i.e. bio_integrity_alloc() must have been called. For a WRITE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) the integrity metadata in the pages must be in a format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) understood by the target device with the notable exception that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) the sector numbers will be remapped as the request traverses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) I/O stack. This implies that the pages added using this call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) will be modified during I/O! The first reference tag in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) integrity metadata must have a value of bip->bip_sector.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) Pages can be added using bio_integrity_add_page() as long as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) there is room in the bip bio_vec array (nr_pages).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Upon completion of a READ operation, the attached pages will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) contain the integrity metadata received from the storage device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) It is up to the receiver to process them and verify data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) integrity upon completion.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) 5.4 Registering A Block Device As Capable Of Exchanging Integrity Metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) --------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) To enable integrity exchange on a block device the gendisk must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) registered as capable:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) `int blk_integrity_register(gendisk, blk_integrity);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) The blk_integrity struct is a template and should contain the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) following::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) static struct blk_integrity my_profile = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) .name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) .generate_fn = my_generate_fn,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) .verify_fn = my_verify_fn,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) .tuple_size = sizeof(struct my_tuple_size),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) .tag_size = <tag bytes per hw sector>,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 'name' is a text string which will be visible in sysfs. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) part of the userland API so chose it carefully and never change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) it. The format is standards body-type-variant.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) 'generate_fn' generates appropriate integrity metadata (for WRITE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 'verify_fn' verifies that the data buffer matches the integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 'tuple_size' must be set to match the size of the integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) metadata per sector. I.e. 8 for DIF and EPP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 'tag_size' must be set to identify how many bytes of tag space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) are available per hardware sector. For DIF this is either 2 or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) 0 depending on the value of the Control Mode Page ATO bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) ----------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>