Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Data Integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) 1. Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Modern filesystems feature checksumming of data and metadata to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) protect against data corruption.  However, the detection of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) corruption is done at read time which could potentially be months
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) after the data was written.  At that point the original data that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) application tried to write is most likely lost.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) The solution is to ensure that the disk is actually storing what the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) application meant it to.  Recent additions to both the SCSI family
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) protocols (SBC Data Integrity Field, SCC protection proposal) as well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) as SATA/T13 (External Path Protection) try to remedy this by adding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) support for appending integrity metadata to an I/O.  The integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) metadata (or protection information in SCSI terminology) includes a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) checksum for each sector as well as an incrementing counter that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) ensures the individual sectors are written in the right order.  And
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) for some protection schemes also that the I/O is written to the right
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) place on disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) Current storage controllers and devices implement various protective
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) measures, for instance checksumming and scrubbing.  But these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) technologies are working in their own isolated domains or at best
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) between adjacent nodes in the I/O path.  The interesting thing about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) DIF and the other integrity extensions is that the protection format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) is well defined and every node in the I/O path can verify the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) integrity of the I/O and reject it if corruption is detected.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) allows not only corruption prevention but also isolation of the point
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) of failure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 2. The Data Integrity Extensions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) As written, the protocol extensions only protect the path between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) controller and storage device.  However, many controllers actually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) allow the operating system to interact with the integrity metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) (IMD).  We have been working with several FC/SAS HBA vendors to enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) the protection information to be transferred to and from their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) controllers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) The SCSI Data Integrity Field works by appending 8 bytes of protection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) information to each sector.  The data + integrity metadata is stored
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) in 520 byte sectors on disk.  Data + IMD are interleaved when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) transferred between the controller and target.  The T13 proposal is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) similar.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) Because it is highly inconvenient for operating systems to deal with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 520 (and 4104) byte sectors, we approached several HBA vendors and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) encouraged them to allow separation of the data and integrity metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) scatter-gather lists.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) The controller will interleave the buffers on write and split them on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) read.  This means that Linux can DMA the data buffers to and from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) host memory without changes to the page cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) is somewhat heavy to compute in software.  Benchmarks found that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) calculating this checksum had a significant impact on system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) performance for a number of workloads.  Some controllers allow a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) lighter-weight checksum to be used when interfacing with the operating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) system.  Emulex, for instance, supports the TCP/IP checksum instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) The IP checksum received from the OS is converted to the 16-bit CRC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) when writing and vice versa.  This allows the integrity metadata to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) generated by Linux or the application at very low cost (comparable to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) software RAID5).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) The IP checksum is weaker than the CRC in terms of detecting bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) errors.  However, the strength is really in the separation of the data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) buffers and the integrity metadata.  These two distinct buffers must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) match up for an I/O to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) The separation of the data and integrity metadata buffers as well as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) the choice in checksums is referred to as the Data Integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) Extensions.  As these extensions are outside the scope of the protocol
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) bodies (T10, T13), Oracle and its partners are trying to standardize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) them within the Storage Networking Industry Association.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 3. Kernel Changes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) The data integrity framework in Linux enables protection information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) to be pinned to I/Os and sent to/received from controllers that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) support it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) The advantage to the integrity extensions in SCSI and SATA is that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) they enable us to protect the entire path from application to storage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) device.  However, at the same time this is also the biggest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) disadvantage. It means that the protection information must be in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) format that can be understood by the disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) Generally Linux/POSIX applications are agnostic to the intricacies of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) the storage devices they are accessing.  The virtual filesystem switch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) and the block layer make things like hardware sector size and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) transport protocols completely transparent to the application.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) However, this level of detail is required when preparing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) protection information to send to a disk.  Consequently, the very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) concept of an end-to-end protection scheme is a layering violation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) It is completely unreasonable for an application to be aware whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) it is accessing a SCSI or SATA disk.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) The data integrity support implemented in Linux attempts to hide this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) from the application.  As far as the application (and to some extent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) the kernel) is concerned, the integrity metadata is opaque information
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) that's attached to the I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) The current implementation allows the block layer to automatically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) generate the protection information for any I/O.  Eventually the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) intent is to move the integrity metadata calculation to userspace for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) user data.  Metadata and other I/O that originates within the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) will still use the automatic generation interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) Some storage devices allow each hardware sector to be tagged with a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 16-bit value.  The owner of this tag space is the owner of the block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) device.  I.e. the filesystem in most cases.  The filesystem can use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) this extra space to tag sectors as they see fit.  Because the tag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) space is limited, the block interface allows tagging bigger chunks by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) way of interleaving.  This way, 8*16 bits of information can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) attached to a typical 4KB filesystem block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) This also means that applications such as fsck and mkfs will need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) access to manipulate the tags from user space.  A passthrough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) interface for this is being worked on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 4. Block Layer Implementation Details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 4.1 Bio
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) The data integrity patches add a new field to struct bio when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) CONFIG_BLK_DEV_INTEGRITY is enabled.  bio_integrity(bio) returns a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) pointer to a struct bip which contains the bio integrity payload.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) Essentially a bip is a trimmed down struct bio which holds a bio_vec
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) containing the integrity metadata and the required housekeeping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) information (bvec pool, vector count, etc.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) A kernel subsystem can enable data integrity protection on a bio by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) calling bio_integrity_alloc(bio).  This will allocate and attach the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) bip to the bio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Individual pages containing integrity metadata can subsequently be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) attached using bio_integrity_add_page().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) bio_free() will automatically free the bip.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 4.2 Block Device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Because the format of the protection data is tied to the physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) disk, each block device has been extended with a block integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) profile (struct blk_integrity).  This optional profile is registered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) with the block layer using blk_integrity_register().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) The profile contains callback functions for generating and verifying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) the protection data, as well as getting and setting application tags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) The profile also contains a few constants to aid in completing,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) merging and splitting the integrity metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) Layered block devices will need to pick a profile that's appropriate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) for all subdevices.  blk_integrity_compare() can help with that.  DM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) and MD linear, RAID0 and RAID1 are currently supported.  RAID4/5/6
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) will require extra work due to the application tag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 5.0 Block Layer Integrity API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 5.1 Normal Filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)     The normal filesystem is unaware that the underlying block device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)     is capable of sending/receiving integrity metadata.  The IMD will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)     be automatically generated by the block layer at submit_bio() time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)     in case of a WRITE.  A READ request will cause the I/O integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)     to be verified upon completion.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)     IMD generation and verification can be toggled using the::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)       /sys/block/<bdev>/integrity/write_generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)     and::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)       /sys/block/<bdev>/integrity/read_verify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)     flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 5.2 Integrity-Aware Filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)     A filesystem that is integrity-aware can prepare I/Os with IMD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)     attached.  It can also use the application tag space if this is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)     supported by the block device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)     `bool bio_integrity_prep(bio);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)       To generate IMD for WRITE and to set up buffers for READ, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)       filesystem must call bio_integrity_prep(bio).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)       Prior to calling this function, the bio data direction and start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)       sector must be set, and the bio should have all data pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)       added.  It is up to the caller to ensure that the bio does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)       change while I/O is in progress.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)       Complete bio with error if prepare failed for some reson.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) 5.3 Passing Existing Integrity Metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) ---------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)     Filesystems that either generate their own integrity metadata or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219)     are capable of transferring IMD from user space can use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)     following calls:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)     `struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)       Allocates the bio integrity payload and hangs it off of the bio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)       nr_pages indicate how many pages of protection data need to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)       stored in the integrity bio_vec list (similar to bio_alloc()).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)       The integrity payload will be freed at bio_free() time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)     `int bio_integrity_add_page(bio, page, len, offset);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)       Attaches a page containing integrity metadata to an existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)       bio.  The bio must have an existing bip,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)       i.e. bio_integrity_alloc() must have been called.  For a WRITE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)       the integrity metadata in the pages must be in a format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)       understood by the target device with the notable exception that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)       the sector numbers will be remapped as the request traverses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240)       I/O stack.  This implies that the pages added using this call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)       will be modified during I/O!  The first reference tag in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)       integrity metadata must have a value of bip->bip_sector.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244)       Pages can be added using bio_integrity_add_page() as long as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)       there is room in the bip bio_vec array (nr_pages).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)       Upon completion of a READ operation, the attached pages will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248)       contain the integrity metadata received from the storage device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)       It is up to the receiver to process them and verify data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)       integrity upon completion.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) 5.4 Registering A Block Device As Capable Of Exchanging Integrity Metadata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) --------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)     To enable integrity exchange on a block device the gendisk must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)     registered as capable:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)     `int blk_integrity_register(gendisk, blk_integrity);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)       The blk_integrity struct is a template and should contain the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)       following::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264)         static struct blk_integrity my_profile = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)             .name                   = "STANDARDSBODY-TYPE-VARIANT-CSUM",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)             .generate_fn            = my_generate_fn,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 	    .verify_fn              = my_verify_fn,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 	    .tuple_size             = sizeof(struct my_tuple_size),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) 	    .tag_size               = <tag bytes per hw sector>,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)         };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272)       'name' is a text string which will be visible in sysfs.  This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)       part of the userland API so chose it carefully and never change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)       it.  The format is standards body-type-variant.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)       E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)       'generate_fn' generates appropriate integrity metadata (for WRITE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)       'verify_fn' verifies that the data buffer matches the integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)       metadata.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)       'tuple_size' must be set to match the size of the integrity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283)       metadata per sector.  I.e. 8 for DIF and EPP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)       'tag_size' must be set to identify how many bytes of tag space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286)       are available per hardware sector.  For DIF this is either 2 or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)       0 depending on the value of the Control Mode Page ATO bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) ----------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>