^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Linux IOMMU Support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) The architecture spec can be obtained from the below location.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) This guide gives a quick cheat sheet for some basic understanding.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) Some Keywords
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) - DMAR - DMA remapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) - DRHD - DMA Remapping Hardware Unit Definition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) - RMRR - Reserved memory Region Reporting Structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) - ZLR - Zero length reads from PCI devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) - IOVA - IO Virtual address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) Basic stuff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) ACPI enumerates and lists the different DMA engines in the platform, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) device scope relationships between PCI devices and which DMA engine controls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) What is RMRR?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) There are some devices the BIOS controls, for e.g USB devices to perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) PS2 emulation. The regions of memory used for these devices are marked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) reserved in the e820 map. When we turn on DMA translation, DMA to those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) regions will fail. Hence BIOS uses RMRR to specify these regions along with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) devices that need to access these regions. OS is expected to setup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) unity mappings for these regions for these devices to access these regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) How is IOVA generated?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) Well behaved drivers call pci_map_*() calls before sending command to device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) that needs to perform DMA. Once DMA is completed and mapping is no longer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) required, device performs a pci_unmap_*() calls to unmap the region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) device has its own domain (hence protection). Devices under p2p bridges
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) share the virtual address with all devices under the p2p bridge due to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) transaction id aliasing for p2p bridges.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) IOVA generation is pretty generic. We used the same technique as vmalloc()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) but these are not global address spaces, but separate for each domain.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) Different DMA engines may support different number of domains.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) We also allocate guard pages with each mapping, so we can attempt to catch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) any overflow that might happen.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) Graphics Problems?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) If you encounter issues with graphics devices, you can try adding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) option intel_iommu=igfx_off to turn off the integrated graphics engine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) If this fixes anything, please ensure you file a bug reporting the problem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) Some exceptions to IOVA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Interrupt ranges are not address translated, (0xfee00000 - 0xfeefffff).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) The same is true for peer to peer transactions. Hence we reserve the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) address from PCI MMIO ranges so they are not allocated for IOVA addresses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) Fault reporting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) When errors are reported, the DMA engine signals via an interrupt. The fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) reason and device that caused it with fault reason is printed on console.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) See below for sample.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Boot Message Sample
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) Something like this gets printed indicating presence of DMAR tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) in ACPI.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) When DMAR is being processed and initialized by ACPI, prints DMAR locations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) and any RMRR's processed::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) ACPI DMAR:Host address width 36
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) When DMAR is enabled for use, you will notice..
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) PCI-DMA: Using DMAR IOMMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Fault reporting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) ^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) DMAR:[fault reason 05] PTE Write access is not set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) DMAR:[fault reason 05] PTE Write access is not set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) TBD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) ----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) - For compatibility testing, could use unity map domain for all devices, just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) provide a 1-1 for all useful memory under a single domain for all devices.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) - API for paravirt ops for abstracting functionality for VMM folks.