^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) .. iommu:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) IOMMU Userspace API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) =====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) IOMMU UAPI is used for virtualization cases where communications are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) needed between physical and virtual IOMMU drivers. For baremetal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) usage, the IOMMU is a system device which does not need to communicate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) with userspace directly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) The primary use cases are guest Shared Virtual Address (SVA) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) guest IO virtual address (IOVA), wherein the vIOMMU implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) relies on the physical IOMMU and for this reason requires interactions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) with the host driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) .. contents:: :local:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) Functionalities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) Communications of user and kernel involve both directions. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) supported user-kernel APIs are as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) 1. Bind/Unbind guest PASID (e.g. Intel VT-d)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) 2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) 3. Invalidate IOMMU caches upon guest requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) 4. Report errors to the guest and serve page requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) Requirements
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) The IOMMU UAPIs are generic and extensible to meet the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) requirements:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) 1. Emulated and para-virtualised vIOMMUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) 2. Multiple vendors (Intel VT-d, ARM SMMU, etc.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) 3. Extensions to the UAPI shall not break existing userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) Interfaces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Although the data structures defined in IOMMU UAPI are self-contained,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) there are no user API functions introduced. Instead, IOMMU UAPI is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) designed to work with existing user driver frameworks such as VFIO.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) Extension Rules & Precautions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) When IOMMU UAPI gets extended, the data structures can *only* be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) modified in two ways:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) 1. Adding new fields by re-purposing the padding[] field. No size change.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) 2. Adding new union members at the end. May increase the structure sizes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) No new fields can be added *after* the variable sized union in that it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) will break backward compatibility when offset moves. A new flag must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) be introduced whenever a change affects the structure using either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) method. The IOMMU driver processes the data based on flags which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) ensures backward compatibility.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Version field is only reserved for the unlikely event of UAPI upgrade
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) at its entirety.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) It's *always* the caller's responsibility to indicate the size of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) structure passed by setting argsz appropriately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Though at the same time, argsz is user provided data which is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) trusted. The argsz field allows the user app to indicate how much data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) it is providing; it's still the kernel's responsibility to validate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) whether it's correct and sufficient for the requested operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) Compatibility Checking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) When IOMMU UAPI extension results in some structure size increase,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) IOMMU UAPI code shall handle the following cases:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) 1. User and kernel has exact size match
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) 2. An older user with older kernel header (smaller UAPI size) running on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) newer kernel (larger UAPI size)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) 3. A newer user with newer kernel header (larger UAPI size) running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) on an older kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) 4. A malicious/misbehaving user passing illegal/invalid size but within
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) range. The data may contain garbage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) Feature Checking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) While launching a guest with vIOMMU, it is strongly advised to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) the compatibility upfront, as some subsequent errors happening during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) vIOMMU operation, such as cache invalidation failures cannot be nicely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) escalated to the guest due to IOMMU specifications. This can lead to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) catastrophic failures for the users.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) User applications such as QEMU are expected to import kernel UAPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) headers. Backward compatibility is supported per feature flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) For example, an older QEMU (with older kernel header) can run on newer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) kernel. Newer QEMU (with new kernel header) may refuse to initialize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) on an older kernel if new feature flags are not supported by older
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) kernel. Simply recompiling existing code with newer kernel header should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) not be an issue in that only existing flags are used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) IOMMU vendor driver should report the below features to IOMMU UAPI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) consumers (e.g. via VFIO).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 2. IOMMU_NESTING_FEAT_BIND_PGTBL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 4. IOMMU_NESTING_FEAT_CACHE_INVLD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 5. IOMMU_NESTING_FEAT_PAGE_REQUEST
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Take VFIO as example, upon request from VFIO userspace (e.g. QEMU),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) VFIO kernel code shall query IOMMU vendor driver for the support of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) the above features. Query result can then be reported back to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) userspace caller. Details can be found in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Documentation/driver-api/vfio.rst.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) Data Passing Example with VFIO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) As the ubiquitous userspace driver framework, VFIO is already IOMMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) aware and shares many key concepts such as device model, group, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) protection domain. Other user driver frameworks can also be extended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) to support IOMMU UAPI but it is outside the scope of this document.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) transport, capability checking, security, and life cycle management of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) process address space ID (PASID).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) VFIO layer conveys the data structures down to the IOMMU driver. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) follows the pattern below::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) struct {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) __u32 argsz;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) __u32 flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) __u8 data[];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Here data[] contains the IOMMU UAPI data structures. VFIO has the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) freedom to bundle the data as well as parse data size based on its own flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) In order to determine the size and feature set of the user data, argsz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) and flags (or the equivalent) are also embedded in the IOMMU UAPI data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) structures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) A "__u32 argsz" field is *always* at the beginning of each structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) For example:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) struct iommu_cache_invalidate_info {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) __u32 argsz;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) __u32 version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) /* IOMMU paging structure cache */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) #define IOMMU_CACHE_INV_TYPE_NR (3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) __u8 cache;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) __u8 granularity;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) __u8 padding[6];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) union {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) struct iommu_inv_pasid_info pasid_info;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) struct iommu_inv_addr_info addr_info;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) } granu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) VFIO is responsible for checking its own argsz and flags. It then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) invokes appropriate IOMMU UAPI functions. The user pointers are passed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) to the IOMMU layer for further processing. The responsibilities are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) divided as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) - Generic IOMMU layer checks argsz range based on UAPI data in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) current kernel version.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) - Generic IOMMU layer checks content of the UAPI data for non-zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) reserved bits in flags, padding fields, and unsupported version.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) This is to ensure not breaking userspace in the future when these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) fields or flags are used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) - Vendor IOMMU driver checks argsz based on vendor flags. UAPI data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) is consumed based on flags. Vendor driver has access to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) unadulterated argsz value in case of vendor specific future
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) extensions. Currently, it does not perform the copy_from_user()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) itself. A __user pointer can be provided in some future scenarios
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) where there's vendor data outside of the structure definition.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) IOMMU code treats UAPI data in two categories:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) - structure contains vendor data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) (Example: iommu_uapi_cache_invalidate())
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) - structure contains only generic data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) (Example: iommu_uapi_sva_bind_gpasid())
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) Sharing UAPI with in-kernel users
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) For UAPIs that are shared with in-kernel users, a wrapper function is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) provided to distinguish the callers. For example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) Userspace caller ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) struct device *dev,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) void __user *udata)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) In-kernel caller ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) struct device *dev, ioasid_t ioasid);