Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Shared Virtual Addressing (SVA) with ENQCMD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ===========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Background
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) Shared Virtual Addressing (SVA) allows the processor and device to use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) same virtual addresses avoiding the need for software to translate virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) addresses to physical addresses. SVA is what PCIe calls Shared Virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) Memory (SVM).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) In addition to the convenience of using application virtual addresses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) by the device, it also doesn't require pinning pages for DMA.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) PCIe Address Translation Services (ATS) along with Page Request Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) (PRI) allow devices to function much the same way as the CPU handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) application page-faults. For more information please refer to the PCIe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) specification Chapter 10: ATS Specification.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) Use of SVA requires IOMMU support in the platform. IOMMU is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) required to support the PCIe features ATS and PRI. ATS allows devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) to cache translations for virtual addresses. The IOMMU driver uses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) mmu_notifier() support to keep the device TLB cache and the CPU cache in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) sync. When an ATS lookup fails for a virtual address, the device should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) use the PRI in order to request the virtual address to be paged into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) CPU page tables. The device must use ATS again in order the fetch the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) translation before use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) Shared Hardware Workqueues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) Unlike Single Root I/O Virtualization (SR-IOV), Scalable IOV (SIOV) permits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) the use of Shared Work Queues (SWQ) by both applications and Virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) Machines (VM's). This allows better hardware utilization vs. hard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) partitioning resources that could result in under utilization. In order to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) allow the hardware to distinguish the context for which work is being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) executed in the hardware by SWQ interface, SIOV uses Process Address Space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) ID (PASID), which is a 20-bit number defined by the PCIe SIG.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) PASID value is encoded in all transactions from the device. This allows the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) Resource Identifier (RID) which is the Bus/Device/Function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) ENQCMD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) ENQCMD is a new instruction on Intel platforms that atomically submits a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) work descriptor to a device. The descriptor includes the operation to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) performed, virtual addresses of all parameters, virtual address of a completion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) record, and the PASID (process address space ID) of the current process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) ENQCMD works with non-posted semantics and carries a status back if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) command was accepted by hardware. This allows the submitter to know if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) submission needs to be retried or other device specific mechanisms to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) implement fairness or ensure forward progress should be provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) ENQCMD is the glue that ensures applications can directly submit commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) to the hardware and also permits hardware to be aware of application context
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) to perform I/O operations via use of PASID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) Process Address Space Tagging
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) A new thread-scoped MSR (IA32_PASID) provides the connection between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) user processes and the rest of the hardware. When an application first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) accesses an SVA-capable device, this MSR is initialized with a newly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) allocated PASID. The driver for the device calls an IOMMU-specific API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) that sets up the routing for DMA and page-requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) For example, the Intel Data Streaming Accelerator (DSA) uses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) iommu_sva_bind_device(), which will do the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) - Allocate the PASID, and program the process page-table (%cr3 register) in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)   PASID context entries.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) - Register for mmu_notifier() to track any page-table invalidations to keep
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)   the device TLB in sync. For example, when a page-table entry is invalidated,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)   the IOMMU propagates the invalidation to the device TLB. This will force any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)   future access by the device to this virtual address to participate in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)   ATS. If the IOMMU responds with proper response that a page is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   present, the device would request the page to be paged in via the PCIe PRI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)   protocol before performing I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) This MSR is managed with the XSAVE feature set as "supervisor state" to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) ensure the MSR is updated during context switch.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) PASID Management
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) The kernel must allocate a PASID on behalf of each process which will use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) ENQCMD and program it into the new MSR to communicate the process identity to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) platform hardware.  ENQCMD uses the PASID stored in this MSR to tag requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) from this process.  When a user submits a work descriptor to a device using the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) ENQCMD instruction, the PASID field in the descriptor is auto-filled with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) value from MSR_IA32_PASID. Requests for DMA from the device are also tagged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) with the same PASID. The platform IOMMU uses the PASID in the transaction to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) perform address translation. The IOMMU APIs setup the corresponding PASID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) entry in IOMMU with the process address used by the CPU (e.g. %cr3 register in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) x86).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) The MSR must be configured on each logical CPU before any application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) thread can interact with a device. Threads that belong to the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) process share the same page tables, thus the same MSR value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) PASID is cleared when a process is created. The PASID allocation and MSR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) programming may occur long after a process and its threads have been created.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) One thread must call iommu_sva_bind_device() to allocate the PASID for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) process. If a thread uses ENQCMD without the MSR first being populated, a #GP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) will be raised. The kernel will update the PASID MSR with the PASID for all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) threads in the process. A single process PASID can be used simultaneously
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) with multiple devices since they all share the same address space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) One thread can call iommu_sva_unbind_device() to free the allocated PASID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) The kernel will clear the PASID MSR for all threads belonging to the process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) New threads inherit the MSR value from the parent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) Relationships
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)  * Each process has many threads, but only one PASID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)  * Devices have a limited number (~10's to 1000's) of hardware workqueues.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)    The device driver manages allocating hardware workqueues.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)  * A single mmap() maps a single hardware workqueue as a "portal" and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)    each portal maps down to a single workqueue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)  * For each device with which a process interacts, there must be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)    one or more mmap()'d portals.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)  * Many threads within a process can share a single portal to access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)    a single device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)  * Multiple processes can separately mmap() the same portal, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)    which case they still share one device hardware workqueue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)  * The single process-wide PASID is used by all threads to interact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)    with all devices.  There is not, for instance, a PASID for each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)    thread or each thread<->device pair.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) FAQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) ===
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) * What is SVA/SVM?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) work in the same address space, i.e., to share it. Some call it Shared
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) Virtual Memory (SVM), but Linux community wanted to avoid confusing it with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) POSIX Shared Memory and Secure Virtual Machines which were terms already in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) circulation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) * What is a PASID?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) A Process Address Space ID (PASID) is a PCIe-defined Transaction Layer Packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) (TLP) prefix. A PASID is a 20-bit number allocated and managed by the OS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) PASID is included in all transactions between the platform and the device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) * How are shared workqueues different?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) Traditionally, in order for userspace applications to interact with hardware,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) there is a separate hardware instance required per process. For example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) consider doorbells as a mechanism of informing hardware about work to process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) Each doorbell is required to be spaced 4k (or page-size) apart for process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) isolation. This requires hardware to provision that space and reserve it in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) MMIO. This doesn't scale as the number of threads becomes quite large. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) hardware also manages the queue depth for Shared Work Queues (SWQ), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) consumers don't need to track queue depth. If there is no space to accept
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) a command, the device will return an error indicating retry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) A user should check Deferrable Memory Write (DMWr) capability on the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) and only submits ENQCMD when the device supports it. In the new DMWr PCIe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) terminology, devices need to support DMWr completer capability. In addition,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) it requires all switch ports to support DMWr routing and must be enabled by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) the PCIe subsystem, much like how PCIe atomic operations are managed for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) instance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) SWQ allows hardware to provision just a single address in the device. When
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) used with ENQCMD to submit work, the device can distinguish the process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) submitting the work since it will include the PASID assigned to that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) process. This helps the device scale to a large number of processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) * Is this the same as a user space device driver?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) Communicating with the device via the shared workqueue is much simpler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) than a full blown user space driver. The kernel driver does all the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) initialization of the hardware. User space only needs to worry about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) submitting work and processing completions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) * Is this the same as SR-IOV?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) Single Root I/O Virtualization (SR-IOV) focuses on providing independent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) hardware interfaces for virtualizing hardware. Hence, it's required to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) almost fully functional interface to software supporting the traditional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) BARs, space for interrupts via MSI-X, its own register layout.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) Virtual Functions (VFs) are assisted by the Physical Function (PF)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) Scalable I/O Virtualization builds on the PASID concept to create device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) instances for virtualization. SIOV requires host software to assist in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) creating virtual devices; each virtual device is represented by a PASID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) along with the bus/device/function of the device.  This allows device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) hardware to optimize device resource creation and can grow dynamically on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) demand. SR-IOV creation and management is very static in nature. Consult
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) references below for more details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) * Why not just create a virtual function for each app?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) Creating PCIe SR-IOV type Virtual Functions (VF) is expensive. VFs require
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) duplicated hardware for PCI config space and interrupts such as MSI-X.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) Resources such as interrupts have to be hard partitioned between VFs at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) creation time, and cannot scale dynamically on demand. The VFs are not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) completely independent from the Physical Function (PF). Most VFs require
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) some communication and assistance from the PF driver. SIOV, in contrast,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) creates a software-defined device where all the configuration and control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) aspects are mediated via the slow path. The work submission and completion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) happen without any mediation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) * Does this support virtualization?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) ENQCMD can be used from within a guest VM. In these cases, the VMM helps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) with setting up a translation table to translate from Guest PASID to Host
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) PASID. Please consult the ENQCMD instruction set reference for more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) * Does memory need to be pinned?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) When devices support SVA along with platform hardware such as IOMMU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) supporting such devices, there is no need to pin memory for DMA purposes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) Devices that support SVA also support other PCIe features that remove the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) pinning requirement for memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) Device TLB support - Device requests the IOMMU to lookup an address before
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) use via Address Translation Service (ATS) requests.  If the mapping exists
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) but there is no page allocated by the OS, IOMMU hardware returns that no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) mapping exists.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) Device requests the virtual address to be mapped via Page Request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) Interface (PRI). Once the OS has successfully completed the mapping, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) returns the response back to the device. The device requests again for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) a translation and continues.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) IOMMU works with the OS in managing consistency of page-tables with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) device. When removing pages, it interacts with the device to remove any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) device TLB entry that might have been cached before removing the mappings from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) the OS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) References
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) VT-D:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) SIOV:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) ENQCMD in ISE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) DSA spec:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf