^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) supports Ethernet functionality over Omni-Path fabric by encapsulating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) the Ethernet packets between HFI nodes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) Architecture
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) The patterns of exchanges of Omni-Path encapsulated Ethernet packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) involves one or more virtual Ethernet switches overlaid on the Omni-Path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) fabric topology. A subset of HFI nodes on the Omni-Path fabric are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) permitted to exchange encapsulated Ethernet packets across a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) virtual Ethernet switch. The virtual Ethernet switches are logical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) abstractions achieved by configuring the HFI nodes on the fabric for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) header generation and processing. In the simplest configuration all HFI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) nodes across the fabric exchange encapsulated Ethernet packets over a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) single virtual Ethernet switch. A virtual Ethernet switch, is effectively
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) an independent Ethernet network. The configuration is performed by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) application. HFI nodes can have multiple VNICs each connected to a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) different virtual Ethernet switch. The below diagram presents a case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) of two virtual Ethernet switches with two HFI nodes::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) +-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) | Subnet/ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) | Ethernet |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) | Manager |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) +-------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) / /
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) / /
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) / /
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) / /
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) +-----------------------------+ +------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) | Virtual Ethernet Switch | | Virtual Ethernet Switch |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) | +---------+ +---------+ | | +---------+ +---------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) | | VPORT | | VPORT | | | | VPORT | | VPORT | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) +--+---------+----+---------+-+ +-+---------+----+---------+---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) | \ / |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) | \ / |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) | \/ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) | / \ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) | / \ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) +-----------+------------+ +-----------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) | VNIC | VNIC | | VNIC | VNIC |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) +-----------+------------+ +-----------+------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) | HFI | | HFI |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) +------------------------+ +------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) The Omni-Path encapsulated Ethernet packet format is as described below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) ==================== ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) Bits Field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) ==================== ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) Quad Word 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) 0-19 SLID (lower 20 bits)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) 20-30 Length (in Quad Words)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) 31 BECN bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) 32-51 DLID (lower 20 bits)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) 52-56 SC (Service Class)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) 57-59 RC (Routing Control)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) 60 FECN bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) 61-62 L2 (=10, 16B format)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) 63 LT (=1, Link Transfer Head Flit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) Quad Word 1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) 0-7 L4 type (=0x78 ETHERNET)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) 8-11 SLID[23:20]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) 12-15 DLID[23:20]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) 16-31 PKEY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) 32-47 Entropy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) 48-63 Reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) Quad Word 2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) 0-15 Reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) 16-31 L4 header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) 32-63 Ethernet Packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Quad Words 3 to N-1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) 0-63 Ethernet packet (pad extended)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Quad Word N (last):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) 0-23 Ethernet packet (pad extended)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) 24-55 ICRC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) 56-61 Tail
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) 62-63 LT (=01, Link Transfer Tail Flit)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) ==================== ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) packet is quad word aligned. The 'Tail' field contains the number of bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) padded. On the receive side the 'Tail' field is read and the padding is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) removed (along with ICRC, Tail and OPA header) before passing packet up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) the network stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) The L4 header field contains the virtual Ethernet switch id the VNIC port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) belongs to. On the receive side, this field is used to de-multiplex the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) received VNIC packets to different VNIC ports.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) Driver Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) Intel OPA VNIC software design is presented in the below diagram.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) OPA VNIC functionality has a HW dependent component and a HW
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) independent component.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) The support has been added for IB device to allocate and free the RDMA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) netdev devices. The RDMA netdev supports interfacing with the network
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) stack thus creating standard network interfaces. OPA_VNIC is an RDMA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) netdev device type.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) The HW dependent VNIC functionality is part of the HFI1 driver. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) implements the verbs to allocate and free the OPA_VNIC RDMA netdev.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) It involves HW resource allocation/management for VNIC functionality.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) It interfaces with the network stack and implements the required
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) net_device_ops functions. It expects Omni-Path encapsulated Ethernet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) packets in the transmit path and provides HW access to them. It strips
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) the Omni-Path header from the received packets before passing them up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) the network stack. It also implements the RDMA netdev control operations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) The OPA VNIC module implements the HW independent VNIC functionality.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) It consists of two parts. The VNIC Ethernet Management Agent (VEMA)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) registers itself with IB core as an IB client and interfaces with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) IB MAD stack. It exchanges the management information with the Ethernet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) set by HW dependent VNIC driver where required to accommodate any control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) operation. It also handles the encapsulation of Ethernet packets with an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Omni-Path header in the transmit path. For each VNIC interface, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) information required for encapsulation is configured by the EM via VEMA MAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) interface. It also passes any control information to the HW dependent driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) by invoking the RDMA netdev control operations::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) +-------------------+ +----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) | | | Linux |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) | IB MAD | | Network |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) | | | Stack |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) +-------------------+ +----------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) +----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) | OPA VNIC Module | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) | (OPA VNIC RDMA Netdev | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) | & EMA functions) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) | | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) +----------------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) +------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) | IB core | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) +------------------+ |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) +--------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) | HFI1 Driver with VNIC support |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) +--------------------------------------------+