Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) AF_XDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) AF_XDP is an address family that is optimized for high performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) packet processing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) This document assumes that the reader is familiar with BPF and XDP. If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) not, the Cilium project has an excellent reference guide at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) http://cilium.readthedocs.io/en/latest/bpf/.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) Using the XDP_REDIRECT action from an XDP program, the program can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) redirect ingress frames to other XDP enabled netdevs, using the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) bpf_redirect_map() function. AF_XDP sockets enable the possibility for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) XDP programs to redirect frames to a memory buffer in a user-space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) application.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) An AF_XDP socket (XSK) is created with the normal socket()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) syscall. Associated with each XSK are two rings: the RX ring and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) TX ring. A socket can receive packets on the RX ring and it can send
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) packets on the TX ring. These rings are registered and sized with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is mandatory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) to have at least one of these rings for each socket. An RX or TX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) descriptor ring points to a data buffer in a memory area called a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) UMEM. RX and TX can share the same UMEM so that a packet does not have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) to be copied between RX and TX. Moreover, if a packet needs to be kept
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) for a while due to a possible retransmit, the descriptor that points
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) to that packet can be changed to point to another and reused right
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) away. This again avoids copying data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) The UMEM consists of a number of equally sized chunks. A descriptor in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) one of the rings references a frame by referencing its addr. The addr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) is simply an offset within the entire UMEM region. The user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) allocates memory for this UMEM using whatever means it feels is most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) appropriate (malloc, mmap, huge pages, etc). This memory area is then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) registered with the kernel using the new setsockopt XDP_UMEM_REG. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) UMEM also has two rings: the FILL ring and the COMPLETION ring. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) FILL ring is used by the application to send down addr for the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) to fill in with RX packet data. References to these frames will then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) appear in the RX ring once each packet has been received. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) COMPLETION ring, on the other hand, contains frame addr that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) kernel has transmitted completely and can now be used again by user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) space, for either TX or RX. Thus, the frame addrs appearing in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) COMPLETION ring are addrs that were previously transmitted using the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) TX ring. In summary, the RX and FILL rings are used for the RX path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) and the TX and COMPLETION rings are used for the TX path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) The socket is then finally bound with a bind() call to a device and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) specific queue id on that device, and it is not until bind is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) completed that traffic starts to flow.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) The UMEM can be shared between processes, if desired. If a process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) wants to do this, it simply skips the registration of the UMEM and its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) call and submits the XSK of the process it would like to share UMEM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) with as well as its own newly created XSK socket. The new process will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) then receive frame addr references in its own RX ring that point to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) this shared UMEM. Note that since the ring structures are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) single-consumer / single-producer (for performance reasons), the new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) process has to create its own socket with associated RX and TX rings,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) since it cannot share this with the other process. This is also the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) reason that there is only one set of FILL and COMPLETION rings per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) UMEM. It is the responsibility of a single process to handle the UMEM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) How is then packets distributed from an XDP program to the XSKs? There
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) user-space application can place an XSK at an arbitrary place in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) map. The XDP program can then redirect a packet to a specific index in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) this map and at this point XDP validates that the XSK in that map was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) indeed bound to that device and ring number. If not, the packet is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) dropped. If the map is empty at that index, the packet is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) dropped. This also means that it is currently mandatory to have an XDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) program loaded (and one XSK in the XSKMAP) to be able to get any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) traffic to user space through the XSK.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) AF_XDP can operate in two different modes: XDP_SKB and XDP_DRV. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) driver does not have support for XDP, or XDP_SKB is explicitly chosen
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) when loading the XDP program, XDP_SKB mode is employed that uses SKBs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) together with the generic XDP support and copies out the data to user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) space. A fallback mode that works for any network device. On the other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) hand, if the driver has support for XDP, it will be used by the AF_XDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) code to provide better performance, but there is still a copy of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) data into user space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) Concepts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) In order to use an AF_XDP socket, a number of associated objects need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) to be setup. These objects and their options are explained in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) following sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) For an overview on how AF_XDP works, you can also take a look at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) Linux Plumbers paper from 2018 on the subject:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf. Do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) NOT consult the paper from 2017 on "AF_PACKET v4", the first attempt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) at AF_XDP. Nearly everything changed since then. Jonathan Corbet has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) also written an excellent article on LWN, "Accelerating networking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) with AF_XDP". It can be found at https://lwn.net/Articles/750845/.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) UMEM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) UMEM is a region of virtual contiguous memory, divided into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) equal-sized frames. An UMEM is associated to a netdev and a specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) queue id of that netdev. It is created and configured (chunk size,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) headroom, start address and size) by using the XDP_UMEM_REG setsockopt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) system call. A UMEM is bound to a netdev and queue id, via the bind()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) An AF_XDP is socket linked to a single UMEM, but one UMEM can have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) multiple AF_XDP sockets. To share an UMEM created via one socket A,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) the next socket B can do this by setting the XDP_SHARED_UMEM flag in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) struct sockaddr_xdp member sxdp_flags, and passing the file descriptor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) of A to struct sockaddr_xdp member sxdp_shared_umem_fd.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) The UMEM has two single-producer/single-consumer rings that are used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) to transfer ownership of UMEM frames between the kernel and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) user-space application.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) Rings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) There are a four different kind of rings: FILL, COMPLETION, RX and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) TX. All rings are single-producer/single-consumer, so the user-space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) application need explicit synchronization of multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) processes/threads are reading/writing to them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) The UMEM uses two rings: FILL and COMPLETION. Each socket associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) with the UMEM must have an RX queue, TX queue or both. Say, that there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) is a setup with four sockets (all doing TX and RX). Then there will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) one FILL ring, one COMPLETION ring, four TX rings and four RX rings.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) The rings are head(producer)/tail(consumer) based rings. A producer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) writes the data ring at the index pointed out by struct xdp_ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) producer member, and increasing the producer index. A consumer reads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) the data ring at the index pointed out by struct xdp_ring consumer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) member, and increasing the consumer index.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) The rings are configured and created via the _RING setsockopt system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) calls and mmapped to user-space using the appropriate offset to mmap()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) (XDP_PGOFF_RX_RING, XDP_PGOFF_TX_RING, XDP_UMEM_PGOFF_FILL_RING and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) XDP_UMEM_PGOFF_COMPLETION_RING).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) The size of the rings need to be of size power of two.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) UMEM Fill Ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) ~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) The FILL ring is used to transfer ownership of UMEM frames from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) user-space to kernel-space. The UMEM addrs are passed in the ring. As
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 16 chunks and can pass addrs between 0 and 64k.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) Frames passed to the kernel are used for the ingress path (RX rings).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) The user application produces UMEM addrs to this ring. Note that, if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) running the application with aligned chunk mode, the kernel will mask
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) the incoming addr.  E.g. for a chunk size of 2k, the log2(2048) LSB of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) the addr will be masked off, meaning that 2048, 2050 and 3000 refers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) to the same chunk. If the user application is run in the unaligned
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) chunks mode, then the incoming addr will be left untouched.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) UMEM Completion Ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) ~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) The COMPLETION Ring is used transfer ownership of UMEM frames from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) kernel-space to user-space. Just like the FILL ring, UMEM indices are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Frames passed from the kernel to user-space are frames that has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) sent (TX ring) and can be used by user-space again.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) The user application consumes UMEM addrs from this ring.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) RX Ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) ~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) The RX ring is the receiving side of a socket. Each entry in the ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) is a struct xdp_desc descriptor. The descriptor contains UMEM offset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) (addr) and the length of the data (len).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) If no frames have been passed to kernel via the FILL ring, no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) descriptors will (or can) appear on the RX ring.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) The user application consumes struct xdp_desc descriptors from this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) ring.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) TX Ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) The TX ring is used to send frames. The struct xdp_desc descriptor is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) filled (index, length and offset) and passed into the ring.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) To start the transfer a sendmsg() system call is required. This might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) be relaxed in the future.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) The user application produces struct xdp_desc descriptors to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) ring.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) Libbpf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Libbpf is a helper library for eBPF and XDP that makes using these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) technologies a lot simpler. It also contains specific helper functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) in tools/lib/bpf/xsk.h for facilitating the use of AF_XDP. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) contains two types of functions: those that can be used to make the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) setup of AF_XDP socket easier and ones that can be used in the data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) plane to access the rings safely and quickly. To see an example on how
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) to use this API, please take a look at the sample application in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) samples/bpf/xdpsock_usr.c which uses libbpf for both setup and data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) plane operations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) We recommend that you use this library unless you have become a power
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) user. It will make your program a lot simpler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) XSKMAP / BPF_MAP_TYPE_XSKMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) On XDP side there is a BPF map type BPF_MAP_TYPE_XSKMAP (XSKMAP) that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) is used in conjunction with bpf_redirect_map() to pass the ingress
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) frame to a socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) The user application inserts the socket into the map, via the bpf()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) Note that if an XDP program tries to redirect to a socket that does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) not match the queue configuration and netdev, the frame will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) dropped. E.g. an AF_XDP socket is bound to netdev eth0 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) queue 17. Only the XDP program executing for eth0 and queue 17 will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) successfully pass data to the socket. Please refer to the sample
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) application (samples/bpf/) in for an example.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) Configuration Flags and Socket Options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) These are the various configuration flags that can be used to control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) and monitor the behavior of AF_XDP sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) XDP_COPY and XDP_ZERO_COPY bind flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) -------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) When you bind to a socket, the kernel will first try to use zero-copy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) copy. If zero-copy is not supported, it will fall back on using copy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) mode, i.e. copying all packets out to user space. But if you would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) like to force a certain mode, you can use the following flags. If you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) pass the XDP_COPY flag to the bind call, the kernel will force the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) socket into copy mode. If it cannot use copy mode, the bind call will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) fail with an error. Conversely, the XDP_ZERO_COPY flag will force the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) socket into zero-copy mode or fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) XDP_SHARED_UMEM bind flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) This flag enables you to bind multiple sockets to the same UMEM. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) works on the same queue id, between queue ids and between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) netdevs/devices. In this mode, each socket has their own RX and TX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) rings as usual, but you are going to have one or more FILL and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) COMPLETION ring pairs. You have to create one of these pairs per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) unique netdev and queue id tuple that you bind to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) Starting with the case were we would like to share a UMEM between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) sockets bound to the same netdev and queue id. The UMEM (tied to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) fist socket created) will only have a single FILL ring and a single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) COMPLETION ring as there is only on unique netdev,queue_id tuple that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) we have bound to. To use this mode, create the first socket and bind
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) it in the normal way. Create a second socket and create an RX and a TX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) ring, or at least one of them, but no FILL or COMPLETION rings as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) ones from the first socket will be used. In the bind call, set he
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) XDP_SHARED_UMEM option and provide the initial socket's fd in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) sockets this way.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) What socket will then a packet arrive on? This is decided by the XDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) program. Put all the sockets in the XSK_MAP and just indicate which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) index in the array you would like to send each packet to. A simple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) round-robin example of distributing packets is shown below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)    #include <linux/bpf.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)    #include "bpf_helpers.h"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)    #define MAX_SOCKS 16
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)    struct {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)         __uint(type, BPF_MAP_TYPE_XSKMAP);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)         __uint(max_entries, MAX_SOCKS);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)         __uint(key_size, sizeof(int));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)         __uint(value_size, sizeof(int));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)    } xsks_map SEC(".maps");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)    static unsigned int rr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)    SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302)    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) 	rr = (rr + 1) & (MAX_SOCKS - 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) 	return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306)    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) Note, that since there is only a single set of FILL and COMPLETION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) rings, and they are single producer, single consumer rings, you need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) to make sure that multiple processes or threads do not use these rings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) concurrently. There are no synchronization primitives in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) libbpf code that protects multiple users at this point in time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) Libbpf uses this mode if you create more than one socket tied to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) same UMEM. However, note that you need to supply the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) xsk_socket__create calls and load your own XDP program as there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) built in one in libbpf that will route the traffic for you.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) The second case is when you share a UMEM between sockets that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) bound to different queue ids and/or netdevs. In this case you have to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) create one FILL ring and one COMPLETION ring for each unique
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) netdev,queue_id pair. Let us say you want to create two sockets bound
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) to two different queue ids on the same netdev. Create the first socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) and bind it in the normal way. Create a second socket and create an RX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) and a TX ring, or at least one of them, and then one FILL and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) COMPLETION ring for this socket. Then in the bind call, set he
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) XDP_SHARED_UMEM option and provide the initial socket's fd in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) sxdp_shared_umem_fd field as you registered the UMEM on that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) socket. These two sockets will now share one and the same UMEM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) There is no need to supply an XDP program like the one in the previous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) case where sockets were bound to the same queue id and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) device. Instead, use the NIC's packet steering capabilities to steer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) the packets to the right queue. In the previous example, there is only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) one queue shared among sockets, so the NIC cannot do this steering. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) can only steer between queues.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) In libbpf, you need to use the xsk_socket__create_shared() API as it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) takes a reference to a FILL ring and a COMPLETION ring that will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) created for you and bound to the shared UMEM. You can use this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) function for all the sockets you create, or you can use it for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) second and following ones and use xsk_socket__create() for the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) one. Both methods yield the same result.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) Note that a UMEM can be shared between sockets on the same queue id
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) and device, as well as between queues on the same device and between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) devices at the same time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) XDP_USE_NEED_WAKEUP bind flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) This option adds support for a new flag called need_wakeup that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) present in the FILL ring and the TX ring, the rings for which user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) space is a producer. When this option is set in the bind call, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) need_wakeup flag will be set if the kernel needs to be explicitly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) woken up by a syscall to continue processing packets. If the flag is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) zero, no syscall is needed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) If the flag is set on the FILL ring, the application needs to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) poll() to be able to continue to receive packets on the RX ring. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) can happen, for example, when the kernel has detected that there are no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) more buffers on the FILL ring and no buffers left on the RX HW ring of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) the NIC. In this case, interrupts are turned off as the NIC cannot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) receive any packets (as there are no buffers to put them in), and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) need_wakeup flag is set so that user space can put buffers on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) FILL ring and then call poll() so that the kernel driver can put these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) buffers on the HW ring and start to receive packets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) If the flag is set for the TX ring, it means that the application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) needs to explicitly notify the kernel to send any packets put on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) TX ring. This can be accomplished either by a poll() call, as in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) RX path, or by calling sendto().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) An example of how to use this flag can be found in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) samples/bpf/xdpsock_user.c. An example with the use of libbpf helpers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) would look like this for the TX path:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381)    if (xsk_ring_prod__needs_wakeup(&my_tx_ring))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382)       sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) I.e., only use the syscall if the flag is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) We recommend that you always enable this mode as it usually leads to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) better performance especially if you run the application and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) driver on the same core, but also if you use different cores for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) application and the kernel driver, as it reduces the number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) syscalls needed for the TX path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392) XDP_{RX|TX|UMEM_FILL|UMEM_COMPLETION}_RING setsockopts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) ------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) These setsockopts sets the number of descriptors that the RX, TX,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) FILL, and COMPLETION rings respectively should have. It is mandatory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) to set the size of at least one of the RX and TX rings. If you set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) both, you will be able to both receive and send traffic from your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) application, but if you only want to do one of them, you can save
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) resources by only setting up one of them. Both the FILL ring and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) COMPLETION ring are mandatory as you need to have a UMEM tied to your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) first one does not have a UMEM and should in that case not have any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) FILL or COMPLETION rings created as the ones from the shared UMEM will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) be used. Note, that the rings are single-producer single-consumer, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) do not try to access them from multiple processes at the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) time. See the XDP_SHARED_UMEM section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) In libbpf, you can create Rx-only and Tx-only sockets by supplying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) NULL to the rx and tx arguments, respectively, to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) xsk_socket__create function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) If you create a Tx-only socket, we recommend that you do not put any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) packets on the fill ring. If you do this, drivers might think you are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) going to receive something when you in fact will not, and this can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) negatively impact performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) XDP_UMEM_REG setsockopt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) This setsockopt registers a UMEM to a socket. This is the area that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) contain all the buffers that packet can recide in. The call takes a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) pointer to the beginning of this area and the size of it. Moreover, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) also has parameter called chunk_size that is the size that the UMEM is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) divided into. It can only be 2K or 4K at the moment. If you have an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) UMEM area that is 128K and a chunk size of 2K, this means that you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) will be able to hold a maximum of 128K / 2K = 64 packets in your UMEM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) area and that your largest packet size can be 2K.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) There is also an option to set the headroom of each single buffer in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) the UMEM. If you set this to N bytes, it means that the packet will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) start N bytes into the buffer leaving the first N bytes for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) application to use. The final option is the flags field, but it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434) be dealt with in separate sections for each UMEM flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) XDP_STATISTICS getsockopt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) Gets drop statistics of a socket that can be useful for debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) purposes. The supported statistics are shown below:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444)    struct xdp_statistics {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) 	  __u64 rx_dropped; /* Dropped for reasons other than invalid desc */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) 	  __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) 	  __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448)    };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) XDP_OPTIONS getsockopt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) Gets options from an XDP socket. The only one supported so far is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454) XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) In order to use AF_XDP sockets two parts are needed. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) user-space application and the XDP program. For a complete setup and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) usage example, please refer to the sample application. The user-space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) side is xdpsock_user.c and the XDP side is part of libbpf.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) The XDP code sample included in tools/lib/bpf/xsk.c is the following:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468)    SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469)    {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470)        int index = ctx->rx_queue_index;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472)        // A set entry here means that the corresponding queue_id
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473)        // has an active AF_XDP socket bound to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474)        if (bpf_map_lookup_elem(&xsks_map, &index))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475)            return bpf_redirect_map(&xsks_map, index, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477)        return XDP_PASS;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478)    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) A simple but not so performance ring dequeue and enqueue could look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485)     // struct xdp_rxtx_ring {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486)     // 	__u32 *producer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487)     // 	__u32 *consumer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488)     // 	struct xdp_desc *desc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489)     // };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491)     // struct xdp_umem_ring {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492)     // 	__u32 *producer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493)     // 	__u32 *consumer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494)     // 	__u64 *desc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495)     // };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497)     // typedef struct xdp_rxtx_ring RING;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498)     // typedef struct xdp_umem_ring RING;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500)     // typedef struct xdp_desc RING_TYPE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501)     // typedef __u64 RING_TYPE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503)     int dequeue_one(RING *ring, RING_TYPE *item)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505)         __u32 entries = *ring->producer - *ring->consumer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507)         if (entries == 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508)             return -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510)         // read-barrier!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512)         *item = ring->desc[*ring->consumer & (RING_SIZE - 1)];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513)         (*ring->consumer)++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514)         return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517)     int enqueue_one(RING *ring, const RING_TYPE *item)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519)         u32 free_entries = RING_SIZE - (*ring->producer - *ring->consumer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521)         if (free_entries == 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522)             return -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524)         ring->desc[*ring->producer & (RING_SIZE - 1)] = *item;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526)         // write-barrier!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528)         (*ring->producer)++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529)         return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) But please use the libbpf functions as they are optimized and ready to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) use. Will make your life easier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) Sample application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) There is a xdpsock benchmarking/test application included that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) demonstrates how to use AF_XDP sockets with private UMEMs. Say that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) you would like your UDP traffic from port 4242 to end up in queue 16,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) that we will enable AF_XDP on. Here, we use ethtool for this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543)       ethtool -N p3p2 rx-flow-hash udp4 fn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544)       ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545)           action 16
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) Running the rxdrop benchmark in XDP_DRV mode can then be done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) using::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550)       samples/bpf/xdpsock -i p3p2 -q 16 -r -N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) can be displayed with "-h", as usual.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) This sample application uses libbpf to make the setup and usage of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) AF_XDP simpler. If you want to know how the raw uapi of AF_XDP is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) really used to make something more advanced, take a look at the libbpf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) code in tools/lib/bpf/xsk.[ch].
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560) FAQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) =======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) Q: I am not seeing any traffic on the socket. What am I doing wrong?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) A: When a netdev of a physical NIC is initialized, Linux usually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566)    allocates one RX and TX queue pair per core. So on a 8 core system,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567)    queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568)    bind call or the xsk_socket__create libbpf function call, you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569)    specify a specific queue id to bind to and it is only the traffic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570)    towards that queue you are going to get on you socket. So in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571)    example above, if you bind to queue 0, you are NOT going to get any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572)    traffic that is distributed to queues 1 through 7. If you are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573)    lucky, you will see the traffic, but usually it will end up on one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574)    of the queues you have not bound to.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576)    There are a number of ways to solve the problem of getting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577)    traffic you want to the queue id you bound to. If you want to see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578)    all the traffic, you can force the netdev to only have 1 queue, queue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579)    id 0, and then bind to queue 0. You can use ethtool to do this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581)      sudo ethtool -L <interface> combined 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583)    If you want to only see part of the traffic, you can program the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584)    NIC through ethtool to filter out your traffic to a single queue id
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585)    that you can bind your XDP socket to. Here is one example in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586)    UDP traffic to and from port 4242 are sent to queue 2::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588)      sudo ethtool -N <interface> rx-flow-hash udp4 fn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589)      sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590)      4242 action 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592)    A number of other ways are possible all up to the capabilities of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593)    the NIC you have.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) Q: Can I use the XSKMAP to implement a switch betwen different umems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596)    in copy mode?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) A: The short answer is no, that is not supported at the moment. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599)    XSKMAP can only be used to switch traffic coming in on queue id X
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600)    to sockets bound to the same queue id X. The XSKMAP can contain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601)    sockets bound to different queue ids, for example X and Y, but only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602)    traffic goming in from queue id Y can be directed to sockets bound
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603)    to the same queue id Y. In zero-copy mode, you should use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604)    switch, or other distribution mechanism, in your NIC to direct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605)    traffic to the correct queue id and socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) Q: My packets are sometimes corrupted. What is wrong?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) A: Care has to be taken not to feed the same buffer in the UMEM into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610)    more than one ring at the same time. If you for example feed the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611)    same buffer into the FILL ring and the TX ring at the same time, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612)    NIC might receive data into the buffer at the same time it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613)    sending it. This will cause some packets to become corrupted. Same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614)    thing goes for feeding the same buffer into the FILL rings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615)    belonging to different queue ids or netdevs bound with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616)    XDP_SHARED_UMEM flag.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) Credits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) =======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) - Björn Töpel (AF_XDP core)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) - Magnus Karlsson (AF_XDP core)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) - Alexander Duyck
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) - Alexei Starovoitov
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) - Daniel Borkmann
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) - Jesper Dangaard Brouer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) - John Fastabend
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) - Jonathan Corbet (LWN coverage)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) - Michael S. Tsirkin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) - Qi Z Zhang
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) - Willem de Bruijn