^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Segmentation Offloads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) This document describes a set of techniques in the Linux networking stack
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) to take advantage of segmentation offload capabilities of various NICs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) The following technologies are described:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) * TCP Segmentation Offload - TSO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) * UDP Fragmentation Offload - UFO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) * IPIP, SIT, GRE, and UDP Tunnel Offloads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) * Generic Segmentation Offload - GSO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) * Generic Receive Offload - GRO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) * Partial Generic Segmentation Offload - GSO_PARTIAL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) * SCTP acceleration with GSO - GSO_BY_FRAGS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) TCP Segmentation Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) TCP segmentation allows a device to segment a single frame into multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) frames with a data payload size specified in skb_shinfo()->gso_size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) skb_shinfo()->gso_size should be set to a non-zero value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) TCP segmentation is dependent on support for the use of partial checksum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) offload. For this reason TSO is normally disabled if the Tx checksum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) offload for a given device is disabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) In order to support TCP segmentation offload it is necessary to populate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) the network and transport header offsets of the skbuff so that the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) drivers will be able determine the offsets of the IP or IPv6 header and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) also point to the TCP header of the packet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) For IPv4 segmentation we support one of two types in terms of the IP ID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) The default behavior is to increment the IP ID with every segment. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) ID and all segments will use the same IP ID. If a device has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) and we will either increment the IP ID for all frames, or leave it at a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) static value based on driver preference.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) UDP Fragmentation Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) UDP fragmentation offload allows a device to fragment an oversized UDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) datagram into multiple IPv4 fragments. Many of the requirements for UDP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) fragmentation offload are the same as TSO. However the IPv4 ID for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) fragments should not increment as a single IPv4 datagram is fragmented.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) UFO is deprecated: modern kernels will no longer generate UFO skbs, but can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) still receive them from tuntap and similar devices. Offload of UDP-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) tunnel protocols is still supported.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) ========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) In addition to the offloads described above it is possible for a frame to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) contain additional headers such as an outer tunnel. In order to account
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) for such instances an additional set of segmentation offload types were
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) cases where there are more than just 1 set of headers. For example in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) case of IPIP and SIT we should have the network and transport headers moved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) from the standard list of headers to "inner" header offsets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Currently only two levels of headers are supported. The convention is to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) refer to the tunnel headers as the outer headers, while the encapsulated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) data is normally referred to as the inner headers. Below is the list of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) calls to access the given headers:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) IPIP/SIT Tunnel::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Outer Inner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) MAC skb_mac_header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) Network skb_network_header skb_inner_network_header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) Transport skb_transport_header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) UDP/GRE Tunnel::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) Outer Inner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) MAC skb_mac_header skb_inner_mac_header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) Network skb_network_header skb_inner_network_header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) Transport skb_transport_header skb_inner_transport_header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) fact that the outer header also requests to have a non-zero checksum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) included in the outer header.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) header has requested a remote checksum offload. In this case the inner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) headers will be left with a partial checksum and only the outer header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) checksum will be computed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Generic Segmentation Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Generic segmentation offload is a pure software offload that is meant to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) deal with cases where device drivers cannot perform the offloads described
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) above. What occurs in GSO is that a given skbuff will have its data broken
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) out over multiple skbuffs that have been resized to match the MSS provided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) via skb_shinfo()->gso_size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Before enabling any hardware segmentation offload a corresponding software
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) offload is required in GSO. Otherwise it becomes possible for a frame to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) be re-routed between devices and end up being unable to be transmitted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Generic Receive Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) Generic receive offload is the complement to GSO. Ideally any frame
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) assembled by GRO should be segmented to create an identical sequence of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) frames using GSO, and any sequence of frames segmented by GSO should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) able to be reassembled back to the original by GRO. The only exception to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) this is IPv4 ID in the case that the DF bit is set for a given IP header.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) If the value of the IPv4 ID is not sequentially incrementing it will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) altered so that it is when a frame assembled via GRO is segmented via GSO.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Partial Generic Segmentation Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) Partial generic segmentation offload is a hybrid between TSO and GSO. What
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) it effectively does is take advantage of certain traits of TCP and tunnels
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) so that instead of having to rewrite the packet headers for each segment
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) only the inner-most transport header and possibly the outer-most network
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) header need to be updated. This allows devices that do not support tunnel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) offloads or tunnel offloads with checksum to still make use of segmentation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) With the partial offload what occurs is that all headers excluding the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) inner transport header are updated such that they will contain the correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) values for if the header was simply duplicated. The one exception to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) is the outer IPv4 ID field. It is up to the device drivers to guarantee
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) that the IPv4 ID field is incremented in the case that a given header does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) not have the DF bit set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) SCTP acceleration with GSO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) SCTP - despite the lack of hardware support - can still take advantage of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) GSO to pass one large packet through the network stack, rather than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) multiple small packets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) This requires a different approach to other offloads, as SCTP packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) cannot be just segmented to (P)MTU. Rather, the chunks must be contained in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) IP segments, padding respected. So unlike regular GSO, SCTP can't just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) generate a big skb, set gso_size to the fragmentation point and deliver it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) to IP layer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) Instead, the SCTP protocol layer builds an skb with the segments correctly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) padded and stored as chained skbs, and skb_segment() splits based on those.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) To signal this, gso_size is set to the special value GSO_BY_FRAGS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) Therefore, any code in the core networking stack must be aware of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) possibility that gso_size will be GSO_BY_FRAGS and handle that case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) appropriately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) There are some helpers to make this easier:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) an skb is an SCTP GSO skb.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) - For size checks, the skb_gso_validate_*_len family of helpers correctly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) considers GSO_BY_FRAGS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.