Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Checksum Offloads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) This document describes a set of techniques in the Linux networking stack to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) take advantage of checksum offload capabilities of various NICs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) The following technologies are described:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) * TX Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) * LCO: Local Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) * RCO: Remote Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) Things that should be documented here but aren't yet:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) * RX Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) * CHECKSUM_UNNECESSARY conversion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) TX Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) The interface for offloading a transmit checksum to a device is explained in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) detail in comments near the top of include/linux/skbuff.h.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) In brief, it allows to request the device fill in a single ones-complement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) The device should compute the 16-bit ones-complement checksum (i.e. the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 'IP-style' checksum) from csum_start to the end of the packet, and fill in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) result at (csum_start + csum_offset).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) Because csum_offset cannot be negative, this ensures that the previous value of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) the checksum field is included in the checksum computation, thus it can be used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) to supply any needed corrections to the checksum (such as the sum of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) pseudo-header for UDP or TCP).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) This interface only allows a single checksum to be offloaded.  Where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) encapsulation is used, the packet may have multiple checksum fields in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) different header layers, and the rest will have to be handled by another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) mechanism such as LCO or RCO.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) CRC32c can also be offloaded using this interface, by means of filling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) skb->csum_start and skb->csum_offset as described above, and setting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) No offloading of the IP header checksum is performed; it is always done in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) software.  This is OK because when we build the IP header, we obviously have it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) in cache, so summing it isn't expensive.  It's also rather short.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) The requirements for GSO are more complicated, because when segmenting an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) encapsulated packet both the inner and outer checksums may need to be edited or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) recomputed for each resulting segment.  See the skbuff.h comment (section 'E')
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) for more details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) A driver declares its offload capabilities in netdev->hw_features; see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) Documentation/networking/netdev-features.rst for more.  Note that a device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) csum_offset given in the SKB; if it tries to deduce these itself in hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) (as some NICs do) the driver should check that the values in the SKB match
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) those which the hardware will deduce, and if not, fall back to checksumming in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) software instead (with skb_csum_hwoffload_help() or one of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) include/linux/skbuff.h).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) The stack should, for the most part, assume that checksum offload is supported
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) by the underlying device.  The only place that should check is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) validate_xmit_skb(), and the functions it calls directly or indirectly.  That
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) function compares the offload features requested by the SKB (which may include
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) other offloads besides TX Checksum Offload) and, if they are not supported or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) enabled on the device (determined by netdev->features), performs the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) corresponding offload in software.  In the case of TX Checksum Offload, that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) means calling skb_csum_hwoffload_help(skb, features).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) LCO: Local Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) LCO is a technique for efficiently computing the outer checksum of an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) encapsulated datagram when the inner checksum is due to be offloaded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) The ones-complement sum of a correctly checksummed TCP or UDP packet is equal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) to the complement of the sum of the pseudo header, because everything else gets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 'cancelled out' by the checksum field.  This is because the sum was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) complemented before being written to the checksum field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) More generally, this holds in any case where the 'IP-style' ones complement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) checksum is used, and thus any checksum that TX Checksum Offload supports.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) That is, if we have set up TX Checksum Offload with a start/offset pair, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) know that after the device has filled in that checksum, the ones complement sum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) from csum_start to the end of the packet will be equal to the complement of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) whatever value we put in the checksum field beforehand.  This allows us to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) compute the outer checksum without looking at the payload: we simply stop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) summing when we get to csum_start, then add the complement of the 16-bit word
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) at (csum_start + csum_offset).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) Then, when the true inner checksum is filled in (either by hardware or by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) skb_checksum_help()), the outer checksum will become correct by virtue of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) arithmetic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) LCO is performed by the stack when constructing an outer UDP header for an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) encapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) IPv6 equivalents, in udp6_set_csum().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) It is also performed when constructing an IPv4 GRE header, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) net/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) constructing an IPv6 GRE header; the GRE checksum is computed over the whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) LCO here as IPv6 GRE still uses an IP-style checksum.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) All of the LCO implementations use a helper function lco_csum(), in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) include/linux/skbuff.h.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) LCO can safely be used for nested encapsulations; in this case, the outer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) encapsulation layer will sum over both its own header and the 'middle' header.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) This does mean that the 'middle' header will get summed multiple times, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) there doesn't seem to be a way to avoid that without incurring bigger costs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) (e.g. in SKB bloat).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) RCO: Remote Checksum Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) RCO is a technique for eliding the inner checksum of an encapsulated datagram,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) allowing the outer checksum to be offloaded.  It does, however, involve a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) change to the encapsulation protocols, which the receiver must also support.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) For this reason, it is disabled by default.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) RCO is detailed in the following Internet-Drafts:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) * https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) * https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) In Linux, RCO is implemented individually in each encapsulation protocol, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) most tunnel types have flags controlling its use.  For instance, VXLAN has the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) used when transmitting to a given remote destination.