^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Open vSwitch datapath developer documentation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) The Open vSwitch kernel module allows flexible userspace control over
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) flow-level packet processing on selected network devices. It can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) used to implement a plain Ethernet switch, network device bonding,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) VLAN processing, network access control, flow-based network control,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) and so on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) The kernel module implements multiple "datapaths" (analogous to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) bridges), each of which can have multiple "vports" (analogous to ports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) within a bridge). Each datapath also has associated with it a "flow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) table" that userspace populates with "flows" that map from keys based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) on packet headers and metadata to sets of actions. The most common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) action forwards the packet to another vport; other actions are also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) implemented.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) When a packet arrives on a vport, the kernel module processes it by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) extracting its flow key and looking it up in the flow table. If there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) is a matching flow, it executes the associated actions. If there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) no match, it queues the packet to userspace for processing (as part of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) its processing, userspace will likely set up a flow to handle further
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) packets of the same type entirely in-kernel).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) Flow key compatibility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) Network protocols evolve over time. New protocols become important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) and existing protocols lose their prominence. For the Open vSwitch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) kernel module to remain relevant, it must be possible for newer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) versions to parse additional protocols as part of the flow key. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) might even be desirable, someday, to drop support for parsing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) protocols that have become obsolete. Therefore, the Netlink interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) to Open vSwitch is designed to allow carefully written userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) applications to work with any version of the flow key, past or future.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) To support this forward and backward compatibility, whenever the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) kernel module passes a packet to userspace, it also passes along the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) flow key that it parsed from the packet. Userspace then extracts its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) own notion of a flow key from the packet and compares it against the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) kernel-provided version:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) - If userspace's notion of the flow key for the packet matches the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) kernel's, then nothing special is necessary.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) - If the kernel's flow key includes more fields than the userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) version of the flow key, for example if the kernel decoded IPv6
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) headers but userspace stopped at the Ethernet type (because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) does not understand IPv6), then again nothing special is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) necessary. Userspace can still set up a flow in the usual way,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) as long as it uses the kernel-provided flow key to do it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) - If the userspace flow key includes more fields than the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) kernel's, for example if userspace decoded an IPv6 header but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) the kernel stopped at the Ethernet type, then userspace can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) forward the packet manually, without setting up a flow in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) kernel. This case is bad for performance because every packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) that the kernel considers part of the flow must go to userspace,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) but the forwarding behavior is correct. (If userspace can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) determine that the values of the extra fields would not affect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) forwarding behavior, then it could set up a flow anyway.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) How flow keys evolve over time is important to making this work, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) the following sections go into detail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) Flow key format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) A flow key is passed over a Netlink socket as a sequence of Netlink
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) attributes. Some attributes represent packet metadata, defined as any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) information about a packet that cannot be extracted from the packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) itself, e.g. the vport on which the packet was received. Most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) attributes, however, are extracted from headers within the packet,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) e.g. source and destination addresses from Ethernet, IP, or TCP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) headers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) The <linux/openvswitch.h> header file defines the exact format of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) flow key attributes. For informal explanatory purposes here, we write
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) them as comma-separated strings, with parentheses indicating arguments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) and nesting. For example, the following could represent a flow key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) corresponding to a TCP packet that arrived on vport 1::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) frag=no), tcp(src=49163, dst=80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) Often we ellipsize arguments not important to the discussion, e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Wildcarded flow key format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) A wildcarded flow is described with two sequences of Netlink attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) passed over the Netlink socket. A flow key, exactly as described above, and an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) optional corresponding flow mask.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) A wildcarded flow can represent a group of exact match flows. Each '1' bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) in the mask specifies a exact match with the corresponding bit in the flow key.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) of a incoming packet. Using wildcarded flow can improve the flow set up rate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) by reduce the number of new flows need to be processed by the user space program.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Support for the mask Netlink attribute is optional for both the kernel and user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) space program. The kernel can ignore the mask attribute, installing an exact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) match flow, or reduce the number of don't care bits in the kernel to less than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) what was specified by the user space program. In this case, variations in bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) that the kernel does not implement will simply result in additional flow setups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) The kernel module will also work with user space programs that neither support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) nor supply flow mask attributes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Since the kernel may ignore or modify wildcard bits, it can be difficult for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) the userspace program to know exactly what matches are installed. There are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) two possible approaches: reactively install flows as they miss the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) flow table (and therefore not attempt to determine wildcard changes at all)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) or use the kernel's response messages to determine the installed wildcards.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) When interacting with userspace, the kernel should maintain the match portion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) of the key exactly as originally installed. This will provides a handle to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) identify the flow for all future operations. However, when reporting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) mask of an installed flow, the mask should include any restrictions imposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) The behavior when using overlapping wildcarded flows is undefined. It is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) responsibility of the user space program to ensure that any incoming packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) can match at most one flow, wildcarded or not. The current implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) performs best-effort detection of overlapping wildcarded flows and may reject
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) some but not all of them. However, this behavior may change in future versions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) Unique flow identifiers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) An alternative to using the original match portion of a key as the handle for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) flow identification is a unique flow identifier, or "UFID". UFIDs are optional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) for both the kernel and user space program.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) User space programs that support UFID are expected to provide it during flow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) setup in addition to the flow, then refer to the flow using the UFID for all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) future operations. The kernel is not required to index flows by the original
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) flow key if a UFID is specified.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) Basic rule for evolving flow keys
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) Some care is needed to really maintain forward and backward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) compatibility for applications that follow the rules listed under
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) "Flow key compatibility" above.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) The basic rule is obvious::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) New network protocol support must only supplement existing flow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) key attributes. It must not change the meaning of already defined
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) flow key attributes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) ==================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) This rule does have less-obvious consequences so it is worth working
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) through a few examples. Suppose, for example, that the kernel module
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) did not already implement VLAN parsing. Instead, it just interpreted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) packet. The flow key for any packet with an 802.1Q header would look
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) essentially like this, ignoring metadata::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) eth(...), eth_type(0x8100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Naively, to add VLAN support, it makes sense to add a new "vlan" flow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) key attribute to contain the VLAN tag, then continue to decode the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) encapsulated headers beyond the VLAN tag using the existing field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) definitions. With this change, a TCP packet in VLAN 10 would have a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) flow key much like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) But this change would negatively affect a userspace application that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) has not been updated to understand the new "vlan" flow key attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) The application could, following the flow compatibility rules above,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) ignore the "vlan" attribute that it does not understand and therefore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) assume that the flow contained IP packets. This is a bad assumption
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) (the flow only contains IP packets if one parses and skips over the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 802.1Q header) and it could cause the application's behavior to change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) across kernel versions even though it follows the compatibility rules.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) The solution is to use a set of nested attributes. This is, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) example, why 802.1Q support uses nested attributes. A TCP packet in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) VLAN 10 is actually expressed as::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) ip(proto=6, ...), tcp(...)))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) Notice how the "eth_type", "ip", and "tcp" flow key attributes are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) nested inside the "encap" attribute. Thus, an application that does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) not understand the "vlan" key will not see either of those attributes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) and therefore will not misinterpret them. (Also, the outer eth_type
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) is still 0x8100, not changed to 0x0800.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) Handling malformed packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) Don't drop packets in the kernel for malformed protocol headers, bad
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) checksums, etc. This would prevent userspace from implementing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) simple Ethernet switch that forwards every packet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) Instead, in such a case, include an attribute with "empty" content.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) It doesn't matter if the empty content could be valid protocol values,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) as long as those values are rarely seen in practice, because userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) can always forward all packets with those values to userspace and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) handle them individually.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) For example, consider a packet that contains an IP header that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) indicates protocol 6 for TCP, but which is truncated just after the IP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) header, so that the TCP header is missing. The flow key for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) packet would include a tcp attribute with all-zero src and dst, like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) As another example, consider a packet with an Ethernet type of 0x8100,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) indicating that a VLAN TCI should follow, but which is truncated just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) after the Ethernet type. The flow key for this packet would include
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) an all-zero-bits vlan and an empty encap attribute, like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) eth(...), eth_type(0x8100), vlan(0), encap()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) Unlike a TCP packet with source and destination ports 0, an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) attribute expressly to allow this situation to be distinguished.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) Thus, the flow key in this second example unambiguously indicates a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) missing or malformed VLAN TCI.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) Other rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) The other rules for flow keys are much less subtle:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) - Duplicate attributes are not allowed at a given nesting level.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) - Ordering of attributes is not significant.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) - When the kernel sends a given flow key to userspace, it always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) composes it the same way. This allows userspace to hash and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) compare entire flow keys that it may not be able to fully
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) interpret.