Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Netfilter's flowtable infrastructure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) This documentation describes the software flowtable infrastructure available in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Netfilter since Linux kernel 4.16.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) --------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) Initial packets follow the classic forwarding path, once the flow enters the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) established state according to the conntrack semantics (ie. we have seen traffic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) in both directions), then you can decide to offload the flow to the flowtable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) from the forward chain via the 'flow offload' action available in nftables.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) path (the visible effect is that you do not see these packets from any of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) netfilter hooks coming after the ingress). In case of flowtable miss, the packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) follows the classic forward path.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) The flowtable uses a resizable hashtable, lookups are based on the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) and destination ports and the input interface (useful in case there are several
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) conntrack zones in place).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) Flowtables are populated via the 'flow offload' nftables action, so the user can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) selectively specify what flows are placed into the flow table. Hence, packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) follow the classic forwarding path unless the user explicitly instruct packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) to use this new alternative forwarding path via nftables policy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) This is represented in Fig.1, which describes the classic forwarding path
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) including the Netfilter hooks and the flowtable fastpath bypass.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 					 userspace process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 					  ^              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 					  |              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 				     _____|____     ____\/___
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 				    /          \   /         \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 				    |   input   |  |  output  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 				    \__________/   \_________/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 					 ^               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 					 |               |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)       _________      __________      ---------     _____\/_____
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49)      /         \    /          \     |Routing |   /            \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)   -->  ingress  ---> prerouting ---> |decision|   | postrouting |--> neigh_xmit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)      \_________/    \__________/     ----------   \____________/          ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52)        |      ^                          |               ^                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)    flowtable  |                     ____\/___            |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)        |      |                    /         \           |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)     __\/___   |                    | forward |------------                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)     |-----|   |                    \_________/                            |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)     |-----|   |                 'flow offload' rule                       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)     |-----|   |                   adds entry to                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)     |_____|   |                     flowtable                             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)        |      |                                                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)       / \     |                                                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)      /hit\_no_|                                                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)      \ ? /                                                                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)       \ /                                                                 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)        |__yes_________________fastpath bypass ____________________________|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 	       Fig.1 Netfilter hooks and flowtable interactions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) The flowtable entry also stores the NAT configuration, so all packets are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) mangled according to the NAT policy that matches the initial packets that went
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) through the classic forwarding path. The TTL is decremented before calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) path given that the transport selectors are missing, therefore flowtable lookup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) is not possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) Example configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) Enabling the flowtable bypass is relatively easy, you only need to create a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) flowtable and add one rule to your forward chain::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 	table inet x {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 		flowtable f {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 			hook ingress priority 0; devices = { eth0, eth1 };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 		chain y {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 			type filter hook forward priority 0; policy accept;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 			ip protocol tcp flow offload @f
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 			counter packets 0 bytes 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) netdevices. You can create as many flowtables as you want in case you need to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) perform resource partitioning. The flowtable priority defines the order in which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) hooks are run in the pipeline, this is convenient in case you already have a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) nftables ingress chain (make sure the flowtable priority is smaller than the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) nftables ingress chain hence the flowtable runs before in the pipeline).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) The 'flow offload' action from the forward chain 'y' adds an entry to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) flowtable for the TCP syn-ack packet coming in the reply direction. Once the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) flow is offloaded, you will observe that the counter rule in the example above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) does not get updated for the packets that are being forwarded through the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) forwarding bypass.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) More reading
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) This documentation is based on the LWN.net articles [1]_\ [2]_. Rafal Milecki
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) also made a very complete and comprehensive summary called "A state of network
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) acceleration" that describes how things were before this infrastructure was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) mainlined [3]_ and it also makes a rough summary of this work [4]_.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) .. [1] https://lwn.net/Articles/738214/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) .. [2] https://lwn.net/Articles/742164/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) .. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) .. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html