Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) Ethernet switch device driver model (switchdev)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) Copyright |copy| 2014 Jiri Pirko <jiri@resnulli.us>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) Copyright |copy| 2014-2015 Scott Feldman <sfeldma@gmail.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) The Ethernet switch device driver model (switchdev) is an in-kernel driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) model for switch devices which offload the forwarding (data) plane from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) Figure 1 is a block diagram showing the components of the switchdev model for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) an example setup using a data-center-class switch ASIC chip.  Other setups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) with SR-IOV or soft switches, such as OVS, are possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 			     User-space tools
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)        user space                   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)       +-------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)        kernel                       | Netlink
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 				    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 		     +--------------+-------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 		     |         Network stack                        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 		     |           (Linux)                            |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 		     |                                              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 		     +----------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 			   sw1p2     sw1p4     sw1p6
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 		      sw1p1  +  sw1p3  +  sw1p5  +          eth1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 			+    |    +    |    +    |            +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 			|    |    |    |    |    |            |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 		     +--+----+----+----+----+----+---+  +-----+-----+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 		     |         Switch driver         |  |    mgmt   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 		     |        (this document)        |  |   driver  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 		     |                               |  |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 		     +--------------+----------------+  +-----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 				    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)        kernel                       | HW bus (eg PCI)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)       +-------------------------------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)        hardware                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 		     +--------------+----------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 		     |         Switch device (sw1)   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 		     |  +----+                       +--------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 		     |  |    v offloaded data path   | mgmt port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 		     |  |    |                       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 		     +--|----|----+----+----+----+---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 			|    |    |    |    |    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 			+    +    +    +    +    +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 		       p1   p2   p3   p4   p5   p6
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 			     front-panel ports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 				    Fig 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) Include Files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)     #include <linux/netdevice.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)     #include <net/switchdev.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) Configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) Use "depends NET_SWITCHDEV" in driver's Kconfig to ensure switchdev model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) support is built for driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) Switch Ports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) On switchdev driver initialization, the driver will allocate and register a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) struct net_device (using register_netdev()) for each enumerated physical switch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) port, called the port netdev.  A port netdev is the software representation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) the physical port and provides a conduit for control traffic to/from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) controller (the kernel) and the network, as well as an anchor point for higher
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) level constructs such as bridges, bonds, VLANs, tunnels, and L3 routers.  Using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) standard netdev tools (iproute2, ethtool, etc), the port netdev can also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) provide to the user access to the physical properties of the switch port such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) as PHY link state and I/O statistics.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) There is (currently) no higher-level kernel object for the switch beyond the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) port netdevs.  All of the switchdev driver ops are netdev ops or switchdev ops.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) A switch management port is outside the scope of the switchdev driver model.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) Typically, the management port is not participating in offloaded data plane and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) is loaded with a different driver, such as a NIC driver, on the management port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) Switch ID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) The switchdev driver must implement the net_device operation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ndo_get_port_parent_id for each port netdev, returning the same physical ID for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) each port of a switch. The ID must be unique between switches on the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) system. The ID does not need to be unique between switches on different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) The switch ID is used to locate ports on a switch and to know if aggregated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) ports belong to the same switch.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) Port Netdev Naming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) ^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) Udev rules should be used for port netdev naming, using some unique attribute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) of the port as a key, for example the port MAC address or the port PHYS name.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Hard-coding of kernel netdev names within the driver is discouraged; let the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) kernel pick the default netdev name, and let udev set the final name based on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) port attribute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) Using port PHYS name (ndo_get_phys_port_name) for the key is particularly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) useful for dynamically-named ports where the device names its ports based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) external configuration.  For example, if a physical 40G port is split logically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) name for each port using port PHYS name.  The udev rule would be::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)     SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 	    ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) would be sub-port 0 on port 1 on switch 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) Port Features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) ^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) NETIF_F_NETNS_LOCAL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) If the switchdev driver (and device) only supports offloading of the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) network namespace (netns), the driver should set this feature flag to prevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) the port netdev from being moved out of the default netns.  A netns-aware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) driver/device would not set this flag and be responsible for partitioning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) hardware to preserve netns containment.  This means hardware cannot forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) traffic from a port in one namespace to another port in another namespace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) Port Topology
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) ^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) The port netdevs representing the physical switch ports can be organized into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) higher-level switching constructs.  The default construct is a standalone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) router port, used to offload L3 forwarding.  Two or more ports can be bonded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) together to form a LAG.  Two or more ports (or LAGs) can be bridged to bridge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) L2 networks.  VLANs can be applied to sub-divide L2 networks.  L2-over-L3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) tunnels can be built on ports.  These constructs are built using standard Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) tools such as the bridge driver, the bonding/team drivers, and netlink-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) tools such as iproute2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) The switchdev driver can know a particular port's position in the topology by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) monitoring NETDEV_CHANGEUPPER notifications.  For example, a port moved into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) bond will see it's upper master change.  If that bond is moved into a bridge,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) the bond's upper master will change.  And so on.  The driver will track such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) movements to know what position a port is in in the overall topology by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) registering for netdevice events and acting on NETDEV_CHANGEUPPER.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) L2 Forwarding Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) The idea is to offload the L2 data forwarding (switching) path from the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) to the switchdev device by mirroring bridge FDB entries down to the device.  An
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) FDB entry is the {port, MAC, VLAN} tuple forwarding destination.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) To offloading L2 bridging, the switchdev driver/device should support:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 	- Static FDB entries installed on a bridge port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 	- Notification of learned/forgotten src mac/vlans from device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 	- STP state changes on the port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 	- VLAN flooding of multicast/broadcast and unknown unicast packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) Static FDB Entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) ^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) The switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) to support static FDB entries installed to the device.  Static bridge FDB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) entries are installed, for example, using iproute2 bridge cmd::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 	bridge fdb add ADDR dev DEV [vlan VID] [self]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) The driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) ops, and handle add/delete/dump of SWITCHDEV_OBJ_ID_PORT_FDB object using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) switchdev_port_obj_xxx ops.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) XXX: what should be done if offloading this rule to hardware fails (for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) example, due to full capacity in hardware tables) ?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) Note: by default, the bridge does not filter on VLAN and only bridges untagged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) traffic.  To enable VLAN support, turn on VLAN filtering::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 	echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) Notification of Learned/Forgotten Source MAC/VLANs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) The switch device will learn/forget source MAC address/VLAN on ingress packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) and notify the switch driver of the mac/vlan/port tuples.  The switch driver,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) in turn, will notify the bridge driver using the switchdev notifier call::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) 	err = call_switchdev_notifiers(val, dev, info, extack);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) forgetting, and info points to a struct switchdev_notifier_fdb_info.  On
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) SWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) bridge's FDB and mark the entry as NTF_EXT_LEARNED.  The iproute2 bridge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) command will label these entries "offload"::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 	$ bridge fdb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) 	52:54:00:12:35:01 dev sw1p1 master br0 permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 	00:02:00:00:02:00 dev sw1p1 master br0 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 	00:02:00:00:02:00 dev sw1p1 self
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 	52:54:00:12:35:02 dev sw1p2 master br0 permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 	00:02:00:00:03:00 dev sw1p2 master br0 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) 	00:02:00:00:03:00 dev sw1p2 self
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) 	33:33:00:00:00:01 dev eth0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 	01:00:5e:00:00:01 dev eth0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) 	33:33:ff:00:00:00 dev eth0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) 	01:80:c2:00:00:0e dev eth0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 	33:33:00:00:00:01 dev br0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) 	01:00:5e:00:00:01 dev br0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) 	33:33:ff:12:35:01 dev br0 self permanent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) Learning on the port should be disabled on the bridge using the bridge command::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 	bridge link set dev DEV learning off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) Learning on the device port should be enabled, as well as learning_sync::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) 	bridge link set dev DEV learning on self
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) 	bridge link set dev DEV learning_sync on self
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) Learning_sync attribute enables syncing of the learned/forgotten FDB entry to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) the bridge's FDB.  It's possible, but not optimal, to enable learning on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) device port and on the bridge port, and disable learning_sync.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) To support learning, the driver implements switchdev op
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) switchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) FDB Ageing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) ^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) The bridge will skip ageing FDB entries marked with NTF_EXT_LEARNED and it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) the responsibility of the port driver/device to age out these entries.  If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) port device supports ageing, when the FDB entry expires, it will notify the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) driver which in turn will notify the bridge with SWITCHDEV_FDB_DEL.  If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) device does not support ageing, the driver can simulate ageing using a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) garbage collection timer to monitor FDB entries.  Expired entries will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) notified to the bridge using SWITCHDEV_FDB_DEL.  See rocker driver for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) example of driver running ageing timer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) To keep an NTF_EXT_LEARNED entry "alive", the driver should refresh the FDB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) entry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...).  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) notification will reset the FDB entry's last-used time to now.  The driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) should rate limit refresh notifications, for example, no more than once a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) second.  (The last-used time is visible using the bridge -s fdb option).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) STP State Change on Port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) ^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) Internally or with a third-party STP protocol implementation (e.g. mstpd), the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) bridge driver maintains the STP state for ports, and will notify the switch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) driver of STP state change on a port using the switchdev op
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) switchdev_attr_port_set for SWITCHDEV_ATTR_PORT_ID_STP_UPDATE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) State is one of BR_STATE_*.  The switch driver can use STP state updates to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) update ingress packet filter list for the port.  For example, if port is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) DISABLED, no packets should pass, but if port moves to BLOCKED, then STP BPDUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) and other IEEE 01:80:c2:xx:xx:xx link-local multicast packets can pass.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) Note that STP BDPUs are untagged and STP state applies to all VLANs on the port
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) so packet filters should be applied consistently across untagged and tagged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) VLANs on the port.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) Flooding L2 domain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) ^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) For a given L2 VLAN domain, the switch device should flood multicast/broadcast
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) and unknown unicast packets to all ports in domain, if allowed by port's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) current STP state.  The switch driver, knowing which ports are within which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) vlan L2 domain, can program the switch device for flooding.  The packet may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) be sent to the port netdev for processing by the bridge driver.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) bridge should not reflood the packet to the same ports the device flooded,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) otherwise there will be duplicate packets on the wire.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) To avoid duplicate packets, the switch driver should mark a packet as already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) forwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) the skb using the ingress bridge port's mark and prevent it from being forwarded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) through any bridge port with the same mark.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) It is possible for the switch device to not handle flooding and push the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) packets up to the bridge driver for flooding.  This is not ideal as the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) of ports scale in the L2 domain as the device is much more efficient at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) flooding packets that software.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) If supported by the device, flood control can be offloaded to it, preventing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) certain netdevs from flooding unicast traffic for which there is no FDB entry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) IGMP Snooping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) ^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) In order to support IGMP snooping, the port netdevs should trap to the bridge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) driver all IGMP join and leave messages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) The bridge multicast module will notify port netdevs on every multicast group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) changed whether it is static configured or dynamically joined/leave.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) The hardware implementation should be forwarding all registered multicast
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) traffic groups only to the configured ports.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) L3 Routing Offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) Offloading L3 routing requires that device be programmed with FIB entries from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) the kernel, with the device doing the FIB lookup and forwarding.  The device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) does a longest prefix match (LPM) on FIB entries matching route prefix and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) forwards the packet to the matching FIB entry's nexthop(s) egress ports.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) To program the device, the driver has to register a FIB notifier handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) using register_fib_notifier. The following events are available:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) ===================  ===================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) FIB_EVENT_ENTRY_ADD  used for both adding a new FIB entry to the device,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) 		     or modifying an existing entry on the device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) FIB_EVENT_ENTRY_DEL  used for removing a FIB entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) FIB_EVENT_RULE_ADD,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) FIB_EVENT_RULE_DEL   used to propagate FIB rule changes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) ===================  ===================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) 	struct fib_entry_notifier_info {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) 		struct fib_notifier_info info; /* must be first */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) 		u32 dst;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) 		int dst_len;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) 		struct fib_info *fi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) 		u8 tos;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 		u8 type;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) 		u32 tb_id;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) 		u32 nlflags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) 	};
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) to add/modify/delete IPv4 dst/dest_len prefix on table tb_id.  The ``*fi``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) structure holds details on the route and route's nexthops.  ``*dev`` is one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) of the port netdevs mentioned in the route's next hop list.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) Routes offloaded to the device are labeled with "offload" in the ip route
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) listing::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) 	$ ip route show
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) 	default via 192.168.0.2 dev eth0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) 	11.0.0.0/30 dev sw1p1  proto kernel  scope link  src 11.0.0.2 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) 	11.0.0.4/30 via 11.0.0.1 dev sw1p1  proto zebra  metric 20 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) 	11.0.0.8/30 dev sw1p2  proto kernel  scope link  src 11.0.0.10 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) 	11.0.0.12/30 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) 	12.0.0.2  proto zebra  metric 30 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) 		nexthop via 11.0.0.1  dev sw1p1 weight 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) 		nexthop via 11.0.0.9  dev sw1p2 weight 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) 	12.0.0.3 via 11.0.0.1 dev sw1p1  proto zebra  metric 20 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 	12.0.0.4 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) 	192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) The "offload" flag is set in case at least one device offloads the FIB entry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) XXX: add/mod/del IPv6 FIB API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) Nexthop Resolution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) ^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) The FIB entry's nexthop list contains the nexthop tuple (gateway, dev), but for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) the switch device to forward the packet with the correct dst mac address, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) nexthop gateways must be resolved to the neighbor's mac address.  Neighbor mac
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) address discovery comes via the ARP (or ND) process and is available via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) arp_tbl neighbor table.  To resolve the routes nexthop gateways, the driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) should trigger the kernel's neighbor resolution process.  See the rocker
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) driver's rocker_port_ipv4_resolve() for an example.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) The driver can monitor for updates to arp_tbl using the netevent notifier
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) to know when arp_tbl neighbor entries are purged from the port.