Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    3) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    4) Packet MMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    5) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    7) Abstract
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    8) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   10) This file documents the mmap() facility available with the PACKET
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   11) socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   13) i) capture network traffic with utilities like tcpdump,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   14) ii) transmit network traffic, or any other that needs raw
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   15)     access to network interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   17) Howto can be found at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   19)     https://sites.google.com/site/packetmmap/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   21) Please send your comments to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   22)     - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   23)     - Johann Baudy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   25) Why use PACKET_MMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   26) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   28) In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   29) inefficient. It uses very limited buffers and requires one system call to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   30) capture each packet, it requires two if you want to get packet's timestamp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   31) (like libpcap always does).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   33) In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   34) configurable circular buffer mapped in user space that can be used to either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   35) send or receive packets. This way reading packets just needs to wait for them,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   36) most of the time there is no need to issue a single system call. Concerning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   37) transmission, multiple packets can be sent through one system call to get the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   38) highest bandwidth. By using a shared buffer between the kernel and the user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   39) also has the benefit of minimizing packet copies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   41) It's fine to use PACKET_MMAP to improve the performance of the capture and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   42) transmission process, but it isn't everything. At least, if you are capturing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   43) at high speeds (this is relative to the cpu speed), you should check if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   44) device driver of your network interface card supports some sort of interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   45) load mitigation or (even better) if it supports NAPI, also make sure it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   46) enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   47) supported by devices of your network. CPU IRQ pinning of your network interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   48) card can also be an advantage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   50) How to use mmap() to improve capture process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   51) ============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   53) From the user standpoint, you should use the higher level libpcap library, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   54) is a de facto standard, portable across nearly all operating systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   55) including Win32.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   57) Packet MMAP support was integrated into libpcap around the time of version 1.3.0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   58) TPACKET_V3 support was added in version 1.5.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   60) How to use mmap() directly to improve capture process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   61) =====================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   63) From the system calls stand point, the use of PACKET_MMAP involves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   64) the following process::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   67)     [setup]     socket() -------> creation of the capture socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   68) 		setsockopt() ---> allocation of the circular buffer (ring)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   69) 				  option: PACKET_RX_RING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   70) 		mmap() ---------> mapping of the allocated buffer to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   71) 				  user process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   73)     [capture]   poll() ---------> to wait for incoming packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   75)     [shutdown]  close() --------> destruction of the capture socket and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   76) 				  deallocation of all associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   77) 				  resources.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   79) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   80) socket creation and destruction is straight forward, and is done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   81) the same way with or without PACKET_MMAP::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   83)  int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   85) where mode is SOCK_RAW for the raw interface were link level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   86) information can be captured or SOCK_DGRAM for the cooked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   87) interface where link level information capture is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   88) supported and a link level pseudo-header is provided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   89) by the kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   91) The destruction of the socket and all associated resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   92) is done by a simple call to close(fd).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   94) Similarly as without PACKET_MMAP, it is possible to use one socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   95) for capture and transmission. This can be done by mapping the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   96) allocated RX and TX buffer ring with a single mmap() call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   97) See "Mapping and use of the circular buffer (ring)".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   99) Next I will describe PACKET_MMAP settings and its constraints,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  100) also the mapping of the circular buffer in the user process and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  101) the use of this buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  103) How to use mmap() directly to improve transmission process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  104) ==========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  105) Transmission process is similar to capture as shown below::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  107)     [setup]         socket() -------> creation of the transmission socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  108) 		    setsockopt() ---> allocation of the circular buffer (ring)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  109) 				      option: PACKET_TX_RING
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  110) 		    bind() ---------> bind transmission socket with a network interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  111) 		    mmap() ---------> mapping of the allocated buffer to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  112) 				      user process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  114)     [transmission]  poll() ---------> wait for free packets (optional)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  115) 		    send() ---------> send all packets that are set as ready in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  116) 				      the ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  117) 				      The flag MSG_DONTWAIT can be used to return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  118) 				      before end of transfer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  120)     [shutdown]      close() --------> destruction of the transmission socket and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  121) 				      deallocation of all associated resources.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  123) Socket creation and destruction is also straight forward, and is done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  124) the same way as in capturing described in the previous paragraph::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  126)  int fd = socket(PF_PACKET, mode, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  128) The protocol can optionally be 0 in case we only want to transmit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  129) via this socket, which avoids an expensive call to packet_rcv().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  130) In this case, you also need to bind(2) the TX_RING with sll_protocol = 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  131) set. Otherwise, htons(ETH_P_ALL) or any other protocol, for example.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  133) Binding the socket to your network interface is mandatory (with zero copy) to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  134) know the header size of frames used in the circular buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  136) As capture, each frame contains two parts::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  138)     --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  139)     | struct tpacket_hdr | Header. It contains the status of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  140)     |                    | of this frame
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  141)     |--------------------|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  142)     | data buffer        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  143)     .                    .  Data that will be sent over the network interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  144)     .                    .
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  145)     --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  147)  bind() associates the socket to your network interface thanks to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  148)  sll_ifindex parameter of struct sockaddr_ll.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  150)  Initialization example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  152)     struct sockaddr_ll my_addr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  153)     struct ifreq s_ifr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  154)     ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  156)     strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  158)     /* get interface index of eth0 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  159)     ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  161)     /* fill sockaddr_ll struct to prepare binding */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  162)     my_addr.sll_family = AF_PACKET;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  163)     my_addr.sll_protocol = htons(ETH_P_ALL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  164)     my_addr.sll_ifindex =  s_ifr.ifr_ifindex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  166)     /* bind socket to eth0 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  167)     bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  169)  A complete tutorial is available at: https://sites.google.com/site/packetmmap/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  171) By default, the user should put data at::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  173)  frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  175) So, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  176) the beginning of the user data will be at::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  178)  frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  180) If you wish to put user data at a custom offset from the beginning of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  181) the frame (for payload alignment with SOCK_RAW mode for instance) you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  182) can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  183) to make this work it must be enabled previously with setsockopt()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  184) and the PACKET_TX_HAS_OFF option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  186) PACKET_MMAP settings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  187) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  189) To setup PACKET_MMAP from user level code is done with a call like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  191)  - Capture process::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  192) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  193)      setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  195)  - Transmission process::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  197)      setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  199) The most significant argument in the previous call is the req parameter,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  200) this parameter must to have the following structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  202)     struct tpacket_req
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  203)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  204) 	unsigned int    tp_block_size;  /* Minimal size of contiguous block */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  205) 	unsigned int    tp_block_nr;    /* Number of blocks */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  206) 	unsigned int    tp_frame_size;  /* Size of frame */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  207) 	unsigned int    tp_frame_nr;    /* Total number of frames */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  208)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  210) This structure is defined in /usr/include/linux/if_packet.h and establishes a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  211) circular buffer (ring) of unswappable memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  212) Being mapped in the capture process allows reading the captured frames and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  213) related meta-information like timestamps without requiring a system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  215) Frames are grouped in blocks. Each block is a physically contiguous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  216) region of memory and holds tp_block_size/tp_frame_size frames. The total number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  217) of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  219)     frames_per_block = tp_block_size/tp_frame_size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  221) indeed, packet_set_ring checks that the following condition is true::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  223)     frames_per_block * tp_block_nr == tp_frame_nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  224) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  225) Lets see an example, with the following values::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  227)      tp_block_size= 4096
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  228)      tp_frame_size= 2048
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  229)      tp_block_nr  = 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  230)      tp_frame_nr  = 8
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  232) we will get the following buffer structure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  234) 	    block #1                 block #2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  235)     +---------+---------+    +---------+---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  236)     | frame 1 | frame 2 |    | frame 3 | frame 4 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  237)     +---------+---------+    +---------+---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  239) 	    block #3                 block #4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  240)     +---------+---------+    +---------+---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  241)     | frame 5 | frame 6 |    | frame 7 | frame 8 |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  242)     +---------+---------+    +---------+---------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  244) A frame can be of any size with the only condition it can fit in a block. A block
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  245) can only hold an integer number of frames, or in other words, a frame cannot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  246) be spawned across two blocks, so there are some details you have to take into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  247) account when choosing the frame_size. See "Mapping and use of the circular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  248) buffer (ring)".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  249) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  250) PACKET_MMAP setting constraints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  251) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  252) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  253) In kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  254) the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  255) 16384 in a 64 bit architecture. For information on these kernel versions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  256) see http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  258) Block size limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  259) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  260) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  261) As stated earlier, each block is a contiguous physical region of memory. These
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  262) memory regions are allocated with calls to the __get_free_pages() function. As
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  263) the name indicates, this function allocates pages of memory, and the second
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  264) argument is "order" or a power of two number of pages, that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  265) (for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  266) order=2 ==> 16384 bytes, etc. The maximum size of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  267) region allocated by __get_free_pages is determined by the MAX_ORDER macro. More
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  268) precisely the limit can be calculated as::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  269) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  270)    PAGE_SIZE << MAX_ORDER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  272)    In a i386 architecture PAGE_SIZE is 4096 bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  273)    In a 2.4/i386 kernel MAX_ORDER is 10
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  274)    In a 2.6/i386 kernel MAX_ORDER is 11
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  275) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  276) So get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  277) respectively, with an i386 architecture.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  278) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  279) User space programs can include /usr/include/sys/user.h and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  280) /usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  282) The pagesize can also be determined dynamically with the getpagesize (2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  283) system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  284) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  285) Block number limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  286) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  287) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  288) To understand the constraints of PACKET_MMAP, we have to see the structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  289) used to hold the pointers to each block.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  290) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  291) Currently, this structure is a dynamically allocated vector with kmalloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  292) called pg_vec, its size limits the number of blocks that can be allocated::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  294)     +---+---+---+---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  295)     | x | x | x | x |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  296)     +---+---+---+---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  297)       |   |   |   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  298)       |   |   |   v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  299)       |   |   v  block #4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  300)       |   v  block #3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  301)       v  block #2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  302)      block #1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  303) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  304) kmalloc allocates any number of bytes of physically contiguous memory from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  305) a pool of pre-determined sizes. This pool of memory is maintained by the slab
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  306) allocator which is at the end the responsible for doing the allocation and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  307) hence which imposes the maximum memory that kmalloc can allocate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  309) In a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  310) predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  311) entries of /proc/slabinfo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  313) In a 32 bit architecture, pointers are 4 bytes long, so the total number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  314) pointers to blocks is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  315) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  316)      131072/4 = 32768 blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  317) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  318) PACKET_MMAP buffer size calculator
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  319) ==================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  321) Definitions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  322) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  323) ==============  ================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  324) <size-max>      is the maximum size of allocable with kmalloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  325) 		(see /proc/slabinfo)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  326) <pointer size>  depends on the architecture -- ``sizeof(void *)``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  327) <page size>     depends on the architecture -- PAGE_SIZE or getpagesize (2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  328) <max-order>     is the value defined with MAX_ORDER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  329) <frame size>    it's an upper bound of frame's capture size (more on this later)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  330) ==============  ================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  332) from these definitions we will derive::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  334) 	<block number> = <size-max>/<pointer size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  335) 	<block size> = <pagesize> << <max-order>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  337) so, the max buffer size is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  338) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  339) 	<block number> * <block size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  340) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  341) and, the number of frames be::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  342) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  343) 	<block number> * <block size> / <frame size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  345) Suppose the following parameters, which apply for 2.6 kernel and an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  346) i386 architecture::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  347) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  348) 	<size-max> = 131072 bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  349) 	<pointer size> = 4 bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  350) 	<pagesize> = 4096 bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  351) 	<max-order> = 11
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  352) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  353) and a value for <frame size> of 2048 bytes. These parameters will yield::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  354) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  355) 	<block number> = 131072/4 = 32768 blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  356) 	<block size> = 4096 << 11 = 8 MiB.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  357) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  358) and hence the buffer will have a 262144 MiB size. So it can hold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  359) 262144 MiB / 2048 bytes = 134217728 frames
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  361) Actually, this buffer size is not possible with an i386 architecture.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  362) Remember that the memory is allocated in kernel space, in the case of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  363) an i386 kernel's memory size is limited to 1GiB.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  365) All memory allocations are not freed until the socket is closed. The memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  366) allocations are done with GFP_KERNEL priority, this basically means that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  367) the allocation can wait and swap other process' memory in order to allocate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  368) the necessary memory, so normally limits can be reached.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  370) Other constraints
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  371) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  373) If you check the source code you will see that what I draw here as a frame
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  374) is not only the link level frame. At the beginning of each frame there is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  375) header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  376) meta information like timestamp. So what we draw here a frame it's really
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  377) the following (from include/linux/if_packet.h)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  379)  /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  380)    Frame structure:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  381) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  382)    - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  383)    - struct tpacket_hdr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  384)    - pad to TPACKET_ALIGNMENT=16
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  385)    - struct sockaddr_ll
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  386)    - Gap, chosen so that packet data (Start+tp_net) aligns to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  387)      TPACKET_ALIGNMENT=16
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  388)    - Start+tp_mac: [ Optional MAC header ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  389)    - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  390)    - Pad to align to TPACKET_ALIGNMENT=16
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  391)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  393) The following are conditions that are checked in packet_set_ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  394) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  395)    - tp_block_size must be a multiple of PAGE_SIZE (1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  396)    - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  397)    - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  398)    - tp_frame_nr   must be exactly frames_per_block*tp_block_nr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  399) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  400) Note that tp_block_size should be chosen to be a power of two or there will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  401) be a waste of memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  402) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  403) Mapping and use of the circular buffer (ring)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  404) ---------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  405) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  406) The mapping of the buffer in the user process is done with the conventional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  407) mmap function. Even the circular buffer is compound of several physically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  408) discontiguous blocks of memory, they are contiguous to the user space, hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  409) just one call to mmap is needed::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  411)     mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  412) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  413) If tp_frame_size is a divisor of tp_block_size frames will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  414) contiguously spaced by tp_frame_size bytes. If not, each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  415) tp_block_size/tp_frame_size frames there will be a gap between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  416) the frames. This is because a frame cannot be spawn across two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  417) blocks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  418) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  419) To use one socket for capture and transmission, the mapping of both the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  420) RX and TX buffer ring has to be done with one call to mmap::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  421) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  422)     ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  423)     setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  424)     setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  425)     ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  426)     rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  427)     tx_ring = rx_ring + size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  428) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  429) RX must be the first as the kernel maps the TX ring memory right
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  430) after the RX one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  431) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  432) At the beginning of each frame there is an status field (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  433) struct tpacket_hdr). If this field is 0 means that the frame is ready
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  434) to be used for the kernel, If not, there is a frame the user can read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  435) and the following flags apply:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  436) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  437) Capture process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  438) ^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  439) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  440)      from include/linux/if_packet.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  441) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  442)      #define TP_STATUS_COPY          (1 << 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  443)      #define TP_STATUS_LOSING        (1 << 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  444)      #define TP_STATUS_CSUMNOTREADY  (1 << 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  445)      #define TP_STATUS_CSUM_VALID    (1 << 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  446) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  447) ======================  =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  448) TP_STATUS_COPY		This flag indicates that the frame (and associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  449) 			meta information) has been truncated because it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  450) 			larger than tp_frame_size. This packet can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  451) 			read entirely with recvfrom().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  452) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  453) 			In order to make this work it must to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  454) 			enabled previously with setsockopt() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  455) 			the PACKET_COPY_THRESH option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  456) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  457) 			The number of frames that can be buffered to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  458) 			be read with recvfrom is limited like a normal socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  459) 			See the SO_RCVBUF option in the socket (7) man page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  460) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  461) TP_STATUS_LOSING	indicates there were packet drops from last time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  462) 			statistics where checked with getsockopt() and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  463) 			the PACKET_STATISTICS option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  464) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  465) TP_STATUS_CSUMNOTREADY	currently it's used for outgoing IP packets which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  466) 			its checksum will be done in hardware. So while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  467) 			reading the packet we should not try to check the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  468) 			checksum.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  469) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  470) TP_STATUS_CSUM_VALID	This flag indicates that at least the transport
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  471) 			header checksum of the packet has been already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  472) 			validated on the kernel side. If the flag is not set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  473) 			then we are free to check the checksum by ourselves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  474) 			provided that TP_STATUS_CSUMNOTREADY is also not set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  475) ======================  =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  476) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  477) for convenience there are also the following defines::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  478) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  479)      #define TP_STATUS_KERNEL        0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  480)      #define TP_STATUS_USER          1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  481) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  482) The kernel initializes all frames to TP_STATUS_KERNEL, when the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  483) receives a packet it puts in the buffer and updates the status with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  484) at least the TP_STATUS_USER flag. Then the user can read the packet,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  485) once the packet is read the user must zero the status field, so the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  486) can use again that frame buffer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  487) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  488) The user can use poll (any other variant should apply too) to check if new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  489) packets are in the ring::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  490) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  491)     struct pollfd pfd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  492) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  493)     pfd.fd = fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  494)     pfd.revents = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  495)     pfd.events = POLLIN|POLLRDNORM|POLLERR;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  496) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  497)     if (status == TP_STATUS_KERNEL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  498) 	retval = poll(&pfd, 1, timeout);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  499) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  500) It doesn't incur in a race condition to first check the status value and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  501) then poll for frames.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  502) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  503) Transmission process
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  504) ^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  505) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  506) Those defines are also used for transmission::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  507) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  508)      #define TP_STATUS_AVAILABLE        0 // Frame is available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  509)      #define TP_STATUS_SEND_REQUEST     1 // Frame will be sent on next send()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  510)      #define TP_STATUS_SENDING          2 // Frame is currently in transmission
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  511)      #define TP_STATUS_WRONG_FORMAT     4 // Frame format is not correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  512) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  513) First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  514) packet, the user fills a data buffer of an available frame, sets tp_len to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  515) current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  516) This can be done on multiple frames. Once the user is ready to transmit, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  517) calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  518) forwarded to the network device. The kernel updates each status of sent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  519) frames with TP_STATUS_SENDING until the end of transfer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  520) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  521) At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  522) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  523) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  524) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  525)     header->tp_len = in_i_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  526)     header->tp_status = TP_STATUS_SEND_REQUEST;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  527)     retval = send(this->socket, NULL, 0, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  528) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  529) The user can also use poll() to check if a buffer is available:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  530) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  531) (status == TP_STATUS_SENDING)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  532) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  533) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  535)     struct pollfd pfd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  536)     pfd.fd = fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  537)     pfd.revents = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  538)     pfd.events = POLLOUT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  539)     retval = poll(&pfd, 1, timeout);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  540) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  541) What TPACKET versions are available and when to use them?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  542) =========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  543) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  544) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  545) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  546)  int val = tpacket_version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  547)  setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  548)  getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  549) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  550) where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  551) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  552) TPACKET_V1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  553) 	- Default if not otherwise specified by setsockopt(2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  554) 	- RX_RING, TX_RING available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  555) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  556) TPACKET_V1 --> TPACKET_V2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  557) 	- Made 64 bit clean due to unsigned long usage in TPACKET_V1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  558) 	  structures, thus this also works on 64 bit kernel with 32 bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  559) 	  userspace and the like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  560) 	- Timestamp resolution in nanoseconds instead of microseconds
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  561) 	- RX_RING, TX_RING available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  562) 	- VLAN metadata information available for packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  563) 	  (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  564) 	  in the tpacket2_hdr structure:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  565) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  566) 		- TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  567) 		  that the tp_vlan_tci field has valid VLAN TCI value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  568) 		- TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  569) 		  indicates that the tp_vlan_tpid field has valid VLAN TPID value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  570) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  571) 	- How to switch to TPACKET_V2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  572) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  573) 		1. Replace struct tpacket_hdr by struct tpacket2_hdr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  574) 		2. Query header len and save
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  575) 		3. Set protocol version to 2, set up ring as usual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  576) 		4. For getting the sockaddr_ll,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  577) 		   use ``(void *)hdr + TPACKET_ALIGN(hdrlen)`` instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  578) 		   ``(void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  579) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  580) TPACKET_V2 --> TPACKET_V3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  581) 	- Flexible buffer implementation for RX_RING:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  582) 		1. Blocks can be configured with non-static frame-size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  583) 		2. Read/poll is at a block-level (as opposed to packet-level)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  584) 		3. Added poll timeout to avoid indefinite user-space wait
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  585) 		   on idle links
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  586) 		4. Added user-configurable knobs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  588) 			4.1 block::timeout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  589) 			4.2 tpkt_hdr::sk_rxhash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  590) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  591) 	- RX Hash data available in user space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  592) 	- TX_RING semantics are conceptually similar to TPACKET_V2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  593) 	  use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  594) 	  instead of TPACKET2_HDRLEN. In the current implementation,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  595) 	  the tp_next_offset field in the tpacket3_hdr MUST be set to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  596) 	  zero, indicating that the ring does not hold variable sized frames.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  597) 	  Packets with non-zero values of tp_next_offset will be dropped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  598) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  599) AF_PACKET fanout mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  600) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  601) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  602) In the AF_PACKET fanout mode, packet reception can be load balanced among
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  603) processes. This also works in combination with mmap(2) on packet sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  604) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  605) Currently implemented fanout policies are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  606) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  607)   - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  608)   - PACKET_FANOUT_LB: schedule to socket by round-robin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  609)   - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  610)   - PACKET_FANOUT_RND: schedule to socket by random selection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  611)   - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  612)   - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  613) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  614) Minimal example code by David S. Miller (try things like "./test eth0 hash",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  615) "./test eth0 lb", etc.)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  616) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  617)     #include <stddef.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  618)     #include <stdlib.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  619)     #include <stdio.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  620)     #include <string.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  621) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  622)     #include <sys/types.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  623)     #include <sys/wait.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  624)     #include <sys/socket.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  625)     #include <sys/ioctl.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  626) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  627)     #include <unistd.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  628) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  629)     #include <linux/if_ether.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  630)     #include <linux/if_packet.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  631) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  632)     #include <net/if.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  633) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  634)     static const char *device_name;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  635)     static int fanout_type;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  636)     static int fanout_id;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  637) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  638)     #ifndef PACKET_FANOUT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  639)     # define PACKET_FANOUT			18
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  640)     # define PACKET_FANOUT_HASH		0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  641)     # define PACKET_FANOUT_LB		1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  642)     #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  643) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  644)     static int setup_socket(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  645)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  646) 	    int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  647) 	    struct sockaddr_ll ll;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  648) 	    struct ifreq ifr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  649) 	    int fanout_arg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  650) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  651) 	    if (fd < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  652) 		    perror("socket");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  653) 		    return EXIT_FAILURE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  654) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  655) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  656) 	    memset(&ifr, 0, sizeof(ifr));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  657) 	    strcpy(ifr.ifr_name, device_name);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  658) 	    err = ioctl(fd, SIOCGIFINDEX, &ifr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  659) 	    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  660) 		    perror("SIOCGIFINDEX");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  661) 		    return EXIT_FAILURE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  662) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  663) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  664) 	    memset(&ll, 0, sizeof(ll));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  665) 	    ll.sll_family = AF_PACKET;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  666) 	    ll.sll_ifindex = ifr.ifr_ifindex;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  667) 	    err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  668) 	    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  669) 		    perror("bind");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  670) 		    return EXIT_FAILURE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  671) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  672) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  673) 	    fanout_arg = (fanout_id | (fanout_type << 16));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  674) 	    err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  675) 			    &fanout_arg, sizeof(fanout_arg));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  676) 	    if (err) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  677) 		    perror("setsockopt");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  678) 		    return EXIT_FAILURE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  679) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  680) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  681) 	    return fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  682)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  683) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  684)     static void fanout_thread(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  685)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  686) 	    int fd = setup_socket();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  687) 	    int limit = 10000;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  688) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  689) 	    if (fd < 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  690) 		    exit(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  691) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  692) 	    while (limit-- > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  693) 		    char buf[1600];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  694) 		    int err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  695) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  696) 		    err = read(fd, buf, sizeof(buf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  697) 		    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  698) 			    perror("read");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  699) 			    exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  700) 		    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  701) 		    if ((limit % 10) == 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  702) 			    fprintf(stdout, "(%d) \n", getpid());
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  703) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  704) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  705) 	    fprintf(stdout, "%d: Received 10000 packets\n", getpid());
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  706) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  707) 	    close(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  708) 	    exit(0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  709)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  710) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  711)     int main(int argc, char **argp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  712)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  713) 	    int fd, err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  714) 	    int i;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  715) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  716) 	    if (argc != 3) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  717) 		    fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  718) 		    return EXIT_FAILURE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  719) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  721) 	    if (!strcmp(argp[2], "hash"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  722) 		    fanout_type = PACKET_FANOUT_HASH;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  723) 	    else if (!strcmp(argp[2], "lb"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  724) 		    fanout_type = PACKET_FANOUT_LB;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  725) 	    else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  726) 		    fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  727) 		    exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  728) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  729) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  730) 	    device_name = argp[1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  731) 	    fanout_id = getpid() & 0xffff;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  732) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  733) 	    for (i = 0; i < 4; i++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  734) 		    pid_t pid = fork();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  735) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  736) 		    switch (pid) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  737) 		    case 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  738) 			    fanout_thread();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  739) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  740) 		    case -1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  741) 			    perror("fork");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  742) 			    exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  743) 		    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  744) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  745) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  746) 	    for (i = 0; i < 4; i++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  747) 		    int status;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  748) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  749) 		    wait(&status);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  750) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  751) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  752) 	    return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  753)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  754) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  755) AF_PACKET TPACKET_V3 example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  756) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  757) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  758) AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  759) sizes by doing it's own memory management. It is based on blocks where polling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  760) works on a per block basis instead of per ring as in TPACKET_V2 and predecessor.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  761) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  762) It is said that TPACKET_V3 brings the following benefits:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  763) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  764)  * ~15% - 20% reduction in CPU-usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  765)  * ~20% increase in packet capture rate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  766)  * ~2x increase in packet density
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  767)  * Port aggregation analysis
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  768)  * Non static frame size to capture entire packet payload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  769) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  770) So it seems to be a good candidate to be used with packet fanout.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  771) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  772) Minimal example code by Daniel Borkmann based on Chetan Loke's lolpcap (compile
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  773) it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  774) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  775)     /* Written from scratch, but kernel-to-user space API usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  776)     * dissected from lolpcap:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  777)     *  Copyright 2011, Chetan Loke <loke.chetan@gmail.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  778)     *  License: GPL, version 2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  779)     */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  780) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  781)     #include <stdio.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  782)     #include <stdlib.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  783)     #include <stdint.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  784)     #include <string.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  785)     #include <assert.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  786)     #include <net/if.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  787)     #include <arpa/inet.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  788)     #include <netdb.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  789)     #include <poll.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  790)     #include <unistd.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  791)     #include <signal.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  792)     #include <inttypes.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  793)     #include <sys/socket.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  794)     #include <sys/mman.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  795)     #include <linux/if_packet.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  796)     #include <linux/if_ether.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  797)     #include <linux/ip.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  798) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  799)     #ifndef likely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  800)     # define likely(x)		__builtin_expect(!!(x), 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  801)     #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  802)     #ifndef unlikely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  803)     # define unlikely(x)		__builtin_expect(!!(x), 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  804)     #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  805) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  806)     struct block_desc {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  807) 	    uint32_t version;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  808) 	    uint32_t offset_to_priv;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  809) 	    struct tpacket_hdr_v1 h1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  810)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  811) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  812)     struct ring {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  813) 	    struct iovec *rd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  814) 	    uint8_t *map;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  815) 	    struct tpacket_req3 req;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  816)     };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  817) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  818)     static unsigned long packets_total = 0, bytes_total = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  819)     static sig_atomic_t sigint = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  820) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  821)     static void sighandler(int num)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  822)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  823) 	    sigint = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  824)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  825) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  826)     static int setup_socket(struct ring *ring, char *netdev)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  827)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  828) 	    int err, i, fd, v = TPACKET_V3;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  829) 	    struct sockaddr_ll ll;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  830) 	    unsigned int blocksiz = 1 << 22, framesiz = 1 << 11;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  831) 	    unsigned int blocknum = 64;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  832) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  833) 	    fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  834) 	    if (fd < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  835) 		    perror("socket");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  836) 		    exit(1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  837) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  838) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  839) 	    err = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &v, sizeof(v));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  840) 	    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  841) 		    perror("setsockopt");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  842) 		    exit(1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  843) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  844) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  845) 	    memset(&ring->req, 0, sizeof(ring->req));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  846) 	    ring->req.tp_block_size = blocksiz;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  847) 	    ring->req.tp_frame_size = framesiz;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  848) 	    ring->req.tp_block_nr = blocknum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  849) 	    ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  850) 	    ring->req.tp_retire_blk_tov = 60;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  851) 	    ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  852) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  853) 	    err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  854) 			    sizeof(ring->req));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  855) 	    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  856) 		    perror("setsockopt");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  857) 		    exit(1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  858) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  859) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  860) 	    ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  861) 			    PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  862) 	    if (ring->map == MAP_FAILED) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  863) 		    perror("mmap");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  864) 		    exit(1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  865) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  866) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  867) 	    ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  868) 	    assert(ring->rd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  869) 	    for (i = 0; i < ring->req.tp_block_nr; ++i) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  870) 		    ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  871) 		    ring->rd[i].iov_len = ring->req.tp_block_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  872) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  873) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  874) 	    memset(&ll, 0, sizeof(ll));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  875) 	    ll.sll_family = PF_PACKET;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  876) 	    ll.sll_protocol = htons(ETH_P_ALL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  877) 	    ll.sll_ifindex = if_nametoindex(netdev);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  878) 	    ll.sll_hatype = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  879) 	    ll.sll_pkttype = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  880) 	    ll.sll_halen = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  881) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  882) 	    err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  883) 	    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  884) 		    perror("bind");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  885) 		    exit(1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  886) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  887) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  888) 	    return fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  889)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  890) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  891)     static void display(struct tpacket3_hdr *ppd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  892)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  893) 	    struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  894) 	    struct iphdr *ip = (struct iphdr *) ((uint8_t *) eth + ETH_HLEN);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  895) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  896) 	    if (eth->h_proto == htons(ETH_P_IP)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  897) 		    struct sockaddr_in ss, sd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  898) 		    char sbuff[NI_MAXHOST], dbuff[NI_MAXHOST];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  899) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  900) 		    memset(&ss, 0, sizeof(ss));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  901) 		    ss.sin_family = PF_INET;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  902) 		    ss.sin_addr.s_addr = ip->saddr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  903) 		    getnameinfo((struct sockaddr *) &ss, sizeof(ss),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  904) 				sbuff, sizeof(sbuff), NULL, 0, NI_NUMERICHOST);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  905) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  906) 		    memset(&sd, 0, sizeof(sd));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  907) 		    sd.sin_family = PF_INET;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  908) 		    sd.sin_addr.s_addr = ip->daddr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  909) 		    getnameinfo((struct sockaddr *) &sd, sizeof(sd),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  910) 				dbuff, sizeof(dbuff), NULL, 0, NI_NUMERICHOST);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  911) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  912) 		    printf("%s -> %s, ", sbuff, dbuff);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  913) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  914) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  915) 	    printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  916)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  917) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  918)     static void walk_block(struct block_desc *pbd, const int block_num)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  919)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  920) 	    int num_pkts = pbd->h1.num_pkts, i;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  921) 	    unsigned long bytes = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  922) 	    struct tpacket3_hdr *ppd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  923) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  924) 	    ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  925) 					pbd->h1.offset_to_first_pkt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  926) 	    for (i = 0; i < num_pkts; ++i) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  927) 		    bytes += ppd->tp_snaplen;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  928) 		    display(ppd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  929) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  930) 		    ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  931) 						ppd->tp_next_offset);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  932) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  933) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  934) 	    packets_total += num_pkts;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  935) 	    bytes_total += bytes;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  936)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  937) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  938)     static void flush_block(struct block_desc *pbd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  939)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  940) 	    pbd->h1.block_status = TP_STATUS_KERNEL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  941)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  942) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  943)     static void teardown_socket(struct ring *ring, int fd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  944)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  945) 	    munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  946) 	    free(ring->rd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  947) 	    close(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  948)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  949) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  950)     int main(int argc, char **argp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  951)     {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  952) 	    int fd, err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  953) 	    socklen_t len;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  954) 	    struct ring ring;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  955) 	    struct pollfd pfd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  956) 	    unsigned int block_num = 0, blocks = 64;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  957) 	    struct block_desc *pbd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  958) 	    struct tpacket_stats_v3 stats;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  959) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  960) 	    if (argc != 2) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  961) 		    fprintf(stderr, "Usage: %s INTERFACE\n", argp[0]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  962) 		    return EXIT_FAILURE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  963) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  964) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  965) 	    signal(SIGINT, sighandler);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  966) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  967) 	    memset(&ring, 0, sizeof(ring));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  968) 	    fd = setup_socket(&ring, argp[argc - 1]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  969) 	    assert(fd > 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  970) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  971) 	    memset(&pfd, 0, sizeof(pfd));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  972) 	    pfd.fd = fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  973) 	    pfd.events = POLLIN | POLLERR;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  974) 	    pfd.revents = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  975) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  976) 	    while (likely(!sigint)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  977) 		    pbd = (struct block_desc *) ring.rd[block_num].iov_base;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  978) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  979) 		    if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  980) 			    poll(&pfd, 1, -1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  981) 			    continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  982) 		    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  983) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  984) 		    walk_block(pbd, block_num);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  985) 		    flush_block(pbd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  986) 		    block_num = (block_num + 1) % blocks;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  987) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  988) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  989) 	    len = sizeof(stats);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  990) 	    err = getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, &stats, &len);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  991) 	    if (err < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  992) 		    perror("getsockopt");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  993) 		    exit(1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  994) 	    }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  995) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  996) 	    fflush(stdout);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  997) 	    printf("\nReceived %u packets, %lu bytes, %u dropped, freeze_q_cnt: %u\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  998) 		stats.tp_packets, bytes_total, stats.tp_drops,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  999) 		stats.tp_freeze_q_cnt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) 	    teardown_socket(&ring, fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) 	    return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) PACKET_QDISC_BYPASS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) If there is a requirement to load the network with many packets in a similar
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) fashion as pktgen does, you might set the following option after socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) creation::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012)     int one = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013)     setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) This has the side-effect, that packets sent through PF_PACKET will bypass the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) kernel's qdisc layer and are forcedly pushed to the driver directly. Meaning,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) packet are not buffered, tc disciplines are ignored, increased loss can occur
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) and such packets are also not visible to other PF_PACKET sockets anymore. So,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) you have been warned; generally, this can be useful for stress testing various
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) components of a system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) On default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) on PF_PACKET sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) PACKET_TIMESTAMP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) The PACKET_TIMESTAMP setting determines the source of the timestamp in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) the packet meta information for mmap(2)ed RX_RING and TX_RINGs.  If your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) NIC is capable of timestamping packets in hardware, you can request those
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) hardware timestamps to be used. Note: you may need to enable the generation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) of hardware timestamps with SIOCSHWTSTAMP (see related information from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) Documentation/networking/timestamping.rst).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037)     int req = SOF_TIMESTAMPING_RAW_HARDWARE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038)     setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) For the mmap(2)ed ring buffers, such timestamps are stored in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) ``tpacket{,2,3}_hdr`` structure's tp_sec and ``tp_{n,u}sec`` members.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) To determine what kind of timestamp has been reported, the tp_status field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) is binary or'ed with the following possible bits ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047)     TP_STATUS_TS_RAW_HARDWARE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048)     TP_STATUS_TS_SOFTWARE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) ... that are equivalent to its ``SOF_TIMESTAMPING_*`` counterparts. For the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) RX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) software fallback was invoked *within* PF_PACKET's processing code (less
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) precise).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) Getting timestamps for the TX_RING works as follows: i) fill the ring frames,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) ii) call sendto() e.g. in blocking mode, iii) wait for status of relevant
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) frames to be updated resp. the frame handed over to the application, iv) walk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) through the frames to pick up the individual hw/sw timestamps.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) Only (!) if transmit timestamping is enabled, then these bits are combined
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) with binary | with TP_STATUS_AVAILABLE, so you must check for that in your
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) application (e.g. !(tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) in a first step to see if the frame belongs to the application, and then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) one can extract the type of timestamp in a second step from tp_status)!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) If you don't care about them, thus having it disabled, checking for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) TP_STATUS_AVAILABLE resp. TP_STATUS_WRONG_FORMAT is sufficient. If in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) TX_RING part only TP_STATUS_AVAILABLE is set, then the tp_sec and tp_{n,u}sec
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) members do not contain a valid value. For TX_RINGs, by default no timestamp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) is generated!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) See include/linux/net_tstamp.h and Documentation/networking/timestamping.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) for more information on hardware timestamps.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) Miscellaneous bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) - Packet sockets work well together with Linux socket filters, thus you also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079)   might want to have a look at Documentation/networking/filter.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) THANKS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084)    Jesse Brandeburg, for fixing my grammathical/spelling errors