Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Kernel Connection Multiplexor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) interface over TCP for generic application protocols. With KCM an application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) can efficiently send and receive application protocol messages over TCP using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) datagram sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) KCM implements an NxM multiplexor in the kernel as diagrammed below::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)     +------------+   +------------+   +------------+   +------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)     | KCM socket |   | KCM socket |   | KCM socket |   | KCM socket |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)     +------------+   +------------+   +------------+   +------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 	|                 |               |                |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 	+-----------+     |               |     +----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 		    |     |               |     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 		+----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 		|           Multiplexor            |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 		+----------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 		    |   |           |           |  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 	+---------+   |           |           |  ------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 	|             |           |           |              |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)     +----------+  +----------+  +----------+  +----------+ +----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)     |  Psock   |  |  Psock   |  |  Psock   |  |  Psock   | |  Psock   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)     +----------+  +----------+  +----------+  +----------+ +----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 	|              |           |            |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)     +----------+  +----------+  +----------+  +----------+ +----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)     | TCP sock |  | TCP sock |  | TCP sock |  | TCP sock | | TCP sock |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)     +----------+  +----------+  +----------+  +----------+ +----------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) KCM sockets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) The KCM sockets provide the user interface to the multiplexor. All the KCM sockets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) bound to a multiplexor are considered to have equivalent function, and I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) operations in different sockets may be done in parallel without the need for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) synchronization between threads in userspace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) Multiplexor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) The multiplexor provides the message steering. In the transmit path, messages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) written on a KCM socket are sent atomically on an appropriate TCP socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) Similarly, in the receive path, messages are constructed on each TCP socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) (Psock) and complete messages are steered to a KCM socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) TCP sockets & Psocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) for each bound TCP socket, this structure holds the state for constructing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) messages on receive as well as other connection specific information for KCM.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) Connected mode semantics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Each multiplexor assumes that all attached TCP connections are to the same
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) destination and can use the different connections for load balancing when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) can be used to send and receive messages from the KCM socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) Socket types
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) Message delineation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) Messages are sent over a TCP stream with some application protocol message
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) format that typically includes a header which frames the messages. The length
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) of a received message can be deduced from the application protocol header
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) (often just a simple length field).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) A TCP stream must be parsed to determine message boundaries. Berkeley Packet
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) BPF program must be specified. The program is called at the start of receiving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) a new message and is given an skbuff that contains the bytes received so far.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) It parses the message header and returns the length of the message. Given this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) information, KCM will construct the message of the stated length and deliver it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) to a KCM socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) TCP socket management
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) write space available (POLLOUT) events are handled by the multiplexor. If there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) is a state change (disconnection) or other error on a TCP socket, an error is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) posted on the TCP socket so that a POLLERR event happens and KCM discontinues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) using the socket. When the application gets the error notification for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) TCP socket, it should unattach the socket from KCM and then handle the error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) condition (the typical response is to close the socket and create a new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) connection if necessary).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) KCM limits the maximum receive message size to be the size of the receive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) socket buffer on the attached TCP socket (the socket buffer size can be set by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) SO_RCVBUF). If the length of a new message reported by the BPF program is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) socket. The BPF program may also enforce a maximum messages size and report an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) error when it is exceeded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) A timeout may be set for assembling messages on a receive socket. The timeout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) value is taken from the receive timeout of the attached TCP socket (this is set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) by SO_RCVTIMEO). If the timer expires before assembly is complete an error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) (ETIMEDOUT) is posted on the socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) User interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Creating a multiplexor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) A new multiplexor and initial KCM socket is created by a socket call::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)   socket(AF_KCM, type, protocol)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) - type is either SOCK_DGRAM or SOCK_SEQPACKET
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) - protocol is KCMPROTO_CONNECTED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) Cloning KCM sockets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) -------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) After the first KCM socket is created using the socket call as described
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) above, additional sockets for the multiplexor can be created by cloning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) a KCM socket. This is accomplished by an ioctl on a KCM socket::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)   /* From linux/kcm.h */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)   struct kcm_clone {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 	int fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)   };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)   struct kcm_clone info;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)   memset(&info, 0, sizeof(info));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)   err = ioctl(kcmfd, SIOCKCMCLONE, &info);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)   if (!err)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)     newkcmfd = info.fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) Attach transport sockets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Attaching of transport sockets to a multiplexor is performed by calling an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) ioctl on a KCM socket for the multiplexor. e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)   /* From linux/kcm.h */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)   struct kcm_attach {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 	int fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 	int bpf_fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)   };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)   struct kcm_attach info;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)   memset(&info, 0, sizeof(info));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)   info.fd = tcpfd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)   info.bpf_fd = bpf_prog_fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)   ioctl(kcmfd, SIOCKCMATTACH, &info);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) The kcm_attach structure contains:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)   - fd: file descriptor for TCP socket being attached
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)   - bpf_prog_fd: file descriptor for compiled BPF program downloaded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) Unattach transport sockets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Unattaching a transport socket from a multiplexor is straightforward. An
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) "unattach" ioctl is done with the kcm_unattach structure as the argument::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)   /* From linux/kcm.h */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)   struct kcm_unattach {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 	int fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179)   };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)   struct kcm_unattach info;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)   memset(&info, 0, sizeof(info));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)   info.fd = cfd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)   ioctl(fd, SIOCKCMUNATTACH, &info);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) Disabling receive on KCM socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) A setsockopt is used to disable or enable receiving on a KCM socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) When receive is disabled, any pending messages in the socket's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) receive buffer are moved to other sockets. This feature is useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) if an application thread knows that it will be doing a lot of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) work on a request and won't be able to service new messages for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) while. Example use::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)   int val = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)   setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) BFP programs for message delineation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) ------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) BPF programs can be compiled using the BPF LLVM backend. For example,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) the BPF program for parsing Thrift is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209)   #include "bpf.h" /* for __sk_buff */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210)   #include "bpf_helpers.h" /* for load_word intrinsic */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212)   SEC("socket_kcm")
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)   int bpf_prog1(struct __sk_buff *skb)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215)        return load_word(skb, 0) + 4;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)   char _license[] SEC("license") = "GPL";
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) Use in applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) KCM accelerates application layer protocols. Specifically, it allows
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) applications to use a message based interface for sending and receiving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) messages. The kernel provides necessary assurances that messages are sent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) and received atomically. This relieves much of the burden applications have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) in mapping a message based protocol onto the TCP stream. KCM also make
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) application layer messages a unit of work in the kernel for the purposes of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) steering and scheduling, which in turn allows a simpler networking model in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) multithreaded applications.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) Configurations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) In an Nx1 configuration, KCM logically provides multiple socket handles
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) to the same TCP connection. This allows parallelism between in I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) operations on the TCP socket (for instance copyin and copyout of data is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) parallelized). In an application, a KCM socket can be opened for each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) processing thread and inserted into the epoll (similar to how SO_REUSEPORT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) is used to allow multiple listener sockets on the same port).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) In a MxN configuration, multiple connections are established to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) same destination. These are used for simple load balancing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) Message batching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) The primary purpose of KCM is load balancing between KCM sockets and hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) threads in a nominal use case. Perfect load balancing, that is steering
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) each received message to a different KCM socket or steering each sent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) message to a different TCP socket, can negatively impact performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) since this doesn't allow for affinities to be established. Balancing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) based on groups, or batches of messages, can be beneficial for performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) On transmit, there are three ways an application can batch (pipeline)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) messages on a KCM socket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)   1) Send multiple messages in a single sendmmsg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)   2) Send a group of messages each with a sendmsg call, where all messages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)      except the last have MSG_BATCH in the flags of sendmsg call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)   3) Create "super message" composed of multiple messages and send this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)      with a single sendmsg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) On receive, the KCM module attempts to queue messages received on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) same KCM socket during each TCP ready callback. The targeted KCM socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) changes at each receive ready callback on the KCM socket. The application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) does not need to configure this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) Error handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) An application should include a thread to monitor errors raised on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) the TCP connection. Normally, this will be done by placing each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) TCP socket attached to a KCM multiplexor in epoll set for POLLERR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) event. If an error occurs on an attached TCP socket, KCM sets an EPIPE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) on the socket thus waking up the application thread. When the application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) sees the error (which may just be a disconnect) it should unattach the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) socket from KCM and then close it. It is assumed that once an error is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) posted on the TCP socket the data stream is unrecoverable (i.e. an error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) may have occurred in the middle of receiving a message).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) TCP connection monitoring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) In KCM there is no means to correlate a message to the TCP socket that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) was used to send or receive the message (except in the case there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) only one attached TCP socket). However, the application does retain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) an open file descriptor to the socket so it will be able to get statistics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) from the socket which can be used in detecting issues (such as high
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) retransmissions on the socket).