^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) XFRM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) The sync patches work is based on initial patches from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Krisztian <hidden@balabit.hu> and others and additional patches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) from Jamal <hadi@cyberus.ca>.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) The end goal for syncing is to be able to insert attributes + generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) events so that the SA can be safely moved from one machine to another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) for HA purposes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) The idea is to synchronize the SA so that the takeover machine can do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) the processing of the SA as accurate as possible if it has access to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) We already have the ability to generate SA add/del/upd events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) These patches add ability to sync and have accurate lifetime byte (to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) ensure proper decay of SAs) and replay counters to avoid replay attacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) with as minimal loss at failover time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) This way a backup stays as closely up-to-date as an active member.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Because the above items change for every packet the SA receives,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) it is possible for a lot of the events to be generated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) For this reason, we also add a nagle-like algorithm to restrict
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) the events. i.e we are going to set thresholds to say "let me
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) know if the replay sequence threshold is reached or 10 secs have passed"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) These thresholds are set system-wide via sysctls or can be updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) per SA.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) The identified items that need to be synchronized are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) - the lifetime byte counter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) note that: lifetime time limit is not important if you assume the failover
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) machine is known ahead of time since the decay of the time countdown
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) is not driven by packet arrival.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) - the replay sequence for both inbound and outbound
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) 1) Message Structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) nlmsghdr:aevent_id:optional-TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) The netlink message types are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) XFRM_MSG_NEWAE and XFRM_MSG_GETAE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) A XFRM_MSG_GETAE does not have TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) A XFRM_MSG_NEWAE will have at least two TLVs (as is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) discussed further below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) aevent_id structure looks like::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) struct xfrm_aevent_id {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) struct xfrm_usersa_id sa_id;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) xfrm_address_t saddr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) __u32 flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) __u32 reqid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) The unique SA is identified by the combination of xfrm_usersa_id,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) reqid and saddr.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) flags are used to indicate different things. The possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) flags are::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) XFRM_AE_RTHR=1, /* replay threshold*/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) XFRM_AE_RVAL=2, /* replay value */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) XFRM_AE_LVAL=4, /* lifetime value */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) XFRM_AE_ETHR=8, /* expiry timer threshold */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) XFRM_AE_CR=16, /* Event cause is replay update */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) XFRM_AE_CE=32, /* Event cause is timer expiry */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) XFRM_AE_CU=64, /* Event cause is policy update */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) How these flags are used is dependent on the direction of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) message (kernel<->user) as well the cause (config, query or event).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) This is described below in the different messages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) The pid will be set appropriately in netlink to recognize direction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) (0 to the kernel and pid = processid that created the event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) when going from kernel to user space)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) A program needs to subscribe to multicast group XFRMNLGRP_AEVENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) to get notified of these events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) 2) TLVS reflect the different parameters:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) -----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) a) byte value (XFRMA_LTIME_VAL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) This TLV carries the running/current counter for byte lifetime since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) last event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) b)replay value (XFRMA_REPLAY_VAL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) This TLV carries the running/current counter for replay sequence since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) last event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) c)replay threshold (XFRMA_REPLAY_THRESH)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) This TLV carries the threshold being used by the kernel to trigger events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) when the replay sequence is exceeded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) d) expiry timer (XFRMA_ETIMER_THRESH)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) This is a timer value in milliseconds which is used as the nagle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) value to rate limit the events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 3) Default configurations for the parameters:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) ---------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) By default these events should be turned off unless there is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) at least one listener registered to listen to the multicast
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) group XFRMNLGRP_AEVENTS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Programs installing SAs will need to specify the two thresholds, however,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) in order to not change existing applications such as racoon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) we also provide default threshold values for these different parameters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) in case they are not specified.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) the two sysctls/proc entries are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) a) /proc/sys/net/core/sysctl_xfrm_aevent_etime
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) used to provide default values for the XFRMA_ETIMER_THRESH in incremental
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) units of time of 100ms. The default is 10 (1 second)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) b) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) used to provide default values for XFRMA_REPLAY_THRESH parameter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) in incremental packet count. The default is two packets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) 4) Message types
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) a) XFRM_MSG_GETAE issued by user-->kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) XFRM_MSG_GETAE does not carry any TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) The response is a XFRM_MSG_NEWAE which is formatted based on what
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) XFRM_MSG_GETAE queried for.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) * if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) * if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) b) XFRM_MSG_NEWAE is issued by either user space to configure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) or kernel to announce events or respond to a XFRM_MSG_GETAE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) i) user --> kernel to configure a specific SA.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) any of the values or threshold parameters can be updated by passing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) appropriate TLV.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) A response is issued back to the sender in user space to indicate success
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) or failure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) In the case of success, additionally an event with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) XFRM_MSG_NEWAE is also issued to any listeners as described in iii).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) ii) kernel->user direction as a response to XFRM_MSG_GETAE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) The threshold TLVs will be included if explicitly requested in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) the XFRM_MSG_GETAE message.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) iii) kernel->user to report as event if someone sets any values or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) In such a case XFRM_AE_CU flag is set to inform the user that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) the change happened as a result of an update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) iv) kernel->user to report event when replay threshold or a timeout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) is exceeded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) In such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) happened) is set to inform the user what happened.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Note the two flags are mutually exclusive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) Exceptions to threshold settings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) If you have an SA that is getting hit by traffic in bursts such that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) there is a period where the timer threshold expires with no packets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) seen, then an odd behavior is seen as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) The first packet arrival after a timer expiry will trigger a timeout
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) event; i.e we don't wait for a timeout period or a packet threshold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) to be reached. This is done for simplicity and efficiency reasons.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) -JHS