Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) LC-trie implementation notes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Node types
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) leaf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 	An end node with data. This has a copy of the relevant key, along
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 	with 'hlist' with routing table entries sorted by prefix length.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 	See struct leaf and struct leaf_info.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) trie node or tnode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 	An internal node, holding an array of child (leaf or tnode) pointers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 	indexed	through a subset of the key. See Level Compression.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) A few concepts explained
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) Bits (tnode)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 	The number of bits in the key segment used for indexing into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 	child array - the "child index". See Level Compression.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) Pos (tnode)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 	The position (in the key) of the key segment used for indexing into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 	the child array. See Path Compression.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Path Compression / skipped bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 	Any given tnode is linked to from the child array of its parent, using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 	a segment of the key specified by the parent's "pos" and "bits"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 	In certain cases, this tnode's own "pos" will not be immediately
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 	adjacent to the parent (pos+bits), but there will be some bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 	in the key skipped over because they represent a single path with no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 	deviations. These "skipped bits" constitute Path Compression.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 	Note that the search algorithm will simply skip over these bits when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 	searching, making it necessary to save the keys in the leaves to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 	verify that they actually do match the key we are searching for.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) Level Compression / child arrays
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 	the trie is kept level balanced moving, under certain conditions, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 	children of a full child (see "full_children") up one level, so that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 	instead of a pure binary tree, each internal node ("tnode") may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 	contain an arbitrarily large array of links to several children.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 	Conversely, a tnode with a mostly empty	child array (see empty_children)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 	may be "halved", having some of its children moved downwards one level,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 	in order to avoid ever-increasing child arrays.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) empty_children
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 	the number of positions in the child array of a given tnode that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 	NULL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) full_children
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 	the number of children of a given tnode that aren't path compressed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 	(in other words, they aren't NULL or leaves and their "pos" is equal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) 	to this	tnode's "pos"+"bits").
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 	(The word "full" here is used more in the sense of "complete" than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 	as the opposite of "empty", which might be a tad confusing.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Comments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) We have tried to keep the structure of the code as close to fib_hash as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) possible to allow verification and help up reviewing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) fib_find_node()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 	A good start for understanding this code. This function implements a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 	straightforward trie lookup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) fib_insert_node()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 	Inserts a new leaf node in the trie. This is bit more complicated than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 	fib_find_node(). Inserting a new node means we might have to run the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 	level compression algorithm on part of the trie.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) trie_leaf_remove()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 	Looks up a key, deletes it and runs the level compression algorithm.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) trie_rebalance()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 	The key function for the dynamic trie after any change in the trie
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 	it is run to optimize and reorganize. It will walk the trie upwards
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 	towards the root from a given tnode, doing a resize() at each step
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 	to implement level compression.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) resize()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 	Analyzes a tnode and optimizes the child array size by either inflating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 	or shrinking it repeatedly until it fulfills the criteria for optimal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 	level compression. This part follows the original paper pretty closely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 	and there may be some room for experimentation here.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) inflate()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 	Doubles the size of the child array within a tnode. Used by resize().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) halve()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 	Halves the size of the child array within a tnode - the inverse of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 	inflate(). Used by resize();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) fn_trie_insert(), fn_trie_delete(), fn_trie_select_default()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 	The route manipulation functions. Should conform pretty closely to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 	corresponding functions in fib_hash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) fn_trie_flush()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	This walks the full trie (using nextleaf()) and searches for empty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	leaves which have to be removed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) fn_trie_dump()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 	Dumps the routing table ordered by prefix length. This is somewhat
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 	slower than the corresponding fib_hash function, as we have to walk the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 	entire trie for each prefix length. In comparison, fib_hash is organized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 	as one "zone"/hash per prefix length.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) -------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) fib_lock is used for an RW-lock in the same way that this is done in fib_hash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) However, the functions are somewhat separated for other possible locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) scenarios. It might conceivably be possible to run trie_rebalance via RCU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) to avoid read_lock in the fn_trie_lookup() function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Main lookup mechanism
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) fn_trie_lookup() is the main lookup function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) The lookup is in its simplest form just like fib_find_node(). We descend the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) trie, key segment by key segment, until we find a leaf. check_leaf() does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) the fib_semantic_match in the leaf's sorted prefix hlist.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) If we find a match, we are done.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) If we don't find a match, we enter prefix matching mode. The prefix length,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) starting out at the same as the key length, is reduced one step at a time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) and we backtrack upwards through the trie trying to find a longest matching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) prefix. The goal is always to reach a leaf and get a positive result from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) fib_semantic_match mechanism.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Inside each tnode, the search for longest matching prefix consists of searching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) through the child array, chopping off (zeroing) the least significant "1" of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) the child index until we find a match or the child index consists of nothing but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) zeros.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) At this point we backtrack (t->stats.backtrack++) up the trie, continuing to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) chop off part of the key in order to find the longest matching prefix.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) At this point we will repeatedly descend subtries to look for a match, and there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) are some optimizations available that can provide us with "shortcuts" to avoid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) descending into dead ends. Look for "HL_OPTIMIZE" sections in the code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) To alleviate any doubts about the correctness of the route selection process,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) gives userland access to fib_lookup().