^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) SipHash - a short input PRF
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) :Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) SipHash is a cryptographically secure PRF -- a keyed hash function -- that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) performs very well for short inputs, hence the name. It was designed by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) as a replacement for some uses of: `jhash`, `md5_transform`, `sha1_transform`,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) and so forth.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) SipHash takes a secret key filled with randomly generated numbers and either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) an input buffer or several input integers. It spits out an integer that is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) indistinguishable from random. You may then use that integer as part of secure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) sequence numbers, secure cookies, or mask it off for use in a hash table.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) Generating a key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) Keys should always be generated from a cryptographically secure source of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) random numbers, either using get_random_bytes or get_random_once::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) siphash_key_t key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) get_random_bytes(&key, sizeof(key));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) If you're not deriving your key from here, you're doing it wrong.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) Using the functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) There are two variants of the function, one that takes a list of integers, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) one that takes a buffer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) u64 siphash(const void *data, size_t len, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) And::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) u64 siphash_1u64(u64, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) u64 siphash_2u64(u64, u64, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) u64 siphash_1u32(u32, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) u64 siphash_2u32(u32, u32, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) If you pass the generic siphash function something of a constant length, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) will constant fold at compile-time and automatically choose one of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) optimized functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) Hashtable key function usage::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) struct some_hashtable {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) DECLARE_HASHTABLE(hashtable, 8);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) siphash_key_t key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) void init_hashtable(struct some_hashtable *table)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) get_random_bytes(&table->key, sizeof(table->key));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) You may then iterate like usual over the returned hash bucket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) Security
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) SipHash has a very high security margin, with its 128-bit key. So long as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) key is kept secret, it is impossible for an attacker to guess the outputs of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) the function, even if being able to observe many outputs, since 2^128 outputs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) is significant.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) Linux implements the "2-4" variant of SipHash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Struct-passing Pitfalls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Often times the XuY functions will not be large enough, and instead you'll
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) want to pass a pre-filled struct to siphash. When doing this, it's important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) to always ensure the struct has no padding holes. The easiest way to do this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) is to simply arrange the members of the struct in descending order of size,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) and to use offsetendof() instead of sizeof() for getting the size. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) performance reasons, if possible, it's probably a good thing to align the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) struct to the right boundary. Here's an example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) const struct {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) struct in6_addr saddr;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) u32 counter;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) u16 dport;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) } __aligned(SIPHASH_ALIGNMENT) combined = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) .saddr = *(struct in6_addr *)saddr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) .counter = counter,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) .dport = dport
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) Resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) Read the SipHash paper if you're interested in learning more:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) https://131002.net/siphash/siphash.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) -------------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) HalfSipHash - SipHash's insecure younger cousin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) :Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) On the off-chance that SipHash is not fast enough for your needs, you might be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) able to justify using HalfSipHash, a terrifying but potentially useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) possibility. HalfSipHash cuts SipHash's rounds down from "2-4" to "1-3" and,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) even scarier, uses an easily brute-forcable 64-bit key (with a 32-bit output)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) instead of SipHash's 128-bit key. However, this may appeal to some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) high-performance `jhash` users.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) Danger!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) Do not ever use HalfSipHash except for as a hashtable key function, and only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) then when you can be absolutely certain that the outputs will never be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) transmitted out of the kernel. This is only remotely useful over `jhash` as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) means of mitigating hashtable flooding denial of service attacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Generating a HalfSipHash key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) Keys should always be generated from a cryptographically secure source of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) random numbers, either using get_random_bytes or get_random_once:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) hsiphash_key_t key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) get_random_bytes(&key, sizeof(key));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) If you're not deriving your key from here, you're doing it wrong.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) Using the HalfSipHash functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) ===============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) There are two variants of the function, one that takes a list of integers, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) one that takes a buffer::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) And::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) u32 hsiphash_1u32(u32, const hsiphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) If you pass the generic hsiphash function something of a constant length, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) will constant fold at compile-time and automatically choose one of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) optimized functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) Hashtable key function usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) struct some_hashtable {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) DECLARE_HASHTABLE(hashtable, 8);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) hsiphash_key_t key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) void init_hashtable(struct some_hashtable *table)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) get_random_bytes(&table->key, sizeof(table->key));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) You may then iterate like usual over the returned hash bucket.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) Performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) this will not be a problem, as the hashtable lookup isn't the bottleneck. And
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) in general, this is probably a good sacrifice to make for the security and DoS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) resistance of HalfSipHash.