^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) RPC Cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) This document gives a brief introduction to the caching
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) mechanisms in the sunrpc layer that is used, in particular,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) for NFS authentication.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) Caches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) The caching replaces the old exports table and allows for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) a wide variety of values to be caches.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) There are a number of caches that are similar in structure though
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) quite possibly very different in content and use. There is a corpus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) of common code for managing these caches.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) Examples of caches that are likely to be needed are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) - mapping from IP address to client name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) - mapping from client name and filesystem to export options
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) - mapping from UID to list of GIDs, to work around NFS's limitation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) of 16 gids.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) - mappings between local UID/GID and remote UID/GID for sites that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) do not have uniform uid assignment
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) - mapping from network identify to public key for crypto authentication.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) The common code handles such things as:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) - general cache lookup with correct locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) - supporting 'NEGATIVE' as well as positive entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) - allowing an EXPIRED time on cache items, and removing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) items after they expire, and are no longer in-use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) - making requests to user-space to fill in cache entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) - allowing user-space to directly set entries in the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) - delaying RPC requests that depend on as-yet incomplete
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) cache entries, and replaying those requests when the cache entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) is complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) - clean out old entries as they expire.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) Creating a Cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) - A cache needs a datum to store. This is in the form of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) structure definition that must contain a struct cache_head
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) as an element, usually the first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) It will also contain a key and some content.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) Each cache element is reference counted and contains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) expiry and update times for use in cache management.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) - A cache needs a "cache_detail" structure that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) describes the cache. This stores the hash table, some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) parameters for cache management, and some operations detailing how
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) to work with particular cache items.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) The operations are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) struct cache_head \*alloc(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) This simply allocates appropriate memory and returns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) a pointer to the cache_detail embedded within the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) void cache_put(struct kref \*)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) This is called when the last reference to an item is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) dropped. The pointer passed is to the 'ref' field
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) in the cache_head. cache_put should release any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) references create by 'cache_init' and, if CACHE_VALID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) is set, any references created by cache_update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) It should then release the memory allocated by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) 'alloc'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) int match(struct cache_head \*orig, struct cache_head \*new)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) test if the keys in the two structures match. Return
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) 1 if they do, 0 if they don't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) void init(struct cache_head \*orig, struct cache_head \*new)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Set the 'key' fields in 'new' from 'orig'. This may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) include taking references to shared objects.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) void update(struct cache_head \*orig, struct cache_head \*new)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Set the 'content' fileds in 'new' from 'orig'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) int cache_show(struct seq_file \*m, struct cache_detail \*cd, struct cache_head \*h)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Optional. Used to provide a /proc file that lists the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) contents of a cache. This should show one item,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) usually on just one line.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) int cache_request(struct cache_detail \*cd, struct cache_head \*h, char \*\*bpp, int \*blen)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) Format a request to be send to user-space for an item
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) to be instantiated. \*bpp is a buffer of size \*blen.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) bpp should be moved forward over the encoded message,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) and \*blen should be reduced to show how much free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) space remains. Return 0 on success or <0 if not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) enough room or other problem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) int cache_parse(struct cache_detail \*cd, char \*buf, int len)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) A message from user space has arrived to fill out a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) cache entry. It is in 'buf' of length 'len'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) cache_parse should parse this, find the item in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) cache with sunrpc_cache_lookup_rcu, and update the item
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) with sunrpc_cache_update.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) - A cache needs to be registered using cache_register(). This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) includes it on a list of caches that will be regularly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) cleaned to discard old data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) Using a cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) To find a value in a cache, call sunrpc_cache_lookup_rcu passing a pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) to the cache_head in a sample item with the 'key' fields filled in.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) This will be passed to ->match to identify the target entry. If no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) entry is found, a new entry will be create, added to the cache, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) marked as not containing valid data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) The item returned is typically passed to cache_check which will check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) if the data is valid, and may initiate an up-call to get fresh data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) cache_check will return -ENOENT in the entry is negative or if an up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) call is needed but not possible, -EAGAIN if an upcall is pending,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) or 0 if the data is valid;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) cache_check can be passed a "struct cache_req\*". This structure is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) typically embedded in the actual request and can be used to create a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) deferred copy of the request (struct cache_deferred_req). This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) done when the found cache item is not uptodate, but the is reason to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) believe that userspace might provide information soon. When the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) item does become valid, the deferred copy of the request will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) revisited (->revisit). It is expected that this method will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) reschedule the request for processing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) The value returned by sunrpc_cache_lookup_rcu can also be passed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) sunrpc_cache_update to set the content for the item. A second item is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) passed which should hold the content. If the item found by _lookup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) has valid data, then it is discarded and a new item is created. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) saves any user of an item from worrying about content changing while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) it is being inspected. If the item found by _lookup does not contain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) valid data, then the content is copied across and CACHE_VALID is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) Populating a cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) Each cache has a name, and when the cache is registered, a directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) with that name is created in /proc/net/rpc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) This directory contains a file called 'channel' which is a channel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) for communicating between kernel and user for populating the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) This directory may later contain other files of interacting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) with the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) The 'channel' works a bit like a datagram socket. Each 'write' is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) passed as a whole to the cache for parsing and interpretation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) Each cache can treat the write requests differently, but it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) expected that a message written will contain:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) - a key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) - an expiry time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) - a content.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) with the intention that an item in the cache with the give key
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) should be create or updated to have the given content, and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) expiry time should be set on that item.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) Reading from a channel is a bit more interesting. When a cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) lookup fails, or when it succeeds but finds an entry that may soon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) expire, a request is lodged for that cache item to be updated by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) user-space. These requests appear in the channel file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) Successive reads will return successive requests.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) If there are no more requests to return, read will return EOF, but a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) select or poll for read will block waiting for another request to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) added.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) Thus a user-space helper is likely to::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) open the channel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) select for readable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) read a request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) write a response
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) loop.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) If it dies and needs to be restarted, any requests that have not been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) answered will still appear in the file and will be read by the new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) instance of the helper.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) Each cache should define a "cache_parse" method which takes a message
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) written from user-space and processes it. It should return an error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) (which propagates back to the write syscall) or 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) Each cache should also define a "cache_request" method which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) takes a cache item and encodes a request into the buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) provided.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) If a cache has no active readers on the channel, and has had not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) active readers for more than 60 seconds, further requests will not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) added to the channel but instead all lookups that do not find a valid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) entry will fail. This is partly for backward compatibility: The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) previous nfs exports table was deemed to be authoritative and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) failed lookup meant a definite 'no'.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) request/response format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) While each cache is free to use its own format for requests
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) and responses over channel, the following is recommended as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) appropriate and support routines are available to help:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) Each request or response record should be printable ASCII
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) with precisely one newline character which should be at the end.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) Fields within the record should be separated by spaces, normally one.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) If spaces, newlines, or nul characters are needed in a field they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) much be quoted. two mechanisms are available:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) - If a field begins '\x' then it must contain an even number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) hex digits, and pairs of these digits provide the bytes in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) field.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) - otherwise a \ in the field must be followed by 3 octal digits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) which give the code for a byte. Other characters are treated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) as them selves. At the very least, space, newline, nul, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) '\' must be quoted in this way.