^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _zsmalloc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) zsmalloc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) This allocator is designed for use with zram. Thus, the allocator is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) supposed to work well under low memory conditions. In particular, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) never attempts higher order page allocation which is very likely to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) fail under memory pressure. On the other hand, if we just use single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) (0-order) pages, it would suffer from very high fragmentation --
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) any object of size PAGE_SIZE/2 or larger would occupy an entire page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) This was one of the major issues with its predecessor (xvmalloc).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) To overcome these issues, zsmalloc allocates a bunch of 0-order pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) and links them together using various 'struct page' fields. These linked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) pages act as a single higher-order page i.e. an object can span 0-order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) page boundaries. The code refers to these linked pages as a single entity
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) called zspage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) since this satisfies the requirements of all its current users (in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) worst case, page is incompressible and is thus stored "as-is" i.e. in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) uncompressed form). For allocation requests larger than this size, failure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) is returned (see zs_malloc).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) Additionally, zs_malloc() does not return a dereferenceable pointer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) Instead, it returns an opaque handle (unsigned long) which encodes actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) location of the allocated object. The reason for this indirection is that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) zsmalloc does not keep zspages permanently mapped since that would cause
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) issues on 32-bit systems where the VA region for kernel space mappings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) is very small. So, before using the allocating memory, the object has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) be mapped using zs_map_object() to get a usable pointer and subsequently
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) unmapped using zs_unmap_object().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) stat
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) ====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) ``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) # cat /sys/kernel/debug/zsmalloc/zram0/classes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) 9 176 0 1 186 129 8 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) 10 192 1 0 2880 2872 135 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) 11 208 0 1 819 795 42 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) 12 224 0 1 219 159 12 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) ...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) class
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) index
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) object size zspage stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) almost_empty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) the number of ZS_ALMOST_EMPTY zspages(see below)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) almost_full
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) the number of ZS_ALMOST_FULL zspages(see below)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) obj_allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) the number of objects allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) obj_used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) the number of objects allocated to the user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) pages_used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) the number of pages allocated for the class
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) pages_per_zspage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) the number of 0-order pages to make a zspage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) * n = number of allocated objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) * N = total number of objects zspage can store
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) * f = fullness_threshold_frac(ie, 4 at the moment)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) Similarly, we assign zspage to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) * ZS_ALMOST_FULL when n > N / f
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) * ZS_EMPTY when n == 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) * ZS_FULL when n == N