^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _zswap:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) zswap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Zswap is a lightweight compressed cache for swap pages. It takes pages that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) in the process of being swapped out and attempts to compress them into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) dynamically allocated RAM-based memory pool. zswap basically trades CPU cycles
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) for potentially reduced swap I/O. This trade-off can also result in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) significant performance improvement if reads from the compressed cache are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) faster than reads from a swap device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) Zswap is a new feature as of v3.11 and interacts heavily with memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) reclaim. This interaction has not been fully explored on the large set of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) potential configurations and workloads that exist. For this reason, zswap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) is a work in progress and should be considered experimental.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Some potential benefits:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) * Desktop/laptop users with limited RAM capacities can mitigate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) performance impact of swapping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) * Overcommitted guests that share a common I/O resource can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) dramatically reduce their swap I/O pressure, avoiding heavy handed I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) throttling by the hypervisor. This allows more work to get done with less
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) impact to the guest workload and guests sharing the I/O subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) * Users with SSDs as swap devices can extend the life of the device by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) drastically reducing life-shortening writes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) Zswap evicts pages from compressed cache on an LRU basis to the backing swap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) device when the compressed pool reaches its size limit. This requirement had
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) been identified in prior community discussions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) Whether Zswap is enabled at the boot time depends on whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) the ``CONFIG_ZSWAP_DEFAULT_ON`` Kconfig option is enabled or not.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) This setting can then be overridden by providing the kernel command line
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) ``zswap.enabled=`` option, for example ``zswap.enabled=0``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) Zswap can also be enabled and disabled at runtime using the sysfs interface.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) An example command to enable zswap at runtime, assuming sysfs is mounted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) at ``/sys``, is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) echo 1 > /sys/module/zswap/parameters/enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) When zswap is disabled at runtime it will stop storing pages that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) being swapped out. However, it will _not_ immediately write out or fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) back into memory all of the pages stored in the compressed pool. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) pages stored in zswap will remain in the compressed pool until they are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) either invalidated or faulted back into memory. In order to force all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) pages out of the compressed pool, a swapoff on the swap device(s) will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) fault back into memory all swapped out pages, including those in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) compressed pool.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) Zswap receives pages for compression through the Frontswap API and is able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) evict pages from its own compressed pool on an LRU basis and write them back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) the backing swap device in the case that the compressed pool is full.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Zswap makes use of zpool for the managing the compressed memory pool. Each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) allocation in zpool is not directly accessible by address. Rather, a handle is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) returned by the allocation routine and that handle must be mapped before being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) accessed. The compressed memory pool grows on demand and shrinks as compressed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) pages are freed. The pool is not preallocated. By default, a zpool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) of type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) but it can be overridden at boot time by setting the ``zpool`` attribute,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) e.g. ``zswap.zpool=zbud``. It can also be changed at runtime using the sysfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) ``zpool`` attribute, e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) echo zbud > /sys/module/zswap/parameters/zpool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) The zbud type zpool allocates exactly 1 page to store 2 compressed pages, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) means the compression ratio will always be 2:1 or worse (because of half-full
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) zbud pages). The zsmalloc type zpool has a more complex compressed page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) storage method, and it can achieve greater storage densities. However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) zsmalloc does not implement compressed page eviction, so once zswap fills it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) cannot evict the oldest page, it can only reject new pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) When a swap page is passed from frontswap to zswap, zswap maintains a mapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) of the swap entry, a combination of the swap type and swap offset, to the zpool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) handle that references that compressed swap page. This mapping is achieved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) with a red-black tree per swap type. The swap offset is the search key for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) tree nodes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) During a page fault on a PTE that is a swap entry, frontswap calls the zswap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) load function to decompress the page into the page allocated by the page fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) handler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) Once there are no PTEs referencing a swap page stored in zswap (i.e. the count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) in the swap_map goes to 0) the swap code calls the zswap invalidate function,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) via frontswap, to free the compressed entry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) Zswap seeks to be simple in its policies. Sysfs attributes allow for one user
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) controlled policy:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) * max_pool_percent - The maximum percentage of memory that the compressed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) pool can occupy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) The default compressor is selected in ``CONFIG_ZSWAP_COMPRESSOR_DEFAULT``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Kconfig option, but it can be overridden at boot time by setting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) ``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) It can also be changed at runtime using the sysfs "compressor"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) attribute, e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) echo lzo > /sys/module/zswap/parameters/compressor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) When the zpool and/or compressor parameter is changed at runtime, any existing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) compressed pages are not modified; they are left in their own zpool. When a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) request is made for a page in an old zpool, it is uncompressed using its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) original compressor. Once all pages are removed from an old zpool, the zpool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) and its compressor are freed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) Some of the pages in zswap are same-value filled pages (i.e. contents of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) page have same value or repetitive pattern). These pages include zero-filled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) pages and they are handled differently. During store operation, a page is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) checked if it is a same-value filled page before compressing it. If true, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) compressed length of the page is set to zero and the pattern or same-filled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) value is stored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) Same-value filled pages identification feature is enabled by default and can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) disabled at boot time by setting the ``same_filled_pages_enabled`` attribute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) to 0, e.g. ``zswap.same_filled_pages_enabled=0``. It can also be enabled and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) disabled at runtime using the sysfs ``same_filled_pages_enabled``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) attribute, e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) When zswap same-filled page identification is disabled at runtime, it will stop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) checking for the same-value filled pages during store operation. However, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) existing pages which are marked as same-value filled pages remain stored
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) unchanged in zswap until they are either loaded or invalidated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) To prevent zswap from shrinking pool when zswap is full and there's a high
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) pressure on swap (this will result in flipping pages in and out zswap pool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) without any real benefit but with a performance drop for the system), a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) special parameter has been introduced to implement a sort of hysteresis to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) refuse taking pages into zswap pool until it has sufficient space if the limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) has been hit. To set the threshold at which zswap would start accepting pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) again after it became full, use the sysfs ``accept_threshold_percent``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) attribute, e. g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) echo 80 > /sys/module/zswap/parameters/accept_threshold_percent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) Setting this parameter to 100 will disable the hysteresis.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) A debugfs interface is provided for various statistic about pool size, number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) of pages stored, same-value filled pages and various counters for the reasons
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) pages are rejected.