Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. _mm_concepts:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Concepts overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) The memory management in Linux is a complex system that evolved over the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) years and included more and more functionality to support a variety of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) systems from MMU-less microcontrollers to supercomputers. The memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) management for systems without an MMU is called ``nommu`` and it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) definitely deserves a dedicated document, which hopefully will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) eventually written. Yet, although some of the concepts are the same,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) here we assume that an MMU is available and a CPU can translate a virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) address to a physical address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) .. contents:: :local:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Virtual Memory Primer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) The physical memory in a computer system is a limited resource and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) even for systems that support memory hotplug there is a hard limit on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) the amount of memory that can be installed. The physical memory is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) necessarily contiguous; it might be accessible as a set of distinct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) address ranges. Besides, different CPU architectures, and even
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) different implementations of the same architecture have different views
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) of how these address ranges are defined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) All this makes dealing directly with physical memory quite complex and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) to avoid this complexity a concept of virtual memory was developed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) The virtual memory abstracts the details of physical memory from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) application software, allows to keep only needed information in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) physical memory (demand paging) and provides a mechanism for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) protection and controlled sharing of data between processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) With virtual memory, each and every memory access uses a virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) address. When the CPU decodes an instruction that reads (or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) writes) from (or to) the system memory, it translates the `virtual`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) address encoded in that instruction to a `physical` address that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) memory controller can understand.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) The physical system memory is divided into page frames, or pages. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) size of each page is architecture specific. Some architectures allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) selection of the page size from several supported values; this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) selection is performed at the kernel build time by setting an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) appropriate kernel configuration option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) Each physical memory page can be mapped as one or more virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) pages. These mappings are described by page tables that allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) translation from a virtual address used by programs to the physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) memory address. The page tables are organized hierarchically.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) The tables at the lowest level of the hierarchy contain physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) addresses of actual pages used by the software. The tables at higher
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) levels contain physical addresses of the pages belonging to the lower
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) levels. The pointer to the top level page table resides in a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) register. When the CPU performs the address translation, it uses this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) register to access the top level page table. The high bits of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) virtual address are used to index an entry in the top level page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) table. That entry is then used to access the next level in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) hierarchy with the next bits of the virtual address as the index to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) that level page table. The lowest bits in the virtual address define
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) the offset inside the actual page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) Huge Pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) The address translation requires several memory accesses and memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) accesses are slow relatively to CPU speed. To avoid spending precious
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) processor cycles on the address translation, CPUs maintain a cache of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) such translations called Translation Lookaside Buffer (or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) TLB). Usually TLB is pretty scarce resource and applications with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) large memory working set will experience performance hit because of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) TLB misses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) Many modern CPU architectures allow mapping of the memory pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) directly by the higher levels in the page table. For instance, on x86,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) it is possible to map 2M and even 1G pages using entries in the second
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) and the third level page tables. In Linux such pages are called
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) `huge`. Usage of huge pages significantly reduces pressure on TLB,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) improves TLB hit-rate and thus improves overall system performance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) There are two mechanisms in Linux that enable mapping of the physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) memory with the huge pages. The first one is `HugeTLB filesystem`, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) hugetlbfs. It is a pseudo filesystem that uses RAM as its backing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) store. For the files created in this filesystem the data resides in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) the memory and mapped using huge pages. The hugetlbfs is described at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) :ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) Another, more recent, mechanism that enables use of the huge pages is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) called `Transparent HugePages`, or THP. Unlike the hugetlbfs that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) requires users and/or system administrators to configure what parts of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) the system memory should and can be mapped by the huge pages, THP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) manages such mappings transparently to the user and hence the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) name. See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) :ref:`Documentation/admin-guide/mm/transhuge.rst <admin_guide_transhuge>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) for more details about THP.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Zones
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) Often hardware poses restrictions on how different physical memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) ranges can be accessed. In some cases, devices cannot perform DMA to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) all the addressable memory. In other cases, the size of the physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) memory exceeds the maximal addressable size of virtual memory and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) special actions are required to access portions of the memory. Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) groups memory pages into `zones` according to their possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) usage. For example, ZONE_DMA will contain memory that can be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) devices for DMA, ZONE_HIGHMEM will contain memory that is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) permanently mapped into kernel's address space and ZONE_NORMAL will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) contain normally addressed pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) The actual layout of the memory zones is hardware dependent as not all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) architectures define all zones, and requirements for DMA are different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) for different platforms.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Nodes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) Many multi-processor machines are NUMA - Non-Uniform Memory Access -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) systems. In such systems the memory is arranged into banks that have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) different access latency depending on the "distance" from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) processor. Each bank is referred to as a `node` and for each node Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) constructs an independent memory management subsystem. A node has its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) own set of zones, lists of free and used pages and various statistics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) counters. You can find more details about NUMA in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) :ref:`Documentation/vm/numa.rst <numa>` and in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) :ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) Page cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) The physical memory is volatile and the common case for getting data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) into the memory is to read it from files. Whenever a file is read, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) data is put into the `page cache` to avoid expensive disk access on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) the subsequent reads. Similarly, when one writes to a file, the data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) is placed in the page cache and eventually gets into the backing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) storage device. The written pages are marked as `dirty` and when Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) decides to reuse them for other purposes, it makes sure to synchronize
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) the file contents on the device with the updated data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) Anonymous Memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) The `anonymous memory` or `anonymous mappings` represent memory that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) is not backed by a filesystem. Such mappings are implicitly created
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) for program's stack and heap or by explicit calls to mmap(2) system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) call. Usually, the anonymous mappings only define virtual memory areas
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) that the program is allowed to access. The read accesses will result
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) in creation of a page table entry that references a special physical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) page filled with zeroes. When the program performs a write, a regular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) physical page will be allocated to hold the written data. The page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) will be marked dirty and if the kernel decides to repurpose it,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) the dirty page will be swapped out.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) Reclaim
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) =======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) Throughout the system lifetime, a physical page can be used for storing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) different types of data. It can be kernel internal data structures,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) DMA'able buffers for device drivers use, data read from a filesystem,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) memory allocated by user space processes etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) Depending on the page usage it is treated differently by the Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) memory management. The pages that can be freed at any time, either
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) because they cache the data available elsewhere, for instance, on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) hard disk, or because they can be swapped out, again, to the hard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) disk, are called `reclaimable`. The most notable categories of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) reclaimable pages are page cache and anonymous memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) In most cases, the pages holding internal kernel data and used as DMA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) buffers cannot be repurposed, and they remain pinned until freed by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) their user. Such pages are called `unreclaimable`. However, in certain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) circumstances, even pages occupied with kernel data structures can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) reclaimed. For instance, in-memory caches of filesystem metadata can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) be re-read from the storage device and therefore it is possible to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) discard them from the main memory when system is under memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) pressure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) The process of freeing the reclaimable physical memory pages and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) repurposing them is called (surprise!) `reclaim`. Linux can reclaim
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) pages either asynchronously or synchronously, depending on the state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) of the system. When the system is not loaded, most of the memory is free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) and allocation requests will be satisfied immediately from the free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) pages supply. As the load increases, the amount of the free pages goes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) down and when it reaches a certain threshold (high watermark), an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) allocation request will awaken the ``kswapd`` daemon. It will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) asynchronously scan memory pages and either just free them if the data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) they contain is available elsewhere, or evict to the backing storage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) device (remember those dirty pages?). As memory usage increases even
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) more and reaches another threshold - min watermark - an allocation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) will trigger `direct reclaim`. In this case allocation is stalled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) until enough memory pages are reclaimed to satisfy the request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) Compaction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) As the system runs, tasks allocate and free the memory and it becomes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) fragmented. Although with virtual memory it is possible to present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) scattered physical pages as virtually contiguous range, sometimes it is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) necessary to allocate large physically contiguous memory areas. Such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) need may arise, for instance, when a device driver requires a large
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) buffer for DMA, or when THP allocates a huge page. Memory `compaction`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) addresses the fragmentation issue. This mechanism moves occupied pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) from the lower part of a memory zone to free pages in the upper part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) of the zone. When a compaction scan is finished free pages are grouped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) together at the beginning of the zone and allocations of large
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) physically contiguous areas become possible.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) Like reclaim, the compaction may happen asynchronously in the ``kcompactd``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) daemon or synchronously as a result of a memory allocation request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) OOM killer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) It is possible that on a loaded machine memory will be exhausted and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) kernel will be unable to reclaim enough memory to continue to operate. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) order to save the rest of the system, it invokes the `OOM killer`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) The `OOM killer` selects a task to sacrifice for the sake of the overall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) system health. The selected task is killed in a hope that after it exits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) enough memory will be freed to continue normal operation.