^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _highmem:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) High Memory Handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) By: Peter Zijlstra <a.p.zijlstra@chello.nl>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) .. contents:: :local:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) What Is High Memory?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) High memory (highmem) is used when the size of physical memory approaches or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) exceeds the maximum size of virtual memory. At that point it becomes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) impossible for the kernel to keep all of the available physical memory mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) at all times. This means the kernel needs to start using temporary mappings of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) the pieces of physical memory that it wants to access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) The part of (physical) memory not covered by a permanent mapping is what we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) refer to as 'highmem'. There are various architecture dependent constraints on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) where exactly that border lies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) In the i386 arch, for example, we choose to map the kernel into every process's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) VM space so that we don't have to pay the full TLB invalidation costs for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) kernel entry/exit. This means the available virtual memory space (4GiB on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) i386) has to be divided between user and kernel space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) The traditional split for architectures using this approach is 3:1, 3GiB for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) userspace and the top 1GiB for kernel space::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) +--------+ 0xffffffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) | Kernel |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) +--------+ 0xc0000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) | User |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) +--------+ 0x00000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) This means that the kernel can at most map 1GiB of physical memory at any one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) time, but because we need virtual address space for other things - including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) temporary maps to access the rest of the physical memory - the actual direct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) map will typically be less (usually around ~896MiB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) Other architectures that have mm context tagged TLBs can have separate kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) and user maps. Some hardware (like some ARMs), however, have limited virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) space when they use mm context tags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) Temporary Virtual Mappings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) The kernel contains several ways of creating temporary mappings:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) * vmap(). This can be used to make a long duration mapping of multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) physical pages into a contiguous virtual space. It needs global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) synchronization to unmap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) * kmap(). This permits a short duration mapping of a single page. It needs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) global synchronization, but is amortized somewhat. It is also prone to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) deadlocks when using in a nested fashion, and so it is not recommended for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) new code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) * kmap_atomic(). This permits a very short duration mapping of a single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) page. Since the mapping is restricted to the CPU that issued it, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) performs well, but the issuing task is therefore required to stay on that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) CPU until it has finished, lest some other task displace its mappings.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) kmap_atomic() may also be used by interrupt contexts, since it is does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) sleep and the caller may not sleep until after kunmap_atomic() is called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) It may be assumed that k[un]map_atomic() won't fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Using kmap_atomic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) When and where to use kmap_atomic() is straightforward. It is used when code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) wants to access the contents of a page that might be allocated from high memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) (see __GFP_HIGHMEM), for example a page in the pagecache. The API has two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) functions, and they can be used in a manner similar to the following::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) /* Find the page of interest. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) struct page *page = find_get_page(mapping, offset);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) /* Gain access to the contents of that page. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) void *vaddr = kmap_atomic(page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) /* Do something to the contents of that page. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) memset(vaddr, 0, PAGE_SIZE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) /* Unmap that page. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) kunmap_atomic(vaddr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) Note that the kunmap_atomic() call takes the result of the kmap_atomic() call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) not the argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) If you need to map two pages because you want to copy from one page to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) another you need to keep the kmap_atomic calls strictly nested, like::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) vaddr1 = kmap_atomic(page1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) vaddr2 = kmap_atomic(page2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) memcpy(vaddr1, vaddr2, PAGE_SIZE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) kunmap_atomic(vaddr2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) kunmap_atomic(vaddr1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Cost of Temporary Mappings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) The cost of creating temporary mappings can be quite high. The arch has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) manipulate the kernel's page tables, the data TLB and/or the MMU's registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) If CONFIG_HIGHMEM is not set, then the kernel will try and create a mapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) simply with a bit of arithmetic that will convert the page struct address into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) a pointer to the page contents rather than juggling mappings about. In such a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) case, the unmap operation may be a null operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) If CONFIG_MMU is not set, then there can be no temporary mappings and no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) highmem. In such a case, the arithmetic approach will also be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) i386 PAE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) The i386 arch, under some circumstances, will permit you to stick up to 64GiB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) of RAM into your 32-bit machine. This has a number of consequences:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) * Linux needs a page-frame structure for each page in the system and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) pageframes need to live in the permanent mapping, which means:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) * you can have 896M/sizeof(struct page) page-frames at most; with struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) page being 32-bytes that would end up being something in the order of 112G
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) worth of pages; the kernel, however, needs to store more than just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) page-frames in that memory...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) * PAE makes your page tables larger - which slows the system down as more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) data has to be accessed to traverse in TLB fills and the like. One
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) advantage is that PAE has more PTE bits and can provide advanced features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) like NX and PAT.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) The general recommendation is that you don't use more than 8GiB on a 32-bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) machine - although more might work for you and your workload, you're pretty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) much on your own - don't expect kernel developers to really care much if things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) come apart.