Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. _highmem:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) High Memory Handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) By: Peter Zijlstra <a.p.zijlstra@chello.nl>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) .. contents:: :local:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) What Is High Memory?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) High memory (highmem) is used when the size of physical memory approaches or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) exceeds the maximum size of virtual memory.  At that point it becomes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) impossible for the kernel to keep all of the available physical memory mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) at all times.  This means the kernel needs to start using temporary mappings of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) the pieces of physical memory that it wants to access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) The part of (physical) memory not covered by a permanent mapping is what we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) refer to as 'highmem'.  There are various architecture dependent constraints on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) where exactly that border lies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) In the i386 arch, for example, we choose to map the kernel into every process's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) VM space so that we don't have to pay the full TLB invalidation costs for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) kernel entry/exit.  This means the available virtual memory space (4GiB on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) i386) has to be divided between user and kernel space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) The traditional split for architectures using this approach is 3:1, 3GiB for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) userspace and the top 1GiB for kernel space::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 		+--------+ 0xffffffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 		| Kernel |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 		+--------+ 0xc0000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 		|        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 		| User   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 		|        |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 		+--------+ 0x00000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) This means that the kernel can at most map 1GiB of physical memory at any one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) time, but because we need virtual address space for other things - including
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) temporary maps to access the rest of the physical memory - the actual direct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) map will typically be less (usually around ~896MiB).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) Other architectures that have mm context tagged TLBs can have separate kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) and user maps.  Some hardware (like some ARMs), however, have limited virtual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) space when they use mm context tags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) Temporary Virtual Mappings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) The kernel contains several ways of creating temporary mappings:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) * vmap().  This can be used to make a long duration mapping of multiple
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)   physical pages into a contiguous virtual space.  It needs global
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)   synchronization to unmap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) * kmap().  This permits a short duration mapping of a single page.  It needs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)   global synchronization, but is amortized somewhat.  It is also prone to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)   deadlocks when using in a nested fashion, and so it is not recommended for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)   new code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) * kmap_atomic().  This permits a very short duration mapping of a single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)   page.  Since the mapping is restricted to the CPU that issued it, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)   performs well, but the issuing task is therefore required to stay on that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)   CPU until it has finished, lest some other task displace its mappings.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)   kmap_atomic() may also be used by interrupt contexts, since it is does not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)   sleep and the caller may not sleep until after kunmap_atomic() is called.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)   It may be assumed that k[un]map_atomic() won't fail.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) Using kmap_atomic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) When and where to use kmap_atomic() is straightforward.  It is used when code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) wants to access the contents of a page that might be allocated from high memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) (see __GFP_HIGHMEM), for example a page in the pagecache.  The API has two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) functions, and they can be used in a manner similar to the following::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 	/* Find the page of interest. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 	struct page *page = find_get_page(mapping, offset);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 	/* Gain access to the contents of that page. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 	void *vaddr = kmap_atomic(page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 	/* Do something to the contents of that page. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 	memset(vaddr, 0, PAGE_SIZE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 	/* Unmap that page. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 	kunmap_atomic(vaddr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) Note that the kunmap_atomic() call takes the result of the kmap_atomic() call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) not the argument.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) If you need to map two pages because you want to copy from one page to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) another you need to keep the kmap_atomic calls strictly nested, like::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 	vaddr1 = kmap_atomic(page1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 	vaddr2 = kmap_atomic(page2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 	memcpy(vaddr1, vaddr2, PAGE_SIZE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 	kunmap_atomic(vaddr2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 	kunmap_atomic(vaddr1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Cost of Temporary Mappings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) The cost of creating temporary mappings can be quite high.  The arch has to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) manipulate the kernel's page tables, the data TLB and/or the MMU's registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) If CONFIG_HIGHMEM is not set, then the kernel will try and create a mapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) simply with a bit of arithmetic that will convert the page struct address into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) a pointer to the page contents rather than juggling mappings about.  In such a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) case, the unmap operation may be a null operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) If CONFIG_MMU is not set, then there can be no temporary mappings and no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) highmem.  In such a case, the arithmetic approach will also be used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) i386 PAE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) The i386 arch, under some circumstances, will permit you to stick up to 64GiB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) of RAM into your 32-bit machine.  This has a number of consequences:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) * Linux needs a page-frame structure for each page in the system and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)   pageframes need to live in the permanent mapping, which means:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) * you can have 896M/sizeof(struct page) page-frames at most; with struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)   page being 32-bytes that would end up being something in the order of 112G
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)   worth of pages; the kernel, however, needs to store more than just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)   page-frames in that memory...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) * PAE makes your page tables larger - which slows the system down as more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)   data has to be accessed to traverse in TLB fills and the like.  One
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)   advantage is that PAE has more PTE bits and can provide advanced features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)   like NX and PAT.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) The general recommendation is that you don't use more than 8GiB on a 32-bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) machine - although more might work for you and your workload, you're pretty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) much on your own - don't expect kernel developers to really care much if things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) come apart.