Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. _pagemap:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Examining Process Page Tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) userspace programs to examine the page tables and related information by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) reading files in ``/proc``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) There are four components to pagemap:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)  * ``/proc/pid/pagemap``.  This file lets a userspace process find out which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)    physical frame each virtual page is mapped to.  It contains one 64-bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)    value for each virtual page, containing the following data (from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)    ``fs/proc/task_mmu.c``, above pagemap_read):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18)     * Bits 0-54  page frame number (PFN) if present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19)     * Bits 0-4   swap type if swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20)     * Bits 5-54  swap offset if swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21)     * Bit  55    pte is soft-dirty (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)       :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23)     * Bit  56    page exclusively mapped (since 4.2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24)     * Bits 57-60 zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25)     * Bit  61    page is file-page or shared-anon (since 3.5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)     * Bit  62    page swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)     * Bit  63    page present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)    Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)    In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)    4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)    Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)    If the page is not present but in swap, then the PFN contains an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)    encoding of the swap file number and the page's offset into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)    swap. Unmapped pages return a null PFN. This allows determining
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37)    precisely which pages are mapped (or in swap) and comparing mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)    pages between processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)    Efficient users of this interface will use ``/proc/pid/maps`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41)    determine which areas of memory are actually mapped and llseek to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42)    skip over unmapped regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44)  * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45)    times each page is mapped, indexed by PFN.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) The page-types tool in the tools/vm directory can be used to query the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) number of times a page is mapped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)  * ``/proc/kpageflags``.  This file contains a 64-bit set of flags for each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)    page, indexed by PFN.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)    The flags are (from ``fs/proc/page.c``, above kpageflags_read):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)     0. LOCKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)     1. ERROR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)     2. REFERENCED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)     3. UPTODATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)     4. DIRTY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)     5. LRU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)     6. ACTIVE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)     7. SLAB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)     8. WRITEBACK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)     9. RECLAIM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)     10. BUDDY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)     11. MMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67)     12. ANON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)     13. SWAPCACHE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)     14. SWAPBACKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)     15. COMPOUND_HEAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)     16. COMPOUND_TAIL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)     17. HUGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)     18. UNEVICTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)     19. HWPOISON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)     20. NOPAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)     21. KSM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)     22. THP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)     23. OFFLINE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)     24. ZERO_PAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)     25. IDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)     26. PGTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)  * ``/proc/kpagecgroup``.  This file contains a 64-bit inode number of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)    memory cgroup each page is charged to, indexed by PFN. Only available when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)    CONFIG_MEMCG is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) Short descriptions to the page flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) 0 - LOCKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)    page is being locked for exclusive access, e.g. by undergoing read/write IO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 7 - SLAB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)    page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)    When compound page is used, SLUB/SLQB will only set this flag on the head
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)    page; SLOB will not flag it at all.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 10 - BUDDY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)     a free memory block managed by the buddy system allocator
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)     The buddy system organizes free memory in blocks of various orders.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)     An order N block has 2^N physically contiguous pages, with the BUDDY flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)     set for and _only_ for the first page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 15 - COMPOUND_HEAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)     A compound page with order N consists of 2^N physically contiguous pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)     A compound page with order 2 takes the form of "HTTT", where H donates its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)     head page and T donates its tail page(s).  The major consumers of compound
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)     pages are hugeTLB pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)     (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)     the SLUB etc.  memory allocators and various device drivers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)     However in this interface, only huge/giga pages are made visible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)     to end users.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 16 - COMPOUND_TAIL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)     A compound page tail (see description above).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 17 - HUGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)     this is an integral part of a HugeTLB page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 19 - HWPOISON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)     hardware detected memory corruption on this page: don't touch the data!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 20 - NOPAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)     no page frame exists at the requested address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 21 - KSM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)     identical memory pages dynamically shared between one or more processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 22 - THP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)     contiguous pages which construct transparent hugepages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 23 - OFFLINE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)     page is logically offline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 24 - ZERO_PAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)     zero page for pfn_zero or huge_zero page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 25 - IDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127)     page has not been accessed since it was marked idle (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)     :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)     Note that this flag may be stale in case the page was accessed via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)     a PTE. To make sure the flag is up-to-date one has to read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)     ``/sys/kernel/mm/page_idle/bitmap`` first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 26 - PGTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)     page is in use as a page table
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) IO related page flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 1 - ERROR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)    IO error occurred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 3 - UPTODATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141)    page has up-to-date data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142)    ie. for file backed page: (in-memory data revision >= on-disk one)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 4 - DIRTY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)    page has been written to, hence contains new data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)    i.e. for file backed page: (in-memory data revision >  on-disk one)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 8 - WRITEBACK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147)    page is being synced to disk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) LRU related page flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 5 - LRU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)    page is in one of the LRU lists
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 6 - ACTIVE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155)    page is in the active LRU list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 18 - UNEVICTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)    page is in the unevictable (non-)LRU list It is somehow pinned and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)    not a candidate for LRU page reclaims, e.g. ramfs pages,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)    shmctl(SHM_LOCK) and mlock() memory segments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 2 - REFERENCED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)    page has been referenced since last LRU list enqueue/requeue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 9 - RECLAIM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)    page will be reclaimed soon after its pageout IO completed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 11 - MMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)    a memory mapped page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 12 - ANON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)    a memory mapped page that is not part of a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 13 - SWAPCACHE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)    page is mapped to swap space, i.e. has an associated swap entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 14 - SWAPBACKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)    page is backed by swap/RAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) The page-types tool in the tools/vm directory can be used to query the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) above flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Using pagemap to do something useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) The general procedure for using pagemap to find out about a process' memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) usage goes like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)  1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)     mapped to what.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)  2. Select the maps you are interested in -- all of them, or a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)     library, or the stack or the heap, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)  3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)  4. Read a u64 for each page from pagemap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)  5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``.  For each PFN you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)     just read, seek to that entry in the file, and read the data you want.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) For example, to find the "unique set size" (USS), which is the amount of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) memory that a process is using that is not shared with any other process,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) you can go through every map in the process, find the PFNs, look those up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) in kpagecount, and tally up the number of pages that are only referenced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) once.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) Other notes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) Reading from any of the files will return -EINVAL if you are not starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) into the file), or if the size of the read is not a multiple of 8 bytes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) always 12 at most architectures). Since Linux 3.11 their meaning changes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) after first clear of soft-dirty bits. Since Linux 4.2 they are used for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) flags unconditionally.