^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _pagemap:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Examining Process Page Tables
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) userspace programs to examine the page tables and related information by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) reading files in ``/proc``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) There are four components to pagemap:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) * ``/proc/pid/pagemap``. This file lets a userspace process find out which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) physical frame each virtual page is mapped to. It contains one 64-bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) value for each virtual page, containing the following data (from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ``fs/proc/task_mmu.c``, above pagemap_read):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) * Bits 0-54 page frame number (PFN) if present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) * Bits 0-4 swap type if swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) * Bits 5-54 swap offset if swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) * Bit 55 pte is soft-dirty (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) * Bit 56 page exclusively mapped (since 4.2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) * Bits 57-60 zero
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) * Bit 61 page is file-page or shared-anon (since 3.5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) * Bit 62 page swapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) * Bit 63 page present
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) If the page is not present but in swap, then the PFN contains an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) encoding of the swap file number and the page's offset into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) swap. Unmapped pages return a null PFN. This allows determining
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) precisely which pages are mapped (or in swap) and comparing mapped
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) pages between processes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) Efficient users of this interface will use ``/proc/pid/maps`` to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) determine which areas of memory are actually mapped and llseek to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) skip over unmapped regions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) * ``/proc/kpagecount``. This file contains a 64-bit count of the number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) times each page is mapped, indexed by PFN.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) The page-types tool in the tools/vm directory can be used to query the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) number of times a page is mapped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) page, indexed by PFN.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) The flags are (from ``fs/proc/page.c``, above kpageflags_read):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) 0. LOCKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) 1. ERROR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) 2. REFERENCED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) 3. UPTODATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) 4. DIRTY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) 5. LRU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) 6. ACTIVE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) 7. SLAB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) 8. WRITEBACK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) 9. RECLAIM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) 10. BUDDY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) 11. MMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) 12. ANON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) 13. SWAPCACHE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) 14. SWAPBACKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) 15. COMPOUND_HEAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) 16. COMPOUND_TAIL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) 17. HUGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) 18. UNEVICTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) 19. HWPOISON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) 20. NOPAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) 21. KSM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) 22. THP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) 23. OFFLINE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) 24. ZERO_PAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) 25. IDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) 26. PGTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) memory cgroup each page is charged to, indexed by PFN. Only available when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) CONFIG_MEMCG is set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) Short descriptions to the page flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) 0 - LOCKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) page is being locked for exclusive access, e.g. by undergoing read/write IO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) 7 - SLAB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) When compound page is used, SLUB/SLQB will only set this flag on the head
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) page; SLOB will not flag it at all.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) 10 - BUDDY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) a free memory block managed by the buddy system allocator
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) The buddy system organizes free memory in blocks of various orders.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) An order N block has 2^N physically contiguous pages, with the BUDDY flag
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) set for and _only_ for the first page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 15 - COMPOUND_HEAD
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) A compound page with order N consists of 2^N physically contiguous pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) A compound page with order 2 takes the form of "HTTT", where H donates its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) head page and T donates its tail page(s). The major consumers of compound
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) pages are hugeTLB pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) the SLUB etc. memory allocators and various device drivers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) However in this interface, only huge/giga pages are made visible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) to end users.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 16 - COMPOUND_TAIL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) A compound page tail (see description above).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 17 - HUGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) this is an integral part of a HugeTLB page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 19 - HWPOISON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) hardware detected memory corruption on this page: don't touch the data!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 20 - NOPAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) no page frame exists at the requested address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 21 - KSM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) identical memory pages dynamically shared between one or more processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 22 - THP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) contiguous pages which construct transparent hugepages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 23 - OFFLINE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) page is logically offline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 24 - ZERO_PAGE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) zero page for pfn_zero or huge_zero page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) 25 - IDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) page has not been accessed since it was marked idle (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Note that this flag may be stale in case the page was accessed via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) a PTE. To make sure the flag is up-to-date one has to read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) ``/sys/kernel/mm/page_idle/bitmap`` first.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) 26 - PGTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) page is in use as a page table
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) IO related page flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) 1 - ERROR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) IO error occurred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) 3 - UPTODATE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) page has up-to-date data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) ie. for file backed page: (in-memory data revision >= on-disk one)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 4 - DIRTY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) page has been written to, hence contains new data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) i.e. for file backed page: (in-memory data revision > on-disk one)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) 8 - WRITEBACK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) page is being synced to disk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) LRU related page flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 5 - LRU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) page is in one of the LRU lists
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 6 - ACTIVE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) page is in the active LRU list
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) 18 - UNEVICTABLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) page is in the unevictable (non-)LRU list It is somehow pinned and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) not a candidate for LRU page reclaims, e.g. ramfs pages,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) shmctl(SHM_LOCK) and mlock() memory segments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 2 - REFERENCED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) page has been referenced since last LRU list enqueue/requeue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 9 - RECLAIM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) page will be reclaimed soon after its pageout IO completed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 11 - MMAP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) a memory mapped page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 12 - ANON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) a memory mapped page that is not part of a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 13 - SWAPCACHE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) page is mapped to swap space, i.e. has an associated swap entry
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 14 - SWAPBACKED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) page is backed by swap/RAM
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) The page-types tool in the tools/vm directory can be used to query the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) above flags.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Using pagemap to do something useful
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) ====================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) The general procedure for using pagemap to find out about a process' memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) usage goes like this:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) mapped to what.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 2. Select the maps you are interested in -- all of them, or a particular
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) library, or the stack or the heap, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) 4. Read a u64 for each page from pagemap.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) just read, seek to that entry in the file, and read the data you want.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) For example, to find the "unique set size" (USS), which is the amount of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) memory that a process is using that is not shared with any other process,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) you can go through every map in the process, find the PFNs, look those up
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) in kpagecount, and tally up the number of pages that are only referenced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) once.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) Other notes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) Reading from any of the files will return -EINVAL if you are not starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) into the file), or if the size of the read is not a multiple of 8 bytes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) always 12 at most architectures). Since Linux 3.11 their meaning changes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) after first clear of soft-dirty bits. Since Linux 4.2 they are used for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) flags unconditionally.