^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _page_owner:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) page owner: Tracking about who allocated each page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) ==================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) page owner is for the tracking about who allocated each page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) It can be used to debug memory leak or to find a memory hogger.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) When allocation happens, information about allocation such as call stack
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) and order of pages is stored into certain storage for each page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) When we need to know about status of all pages, we can get and analyze
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) this information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) Although we already have tracepoint for tracing page allocation/free,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) using it for analyzing who allocate each page is rather complex. We need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) to enlarge the trace buffer for preventing overlapping until userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) program launched. And, launched program continually dump out the trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) buffer for later analysis and it would change system behaviour with more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) possibility rather than just keeping it in memory, so bad for debugging.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) page owner can also be used for various purposes. For example, accurate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) fragmentation statistics can be obtained through gfp flag information of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) each page. It is already implemented and activated if page owner is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) enabled. Other usages are more than welcome.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) page owner is disabled in default. So, if you'd like to use it, you need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) to add "page_owner=on" into your boot cmdline. If the kernel is built
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) with page owner and page owner is disabled in runtime due to no enabling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) boot option, runtime overhead is marginal. If disabled in runtime, it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) doesn't require memory to store owner information, so there is no runtime
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) memory overhead. And, page owner inserts just two unlikely branches into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) the page allocator hotpath and if not enabled, then allocation is done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) like as the kernel without page owner. These two unlikely branches should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) not affect to allocation performance, especially if the static keys jump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) label patching functionality is available. Following is the kernel's code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) size change due to this facility.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) - Without page owner::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) text data bss dec hex filename
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) 48392 2333 644 51369 c8a9 mm/page_alloc.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) - With page owner::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) text data bss dec hex filename
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) 48800 2445 644 51889 cab1 mm/page_alloc.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) 6662 108 29 6799 1a8f mm/page_owner.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) 1025 8 8 1041 411 mm/page_ext.o
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) Although, roughly, 8 KB code is added in total, page_alloc.o increase by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) 520 bytes and less than half of it is in hotpath. Building the kernel with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) page owner and turning it on if needed would be great option to debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) kernel memory problem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) There is one notice that is caused by implementation detail. page owner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) stores information into the memory from struct page extension. This memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) is initialized some time later than that page allocator starts in sparse
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) memory system, so, until initialization, many pages can be allocated and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) they would have no owner information. To fix it up, these early allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) pages are investigated and marked as allocated in initialization phase.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Although it doesn't mean that they have the right owner information,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) at least, we can tell whether the page is allocated or not,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) are catched and marked, although they are mostly allocated from struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) page extension feature. Anyway, after that, no page is left in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) un-tracking state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) Usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) 1) Build user-space helper::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) cd tools/vm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) make page_owner_sort
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) 2) Enable page owner: add "page_owner=on" to boot cmdline.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) 3) Do the job what you want to debug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) 4) Analyze information from page owner::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) cat /sys/kernel/debug/page_owner > page_owner_full.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) ./page_owner_sort page_owner_full.txt sorted_page_owner.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) See the result about who allocated each page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) in the ``sorted_page_owner.txt``.