^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _overcommit_accounting:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Overcommit Accounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) The Linux kernel supports the following overcommit handling modes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Heuristic overcommit handling. Obvious overcommits of address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) space are refused. Used for a typical system. It ensures a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) seriously wild allocation fails while allowing overcommit to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) reduce swap usage. root is allowed to allocate slightly more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) memory in this mode. This is the default.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) Always overcommit. Appropriate for some scientific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) applications. Classic example is code using sparse arrays and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) just relying on the virtual memory consisting almost entirely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) of zero pages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Don't overcommit. The total address space commit for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) system is not permitted to exceed swap + a configurable amount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) (default is 50%) of physical RAM. Depending on the amount you
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) use, in most situations this means a process will not be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) killed while accessing pages but will receive errors on memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) allocation as appropriate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) Useful for applications that want to guarantee their memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) allocations will be available in the future without having to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) initialize every page.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) The overcommit policy is set via the sysctl ``vm.overcommit_memory``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) The overcommit amount can be set via ``vm.overcommit_ratio`` (percentage)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) or ``vm.overcommit_kbytes`` (absolute value).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) The current overcommit limit and amount committed are viewable in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) ``/proc/meminfo`` as CommitLimit and Committed_AS respectively.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) Gotchas
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) =======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) The C language stack growth does an implicit mremap. If you want absolute
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) guarantees and run close to the edge you MUST mmap your stack for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) largest size you think you will need. For typical stack usage this does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) not matter much but it's a corner case if you really really care
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) In mode 2 the MAP_NORESERVE flag is ignored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) How It Works
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) The overcommit is based on the following rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) For a file backed map
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) | SHARED or READ-only - 0 cost (the file is the map not swap)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) | PRIVATE WRITABLE - size of mapping per instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) For an anonymous or ``/dev/zero`` map
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) | SHARED - size of mapping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) | PRIVATE READ-only - 0 cost (but of little use)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) | PRIVATE WRITABLE - size of mapping per instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Additional accounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) | Pages made writable copies by mmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) | shmfs memory drawn from the same pool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) Status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) ======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) * We account mmap memory mappings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) * We account mprotect changes in commit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) * We account mremap changes in size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) * We account brk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) * We account munmap
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) * We report the commit status in /proc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) * Account and check on fork
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) * Review stack handling/building on exec
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) * SHMfs accounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) * Implement actual limit enforcement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) To Do
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) =====
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) * Account ptrace pages (this is hard)