Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) HugeTLB Controller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ==================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) HugeTLB controller can be created by first mounting the cgroup filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) # mount -t cgroup -o hugetlb none /sys/fs/cgroup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) With the above step, the initial or the parent HugeTLB group becomes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) New groups can be created under the parent group /sys/fs/cgroup::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)   # cd /sys/fs/cgroup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)   # mkdir g1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17)   # echo $$ > g1/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) The above steps create a new group g1 and move the current shell
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) process (bash) into it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) Brief summary of control files::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24)  hugetlb.<hugepagesize>.rsvd.limit_in_bytes            # set/show limit of "hugepagesize" hugetlb reservations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25)  hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes        # show max "hugepagesize" hugetlb reservations and no-reserve faults
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)  hugetlb.<hugepagesize>.rsvd.usage_in_bytes            # show current reservations and no-reserve faults for "hugepagesize" hugetlb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)  hugetlb.<hugepagesize>.rsvd.failcnt                   # show the number of allocation failure due to HugeTLB reservation limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)  hugetlb.<hugepagesize>.limit_in_bytes                 # set/show limit of "hugepagesize" hugetlb faults
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)  hugetlb.<hugepagesize>.max_usage_in_bytes             # show max "hugepagesize" hugetlb  usage recorded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)  hugetlb.<hugepagesize>.usage_in_bytes                 # show current usage for "hugepagesize" hugetlb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)  hugetlb.<hugepagesize>.failcnt                        # show the number of allocation failure due to HugeTLB usage limit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) For a system supporting three hugepage sizes (64k, 32M and 1G), the control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) files include::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)   hugetlb.1GB.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37)   hugetlb.1GB.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)   hugetlb.1GB.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39)   hugetlb.1GB.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)   hugetlb.1GB.rsvd.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41)   hugetlb.1GB.rsvd.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42)   hugetlb.1GB.rsvd.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43)   hugetlb.1GB.rsvd.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44)   hugetlb.64KB.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45)   hugetlb.64KB.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)   hugetlb.64KB.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)   hugetlb.64KB.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)   hugetlb.64KB.rsvd.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49)   hugetlb.64KB.rsvd.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)   hugetlb.64KB.rsvd.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)   hugetlb.64KB.rsvd.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52)   hugetlb.32MB.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)   hugetlb.32MB.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)   hugetlb.32MB.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)   hugetlb.32MB.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)   hugetlb.32MB.rsvd.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)   hugetlb.32MB.rsvd.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)   hugetlb.32MB.rsvd.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)   hugetlb.32MB.rsvd.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) 1. Page fault accounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) hugetlb.<hugepagesize>.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) hugetlb.<hugepagesize>.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) hugetlb.<hugepagesize>.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) hugetlb.<hugepagesize>.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) control group and enforces the limit during page fault. Since HugeTLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) doesn't support page reclaim, enforcing the limit at page fault time implies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) that, the application will get SIGBUS signal if it tries to fault in HugeTLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) pages beyond its limit. Therefore the application needs to know exactly how many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) there are enough available on the machine for all the users to avoid processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) getting SIGBUS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 2. Reservation accounting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) hugetlb.<hugepagesize>.rsvd.limit_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) hugetlb.<hugepagesize>.rsvd.usage_in_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) hugetlb.<hugepagesize>.rsvd.failcnt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) The HugeTLB controller allows to limit the HugeTLB reservations per control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) group and enforces the controller limit at reservation time and at the fault of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) HugeTLB memory for which no reservation exists. Since reservation limits are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) enforced at reservation time (on mmap or shget), reservation limits never causes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) the application to get SIGBUS signal if the memory was reserved before hand. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) limit, enforcing memory usage at fault time and causing the application to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) receive a SIGBUS if it's crossing its limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) Reservation limits are superior to page fault limits described above, since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) reservation limits are enforced at reservation time (on mmap or shget), and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) never causes the application to get SIGBUS signal if the memory was reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) before hand. This allows for easier fallback to alternatives such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) non-HugeTLB memory for example. In the case of page fault accounting, it's very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) hard to avoid processes getting SIGBUS since the sysadmin needs precisely know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) the HugeTLB usage of all the tasks in the system and make sure there is enough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) systems is practically impossible with page fault accounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 3. Caveats with shared memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) to the first task that causes the memory to be reserved or faulted, and all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) subsequent uses of this reserved or faulted memory is done without charging.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) This is usually when the HugeTLB file is deleted, and not when the task that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) caused the reservation or fault has exited.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 4. Caveats with HugeTLB cgroup offline.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) When a HugeTLB cgroup goes offline with some reservations or faults still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) charged to it, the behavior is as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) - The fault charges are charged to the parent HugeTLB cgroup (reparented),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) - the reservation charges remain on the offline HugeTLB cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) reservations charged to it, that cgroup persists as a zombie until all HugeTLB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) reservations are uncharged. HugeTLB reservations behave in this manner to match
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) the memory controller whose cgroups also persist as zombie until all charged
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) complex compared to the tracking of HugeTLB faults, so it is significantly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) harder to reparent reservations at offline time.