Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) Firmware-Assisted Dump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) July 2011
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) The goal of firmware-assisted dump is to enable the dump of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) a crashed system, and to do so from a fully-reset system, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) to minimize the total elapsed time until the system is back
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) in production use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) - Firmware-Assisted Dump (FADump) infrastructure is intended to replace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)   the existing phyp assisted dump.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) - Fadump uses the same firmware interfaces and memory reservation model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)   as phyp assisted dump.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) - Unlike phyp dump, FADump exports the memory dump through /proc/vmcore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17)   in the ELF format in the same way as kdump. This helps us reuse the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18)   kdump infrastructure for dump capture and filtering.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) - Unlike phyp dump, userspace tool does not need to refer any sysfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20)   interface while reading /proc/vmcore.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) - Unlike phyp dump, FADump allows user to release all the memory reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22)   for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) - Once enabled through kernel boot parameter, FADump can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24)   started/stopped through /sys/kernel/fadump_registered interface (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25)   sysfs files section below) and can be easily integrated with kdump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26)   service start/stop init scripts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Comparing with kdump or other strategies, firmware-assisted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) dump offers several strong, practical advantages:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) -  Unlike kdump, the system has been reset, and loaded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)    with a fresh copy of the kernel.  In particular,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)    PCI and I/O devices have been reinitialized and are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)    in a clean, consistent state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) -  Once the dump is copied out, the memory that held the dump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)    is immediately available to the running kernel. And therefore,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37)    unlike kdump, FADump doesn't need a 2nd reboot to get back
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38)    the system to the production configuration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) The above can only be accomplished by coordination with,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) and assistance from the Power firmware. The procedure is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) -  The first kernel registers the sections of memory with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45)    Power firmware for dump preservation during OS initialization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)    These registered sections of memory are reserved by the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)    kernel during early boot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) -  When system crashes, the Power firmware will copy the registered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)    low memory regions (boot memory) from source to destination area.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)    It will also save hardware PTE's.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53)    NOTE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54)          The term 'boot memory' means size of the low memory chunk
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55)          that is required for a kernel to boot successfully when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56)          booted with restricted memory. By default, the boot memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57)          size will be the larger of 5% of system RAM or 256MB.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58)          Alternatively, user can also specify boot memory size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59)          through boot parameter 'crashkernel=' which will override
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)          the default calculated size. Use this option if default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)          boot memory size is not sufficient for second kernel to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)          boot successfully. For syntax of crashkernel= parameter,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)          refer to Documentation/admin-guide/kdump/kdump.rst. If any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64)          offset is provided in crashkernel= parameter, it will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65)          ignored as FADump uses a predefined offset to reserve memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)          for boot memory dump preservation in case of a crash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) -  After the low memory (boot memory) area has been saved, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)    firmware will reset PCI and other hardware state.  It will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)    *not* clear the RAM. It will then launch the bootloader, as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)    normal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) -  The freshly booted kernel will notice that there is a new node
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)    (rtas/ibm,kernel-dump on pSeries or ibm,opal/dump/mpipl-boot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)    on OPAL platform) in the device tree, indicating that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)    there is crash data available from a previous boot. During
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)    the early boot OS will reserve rest of the memory above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)    boot memory size effectively booting with restricted memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)    size. This will make sure that this kernel (also, referred
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)    to as second kernel or capture kernel) will not touch any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)    of the dump memory area.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) -  User-space tools will read /proc/vmcore to obtain the contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)    of memory, which holds the previous crashed kernel dump in ELF
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)    format. The userspace tools may copy this info to disk, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86)    network, nas, san, iscsi, etc. as desired.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) -  Once the userspace tool is done saving dump, it will echo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)    '1' to /sys/kernel/fadump_release_mem to release the reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)    memory back to general use, except the memory required for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)    next firmware-assisted dump registration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)    e.g.::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)      # echo 1 > /sys/kernel/fadump_release_mem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) Please note that the firmware-assisted dump feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) is only available on POWER6 and above systems on pSeries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) (PowerVM) platform and POWER9 and above systems with OP940
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) or later firmware versions on PowerNV (OPAL) platform.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) Note that, OPAL firmware exports ibm,opal/dump node when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) FADump is supported on PowerNV platform.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) On OPAL based machines, system first boots into an intermittent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) kernel (referred to as petitboot kernel) before booting into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) capture kernel. This kernel would have minimal kernel and/or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) userspace support to process crash data. Such kernel needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) preserve previously crash'ed kernel's memory for the subsequent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) capture kernel boot to process this crash data. Kernel config
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) option CONFIG_PRESERVE_FA_DUMP has to be enabled on such kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) to ensure that crash data is preserved to process later.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) -- On OPAL based machines (PowerNV), if the kernel is build with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)    CONFIG_OPAL_CORE=y, OPAL memory at the time of crash is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115)    exported as /sys/firmware/opal/mpipl/core file. This procfs file is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)    helpful in debugging OPAL crashes with GDB. The kernel memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)    used for exporting this procfs file can be released by echo'ing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)    '1' to /sys/firmware/opal/mpipl/release_core node.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)    e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)      # echo 1 > /sys/firmware/opal/mpipl/release_core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) Implementation details:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) During boot, a check is made to see if firmware supports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) this feature on that particular machine. If it does, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) we check to see if an active dump is waiting for us. If yes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) then everything but boot memory size of RAM is reserved during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) early boot (See Fig. 2). This area is released once we finish
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) collecting the dump from user land scripts (e.g. kdump scripts)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) that are run. If there is dump data, then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) /sys/kernel/fadump_release_mem file is created, and the reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) memory is held.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) If there is no waiting dump data, then only the memory required to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) hold CPU state, HPTE region, boot memory dump, FADump header and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) elfcore header, is usually reserved at an offset greater than boot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) memory size (see Fig. 1). This area is *not* released: this region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) will be kept permanently reserved, so that it can act as a receptacle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) for a copy of the boot memory content in addition to CPU state and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) HPTE region, in the case a crash does occur.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) Since this reserved memory area is used only after the system crash,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) there is no point in blocking this significant chunk of memory from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) production kernel. Hence, the implementation uses the Linux kernel's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Contiguous Memory Allocator (CMA) for memory reservation if CMA is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) configured for kernel. With CMA reservation this memory will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) available for applications to use it, while kernel is prevented from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) using it. With this FADump will still be able to capture all of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) kernel memory and most of the user space memory except the user pages
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) that were present in CMA region::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)   o Memory Reservation during first kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)   Low memory                                                 Top of memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)   0    boot memory size   |<--- Reserved dump area --->|       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)   |           |           |    Permanent Reservation   |       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159)   V           V           |                            |       V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)   +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)   |           |           |///|////|  DUMP | HDR | ELF |////|  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)   +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)         |                   ^    ^     ^      ^           ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)         |                   |    |     |      |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165)         \                  CPU  HPTE   /      |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166)          ------------------------------       |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)       Boot memory content gets transferred    |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)       to reserved area by firmware at the     |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169)       time of crash.                          |           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170)                                           FADump Header   |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171)                                            (meta area)    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)                                                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)                                                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)                       Metadata: This area holds a metadata struture whose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)                       address is registered with f/w and retrieved in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)                       second kernel after crash, on platforms that support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)                       tags (OPAL). Having such structure with info needed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178)                       to process the crashdump eases dump capture process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)                    Fig. 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183)   o Memory Reservation during second kernel after crash
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185)   Low memory                                              Top of memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)   0      boot memory size                                      |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)   |           |<------------ Crash preserved area ------------>|
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)   V           V           |<--- Reserved dump area --->|       |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)   +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)   |           |           |///|////|  DUMP | HDR | ELF |////|  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191)   +-----------+-----/ /---+---+----+-------+-----+-----+----+--+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)         |                                           |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)         V                                           V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)    Used by second                             /proc/vmcore
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195)    kernel to boot
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197)         +---+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)         |///| -> Regions (CPU, HPTE & Metadata) marked like this in the above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)         +---+    figures are not always present. For example, OPAL platform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200)                  does not have CPU & HPTE regions while Metadata region is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201)                  not supported on pSeries currently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)                    Fig. 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) Currently the dump will be copied from /proc/vmcore to a new file upon
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) user intervention. The dump data available through /proc/vmcore will be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) in ELF format. Hence the existing kdump infrastructure (kdump scripts)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) to save the dump works fine with minor modifications. KDump scripts on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) major Distro releases have already been modified to work seemlessly (no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) user intervention in saving the dump) when FADump is used, instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) KDump, as dump mechanism.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) The tools to examine the dump will be same as the ones
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) used for kdump.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) How to enable firmware-assisted dump (FADump):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) ----------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) 1. Set config option CONFIG_FA_DUMP=y and build kernel.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) 2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222)    By default, FADump reserved memory will be initialized as CMA area.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223)    Alternatively, user can boot linux kernel with 'fadump=nocma' to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224)    prevent FADump to use CMA.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) 3. Optionally, user can also set 'crashkernel=' kernel cmdline
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)    to specify size of the memory to reserve for boot memory dump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)    preservation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) NOTE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230)      1. 'fadump_reserve_mem=' parameter has been deprecated. Instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231)         use 'crashkernel=' to specify size of the memory to reserve
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)         for boot memory dump preservation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233)      2. If firmware-assisted dump fails to reserve memory then it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)         will fallback to existing kdump mechanism if 'crashkernel='
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235)         option is set at kernel cmdline.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)      3. if user wants to capture all of user space memory and ok with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)         reserved memory not available to production system, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238)         'fadump=nocma' kernel parameter can be used to fallback to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)         old behaviour.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) Sysfs/debugfs files:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) Firmware-assisted dump feature uses sysfs file system to hold
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) the control files and debugfs file to display memory reserved region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) Here is the list of files under kernel sysfs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)  /sys/kernel/fadump_enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250)     This is used to display the FADump status.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252)     - 0 = FADump is disabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)     - 1 = FADump is enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)     This interface can be used by kdump init scripts to identify if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)     FADump is enabled in the kernel and act accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)  /sys/kernel/fadump_registered
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)     This is used to display the FADump registration status as well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260)     as to control (start/stop) the FADump registration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262)     - 0 = FADump is not registered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)     - 1 = FADump is registered and ready to handle system crash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)     To register FADump echo 1 > /sys/kernel/fadump_registered and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266)     echo 0 > /sys/kernel/fadump_registered for un-register and stop the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)     FADump. Once the FADump is un-registered, the system crash will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268)     be handled and vmcore will not be captured. This interface can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)     easily integrated with kdump service start/stop.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)  /sys/kernel/fadump/mem_reserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)    This is used to display the memory reserved by FADump for saving the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274)    crash dump.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)  /sys/kernel/fadump_release_mem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)     This file is available only when FADump is active during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)     second kernel. This is used to release the reserved memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279)     region that are held for saving crash dump. To release the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280)     reserved memory echo 1 to it::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) 	echo 1  > /sys/kernel/fadump_release_mem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)     After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)     file will change to reflect the new memory reservations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)     The existing userspace tools (kdump infrastructure) can be easily
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)     enhanced to use this interface to release the memory reserved for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289)     dump and continue without 2nd reboot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) Note: /sys/kernel/fadump_release_opalcore sysfs has moved to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292)       /sys/firmware/opal/mpipl/release_core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)  /sys/firmware/opal/mpipl/release_core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)     This file is available only on OPAL based machines when FADump is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297)     active during capture kernel. This is used to release the memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)     used by the kernel to export /sys/firmware/opal/mpipl/core file. To
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299)     release this memory, echo '1' to it:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301)     echo 1  > /sys/firmware/opal/mpipl/release_core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) Note: The following FADump sysfs files are deprecated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) +----------------------------------+--------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) | Deprecated                       | Alternative                    |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) +----------------------------------+--------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) | /sys/kernel/fadump_enabled       | /sys/kernel/fadump/enabled     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) +----------------------------------+--------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) | /sys/kernel/fadump_registered    | /sys/kernel/fadump/registered  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) +----------------------------------+--------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) | /sys/kernel/fadump_release_mem   | /sys/kernel/fadump/release_mem |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) +----------------------------------+--------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) Here is the list of files under powerpc debugfs:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) (Assuming debugfs is mounted on /sys/kernel/debug directory.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318)  /sys/kernel/debug/powerpc/fadump_region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319)     This file shows the reserved memory regions if FADump is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320)     enabled otherwise this file is empty. The output format
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)     is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323)       <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325)     and for kernel DUMP region is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327)     DUMP: Src: <src-addr>, Dest: <dest-addr>, Size: <size>, Dumped: # bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)     e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330)     Contents when FADump is registered during first kernel::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)       # cat /sys/kernel/debug/powerpc/fadump_region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)       CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)       HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335)       DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337)     Contents when FADump is active during second kernel::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339)       # cat /sys/kernel/debug/powerpc/fadump_region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340)       CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)       HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342)       DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343)           : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) NOTE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347)       Please refer to Documentation/filesystems/debugfs.rst on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348)       how to mount the debugfs filesystem.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) TODO:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353)  - Need to come up with the better approach to find out more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)    accurate boot memory size that is required for a kernel to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355)    boot successfully when booted with restricted memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356)  - The FADump implementation introduces a FADump crash info structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)    in the scratch area before the ELF core header. The idea of introducing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358)    this structure is to pass some important crash info data to the second
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359)    kernel which will help second kernel to populate ELF core header with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360)    correct data before it gets exported through /proc/vmcore. The current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361)    design implementation does not address a possibility of introducing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)    additional fields (in future) to this structure without affecting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363)    compatibility. Need to come up with the better approach to address this.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365)    The possible approaches are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) 	1. Introduce version field for version tracking, bump up the version
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) 	whenever a new field is added to the structure in future. The version
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) 	field can be used to find out what fields are valid for the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) 	version of the structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) 	2. Reserve the area of predefined size (say PAGE_SIZE) for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) 	structure and have unused area as reserved (initialized to zero)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) 	for future field additions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375)    The advantage of approach 1 over 2 is we don't need to reserve extra space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) This document is based on the original documentation written for phyp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) assisted dump by Linas Vepstas and Manish Ahuja.