^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) HPE iLO NMI Watchdog Driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) for iLO based ProLiant Servers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ==============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) Last reviewed: 08/20/2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) The HPE iLO NMI Watchdog driver is a kernel module that provides basic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) watchdog functionality and handler for the iLO "Generate NMI to System"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) virtual button.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) All references to iLO in this document imply it also works on iLO2 and all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) subsequent generations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) Watchdog functionality is enabled like any other common watchdog driver. That
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) is, an application needs to be started that kicks off the watchdog timer. A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) basic application exists in tools/testing/selftests/watchdog/ named
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) watchdog-test.c. Simply compile the C file and kick it off. If the system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) gets into a bad state and hangs, the HPE ProLiant iLO timer register will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) not be updated in a timely fashion and a hardware system reset (also known as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) an Automatic Server Recovery (ASR)) event will occur.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) The hpwdt driver also has the following module parameters:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) ============ ================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) soft_margin allows the user to set the watchdog timer value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) Default value is 30 seconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) timeout an alias of soft_margin.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) pretimeout allows the user to set the watchdog pretimeout value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) This is the number of seconds before timeout when an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) NMI is delivered to the system. Setting the value to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) zero disables the pretimeout NMI.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Default value is 9 seconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) nowayout basic watchdog parameter that does not allow the timer to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) be restarted or an impending ASR to be escaped.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) Default value is set when compiling the kernel. If it is set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) to "Y", then there is no way of disabling the watchdog once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) it has been started.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) kdumptimeout Minimum timeout in seconds to apply upon receipt of an NMI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) before calling panic. (-1) disables the watchdog. When value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) is > 0, the timer is reprogrammed with the greater of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) value or current timeout value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) ============ ================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) NOTE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) More information about watchdog drivers in general, including the ioctl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) interface to /dev/watchdog can be found in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) can only be set to 9 seconds. Attempts to set pretimeout to other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) non-zero values will be rounded, possibly to zero. Users should verify
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) the pretimeout value after attempting to set pretimeout or timeout.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) panic. This is to allow for a crash dump to be collected. It is incumbent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) upon the user to have properly configured the system for kdump.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) The default Linux kernel behavior upon panic is to print a kernel tombstone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) and loop forever. This is generally not what a watchdog user wants.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) For those wishing to learn more please see:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) Documentation/admin-guide/kdump/kdump.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Documentation/admin-guide/kernel-parameters.txt (panic=)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) Your Linux Distribution specific documentation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) If the hpwdt does not receive the NMI associated with an expiring timer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) the iLO will proceed to reset the system at timeout if the timer hasn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) been updated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) --
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) The HPE iLO NMI Watchdog Driver and documentation were originally developed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) by Tom Mingarelli.