^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) PCI Bus EEH Error Recovery
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Linas Vepstas <linas@austin.ibm.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) 12 January 2005
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Overview:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) The IBM POWER-based pSeries and iSeries computers include PCI bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) controller chips that have extended capabilities for detecting and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) reporting a large variety of PCI bus error conditions. These features
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) go under the name of "EEH", for "Enhanced Error Handling". The EEH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) hardware features allow PCI bus errors to be cleared and a PCI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) card to be "rebooted", without also having to reboot the operating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) This is in contrast to traditional PCI error handling, where the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) PCI chip is wired directly to the CPU, and an error would cause
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) a CPU machine-check/check-stop condition, halting the CPU entirely.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Another "traditional" technique is to ignore such errors, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) can lead to data corruption, both of user data or of kernel data,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) hung/unresponsive adapters, or system crashes/lockups. Thus,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) the idea behind EEH is that the operating system can become more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) reliable and robust by protecting it from PCI errors, and giving
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) the OS the ability to "reboot"/recover individual PCI devices.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) Future systems from other vendors, based on the PCI-E specification,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) may contain similar features.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) Causes of EEH Errors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) EEH was originally designed to guard against hardware failure, such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) as PCI cards dying from heat, humidity, dust, vibration and bad
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) electrical connections. The vast majority of EEH errors seen in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) "real life" are due to either poorly seated PCI cards, or,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) unfortunately quite commonly, due to device driver bugs, device firmware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) bugs, and sometimes PCI card hardware bugs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) The most common software bug, is one that causes the device to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) attempt to DMA to a location in system memory that has not been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) reserved for DMA access for that card. This is a powerful feature,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) as it prevents what; otherwise, would have been silent memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) corruption caused by the bad DMA. A number of device driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) bugs have been found and fixed in this way over the past few
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) years. Other possible causes of EEH errors include data or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) address line parity errors (for example, due to poor electrical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) connectivity due to a poorly seated card), and PCI-X split-completion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) errors (due to software, device firmware, or device PCI hardware bugs).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) The vast majority of "true hardware failures" can be cured by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) physically removing and re-seating the PCI card.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) Detection and Recovery
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) In the following discussion, a generic overview of how to detect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) and recover from EEH errors will be presented. This is followed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) by an overview of how the current implementation in the Linux
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) kernel does it. The actual implementation is subject to change,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) and some of the finer points are still being debated. These
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) may in turn be swayed if or when other architectures implement
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) similar functionality.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) When a PCI Host Bridge (PHB, the bus controller connecting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) PCI bus to the system CPU electronics complex) detects a PCI error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) condition, it will "isolate" the affected PCI card. Isolation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) will block all writes (either to the card from the system, or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) from the card to the system), and it will cause all reads to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) return all-ff's (0xff, 0xffff, 0xffffffff for 8/16/32-bit reads).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) This value was chosen because it is the same value you would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) get if the device was physically unplugged from the slot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) This includes access to PCI memory, I/O space, and PCI config
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) space. Interrupts; however, will continued to be delivered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) Detection and recovery are performed with the aid of ppc64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) firmware. The programming interfaces in the Linux kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) into the firmware are referred to as RTAS (Run-Time Abstraction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Services). The Linux kernel does not (should not) access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) the EEH function in the PCI chipsets directly, primarily because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) there are a number of different chipsets out there, each with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) different interfaces and quirks. The firmware provides a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) uniform abstraction layer that will work with all pSeries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) and iSeries hardware (and be forwards-compatible).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) If the OS or device driver suspects that a PCI slot has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) EEH-isolated, there is a firmware call it can make to determine if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) this is the case. If so, then the device driver should put itself
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) into a consistent state (given that it won't be able to complete any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) pending work) and start recovery of the card. Recovery normally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) would consist of resetting the PCI device (holding the PCI #RST
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) line high for two seconds), followed by setting up the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) config space (the base address registers (BAR's), latency timer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) cache line size, interrupt line, and so on). This is followed by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) reinitialization of the device driver. In a worst-case scenario,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) the power to the card can be toggled, at least on hot-plug-capable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) slots. In principle, layers far above the device driver probably
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) do not need to know that the PCI card has been "rebooted" in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) way; ideally, there should be at most a pause in Ethernet/disk/USB
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) I/O while the card is being reset.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) If the card cannot be recovered after three or four resets, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) kernel/device driver should assume the worst-case scenario, that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) card has died completely, and report this error to the sysadmin.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) In addition, error messages are reported through RTAS and also through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) syslogd (/var/log/messages) to alert the sysadmin of PCI resets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) The correct way to deal with failed adapters is to use the standard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) PCI hotplug tools to remove and replace the dead card.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Current PPC64 Linux EEH Implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) At this time, a generic EEH recovery mechanism has been implemented,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) so that individual device drivers do not need to be modified to support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) EEH recovery. This generic mechanism piggy-backs on the PCI hotplug
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) infrastructure, and percolates events up through the userspace/udev
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) infrastructure. Following is a detailed description of how this is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) accomplished.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) EEH must be enabled in the PHB's very early during the boot process,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) and if a PCI slot is hot-plugged. The former is performed by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) eeh_init() in arch/powerpc/platforms/pseries/eeh.c, and the later by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) drivers/pci/hotplug/pSeries_pci.c calling in to the eeh.c code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) EEH must be enabled before a PCI scan of the device can proceed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) Current Power5 hardware will not work unless EEH is enabled;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) although older Power4 can run with it disabled. Effectively,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) EEH can no longer be turned off. PCI devices *must* be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) registered with the EEH code; the EEH code needs to know about
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) the I/O address ranges of the PCI device in order to detect an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) error. Given an arbitrary address, the routine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) pci_get_device_by_addr() will find the pci device associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) with that address (if any).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) The default arch/powerpc/include/asm/io.h macros readb(), inb(), insb(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) etc. include a check to see if the i/o read returned all-0xff's.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) If so, these make a call to eeh_dn_check_failure(), which in turn
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) asks the firmware if the all-ff's value is the sign of a true EEH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) error. If it is not, processing continues as normal. The grand
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) total number of these false alarms or "false positives" can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) seen in /proc/ppc64/eeh (subject to change). Normally, almost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) all of these occur during boot, when the PCI bus is scanned, where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) a large number of 0xff reads are part of the bus scan procedure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) If a frozen slot is detected, code in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) arch/powerpc/platforms/pseries/eeh.c will print a stack trace to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) syslog (/var/log/messages). This stack trace has proven to be very
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) useful to device-driver authors for finding out at what point the EEH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) error was detected, as the error itself usually occurs slightly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) beforehand.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) Next, it uses the Linux kernel notifier chain/work queue mechanism to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) allow any interested parties to find out about the failure. Device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) drivers, or other parts of the kernel, can use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) `eeh_register_notifier(struct notifier_block *)` to find out about EEH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) events. The event will include a pointer to the pci device, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) device node and some state info. Receivers of the event can "do as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) they wish"; the default handler will be described further in this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) To assist in the recovery of the device, eeh.c exports the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) following functions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) rtas_set_slot_reset()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) assert the PCI #RST line for 1/8th of a second
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) rtas_configure_bridge()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) ask firmware to configure any PCI bridges
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) located topologically under the pci slot.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) eeh_save_bars() and eeh_restore_bars():
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) save and restore the PCI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) config-space info for a device and any devices under it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) A handler for the EEH notifier_block events is implemented in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) drivers/pci/hotplug/pSeries_pci.c, called handle_eeh_events().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) It saves the device BAR's and then calls rpaphp_unconfig_pci_adapter().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) This last call causes the device driver for the card to be stopped,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) which causes uevents to go out to user space. This triggers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) user-space scripts that might issue commands such as "ifdown eth0"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) for ethernet cards, and so on. This handler then sleeps for 5 seconds,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) hoping to give the user-space scripts enough time to complete.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) It then resets the PCI card, reconfigures the device BAR's, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) any bridges underneath. It then calls rpaphp_enable_pci_slot(),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) which restarts the device driver and triggers more user-space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) events (for example, calling "ifup eth0" for ethernet cards).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) Device Shutdown and User-Space Events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) -------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) This section documents what happens when a pci slot is unconfigured,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) focusing on how the device driver gets shut down, and on how the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) events get delivered to user-space scripts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) Following is an example sequence of events that cause a device driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) close function to be called during the first phase of an EEH reset.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) The following sequence is an example of the pcnet32 device driver::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) rpa_php_unconfig_pci_adapter (struct slot *) // in rpaphp_pci.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) pci_remove_bus_device (struct pci_dev *) // in /drivers/pci/remove.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) pci_destroy_dev (struct pci_dev *)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) device_unregister (&dev->dev) // in /drivers/base/core.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) device_del (struct device *)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) bus_remove_device() // in /drivers/base/bus.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) device_release_driver()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) struct device_driver->remove() which is just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) pci_device_remove() // in /drivers/pci/pci_driver.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) struct pci_driver->remove() which is just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) pcnet32_remove_one() // in /drivers/net/pcnet32.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) unregister_netdev() // in /net/core/dev.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) dev_close() // in /net/core/dev.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) calls dev->stop();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) which is just pcnet32_close() // in pcnet32.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) which does what you wanted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) to stop the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) frees pcnet32 device driver memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) }}}}}}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) in drivers/pci/pci_driver.c,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) struct device_driver->remove() is just pci_device_remove()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) which calls struct pci_driver->remove() which is pcnet32_remove_one()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) which calls unregister_netdev() (in net/core/dev.c)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) which calls dev_close() (in net/core/dev.c)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) which calls dev->stop() which is pcnet32_close()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) which then does the appropriate shutdown.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) ---
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) Following is the analogous stack trace for events sent to user-space
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) when the pci device is unconfigured::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) rpa_php_unconfig_pci_adapter() { // in rpaphp_pci.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) pci_remove_bus_device (struct pci_dev *) { // in /drivers/pci/remove.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) pci_destroy_dev (struct pci_dev *) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) device_unregister (&dev->dev) { // in /drivers/base/core.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) device_del(struct device * dev) { // in /drivers/base/core.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) kobject_del() { //in /libs/kobject.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) kobject_uevent() { // in /libs/kobject.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) kset_uevent() { // in /lib/kobject.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) kset->uevent_ops->uevent() // which is really just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) a call to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) dev_uevent() { // in /drivers/base/core.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) dev->bus->uevent() which is really just a call to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) pci_uevent () { // in drivers/pci/hotplug.c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) which prints device name, etc....
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) then kobject_uevent() sends a netlink uevent to userspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) --> userspace uevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) (during early boot, nobody listens to netlink events and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) kobject_uevent() executes uevent_helper[], which runs the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) event process /sbin/hotplug)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) kobject_del() then calls sysfs_remove_dir(), which would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) trigger any user-space daemon that was watching /sysfs,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) and notice the delete event.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) Pro's and Con's of the Current Design
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) -------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) There are several issues with the current EEH software recovery design,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) which may be addressed in future revisions. But first, note that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) big plus of the current design is that no changes need to be made to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) individual device drivers, so that the current design throws a wide net.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) The biggest negative of the design is that it potentially disturbs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) network daemons and file systems that didn't need to be disturbed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) - A minor complaint is that resetting the network card causes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) user-space back-to-back ifdown/ifup burps that potentially disturb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) network daemons, that didn't need to even know that the pci
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) card was being rebooted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) - A more serious concern is that the same reset, for SCSI devices,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) causes havoc to mounted file systems. Scripts cannot post-facto
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) unmount a file system without flushing pending buffers, but this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) is impossible, because I/O has already been stopped. Thus,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) ideally, the reset should happen at or below the block layer,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) so that the file systems are not disturbed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) Reiserfs does not tolerate errors returned from the block device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) Ext3fs seems to be tolerant, retrying reads/writes until it does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) succeed. Both have been only lightly tested in this scenario.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) The SCSI-generic subsystem already has built-in code for performing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) SCSI device resets, SCSI bus resets, and SCSI host-bus-adapter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) (HBA) resets. These are cascaded into a chain of attempted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) resets if a SCSI command fails. These are completely hidden
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) from the block layer. It would be very natural to add an EEH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) reset into this chain of events.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) - If a SCSI error occurs for the root device, all is lost unless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) the sysadmin had the foresight to run /bin, /sbin, /etc, /var
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) and so on, out of ramdisk/tmpfs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) Conclusions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) There's forward progress ...