^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) ===========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) The PCI Express Advanced Error Reporting Driver Guide HOWTO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) ===========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) :Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) - Yanmin Zhang <yanmin.zhang@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) :Copyright: |copy| 2006 Intel Corporation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) About this guide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) This guide describes the basics of the PCI Express Advanced Error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) Reporting (AER) driver and provides information on how to use it, as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) well as how to enable the drivers of endpoint devices to conform with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) PCI Express AER driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) What is the PCI Express AER Driver?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) PCI Express error signaling can occur on the PCI Express link itself
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) or on behalf of transactions initiated on the link. PCI Express
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) defines two error reporting paradigms: the baseline capability and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) the Advanced Error Reporting capability. The baseline capability is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) required of all PCI Express components providing a minimum defined
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) set of error reporting requirements. Advanced Error Reporting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) capability is implemented with a PCI Express advanced error reporting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) extended capability structure providing more robust error reporting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) The PCI Express AER driver provides the infrastructure to support PCI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) Express Advanced Error Reporting capability. The PCI Express AER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) driver provides three basic functions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) - Gathers the comprehensive error information if errors occurred.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) - Reports error to the users.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) - Performs error recovery actions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) AER driver only attaches root ports which support PCI-Express AER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) capability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) User Guide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) Include the PCI Express AER Root Driver into the Linux Kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) -------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) The PCI Express AER Root driver is a Root Port service driver attached
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) to the PCI Express Port Bus driver. If a user wants to use it, the driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) has to be compiled. Option CONFIG_PCIEAER supports this capability. It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) CONFIG_PCIEAER = y.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) Load PCI Express AER Root Driver
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Some systems have AER support in firmware. Enabling Linux AER support at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) the same time the firmware handles AER may result in unpredictable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) behavior. Therefore, Linux does not handle AER events unless the firmware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) Specification for details regarding _OSC usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) AER error output
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) When a PCIe AER error is captured, an error message will be output to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) console. If it's a correctable error, it is output as a warning.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Otherwise, it is printed as an error. So users could choose different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) log level to filter out correctable error messages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) Below shows an example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) 0000:50:00.0: [20] Unsupported Request (First)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) In the example, 'Requester ID' means the ID of the device who sends
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) the error message to root port. Pls. refer to pci express specs for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) other fields.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) AER Statistics / Counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) When PCIe AER errors are captured, the counters / statistics are also exposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) in the form of sysfs attributes which are documented at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) Developer Guide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) To enable AER aware support requires a software driver to configure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) the AER capability structure within its device and to provide callbacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) To support AER better, developers need understand how AER does work
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) firstly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) PCI Express errors are classified into two types: correctable errors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) and uncorrectable errors. This classification is based on the impacts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) of those errors, which may result in degraded performance or function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) failure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Correctable errors pose no impacts on the functionality of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) interface. The PCI Express protocol can recover without any software
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) intervention or any loss of data. These errors are detected and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) corrected by hardware. Unlike correctable errors, uncorrectable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) errors impact functionality of the interface. Uncorrectable errors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) can cause a particular transaction or a particular PCI Express link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) to be unreliable. Depending on those error conditions, uncorrectable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) errors are further classified into non-fatal errors and fatal errors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Non-fatal errors cause the particular transaction to be unreliable,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) but the PCI Express link itself is fully functional. Fatal errors, on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) the other hand, cause the link to be unreliable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) When AER is enabled, a PCI Express device will automatically send an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) error message to the PCIe root port above it when the device captures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) an error. The Root Port, upon receiving an error reporting message,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) internally processes and logs the error message in its PCI Express
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) capability structure. Error information being logged includes storing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) the error reporting agent's requestor ID into the Error Source
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) Identification Registers and setting the error bits of the Root Error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Status Register accordingly. If AER error reporting is enabled in Root
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) Error Command Register, the Root Port generates an interrupt if an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) error is detected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) Note that the errors as described above are related to the PCI Express
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) hierarchy and links. These errors do not include any device specific
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) errors because device specific errors will still get sent directly to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) the device driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) Configure the AER capability structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) AER aware drivers of PCI Express component need change the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) control registers to enable AER. They also could change AER registers,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) including mask and severity registers. Helper function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) pci_enable_pcie_error_reporting could be used to enable AER. See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) section 3.3.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) Provide callbacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) callback reset_link to reset pci express link
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) This callback is used to reset the pci express physical link when a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) fatal error happens. The root port aer service driver provides a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) default reset_link function, but different upstream ports might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) have different specifications to reset pci express link, so all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) upstream ports should provide their own reset_link functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) Section 3.2.2.2 provides more detailed info on when to call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) reset_link.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) PCI error-recovery callbacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) The PCI Express AER Root driver uses error callbacks to coordinate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) with downstream device drivers associated with a hierarchy in question
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) when performing error recovery actions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) Data struct pci_driver has a pointer, err_handler, to point to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) pci_error_handlers who consists of a couple of callback function
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) pointers. AER driver follows the rules defined in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) pci-error-recovery.txt except pci express specific parts (e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) reset_link). Pls. refer to pci-error-recovery.txt for detailed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) definitions of the callbacks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Below sections specify when to call the error callback functions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) Correctable errors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) ~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) Correctable errors pose no impacts on the functionality of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) the interface. The PCI Express protocol can recover without any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) software intervention or any loss of data. These errors do not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) require any recovery actions. The AER driver clears the device's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) correctable error status register accordingly and logs these errors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) Non-correctable (non-fatal and fatal) errors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) If an error message indicates a non-fatal error, performing link reset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) at upstream is not required. The AER driver calls error_detected(dev,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) pci_channel_io_normal) to all drivers associated within a hierarchy in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) question. for example::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) If Upstream port A captures an AER error, the hierarchy consists of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) Downstream port B and EndPoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) A driver may return PCI_ERS_RESULT_CAN_RECOVER,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) whether it can recover or the AER driver calls mmio_enabled as next.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) If an error message indicates a fatal error, kernel will broadcast
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) error_detected(dev, pci_channel_io_frozen) to all drivers within
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) a hierarchy in question. Then, performing link reset at upstream is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) necessary. As different kinds of devices might use different approaches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) to reset link, AER port service driver is required to provide the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) function to reset link via callback parameter of pcie_do_recovery()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) function. If reset_link is not NULL, recovery function will use it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) to mmio_enabled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) helper functions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) int pci_enable_pcie_error_reporting(struct pci_dev *dev);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) pci_enable_pcie_error_reporting enables the device to send error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) messages to root port when an error is detected. Note that devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) don't enable the error reporting by default, so device drivers need
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) call this function to enable it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) int pci_disable_pcie_error_reporting(struct pci_dev *dev);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) pci_disable_pcie_error_reporting disables the device to send error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) messages to root port when an error is detected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) int pci_aer_clear_nonfatal_status(struct pci_dev *dev);`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) pci_aer_clear_nonfatal_status clears non-fatal errors in the uncorrectable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) error status register.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) Frequent Asked Questions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) Q:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) What happens if a PCI Express device driver does not provide an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) error recovery handler (pci_driver->err_handler is equal to NULL)?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) A:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) The devices attached with the driver won't be recovered. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) error is fatal, kernel will print out warning messages. Please refer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) to section 3 for more information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) Q:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) What happens if an upstream port service driver does not provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) callback reset_link?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) A:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) Fatal error recovery will fail if the errors are reported by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) upstream ports who are attached by the service driver.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) Q:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) How does this infrastructure deal with driver that is not PCI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) Express aware?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) A:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) This infrastructure calls the error callback functions of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) driver when an error happens. But if the driver is not aware of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) PCI Express, the device might not report its own errors to root
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) port.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) Q:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) What modifications will that driver need to make it compatible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) with the PCI Express AER Root driver?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) A:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) It could call the helper functions to enable AER in devices and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) cleanup uncorrectable status register. Pls. refer to section 3.3.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) Software error injection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) ========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) Debugging PCIe AER error recovery code is quite difficult because it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) is hard to trigger real hardware errors. Software based error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) injection can be used to fake various kinds of PCIe errors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) First you should enable PCIe AER software error injection in kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) configuration, that is, following item should be in your .config.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) After reboot with new kernel or insert the module, a device file named
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) /dev/aer_inject should be created.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) Then, you need a user space tool named aer-inject, which can be gotten
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) from:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) More information about aer-inject can be found in the document comes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) with its source code.