^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) PCIe Device AER statistics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) These attributes show up under all the devices that are AER capable. These
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) statistical counters indicate the errors "as seen/reported by the device".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) Note that this may mean that if an endpoint is causing problems, the AER
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) counters may increment at its link partner (e.g. root port) because the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) errors may be "seen" / reported by the link partner and not the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) problematic endpoint itself (which may report all counters as 0 as it never
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) saw any problems).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) What: /sys/bus/pci/devices/<dev>/aer_dev_correctable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) Date: July 2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) KernelVersion: 4.19.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Contact: linux-pci@vger.kernel.org, rajatja@google.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) Description: List of correctable errors seen and reported by this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) PCI device using ERR_COR. Note that since multiple errors may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) be reported using a single ERR_COR message, thus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) TOTAL_ERR_COR at the end of the file may not match the actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) total of all the errors in the file. Sample output::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) Receiver Error 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) Bad TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) Bad DLLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) RELAY_NUM Rollover 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) Replay Timer Timeout 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) Advisory Non-Fatal 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) Corrected Internal Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) Header Log Overflow 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) TOTAL_ERR_COR 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) What: /sys/bus/pci/devices/<dev>/aer_dev_fatal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) Date: July 2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) KernelVersion: 4.19.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) Contact: linux-pci@vger.kernel.org, rajatja@google.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) Description: List of uncorrectable fatal errors seen and reported by this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) PCI device using ERR_FATAL. Note that since multiple errors may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) be reported using a single ERR_FATAL message, thus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) TOTAL_ERR_FATAL at the end of the file may not match the actual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) total of all the errors in the file. Sample output::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) Undefined 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) Data Link Protocol 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) Surprise Down Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) Poisoned TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) Flow Control Protocol 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) Completion Timeout 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) Completer Abort 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) Unexpected Completion 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) Receiver Overflow 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) Malformed TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) ECRC 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) Unsupported Request 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) ACS Violation 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) Uncorrectable Internal Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) MC Blocked TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) AtomicOp Egress Blocked 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) TLP Prefix Blocked Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) TOTAL_ERR_FATAL 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Date: July 2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) KernelVersion: 4.19.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) Contact: linux-pci@vger.kernel.org, rajatja@google.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Description: List of uncorrectable nonfatal errors seen and reported by this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) PCI device using ERR_NONFATAL. Note that since multiple errors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) may be reported using a single ERR_FATAL message, thus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) TOTAL_ERR_NONFATAL at the end of the file may not match the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) actual total of all the errors in the file. Sample output::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) Undefined 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Data Link Protocol 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) Surprise Down Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) Poisoned TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) Flow Control Protocol 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) Completion Timeout 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) Completer Abort 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Unexpected Completion 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) Receiver Overflow 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) Malformed TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) ECRC 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) Unsupported Request 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) ACS Violation 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) Uncorrectable Internal Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) MC Blocked TLP 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) AtomicOp Egress Blocked 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) TLP Prefix Blocked Error 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) TOTAL_ERR_NONFATAL 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) PCIe Rootport AER statistics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) These attributes show up under only the rootports (or root complex event
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) collectors) that are AER capable. These indicate the number of error messages as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) "reported to" the rootport. Please note that the rootports also transmit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) (internally) the ERR_* messages for errors seen by the internal rootport PCI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) device, so these counters include them and are thus cumulative of all the error
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) messages on the PCI hierarchy originating at that root port.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Date: July 2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) KernelVersion: 4.19.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) Contact: linux-pci@vger.kernel.org, rajatja@google.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Description: Total number of ERR_COR messages reported to rootport.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Date: July 2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) KernelVersion: 4.19.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Contact: linux-pci@vger.kernel.org, rajatja@google.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Description: Total number of ERR_FATAL messages reported to rootport.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) Date: July 2018
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) KernelVersion: 4.19.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Contact: linux-pci@vger.kernel.org, rajatja@google.com
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Description: Total number of ERR_NONFATAL messages reported to rootport.