Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Devlink Health
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Background
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ==========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) The ``devlink`` health mechanism is targeted for Real Time Alerting, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) order to know when something bad happened to a PCI device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13)   * Provide alert debug information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14)   * Self healing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15)   * If problem needs vendor support, provide a way to gather all needed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16)     debugging information.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) ========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) The main idea is to unify and centralize driver health reports in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) generic ``devlink`` instance and allow the user to set different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) attributes of the health reporting and recovery procedures.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) The ``devlink`` health reporter:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) Device driver creates a "health reporter" per each error/health type.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) or unknown (driver specific).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) For each registered health reporter a driver can issue error/health reports
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) asynchronously. All health reports handling is done by ``devlink``.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) Device driver can provide specific callbacks for each "health reporter", e.g.:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)   * Recovery procedures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)   * Diagnostics procedures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)   * Object dump procedures
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)   * OOB initial parameters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) Different parts of the driver can register different types of health reporters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) with different handlers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) Actions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) =======
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) Once an error is reported, devlink health will perform the following actions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46)   * A log is being send to the kernel trace events buffer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47)   * Health status and statistics are being updated for the reporter instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48)   * Object dump is being taken and saved at the reporter instance (as long as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49)     there is no other dump which is already stored)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50)   * Auto recovery attempt is being done. Depends on:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51)     - Auto-recovery configuration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52)     - Grace period vs. time passed since last recover
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) User Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) User can access/change each reporter's parameters and driver specific callbacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) via ``devlink``, e.g per error type (per health reporter):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60)   * Configure reporter's generic parameters (like: disable/enable auto recovery)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61)   * Invoke recovery procedure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62)   * Run diagnostics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63)   * Object dump
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) .. list-table:: List of devlink health interfaces
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66)    :widths: 10 90
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68)    * - Name
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69)      - Description
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)    * - ``DEVLINK_CMD_HEALTH_REPORTER_GET``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71)      - Retrieves status and configuration info per DEV and reporter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)    * - ``DEVLINK_CMD_HEALTH_REPORTER_SET``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)      - Allows reporter-related configuration setting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)    * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)      - Triggers a reporter's recovery procedure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76)    * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77)      - Retrieves diagnostics data from a reporter on a device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78)    * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)      - Retrieves the last stored dump. Devlink health
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80)        saves a single dump. If an dump is not already stored by the devlink
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)        for this reporter, devlink generates a new dump.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)        dump output is defined by the reporter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)    * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84)      - Clears the last saved dump file for the specified reporter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) The following diagram provides a general overview of ``devlink-health``::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)                                                    netlink
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)                                           +--------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)                                           |                          |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)                                           |            +             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)                                           |            |             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)                                           +--------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94)                                                        |request for ops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)                                                        |(diagnose,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)      mlx5_core                             devlink     |recover,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)                                                        |dump)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)     +--------+                            +--------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99)     |        |                            |    reporter|             |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)     |        |                            |  +---------v----------+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)     |        |   ops execution            |  |                    |  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)     |     <----------------------------------+                    |  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)     |        |                            |  |                    |  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)     |        |                            |  + ^------------------+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)     |        |                            |    | request for ops     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)     |        |                            |    | (recover, dump)     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)     |        |                            |    |                     |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)     |        |                            |  +-+------------------+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)     |        |     health report          |  | health handler     |  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110)     |        +------------------------------->                    |  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)     |        |                            |  +--------------------+  |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)     |        |     health reporter create |                          |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)     |        +---------------------------->                          |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)     +--------+                            +--------------------------+