Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) Error Detection And Correction (EDAC) Devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) =============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Main Concepts used at the EDAC subsystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) There are several things to be aware of that aren't at all obvious, like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) *sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) etc...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) These are some of the many terms that are thrown about that don't always
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) mean what people think they mean (Inconceivable!).  In the interest of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) creating a common ground for discussion, terms and their definitions
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) will be established.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) * Memory devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) The individual DRAM chips on a memory stick.  These devices commonly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) output 4 and 8 bits each (x4, x8). Grouping several of these in parallel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) provides the number of bits that the memory controller expects:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) typically 72 bits, in order to provide 64 bits + 8 bits of ECC data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) * Memory Stick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) A printed circuit board that aggregates multiple memory devices in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) parallel.  In general, this is the Field Replaceable Unit (FRU) which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) gets replaced, in the case of excessive errors. Most often it is also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) called DIMM (Dual Inline Memory Module).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) * Memory Socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) A physical connector on the motherboard that accepts a single memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) stick. Also called as "slot" on several datasheets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) * Channel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) A memory controller channel, responsible to communicate with a group of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) DIMMs. Each channel has its own independent control (command) and data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) bus, and can be used independently or grouped with other channels.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) * Branch
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) It is typically the highest hierarchy on a Fully-Buffered DIMM memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) controller. Typically, it contains two channels. Two channels at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) same branch can be used in single mode or in lockstep mode. When
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) lockstep is enabled, the cacheline is doubled, but it generally brings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) some performance penalty. Also, it is generally not possible to point to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) just one memory stick when an error occurs, as the error correction code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) is calculated using two DIMMs instead of one. Due to that, it is capable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) of correcting more errors than on single mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) * Single-channel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) The data accessed by the memory controller is contained into one dimm
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) only. E. g. if the data is 64 bits-wide, the data flows to the CPU using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) one 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) memories. FB-DIMM and RAMBUS use a different concept for channel, so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) this concept doesn't apply there.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) * Double-channel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) The data size accessed by the memory controller is interlaced into two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) dimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) bits with ECC), the data flows to the CPU using a 128 bits parallel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) access.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) * Chip-select row
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) This is the name of the DRAM signal used to select the DRAM ranks to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) accessed. Common chip-select rows for single channel are 64 bits, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) dual channel 128 bits. It may not be visible by the memory controller,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) as some DIMM types have a memory buffer that can hide direct access to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) it from the Memory Controller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) * Single-Ranked stick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) A Single-ranked stick has 1 chip-select row of memory. Motherboards
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) commonly drive two chip-select pins to a memory stick. A single-ranked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) stick, will occupy only one of those rows. The other will be unused.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) .. _doubleranked:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) * Double-Ranked stick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) A double-ranked stick has two chip-select rows which access different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) sets of memory devices.  The two rows cannot be accessed concurrently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) * Double-sided stick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) **DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) A double-sided stick has two chip-select rows which access different sets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) of memory devices. The two rows cannot be accessed concurrently.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) "Double-sided" is irrespective of the memory devices being mounted on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) both sides of the memory stick.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) * Socket set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) All of the memory sticks that are required for a single memory access or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) all of the memory sticks spanned by a chip-select row.  A single socket
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) set has two chip-select rows and if double-sided sticks are used these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) will occupy those chip-select rows.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) * Bank
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) This term is avoided because it is unclear when needing to distinguish
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) between chip-select rows and socket sets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Memory Controllers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) Most of the EDAC core is focused on doing Memory Controller error detection.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) The :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info``
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) to describe the memory controllers, with is an opaque struct for the EDAC
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) drivers. Only the EDAC core is allowed to touch it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) .. kernel-doc:: include/linux/edac.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) .. kernel-doc:: drivers/edac/edac_mc.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) PCI Controllers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) ---------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) The EDAC subsystem provides a mechanism to handle PCI controllers by calling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) the :c:func:`edac_pci_alloc_ctl_info`. It will use the struct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) :c:type:`edac_pci_ctl_info` to describe the PCI controllers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) .. kernel-doc:: drivers/edac/edac_pci.h
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) EDAC Blocks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) The EDAC subsystem also provides a generic mechanism to report errors on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) other parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) The structures :c:type:`edac_dev_sysfs_block_attribute`,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) :c:type:`edac_device_block`, :c:type:`edac_device_instance` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) :c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) representation at sysfs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) This set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) PCI, like:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) - CPU caches (L1 and L2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) - DMA engines
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) - Core CPU switches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) - Fabric switch units
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) - PCIe interface controllers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) - other EDAC/ECC type devices that can be monitored for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)   errors, etc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) It allows for a 2 level set of hierarchy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) For example, a cache could be composed of L1, L2 and L3 levels of cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) Each CPU core would have its own L1 cache, while sharing L2 and maybe L3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) caches. On such case, those can be represented via the following sysfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) nodes::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) 	/sys/devices/system/edac/..
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) 	pci/		<existing pci directory (if available)>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) 	mc/		<existing memory device directory>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) 	cpu/cpu0/..	<L1 and L2 block directory>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) 		/L1-cache/ce_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) 			 /ue_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) 		/L2-cache/ce_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) 			 /ue_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) 	cpu/cpu1/..	<L1 and L2 block directory>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) 		/L1-cache/ce_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) 			 /ue_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) 		/L2-cache/ce_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) 			 /ue_count
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) 	...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) 	the L1 and L2 directories would be "edac_device_block's"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) .. kernel-doc:: drivers/edac/edac_device.h