^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Linux I2C fault injection
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) The GPIO based I2C bus master driver can be configured to provide fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) injection capabilities. It is then meant to be connected to another I2C bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) which is driven by the I2C bus master driver under test. The GPIO fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) injection driver can create special states on the bus which the other I2C bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) master driver should handle gracefully.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) Once the Kconfig option I2C_GPIO_FAULT_INJECTOR is enabled, there will be an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) 'i2c-fault-injector' subdirectory in the Kernel debugfs filesystem, usually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) mounted at /sys/kernel/debug. There will be a separate subdirectory per GPIO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) driven I2C bus. Each subdirectory will contain files to trigger the fault
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) injection. They will be described now along with their intended use-cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) Wire states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) "scl"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) By reading this file, you get the current state of SCL. By writing, you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) change its state to either force it low or to release it again. So, by using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) "echo 0 > scl" you force SCL low and thus, no communication will be possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) because the bus master under test will not be able to clock. It should detect
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) the condition of SCL being unresponsive and report an error to the upper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) layers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) "sda"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) By reading this file, you get the current state of SDA. By writing, you can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) change its state to either force it low or to release it again. So, by using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) "echo 0 > sda" you force SDA low and thus, data cannot be transmitted. The bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) master under test should detect this condition and trigger a bus recovery (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) I2C specification version 4, section 3.1.16) using the helpers of the Linux I2C
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) core (see 'struct bus_recovery_info'). However, the bus recovery will not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) succeed because SDA is still pinned low until you manually release it again
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) with "echo 1 > sda". A test with an automatic release can be done with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) "incomplete transfers" class of fault injectors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) Incomplete transfers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) The following fault injectors create situations where SDA will be held low by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) device. Bus recovery should be able to fix these situations. But please note:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) there are I2C client devices which detect a stuck SDA on their side and release
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) it on their own after a few milliseconds. Also, there might be an external
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) device deglitching and monitoring the I2C bus. It could also detect a stuck SDA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) and will init a bus recovery on its own. If you want to implement bus recovery
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) in a bus master driver, make sure you checked your hardware setup for such
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) devices before. And always verify with a scope or logic analyzer!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) "incomplete_address_phase"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) --------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) This file is write only and you need to write the address of an existing I2C
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) client device to it. Then, a read transfer to this device will be started, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) it will stop at the ACK phase after the address of the client has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) transmitted. Because the device will ACK its presence, this results in SDA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) being pulled low by the device while SCL is high. So, similar to the "sda" file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) above, the bus master under test should detect this condition and try a bus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) recovery. This time, however, it should succeed and the device should release
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) SDA after toggling SCL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) "incomplete_write_byte"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) -----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) Similar to above, this file is write only and you need to write the address of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) an existing I2C client device to it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) The injector will again stop at one ACK phase, so the device will keep SDA low
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) because it acknowledges data. However, there are two differences compared to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) 'incomplete_address_phase':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) a) the message sent out will be a write message
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) b) after the address byte, a 0x00 byte will be transferred. Then, stop at ACK.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) This is a highly delicate state, the device is set up to write any data to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) register 0x00 (if it has registers) when further clock pulses happen on SCL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) This is why bus recovery (up to 9 clock pulses) must either check SDA or send
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) additional STOP conditions to ensure the bus has been released. Otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) random data will be written to a device!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) Lost arbitration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) Here, we want to simulate the condition where the master under test loses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) bus arbitration against another master in a multi-master setup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) "lose_arbitration"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) This file is write only and you need to write the duration of the arbitration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) intereference (in µs, maximum is 100ms). The calling process will then sleep
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) and wait for the next bus clock. The process is interruptible, though.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) Arbitration lost is achieved by waiting for SCL going down by the master under
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) test and then pulling SDA low for some time. So, the I2C address sent out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) should be corrupted and that should be detected properly. That means that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) address sent out should have a lot of '1' bits to be able to detect corruption.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) There doesn't need to be a device at this address because arbitration lost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) should be detected beforehand. Also note, that SCL going down is monitored
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) using interrupts, so the interrupt latency might cause the first bits to be not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) corrupted. A good starting point for using this fault injector on an otherwise
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) idle bus is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) # echo 200 > lose_arbitration &
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) # i2cget -y <bus_to_test> 0x3f
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) Panic during transfer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) This fault injector will create a Kernel panic once the master under test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) started a transfer. This usually means that the state machine of the bus master
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) driver will be ungracefully interrupted and the bus may end up in an unusual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) state. Use this to check if your shutdown/reboot/boot code can handle this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) scenario.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) "inject_panic"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) --------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) This file is write only and you need to write the delay between the detected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) start of a transmission and the induced Kernel panic (in µs, maximum is 100ms).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) The calling process will then sleep and wait for the next bus clock. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) process is interruptible, though.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) Start of a transfer is detected by waiting for SCL going down by the master
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) under test. A good starting point for using this fault injector is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) # echo 0 > inject_panic &
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) # i2cget -y <bus_to_test> <some_address>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) Note that there doesn't need to be a device listening to the address you are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) using. Results may vary depending on that, though.