^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. _NMI_rcu_doc:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) Using RCU to Protect Dynamic NMI Handlers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) =========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Although RCU is usually used to protect read-mostly data structures,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) it is possible to use RCU to provide dynamic non-maskable interrupt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) handlers, as well as dynamic irq handlers. This document describes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) work in "arch/x86/oprofile/nmi_timer_int.c" and in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) "arch/x86/kernel/traps.c".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) The relevant pieces of code are listed below, each followed by a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) brief explanation::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) The dummy_nmi_callback() function is a "dummy" NMI handler that does
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) nothing, but returns zero, thus saying that it did nothing, allowing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) the NMI handler to take the default machine-specific action::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) static nmi_callback_t nmi_callback = dummy_nmi_callback;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) This nmi_callback variable is a global function pointer to the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) NMI handler::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) void do_nmi(struct pt_regs * regs, long error_code)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) int cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) nmi_enter();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) cpu = smp_processor_id();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) ++nmi_count(cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) if (!rcu_dereference_sched(nmi_callback)(regs, cpu))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) default_do_nmi(regs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) nmi_exit();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) The do_nmi() function processes each NMI. It first disables preemption
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) in the same way that a hardware irq would, then increments the per-CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) count of NMIs. It then invokes the NMI handler stored in the nmi_callback
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) function pointer. If this handler returns zero, do_nmi() invokes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) default_do_nmi() function to handle a machine-specific NMI. Finally,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) preemption is restored.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) In theory, rcu_dereference_sched() is not needed, since this code runs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) only on i386, which in theory does not need rcu_dereference_sched()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) anyway. However, in practice it is a good documentation aid, particularly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) for anyone attempting to do something similar on Alpha or on systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) with aggressive optimizing compilers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) Quick Quiz:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) :ref:`Answer to Quick Quiz <answer_quick_quiz_NMI>`
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) Back to the discussion of NMI and RCU::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) void set_nmi_callback(nmi_callback_t callback)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) rcu_assign_pointer(nmi_callback, callback);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) The set_nmi_callback() function registers an NMI handler. Note that any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) data that is to be used by the callback must be initialized up -before-
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) the call to set_nmi_callback(). On architectures that do not order
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) writes, the rcu_assign_pointer() ensures that the NMI handler sees the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) initialized values::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) void unset_nmi_callback(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) This function unregisters an NMI handler, restoring the original
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) dummy_nmi_handler(). However, there may well be an NMI handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) currently executing on some other CPU. We therefore cannot free
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) up any data structures used by the old NMI handler until execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) of it completes on all other CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) One way to accomplish this is via synchronize_rcu(), perhaps as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) follows::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) unset_nmi_callback();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) kfree(my_nmi_data);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) This works because (as of v4.20) synchronize_rcu() blocks until all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) CPUs complete any preemption-disabled segments of code that they were
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) executing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) Since NMI handlers disable preemption, synchronize_rcu() is guaranteed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) not to return until all ongoing NMI handlers exit. It is therefore safe
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) to free up the handler's data as soon as synchronize_rcu() returns.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) Important note: for this to work, the architecture in question must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) .. _answer_quick_quiz_NMI:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Answer to Quick Quiz:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) The caller to set_nmi_callback() might well have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) initialized some data that is to be used by the new NMI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) handler. In this case, the rcu_dereference_sched() would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) be needed, because otherwise a CPU that received an NMI
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) just after the new handler was set might see the pointer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) to the new NMI handler, but the old pre-initialized
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) version of the handler's data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) This same sad story can happen on other CPUs when using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) a compiler with aggressive pointer-value speculation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) optimizations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) More important, the rcu_dereference_sched() makes it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) clear to someone reading the code that the pointer is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) being protected by RCU-sched.