Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Kernel Stacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) Kernel stacks on x86-64 bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) ===========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) Most of the text from Keith Owens, hacked by AK
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) x86_64 page size (PAGE_SIZE) is 4K.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) Like all other architectures, x86_64 has a kernel stack for every
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) active thread.  These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) These stacks contain useful data as long as a thread is alive or a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) zombie. While the thread is in user space the kernel stack is empty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) except for the thread_info structure at the bottom.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) In addition to the per thread stacks, there are specialized stacks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) associated with each CPU.  These stacks are only used while the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) is in control on that CPU; when a CPU returns to user space the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) specialized stacks contain no useful data.  The main CPU stacks are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) * Interrupt stack.  IRQ_STACK_SIZE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)   Used for external hardware interrupts.  If this is the first external
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)   hardware interrupt (i.e. not a nested hardware interrupt) then the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29)   kernel switches from the current task to the interrupt stack.  Like
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)   the split thread and interrupt stacks on i386, this gives more room
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)   for kernel interrupt processing without having to increase the size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32)   of every per thread stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)   The interrupt stack is also used when processing a softirq.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) Switching to the kernel interrupt stack is done by software based on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) per CPU interrupt nest counter. This is needed because x86-64 "IST"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) hardware stacks cannot nest without races.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) x86_64 also has a feature which is not available on i386, the ability
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) to automatically switch to a new stack for designated events such as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) double fault or NMI, which makes it easier to handle these unusual
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) events on x86_64.  This feature is called the Interrupt Stack Table
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) (IST).  There can be up to 7 IST entries per CPU. The IST code is an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) index into the Task State Segment (TSS). The IST entries in the TSS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) point to dedicated stacks; each stack can be a different size.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) An IST is selected by a non-zero value in the IST field of an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) interrupt-gate descriptor.  When an interrupt occurs and the hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) loads such a descriptor, the hardware automatically sets the new stack
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) pointer based on the IST value, then invokes the interrupt handler.  If
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) the interrupt came from user mode, then the interrupt handler prologue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) will switch back to the per-thread stack.  If software wants to allow
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) nested IST interrupts then the handler must adjust the IST values on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) entry to and exit from the interrupt handler.  (This is occasionally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) done, e.g. for debug exceptions.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) Events with different IST codes (i.e. with different stacks) can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) nested.  For example, a debug interrupt can safely be interrupted by an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) NMI.  arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) pointers on entry to and exit from all IST events, in theory allowing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) IST events with the same code to be nested.  However in most cases, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) stack size allocated to an IST assumes no nesting for the same code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) If that assumption is ever broken then the stacks will become corrupt.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) The currently assigned IST stacks are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) * ESTACK_DF.  EXCEPTION_STKSZ (PAGE_SIZE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70)   Used for interrupt 8 - Double Fault Exception (#DF).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72)   Invoked when handling one exception causes another exception. Happens
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73)   when the kernel is very confused (e.g. kernel stack pointer corrupt).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74)   Using a separate stack allows the kernel to recover from it well enough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75)   in many cases to still output an oops.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) * ESTACK_NMI.  EXCEPTION_STKSZ (PAGE_SIZE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79)   Used for non-maskable interrupts (NMI).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81)   NMI can be delivered at any time, including when the kernel is in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82)   middle of switching stacks.  Using IST for NMI events avoids making
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83)   assumptions about the previous state of the kernel stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) * ESTACK_DB.  EXCEPTION_STKSZ (PAGE_SIZE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87)   Used for hardware debug interrupts (interrupt 1) and for software
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)   debug interrupts (INT3).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)   When debugging a kernel, debug interrupts (both hardware and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)   software) can occur at any time.  Using IST for these interrupts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92)   avoids making assumptions about the previous state of the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93)   stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95)   To handle nested #DB correctly there exist two instances of DB stacks. On
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96)   #DB entry the IST stackpointer for #DB is switched to the second instance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97)   so a nested #DB starts from a clean stack. The nested #DB switches
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98)   the IST stackpointer to a guard hole to catch triple nesting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) * ESTACK_MCE.  EXCEPTION_STKSZ (PAGE_SIZE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)   Used for interrupt 18 - Machine Check Exception (#MC).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)   MCE can be delivered at any time, including when the kernel is in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)   middle of switching stacks.  Using IST for MCE events avoids making
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)   assumptions about the previous state of the kernel stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) For more details see the Intel IA32 or AMD AMD64 architecture manuals.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) Printing backtraces on x86
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) The question about the '?' preceding function names in an x86 stacktrace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) keeps popping up, here's an indepth explanation. It helps if the reader
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) stares at print_context_stack() and the whole machinery in and around
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) arch/x86/kernel/dumpstack.c.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) We always scan the full kernel stack for return addresses stored on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) the kernel stack(s) [1]_, from stack top to stack bottom, and print out
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) anything that 'looks like' a kernel text address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) If it fits into the frame pointer chain, we print it without a question
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) mark, knowing that it's part of the real backtrace.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) If the address does not fit into our expected frame pointer chain we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) still print it, but we print a '?'. It can mean two things:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131)  - either the address is not part of the call chain: it's just stale
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)    values on the kernel stack, from earlier function calls. This is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)    the common case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)  - or it is part of the call chain, but the frame pointer was not set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)    up properly within the function, so we don't recognize it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) This way we will always print out the real call chain (plus a few more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) entries), regardless of whether the frame pointer was set up correctly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) or not - but in most cases we'll get the call chain right as well. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) entries printed are strictly in stack order, so you can deduce more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) information from that as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) The most important property of this method is that we _never_ lose
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) information: we always strive to print _all_ addresses on the stack(s)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) that look like kernel text addresses, so if debug information is wrong,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) we still print out the real call chain as well - just with more question
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) marks than ideal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) .. [1] For things like IRQ and IST stacks, we also scan those stacks, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)        the right order, and try to cross from one stack into another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)        reconstructing the call chain. This works most of the time.