Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) Kernel Entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7) This file documents some of the kernel entries in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8) arch/x86/entry/entry_64.S.  A lot of this explanation is adapted from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9) an email from Ingo Molnar:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11) http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) The x86 architecture has quite a few different ways to jump into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) kernel code.  Most of these entry points are registered in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) arch/x86/kernel/traps.c and implemented in arch/x86/entry/entry_64.S
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) for 64-bit, arch/x86/entry/entry_32.S for 32-bit and finally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) arch/x86/entry/entry_64_compat.S which implements the 32-bit compatibility
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) syscall entry points and thus provides for 32-bit processes the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) ability to execute syscalls when running on 64-bit kernels.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) The IDT vector assignments are listed in arch/x86/include/asm/irq_vectors.h.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) Some of these entries are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25)  - system_call: syscall instruction from 64-bit code.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27)  - entry_INT80_compat: int 0x80 from 32-bit or 64-bit code; compat syscall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28)    either way.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30)  - entry_INT80_compat, ia32_sysenter: syscall and sysenter from 32-bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31)    code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33)  - interrupt: An array of entries.  Every IDT vector that doesn't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34)    explicitly point somewhere else gets set to the corresponding
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35)    value in interrupts.  These point to a whole array of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36)    magically-generated functions that make their way to do_IRQ with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37)    the interrupt number as a parameter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39)  - APIC interrupts: Various special-purpose interrupts for things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40)    like TLB shootdown.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42)  - Architecturally-defined exceptions like divide_error.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) There are a few complexities here.  The different x86-64 entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) have different calling conventions.  The syscall and sysenter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) instructions have their own peculiar calling conventions.  Some of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) the IDT entries push an error code onto the stack; others don't.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) IDT entries using the IST alternative stack mechanism need their own
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) magic to get the stack frames right.  (You can find some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) documentation in the AMD APM, Volume 2, Chapter 8 and the Intel SDM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) Volume 3, Chapter 6.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) Dealing with the swapgs instruction is especially tricky.  Swapgs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) toggles whether gs is the kernel gs or the user gs.  The swapgs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) instruction is rather fragile: it must nest perfectly and only in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) single depth, it should only be used if entering from user mode to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) kernel mode and then when returning to user-space, and precisely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) so. If we mess that up even slightly, we crash.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) So when we have a secondary entry, already in kernel mode, we *must
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) not switched/swapped yet.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) Now, there's a secondary complication: there's a cheap way to test
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) which mode the CPU is in and an expensive way.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) The cheap way is to pick this info off the entry frame on the kernel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) stack, from the CS of the ptregs area of the kernel stack::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) 	xorl %ebx,%ebx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) 	testl $3,CS+8(%rsp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) 	je error_kernelspace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) 	SWAPGS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) The expensive (paranoid) way is to read back the MSR_GS_BASE value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) (which is what SWAPGS modifies)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) 	movl $1,%ebx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) 	movl $MSR_GS_BASE,%ecx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 	rdmsr
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) 	testl %edx,%edx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) 	js 1f   /* negative -> in kernel */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) 	SWAPGS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) 	xorl %ebx,%ebx
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85)   1:	ret
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) If we are at an interrupt or user-trap/gate-alike boundary then we can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88) use the faster check: the stack will be a reliable indicator of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89) whether SWAPGS was already done: if we see that we are a secondary
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90) entry interrupting kernel mode execution, then we know that the GS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91) base has already been switched. If it says that we interrupted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) user-space execution then we must do the SWAPGS.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) which might have triggered right after a normal entry wrote CS to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) stack but before we executed SWAPGS, then the only safe way to check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) for GS is the slower method: the RDMSR.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) Therefore, super-atomic entries (except NMI, which is handled separately)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) must use idtentry with paranoid=1 to handle gsbase correctly.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) triggers three main behavior changes:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103)  - Interrupt entry will use the slower gsbase check.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)  - Interrupt entry from user mode will switch off the IST stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)  - Interrupt exit to kernel mode will not attempt to reschedule.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) We try to only use IST entries and the paranoid entry code for vectors
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) that absolutely need the more expensive check for the GS base - and we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) generate all 'normal' entry points with the regular (faster) paranoid=0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) variant.