^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Light-weight System Calls for IA-64
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) Started: 13-Jan-2003
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) Last update: 27-Sep-2003
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) David Mosberger-Tang
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) <davidm@hpl.hp.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) Using the "epc" instruction effectively introduces a new mode of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) execution to the ia64 linux kernel. We call this mode the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) "fsys-mode". To recap, the normal states of execution are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) - kernel mode:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) Both the register stack and the memory stack have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) switched over to kernel memory. The user-level state is saved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) in a pt-regs structure at the top of the kernel memory stack.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) - user mode:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) Both the register stack and the kernel stack are in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) user memory. The user-level state is contained in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) CPU registers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) - bank 0 interruption-handling mode:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) This is the non-interruptible state which all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) interruption-handlers start execution in. The user-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) state remains in the CPU registers and some kernel state may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) be stored in bank 0 of registers r16-r31.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) In contrast, fsys-mode has the following special properties:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) - execution is at privilege level 0 (most-privileged)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) - CPU registers may contain a mixture of user-level and kernel-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) state (it is the responsibility of the kernel to ensure that no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) security-sensitive kernel-level state is leaked back to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) user-level)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) - execution is interruptible and preemptible (an fsys-mode handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) can disable interrupts and avoid all other interruption-sources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) to avoid preemption)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) - neither the memory-stack nor the register-stack can be trusted while
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) in fsys-mode (they point to the user-level stacks, which may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) be invalid, or completely bogus addresses)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) In summary, fsys-mode is much more similar to running in user-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) than it is to running in kernel-mode. Of course, given that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) privilege level is at level 0, this means that fsys-mode requires some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) care (see below).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) How to tell fsys-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) Linux operates in fsys-mode when (a) the privilege level is 0 (most
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) privileged) and (b) the stacks have NOT been switched to kernel memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) yet. For convenience, the header file <asm-ia64/ptrace.h> provides
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) three macros::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) user_mode(regs)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) user_stack(task,regs)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) fsys_mode(task,regs)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) The "regs" argument is a pointer to a pt_regs structure. The "task"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) argument is a pointer to the task structure to which the "regs"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) pointer belongs to. user_mode() returns TRUE if the CPU state pointed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) to by "regs" was executing in user mode (privilege level 3).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) user_stack() returns TRUE if the state pointed to by "regs" was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) executing on the user-level stack(s). Finally, fsys_mode() returns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) The fsys_mode() macro is equivalent to the expression::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) !user_mode(regs) && user_stack(task,regs)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) How to write an fsyscall handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) ================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) (fsyscall_table). This table contains one entry for each system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) By default, a system call is handled by fsys_fallback_syscall(). This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) routine takes care of entering (full) kernel mode and calling the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) normal Linux system call handler. For performance-critical system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) calls, it is possible to write a hand-tuned fsyscall_handler. For
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) example, fsys.S contains fsys_getpid(), which is a hand-tuned version
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) of the getpid() system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) The entry and exit-state of an fsyscall handler is as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) Machine state on entry to fsyscall handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) ------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) ========= ===============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) r10 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) r11 saved ar.pfs (a user-level value)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) r15 system call number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) r16 "current" task pointer (in normal kernel-mode, this is in r13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) r32-r39 system call arguments
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) b6 return address (a user-level value)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) ar.pfs previous frame-state (a user-level value)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) PSR.be cleared to zero (i.e., little-endian byte order is in effect)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) - all other registers may contain values passed in from user-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) ========= ===============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Required machine state on exit to fsyscall handler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) --------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) ========= ===========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) r11 saved ar.pfs (as passed into the fsyscall handler)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) r15 system call number (as passed into the fsyscall handler)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) r32-r39 system call arguments (as passed into the fsyscall handler)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) b6 return address (as passed into the fsyscall handler)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) ar.pfs previous frame-state (as passed into the fsyscall handler)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) ========= ===========================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) Fsyscall handlers can execute with very little overhead, but with that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) speed comes a set of restrictions:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) * Fsyscall-handlers MUST check for any pending work in the flags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) member of the thread-info structure and if any of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) doing a full system call (by calling fsys_fallback_syscall).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) * Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) r15, b6, and ar.pfs) because they will be needed in case of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) system call restart. Of course, all "preserved" registers also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) must be preserved, in accordance to the normal calling conventions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) * Fsyscall-handlers MUST check argument registers for containing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) NaT value before using them in any way that could trigger a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) NaT-consumption fault. If a system call argument is found to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) contain a NaT value, an fsyscall-handler may return immediately
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) with r8=EINVAL, r10=-1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) * Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) any other operation that would trigger mandatory RSE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) (register-stack engine) traffic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) * Fsyscall-handlers MUST NOT write to any stacked registers because
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) it is not safe to assume that user-level called a handler with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) proper number of arguments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) * Fsyscall-handlers need to be careful when accessing per-CPU variables:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) unless proper safe-guards are taken (e.g., interruptions are avoided),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) execution may be pre-empted and resumed on another CPU at any given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) * Fsyscall-handlers must be careful not to leak sensitive kernel'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) information back to user-level. In particular, before returning to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) user-level, care needs to be taken to clear any scratch registers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) that could contain sensitive information (note that the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) task pointer is not considered sensitive: it's already exposed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) through ar.k6).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) * Fsyscall-handlers MUST NOT access user-memory without first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) validating access-permission (this can be done typically via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) probe.r.fault and/or probe.w.fault) and without guarding against
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) memory access exceptions (this can be done with the EX() macros
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) defined by asmmacro.h).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) The above restrictions may seem draconian, but remember that it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) possible to trade off some of the restrictions by paying a slightly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) higher overhead. For example, if an fsyscall-handler could benefit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) from the shadow register bank, it could temporarily disable PSR.i and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) needed. In other words, following the above rules yields extremely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) fast system call execution (while fully preserving system call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) semantics), but there is also a lot of flexibility in handling more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) complicated cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Signal handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) The delivery of (asynchronous) signals must be delayed until fsys-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) is exited. This is accomplished with the help of the lower-privilege
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) checks whether the interrupted task was in fsys-mode and, if so, sets
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) PSR.lp and returns immediately. When fsys-mode is exited via the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) "br.ret" instruction that lowers the privilege level, a trap will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) occur. The trap handler clears PSR.lp again and returns immediately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) The kernel exit path then checks for and delivers any pending signals.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) PSR Handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) ============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) The "epc" instruction doesn't change the contents of PSR at all. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) is in contrast to a regular interruption, which clears almost all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) bits. Because of that, some care needs to be taken to ensure things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) work as expected. The following discussion describes how each PSR bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) is handled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) ======= =======================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) PSR.be Cleared when entering fsys-mode. A srlz.d instruction is used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) to ensure the CPU is in little-endian mode before the first
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) load/store instruction is executed. PSR.be is normally NOT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) restored upon return from an fsys-mode handler. In other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) words, user-level code must not rely on PSR.be being preserved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) across a system call.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) PSR.up Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) PSR.ac Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) PSR.mfl Unchanged. Note: fsys-mode handlers must not write-registers!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) PSR.mfh Unchanged. Note: fsys-mode handlers must not write-registers!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) PSR.ic Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) PSR.i Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) PSR.pk Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) PSR.dt Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) PSR.dfl Unchanged. Note: fsys-mode handlers must not write-registers!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) PSR.dfh Unchanged. Note: fsys-mode handlers must not write-registers!
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) PSR.sp Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) PSR.pp Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) PSR.di Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) PSR.si Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) PSR.db Unchanged. The kernel prevents user-level from setting a hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) breakpoint that triggers at any privilege level other than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) 3 (user-mode).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) PSR.lp Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) PSR.tb Lazy redirect. If a taken-branch trap occurs while in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) fsys-mode, the trap-handler modifies the saved machine state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) such that execution resumes in the gate page at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) syscall_via_break(), with privilege level 3. Note: the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) taken branch would occur on the branch invoking the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) fsyscall-handler, at which point, by definition, a syscall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) restart is still safe. If the system call number is invalid,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) the fsys-mode handler will return directly to user-level. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) return will trigger a taken-branch trap, but since the trap is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) taken _after_ restoring the privilege level, the CPU has already
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) left fsys-mode, so no special treatment is needed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) PSR.rt Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) PSR.cpl Cleared to 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) PSR.is Unchanged (guaranteed to be 0 on entry to the gate page).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) PSR.mc Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) PSR.it Unchanged (guaranteed to be 1).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) PSR.id Unchanged. Note: the ia64 linux kernel never sets this bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) PSR.da Unchanged. Note: the ia64 linux kernel never sets this bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) PSR.dd Unchanged. Note: the ia64 linux kernel never sets this bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) PSR.ss Lazy redirect. If set, "epc" will cause a Single Step Trap to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) be taken. The trap handler then modifies the saved machine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) state such that execution resumes in the gate page at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) syscall_via_break(), with privilege level 3.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) PSR.ri Unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) PSR.ed Unchanged. Note: This bit could only have an effect if an fsys-mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) handler performed a speculative load that gets NaTted. If so, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) would be the normal & expected behavior, so no special treatment is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) needed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) PSR.bn Unchanged. Note: fsys-mode handlers may clear the bit, if needed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) Doing so requires clearing PSR.i and PSR.ic as well.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) PSR.ia Unchanged. Note: the ia64 linux kernel never sets this bit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) ======= =======================================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) Using fast system calls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) =======================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) To use fast system calls, userspace applications need simply call
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) __kernel_syscall_via_epc(). For example
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) -- example fgettimeofday() call --
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) -- fgettimeofday.S --
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) #include <asm/asmmacro.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) GLOBAL_ENTRY(fgettimeofday)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) .prologue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) .save ar.pfs, r11
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) mov r11 = ar.pfs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) .body
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) mov r2 = 0xa000000000020660;; // gate address
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) // found by inspection of System.map for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) // __kernel_syscall_via_epc() function. See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) // below for how to do this for real.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) mov b7 = r2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) mov r15 = 1087 // gettimeofday syscall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) ;;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) br.call.sptk.many b6 = b7
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) ;;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) .restore sp
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) mov ar.pfs = r11
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) br.ret.sptk.many rp;; // return to caller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) END(fgettimeofday)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) -- end fgettimeofday.S --
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) In reality, getting the gate address is accomplished by two extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) * AT_SYSINFO : is the address of __kernel_syscall_via_epc()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) * AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) The ELF DSO is a pre-linked library that is mapped in by the kernel at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) the gate page. It is a proper ELF shared object so, with a dynamic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) loader that recognises the library, you should be able to make calls to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) the exported functions within it as with any other shared library.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) AT_SYSINFO points into the kernel DSO at the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) __kernel_syscall_via_epc() function for historical reasons (it was
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) used before the kernel DSO) and as a convenience.