All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Mosberger <davidm@napali.hpl.hp.com>
To: linux-ia64@vger.kernel.org
Subject: [Linux-ia64] fsyscall-support
Date: Wed, 15 Jan 2003 06:36:32 +0000	[thread overview]
Message-ID: <marc-linux-ia64-105590709805677@msgid-missing> (raw)

Attached below is a patch relative to 2.5.52+ia64 patch which adds
support for light-weight system calls.  I'm happy to say that
everything seems to have fallen into place _very_ nicely.  In fact,
the patch below is actually rather small: most of its size comes from
adding the fsyscall-table and some renaming (pUser/pKern got renamed
to pUStk/pKStk to reflect their new meaning).  Ah, the other
relatively sizeable piece is--ta ta---documentation: see
Documentation/ia64/fsys.txt for details (this file needs to be
improved; suggestions welcome).

I believe the design and the implementation of the fsyscall support is
safe and has no outstanding holes (well, at least none that I know
of).  For example, not only do fsyscalls have full system call
semantics, you can also single-step across them or taken-branch-trap
across them (extra credit for those who figure out how this works just
by looking at the code ;-).

And yet despite this, fsyscalls really _can_ be very fast: a
NULL-system call (e.g., getpid()) can run in as little as 35 cycles.
I find that pretty amazing---hats off to the ia64 & McKinley
architects!

Given this low (minimal) overhead, this ought to pretty much obviate
any desire for vsyscalls (pseudo-syscalls which run entirely in
user-level, e.g., by accessing a kernel-page that's mapped read-only).

To avoid confusion, I should point out three things:

 - The only fsyscall that's currently implemented in a light-weight
   fashion is getpid().  Of course, nobody really cares about the
   speed of getpid(), but it's easy to do and lets us establish the
   lower-bound for fsyscall overheads.  More interesting candidates
   for light-weight implementation would be gettimeofday(),
   sigprocmask(), and sigreturn(), for example.

 - In the absence of a light-weight system call handler, an fsyscall
   with fall back to a full-blown system call.  At the moment, the
   fall back path uses a "break 0x100000" for this, which is obviously
   silly and causes non-light-weight system calls to actually run
   slightly slower than before.  Next step is to streamline this path
   (e.g., avoid break 0x100000, save/restore only minimal set of
   registers).

 - Only limited testing has been done so far.  I'm working on putting
   together a system that's entirely built on top of fsyscalls, but
   the glibc pieces are not quite there yet.

Oh, I pushed some other changes into the lia64 bk tree before applying
this patch.  I don't think you need those in order to apply this patch
on top of 2.5.52+ia64, but I haven't actually tested it.

Enjoy,

	--david

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.895   -> 1.896  
#	include/asm-ia64/asmmacro.h	1.3     -> 1.4    
#	arch/ia64/kernel/entry.S	1.28    -> 1.29   
#	include/asm-ia64/processor.h	1.29    -> 1.30   
#	arch/ia64/kernel/entry.h	1.4     -> 1.5    
#	include/asm-ia64/ptrace.h	1.5     -> 1.6    
#	arch/ia64/kernel/head.S	1.7     -> 1.8    
#	include/asm-ia64/elf.h	1.5     -> 1.6    
#	arch/ia64/kernel/gate.S	1.9     -> 1.10   
#	arch/ia64/kernel/minstate.h	1.8     -> 1.9    
#	arch/ia64/kernel/unaligned.c	1.8     -> 1.9    
#	arch/ia64/tools/print_offsets.c	1.10    -> 1.11   
#	   arch/ia64/Kconfig	1.11    -> 1.12   
#	arch/ia64/kernel/traps.c	1.20    -> 1.21   
#	arch/ia64/kernel/Makefile	1.12    -> 1.13   
#	               (new)	        -> 1.1     arch/ia64/kernel/fsys.S
#	               (new)	        -> 1.1     Documentation/ia64/fsys.txt
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/01/14	davidm@tiger.hpl.hp.com	1.896
# ia64: Light-weight system call support (aka, "fsyscalls").  This does not (yet)
# 	accelerate normal system calls, but it puts the infrastructure in place
# 	and lets you write fsyscall-handlers to your hearts content.  A null system-
# 	call (such as getpid()) can now run in as little as 35 cycles!
# --------------------------------------------
#
diff -Nru a/Documentation/ia64/fsys.txt b/Documentation/ia64/fsys.txt
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/Documentation/ia64/fsys.txt	Tue Jan 14 22:18:08 2003
@@ -0,0 +1,219 @@
+-*-Mode: outline-*-
+
+		Light-weight System Calls for IA-64
+		-----------------------------------
+
+		        Started: 13-Jan-2002
+		    Last update: 14-Jan-2002
+
+	              David Mosberger-Tang
+		      <davidm@hpl.hp.com>
+
+Using the "epc" instruction effectively introduces a new mode of
+execution to the ia64 linux kernel.  We call this mode the
+"fsys-mode".  To recap, the normal states of execution are:
+
+  - kernel mode:
+	Both the register stack and the kernel stack have been
+	switched over to the kernel stack.  The user-level state
+	is saved in a pt-regs structure at the top of the kernel
+	memory stack.
+
+  - user mode:
+	Both the register stack and the kernel stack are in
+	user land.  The user-level state is contained in the
+	CPU registers.
+
+  - bank 0 interruption-handling mode:
+	This is the non-interruptible state in that all
+	interruption-handlers start executing in.  The user-level
+	state remains in the CPU registers and some kernel state may
+	be stored in bank 0 of registers r16-r31.
+
+Fsys-mode has the following special properties:
+
+  - execution is at privilege level 0 (most-privileged)
+
+  - CPU registers may contain a mixture of user-level and kernel-level
+    state (it is the responsibility of the kernel to ensure that no
+    security-sensitive kernel-level state is leaked back to
+    user-level)
+
+  - execution is interruptible and preemptible (an fsys-mode handler
+    can disable interrupts and avoid all other interruption-sources
+    to avoid preemption)
+
+  - neither the memory nor the register stack can be trusted while
+    in fsys-mode (they point to the user-level stacks, which may
+    be invalid)
+
+In summary, fsys-mode is much more similar to running in user-mode
+than it is to running in kernel-mode.  Of course, given that the
+privilege level is at level 0, this means that fsys-mode requires some
+care (see below).
+
+
+* How to tell fsys-mode
+
+Linux operates in fsys-mode when (a) the privilege level is 0 (most
+privileged) and (b) the stacks have NOT been switched to kernel memory
+yet.  For convenience, the header file <asm-ia64/ptrace.h> provides
+three macros:
+
+	user_mode(regs)
+	user_stack(regs)
+	fsys_mode(regs)
+
+The "regs" argument is a pointer to a pt_regs structure.  user_mode()
+returns TRUE if the CPU state pointed to by "regs" was executing in
+user mode (privilege level 3).  user_stack() returns TRUE if the state
+pointed to by "regs" was executing on the user-level stack(s).
+Finally, fsys_mode() returns TRUE if the CPU state pointed to by
+"regs" was executing in fsys-mode.  The fsys_mode() macro corresponds
+exactly to the expression:
+
+	!user_mode(regs) && user_stack(regs)
+
+* How to write an fsyscall handler
+
+The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
+(fsyscall_table).  This table contains one entry for each system call.
+By default, a system call is handled by fsys_fallback_syscall().  This
+routine takes care of entering (full) kernel mode and calling the
+normal Linux system call handler.  For performance-critical system
+calls, it is possible to write a hand-tuned fsyscall_handler.  For
+example, fsys.S contains fsys_getpid(), which is a hand-tuned version
+of the getpid() system call.
+
+The entry and exit-state of an fsyscall handler is as follows:
+
+** Machine state on entry to fsyscall handler:
+
+ - r11	  = saved ar.pfs (a user-level value)
+ - r15	  = system call number
+ - r16	  = "current" task pointer (in normal kernel-mode, this is in r13)
+ - r32-r39 = system call arguments
+ - b6	  = return address (a user-level value)
+ - ar.pfs = previous frame-state (a user-level value)
+ - PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
+ - all other registers may contain values passed in from user-mode
+
+** Required machine state on exit to fsyscall handler:
+
+ - r11	  = saved ar.pfs (as passed into the fsyscall handler)
+ - r15	  = system call number (as passed into the fsyscall handler)
+ - r32-r39 = system call arguments (as passed into the fsyscall handler)
+ - b6	  = return address (as passed into the fsyscall handler)
+ - ar.pfs = previous frame-state (as passed into the fsyscall handler)
+
+Fsyscall handlers can execute with very little overhead, but with that
+speed comes a set of restrictions:
+
+ o Fsyscall-handlers MUST check for any pending work in the flags
+   member of the thread-info structure and if any of the
+   TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
+   doing a full system call (by calling fsys_fallback_syscall).
+
+ o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
+   r15, b6, and ar.pfs) because they will be needed in case of a
+   system call restart.  Of course, all "preserved" registers also
+   must be preserved, in accordance to the normal calling conventions.
+
+ o Fsyscall-handlers MUST check argument registers for containing a
+   NaT value before using them in any way that could trigger a
+   NaT-consumption fault.  If a system call argument is found to
+   contain a NaT value, an fsyscall-handler may return immediately
+   with r8=EINVAL, r10=-1.
+
+ o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
+   any other operation that would trigger mandatory RSE
+   (register-stack engine) traffic.
+
+ o Fsyscall-handlers MUST NOT write to any stacked registers because
+   it is not safe to assume that user-level called a handler with the
+   proper number of arguments.
+
+ o Fsyscall-handlers need to be careful when accessing per-CPU variables:
+   unless proper safe-guards are taken (e.g., interruptions are avoided),
+   execution may be pre-empted and resumed on another CPU at any given
+   time.
+
+ o Fsyscall-handlers must be careful not to leak sensitive kernel'
+   information back to user-level.  In particular, before returning to
+   user-level, care needs to be taken to clear any scratch registers
+   that could contain sensitive information (note that the current
+   task pointer is not considered sensitive: it's already exposed
+   through ar.k6).
+
+The above restrictions may seem draconian, but remember that it's
+possible to trade off some of the restrictions by paying a slightly
+higher overhead.  For example, if an fsyscall-handler could benefit
+from the shadow register bank, it could temporarily disable PSR.i and
+PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
+needed.  In other words, following the above rules yields extremely
+fast system call execution (while fully preserving system call
+semantics), but there is also a lot of flexibility in handling more
+complicated cases.
+
+* PSR Handling
+
+The "epc" instruction doesn't change the contents of PSR at all.  This
+is in contrast to a regular interruption, which clears almost all
+bits.  Because of that, some care needs to be taken to ensure things
+work as expected.  The following discussion describes how each PSR bit
+is handled.
+
+PSR.be	Cleared when entering fsys-mode.  A srlz.d instruction is used
+	to ensure the CPU is in little-endian mode before the first
+	load/store instruction is executed.  PSR.be is normally NOT
+	restored upon return from an fsys-mode handler.  In other
+	words, user-level code must not rely on PSR.be being preserved
+	across a system call.
+PSR.up	Unchanged.
+PSR.ac	Unchanged.
+PSR.mfl Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.mfh	Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.ic	Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
+PSR.i	Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
+PSR.pk	Unchanged.
+PSR.dt	Unchanged.
+PSR.dfl	Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.dfh	Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.sp	Unchanged.
+PSR.pp	Unchanged.
+PSR.di	Unchanged.
+PSR.si	Unchanged.
+PSR.db	Unchanged.  The kernel prevents user-level from setting a hardware
+	breakpoint that triggers at any privilege level other than 3 (user-mode).
+PSR.lp	Unchanged.
+PSR.tb	Lazy redirect.  If a taken-branch trap occurs while in
+	fsys-mode, the trap-handler modifies the saved machine state
+	such that execution resumes in the gate page at
+	syscall_via_break(), with privilege level 3.  Note: the
+	taken branch would occur on the branch invoking the
+	fsyscall-handler, at which point, by definition, a syscall
+	restart is still safe.  If the system call number is invalid,
+	the fsys-mode handler will return directly to user-level.  This
+	return will trigger a taken-branch trap, but since the trap is
+	taken _after_ restoring the privilege level, the CPU has already
+	left fsys-mode, so no special treatment is needed.
+PSR.rt	Unchanged.
+PSR.cpl	Cleared to 0.
+PSR.is	Unchanged (guaranteed to be 0 on entry to the gate page).
+PSR.mc	Unchanged.
+PSR.it	Unchanged (guaranteed to be 1).
+PSR.id	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+PSR.da	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+PSR.dd	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+PSR.ss	Lazy redirect.  If set, "epc" will cause a Single Step Trap to
+	be taken.  The trap handler then modifies the saved machine
+	state such that execution resumes in the gate page at
+	syscall_via_break(), with privilege level 3.
+PSR.ri	Unchanged.
+PSR.ed	Unchanged.  Note: This bit could only have an effect if an fsys-mode
+	handler performed a speculative load that gets NaTted.  If so, this
+	would be the normal & expected behavior, so no special treatment is
+	needed.
+PSR.bn	Unchanged.  Note: fsys-mode handlers may clear the bit, if needed.
+	Doing so requires clearing PSR.i and PSR.ic as well.
+PSR.ia	Unchanged.  Note: the ia64 linux kernel never sets this bit.
diff -Nru a/arch/ia64/Kconfig b/arch/ia64/Kconfig
--- a/arch/ia64/Kconfig	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/Kconfig	Tue Jan 14 22:18:08 2003
@@ -806,6 +806,9 @@
 
 menu "Kernel hacking"
 
+config FSYS
+	bool "Light-weight system-call support (via epc)"
+
 choice
 	prompt "Physical memory granularity"
 	default IA64_GRANULE_64MB
diff -Nru a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile
--- a/arch/ia64/kernel/Makefile	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/Makefile	Tue Jan 14 22:18:08 2003
@@ -12,6 +12,7 @@
 	 semaphore.o setup.o	\
 	 signal.o sys_ia64.o traps.o time.o unaligned.o unwind.o
 
+obj-$(CONFIG_FSYS) += fsys.o
 obj-$(CONFIG_IOSAPIC) += iosapic.o
 obj-$(CONFIG_IA64_PALINFO) += palinfo.o
 obj-$(CONFIG_EFI_VARS) += efivars.o
diff -Nru a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
--- a/arch/ia64/kernel/entry.S	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/entry.S	Tue Jan 14 22:18:08 2003
@@ -3,7 +3,7 @@
  *
  * Kernel entry points.
  *
- * Copyright (C) 1998-2002 Hewlett-Packard Co
+ * Copyright (C) 1998-2003 Hewlett-Packard Co
  *	David Mosberger-Tang <davidm@hpl.hp.com>
  * Copyright (C) 1999 VA Linux Systems
  * Copyright (C) 1999 Walt Drummond <drummond@valinux.com>
@@ -22,8 +22,8 @@
 /*
  * Global (preserved) predicate usage on syscall entry/exit path:
  *
- *	pKern:		See entry.h.
- *	pUser:		See entry.h.
+ *	pKStk:		See entry.h.
+ *	pUStk:		See entry.h.
  *	pSys:		See entry.h.
  *	pNonSys:	!pSys
  */
@@ -63,7 +63,7 @@
 	sxt4 r8=r8			// return 64-bit result
 	;;
 	stf.spill [sp]ð
-(p6)	cmp.ne pKern,pUser=r0,r0	// a successful execve() lands us in user-mode...
+(p6)	cmp.ne pKStk,pUStk=r0,r0	// a successful execve() lands us in user-mode...
 	mov rp=loc0
 (p6)	mov ar.pfs=r0			// clear ar.pfs on success
 (p7)	br.ret.sptk.many rp
@@ -193,7 +193,7 @@
 	;;
 (p6)	srlz.d
 	ld8 sp=[r21]			// load kernel stack pointer of new task
-	mov IA64_KR(CURRENT)=r20	// update "current" application register
+	mov IA64_KR(CURRENT)=in0	// update "current" application register
 	mov r8=r13			// return pointer to previously running task
 	mov r13=in0			// set "current" pointer
 	;;
@@ -569,11 +569,12 @@
 	// fall through
 GLOBAL_ENTRY(ia64_leave_kernel)
 	PT_REGS_UNWIND_INFO(0)
-	// work.need_resched etc. mustn't get changed by this CPU before it returns to userspace:
-(pUser)	cmp.eq.unc p6,p0=r0,r0			// p6 <- pUser
-(pUser)	rsm psr.i
+	// work.need_resched etc. mustn't get changed by this CPU before it returns to
+	// user- or fsys-mode:
+(pUStk)	cmp.eq.unc p6,p0=r0,r0			// p6 <- pUStk
+(pUStk)	rsm psr.i
 	;;
-(pUser)	adds r17=TI_FLAGS+IA64_TASK_SIZE,r13
+(pUStk)	adds r17=TI_FLAGS+IA64_TASK_SIZE,r13
 	;;
 .work_processed:
 (p6)	ld4 r18=[r17]				// load current_thread_info()->flags
@@ -635,9 +636,9 @@
 	;;
 	srlz.i			// ensure interruption collection is off
 	mov b7=r15
+	bsw.0			// switch back to bank 0 (no stop bit required beforehand...)
 	;;
-	bsw.0			// switch back to bank 0
-	;;
+(pUStk)	mov r18=IA64_KR(CURRENT)	// Itanium 2: 12 cycle read latency
 	adds r16\x16,r12
 	adds r17$,r12
 	;;
@@ -665,16 +666,21 @@
 	;;
 	ld8.fill r12=[r16],16
 	ld8.fill r13=[r17],16
+(pUStk)	adds r18=IA64_TASK_THREAD_ON_USTACK_OFFSET,r18
 	;;
 	ld8.fill r14=[r16]
 	ld8.fill r15=[r17]
+(pUStk)	mov r17=1
+	;;
+(pUStk)	st1 [r18]=r17		// restore current->thread.on_ustack
 	shr.u r18=r19,16	// get byte size of existing "dirty" partition
 	;;
 	mov r16=ar.bsp		// get existing backing store pointer
 	movl r17=THIS_CPU(ia64_phys_stacked_size_p8)
 	;;
 	ld4 r17=[r17]		// r17 = cpu_data->phys_stacked_size_p8
-(pKern)	br.cond.dpnt skip_rbs_switch
+(pKStk)	br.cond.dpnt skip_rbs_switch
+
 	/*
 	 * Restore user backing store.
 	 *
@@ -788,12 +794,12 @@
 skip_rbs_switch:
 	mov b6=rB6
 	mov ar.pfs=rARPFS
-(pUser)	mov ar.bspstore=rARBSPSTORE
+(pUStk)	mov ar.bspstore=rARBSPSTORE
 (p9)	mov cr.ifs=rCRIFS
 	mov cr.ipsr=rCRIPSR
 	mov cr.iip=rCRIIP
 	;;
-(pUser)	mov ar.rnat=rARRNAT	// must happen with RSE in lazy mode
+(pUStk)	mov ar.rnat=rARRNAT	// must happen with RSE in lazy mode
 	mov ar.rsc=rARRSC
 	mov ar.unat=rARUNAT
 	mov pr=rARPR,-1
diff -Nru a/arch/ia64/kernel/entry.h b/arch/ia64/kernel/entry.h
--- a/arch/ia64/kernel/entry.h	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/entry.h	Tue Jan 14 22:18:08 2003
@@ -4,8 +4,8 @@
  * Preserved registers that are shared between code in ivt.S and entry.S.  Be
  * careful not to step on these!
  */
-#define pKern		p2	/* will leave_kernel return to kernel-mode? */
-#define pUser		p3	/* will leave_kernel return to user-mode? */
+#define pKStk		p2	/* will leave_kernel return to kernel-stacks? */
+#define pUStk		p3	/* will leave_kernel return to user-stacks? */
 #define pSys		p4	/* are we processing a (synchronous) system call? */
 #define pNonSys		p5	/* complement of pSys */
 
diff -Nru a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/arch/ia64/kernel/fsys.S	Tue Jan 14 22:18:08 2003
@@ -0,0 +1,291 @@
+/*
+ * This file contains the light-weight system call handlers (fsyscall-handlers).
+ *
+ * Copyright (C) 2003 Hewlett-Packard Co
+ * 	David Mosberger-Tang <davidm@hpl.hp.com>
+ */
+
+#include <asm/asmmacro.h>
+#include <asm/errno.h>
+#include <asm/offsets.h>
+#include <asm/thread_info.h>
+
+ENTRY(fsys_ni_syscall)
+	mov r8=ENOSYS
+	mov r10=-1
+	br.ret.sptk.many b6
+END(fsys_ni_syscall)
+
+ENTRY(fsys_getpid)
+	add r9=TI_FLAGS+IA64_TASK_SIZE,r16
+	;;
+	ld4 r9=[r9]
+	add r8=IA64_TASK_TGID_OFFSET,r16
+	;;
+	and r9=TIF_ALLWORK_MASK,r9
+	ld4 r8=[r8]
+	;;
+	cmp.ne p8,p0=0,r9
+(p8)	br.spnt.many fsys_fallback_syscall
+	br.ret.sptk.many b6
+END(fsys_getpid)
+
+	.rodata
+	.align 8
+	.globl fsyscall_table
+fsyscall_table:
+	data8 fsys_ni_syscall
+	data8 fsys_fallback_syscall	// exit			// 1025
+	data8 fsys_fallback_syscall	// read
+	data8 fsys_fallback_syscall	// write
+	data8 fsys_fallback_syscall	// open
+	data8 fsys_fallback_syscall	// close
+	data8 fsys_fallback_syscall	// creat		// 1030
+	data8 fsys_fallback_syscall	// link
+	data8 fsys_fallback_syscall	// unlink
+	data8 fsys_fallback_syscall	// execve
+	data8 fsys_fallback_syscall	// chdir
+	data8 fsys_fallback_syscall	// fchdir		// 1035
+	data8 fsys_fallback_syscall	// utimes
+	data8 fsys_fallback_syscall	// mknod
+	data8 fsys_fallback_syscall	// chmod
+	data8 fsys_fallback_syscall	// chown
+	data8 fsys_fallback_syscall	// lseek		// 1040
+	data8 fsys_getpid
+	data8 fsys_fallback_syscall	// getppid
+	data8 fsys_fallback_syscall	// mount
+	data8 fsys_fallback_syscall	// umount
+	data8 fsys_fallback_syscall	// setuid		// 1045
+	data8 fsys_fallback_syscall	// getuid
+	data8 fsys_fallback_syscall	// geteuid
+	data8 fsys_fallback_syscall	// ptrace
+	data8 fsys_fallback_syscall	// access
+	data8 fsys_fallback_syscall	// sync			// 1050
+	data8 fsys_fallback_syscall	// fsync
+	data8 fsys_fallback_syscall	// fdatasync
+	data8 fsys_fallback_syscall	// kill
+	data8 fsys_fallback_syscall	// rename
+	data8 fsys_fallback_syscall	// mkdir		// 1055
+	data8 fsys_fallback_syscall	// rmdir
+	data8 fsys_fallback_syscall	// dup
+	data8 fsys_fallback_syscall	// pipe
+	data8 fsys_fallback_syscall	// times
+	data8 fsys_fallback_syscall	// brk			// 1060
+	data8 fsys_fallback_syscall	// setgid
+	data8 fsys_fallback_syscall	// getgid
+	data8 fsys_fallback_syscall	// getegid
+	data8 fsys_fallback_syscall	// acct
+	data8 fsys_fallback_syscall	// ioctl		// 1065
+	data8 fsys_fallback_syscall	// fcntl
+	data8 fsys_fallback_syscall	// umask
+	data8 fsys_fallback_syscall	// chroot
+	data8 fsys_fallback_syscall	// ustat
+	data8 fsys_fallback_syscall	// dup2			// 1070
+	data8 fsys_fallback_syscall	// setreuid
+	data8 fsys_fallback_syscall	// setregid
+	data8 fsys_fallback_syscall	// getresuid
+	data8 fsys_fallback_syscall	// setresuid
+	data8 fsys_fallback_syscall	// getresgid		// 1075
+	data8 fsys_fallback_syscall	// setresgid
+	data8 fsys_fallback_syscall	// getgroups
+	data8 fsys_fallback_syscall	// setgroups
+	data8 fsys_fallback_syscall	// getpgid
+	data8 fsys_fallback_syscall	// setpgid		// 1080
+	data8 fsys_fallback_syscall	// setsid
+	data8 fsys_fallback_syscall	// getsid
+	data8 fsys_fallback_syscall	// sethostname
+	data8 fsys_fallback_syscall	// setrlimit
+	data8 fsys_fallback_syscall	// getrlimit		// 1085
+	data8 fsys_fallback_syscall	// getrusage
+	data8 fsys_fallback_syscall	// gettimeofday
+	data8 fsys_fallback_syscall	// settimeofday
+	data8 fsys_fallback_syscall	// select
+	data8 fsys_fallback_syscall	// poll			// 1090
+	data8 fsys_fallback_syscall	// symlink
+	data8 fsys_fallback_syscall	// readlink
+	data8 fsys_fallback_syscall	// uselib
+	data8 fsys_fallback_syscall	// swapon
+	data8 fsys_fallback_syscall	// swapoff		// 1095
+	data8 fsys_fallback_syscall	// reboot
+	data8 fsys_fallback_syscall	// truncate
+	data8 fsys_fallback_syscall	// ftruncate
+	data8 fsys_fallback_syscall	// fchmod
+	data8 fsys_fallback_syscall	// fchown		// 1100
+	data8 fsys_fallback_syscall	// getpriority
+	data8 fsys_fallback_syscall	// setpriority
+	data8 fsys_fallback_syscall	// statfs
+	data8 fsys_fallback_syscall	// fstatfs
+	data8 fsys_fallback_syscall	// gettid		// 1105
+	data8 fsys_fallback_syscall	// semget
+	data8 fsys_fallback_syscall	// semop
+	data8 fsys_fallback_syscall	// semctl
+	data8 fsys_fallback_syscall	// msgget
+	data8 fsys_fallback_syscall	// msgsnd		// 1110
+	data8 fsys_fallback_syscall	// msgrcv
+	data8 fsys_fallback_syscall	// msgctl
+	data8 fsys_fallback_syscall	// shmget
+	data8 fsys_fallback_syscall	// shmat
+	data8 fsys_fallback_syscall	// shmdt		// 1115
+	data8 fsys_fallback_syscall	// shmctl
+	data8 fsys_fallback_syscall	// syslog
+	data8 fsys_fallback_syscall	// setitimer
+	data8 fsys_fallback_syscall	// getitimer
+	data8 fsys_fallback_syscall		 		// 1120
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall	// vhangup
+	data8 fsys_fallback_syscall	// lchown
+	data8 fsys_fallback_syscall	// remap_file_pages	// 1125
+	data8 fsys_fallback_syscall	// wait4
+	data8 fsys_fallback_syscall	// sysinfo
+	data8 fsys_fallback_syscall	// clone
+	data8 fsys_fallback_syscall	// setdomainname
+	data8 fsys_fallback_syscall	// newuname		// 1130
+	data8 fsys_fallback_syscall	// adjtimex
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall	// init_module
+	data8 fsys_fallback_syscall	// delete_module
+	data8 fsys_fallback_syscall				// 1135
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall	// quotactl
+	data8 fsys_fallback_syscall	// bdflush
+	data8 fsys_fallback_syscall	// sysfs
+	data8 fsys_fallback_syscall	// personality		// 1140
+	data8 fsys_fallback_syscall	// afs_syscall
+	data8 fsys_fallback_syscall	// setfsuid
+	data8 fsys_fallback_syscall	// setfsgid
+	data8 fsys_fallback_syscall	// getdents
+	data8 fsys_fallback_syscall	// flock		// 1145
+	data8 fsys_fallback_syscall	// readv
+	data8 fsys_fallback_syscall	// writev
+	data8 fsys_fallback_syscall	// pread64
+	data8 fsys_fallback_syscall	// pwrite64
+	data8 fsys_fallback_syscall	// sysctl		// 1150
+	data8 fsys_fallback_syscall	// mmap
+	data8 fsys_fallback_syscall	// munmap
+	data8 fsys_fallback_syscall	// mlock
+	data8 fsys_fallback_syscall	// mlockall
+	data8 fsys_fallback_syscall	// mprotect		// 1155
+	data8 fsys_fallback_syscall	// mremap
+	data8 fsys_fallback_syscall	// msync
+	data8 fsys_fallback_syscall	// munlock
+	data8 fsys_fallback_syscall	// munlockall
+	data8 fsys_fallback_syscall	// sched_getparam	// 1160
+	data8 fsys_fallback_syscall	// sched_setparam
+	data8 fsys_fallback_syscall	// sched_getscheduler
+	data8 fsys_fallback_syscall	// sched_setscheduler
+	data8 fsys_fallback_syscall	// sched_yield
+	data8 fsys_fallback_syscall	// sched_get_priority_max	// 1165
+	data8 fsys_fallback_syscall	// sched_get_priority_min
+	data8 fsys_fallback_syscall	// sched_rr_get_interval
+	data8 fsys_fallback_syscall	// nanosleep
+	data8 fsys_fallback_syscall	// nfsservctl
+	data8 fsys_fallback_syscall	// prctl		// 1170
+	data8 fsys_fallback_syscall	// getpagesize
+	data8 fsys_fallback_syscall	// mmap2
+	data8 fsys_fallback_syscall	// pciconfig_read
+	data8 fsys_fallback_syscall	// pciconfig_write
+	data8 fsys_fallback_syscall	// perfmonctl		// 1175
+	data8 fsys_fallback_syscall	// sigaltstack
+	data8 fsys_fallback_syscall	// rt_sigaction
+	data8 fsys_fallback_syscall	// rt_sigpending
+	data8 fsys_fallback_syscall	// rt_sigprocmask
+	data8 fsys_fallback_syscall	// rt_sigqueueinfo	// 1180
+	data8 fsys_fallback_syscall	// rt_sigreturn
+	data8 fsys_fallback_syscall	// rt_sigsuspend
+	data8 fsys_fallback_syscall	// rt_sigtimedwait
+	data8 fsys_fallback_syscall	// getcwd
+	data8 fsys_fallback_syscall	// capget		// 1185
+	data8 fsys_fallback_syscall	// capset
+	data8 fsys_fallback_syscall	// sendfile
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall	// socket		// 1190
+	data8 fsys_fallback_syscall	// bind
+	data8 fsys_fallback_syscall	// connect
+	data8 fsys_fallback_syscall	// listen
+	data8 fsys_fallback_syscall	// accept
+	data8 fsys_fallback_syscall	// getsockname		// 1195
+	data8 fsys_fallback_syscall	// getpeername
+	data8 fsys_fallback_syscall	// socketpair
+	data8 fsys_fallback_syscall	// send
+	data8 fsys_fallback_syscall	// sendto
+	data8 fsys_fallback_syscall	// recv			// 1200
+	data8 fsys_fallback_syscall	// recvfrom
+	data8 fsys_fallback_syscall	// shutdown
+	data8 fsys_fallback_syscall	// setsockopt
+	data8 fsys_fallback_syscall	// getsockopt
+	data8 fsys_fallback_syscall	// sendmsg		// 1205
+	data8 fsys_fallback_syscall	// recvmsg
+	data8 fsys_fallback_syscall	// pivot_root
+	data8 fsys_fallback_syscall	// mincore
+	data8 fsys_fallback_syscall	// madvise
+	data8 fsys_fallback_syscall	// newstat		// 1210
+	data8 fsys_fallback_syscall	// newlstat
+	data8 fsys_fallback_syscall	// newfstat
+	data8 fsys_fallback_syscall	// clone2
+	data8 fsys_fallback_syscall	// getdents64
+	data8 fsys_fallback_syscall	// getunwind		// 1215
+	data8 fsys_fallback_syscall	// readahead
+	data8 fsys_fallback_syscall	// setxattr
+	data8 fsys_fallback_syscall	// lsetxattr
+	data8 fsys_fallback_syscall	// fsetxattr
+	data8 fsys_fallback_syscall	// getxattr		// 1220
+	data8 fsys_fallback_syscall	// lgetxattr
+	data8 fsys_fallback_syscall	// fgetxattr
+	data8 fsys_fallback_syscall	// listxattr
+	data8 fsys_fallback_syscall	// llistxattr
+	data8 fsys_fallback_syscall	// flistxattr		// 1225
+	data8 fsys_fallback_syscall	// removexattr
+	data8 fsys_fallback_syscall	// lremovexattr
+	data8 fsys_fallback_syscall	// fremovexattr
+	data8 fsys_fallback_syscall	// tkill
+	data8 fsys_fallback_syscall	// futex		// 1230
+	data8 fsys_fallback_syscall	// sched_setaffinity
+	data8 fsys_fallback_syscall	// sched_getaffinity
+	data8 fsys_fallback_syscall	// set_tid_address
+	data8 fsys_fallback_syscall	// alloc_hugepages
+	data8 fsys_fallback_syscall	// free_hugepages	// 1235
+	data8 fsys_fallback_syscall	// exit_group
+	data8 fsys_fallback_syscall	// lookup_dcookie
+	data8 fsys_fallback_syscall	// io_setup
+	data8 fsys_fallback_syscall	// io_destroy
+	data8 fsys_fallback_syscall	// io_getevents		// 1240
+	data8 fsys_fallback_syscall	// io_submit
+	data8 fsys_fallback_syscall	// io_cancel
+	data8 fsys_fallback_syscall	// epoll_create
+	data8 fsys_fallback_syscall	// epoll_ctl
+	data8 fsys_fallback_syscall	// epoll_wait		// 1245
+	data8 fsys_fallback_syscall	// restart_syscall
+	data8 fsys_fallback_syscall	// semtimedop
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall				// 1250
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall				// 1255
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall				// 1260
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall				// 1265
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall				// 1270
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall				// 1275
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
+	data8 fsys_fallback_syscall
diff -Nru a/arch/ia64/kernel/gate.S b/arch/ia64/kernel/gate.S
--- a/arch/ia64/kernel/gate.S	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/gate.S	Tue Jan 14 22:18:08 2003
@@ -2,7 +2,7 @@
  * This file contains the code that gets mapped at the upper end of each task's text
  * region.  For now, it contains the signal trampoline code only.
  *
- * Copyright (C) 1999-2002 Hewlett-Packard Co
+ * Copyright (C) 1999-2003 Hewlett-Packard Co
  * 	David Mosberger-Tang <davidm@hpl.hp.com>
  */
 
@@ -14,6 +14,85 @@
 #include <asm/page.h>
 
 	.section .text.gate, "ax"
+.start_gate:
+
+
+#if CONFIG_FSYS
+
+#include <asm/errno.h>
+
+/*
+ * On entry:
+ *	r11 = saved ar.pfs
+ *	r15 = system call #
+ *	b0  = saved return address
+ *	b6  = return address
+ * On exit:
+ *	r11 = saved ar.pfs
+ *	r15 = system call #
+ *	b0  = saved return address
+ *	all other "scratch" registers:	undefined
+ *	all "preserved" registers:	same as on entry
+ */
+GLOBAL_ENTRY(syscall_via_epc)
+	.prologue
+	.altrp b6
+	.body
+{
+	/*
+	 * Note: the kernel cannot assume that the first two instructions in this
+	 * bundle get executed.  The remaining code must be safe even if
+	 * they do not get executed.
+	 */
+	adds r17=-1024,r15
+	mov r10=0				// default to successful syscall execution
+	epc
+}
+	;;
+	rsm psr.be
+	movl r18=fsyscall_table
+
+	mov r16=IA64_KR(CURRENT)
+	mov r19%5
+	;;
+	shladd r18=r17,3,r18
+	cmp.geu p6,p0=r19,r17			// (syscall > 0 && syscall <= 1024+255)?
+	;;
+	srlz.d					// ensure little-endian byteorder is in effect
+(p6)	ld8 r18=[r18]
+	;;
+(p6)	mov b7=r18
+(p6)	br.sptk.many b7
+
+	mov r10=-1
+	mov r8=ENOSYS
+	br.ret.sptk.many b6
+END(syscall_via_epc)
+
+GLOBAL_ENTRY(syscall_via_break)
+	.prologue
+	.altrp b6
+	.body
+	break 0x100000
+	br.ret.sptk.many b6
+END(syscall_via_break)
+
+GLOBAL_ENTRY(fsys_fallback_syscall)
+	/*
+	 * It would be better/fsyser to do the SAVE_MIN magic directly here, but for now
+	 * we simply fall back on doing a system-call via break.  Good enough
+	 * to get started.  (Note: we have to do this through the gate page again, since
+	 * the br.ret will switch us back to user-level privilege.)
+	 *
+	 * XXX Move this back to fsys.S after changing it over to avoid break 0x100000.
+	 */
+	movl r2=(syscall_via_break - .start_gate) + GATE_ADDR
+	;;
+	mov b7=r2
+	br.ret.sptk.many b7
+END(fsys_fallback_syscall)
+
+#endif /* CONFIG_FSYS */
 
 #	define ARG0_OFF		(16 + IA64_SIGFRAME_ARG0_OFFSET)
 #	define ARG1_OFF		(16 + IA64_SIGFRAME_ARG1_OFFSET)
diff -Nru a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S
--- a/arch/ia64/kernel/head.S	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/head.S	Tue Jan 14 22:18:08 2003
@@ -5,7 +5,7 @@
  * to set up the kernel's global pointer and jump to the kernel
  * entry point.
  *
- * Copyright (C) 1998-2001 Hewlett-Packard Co
+ * Copyright (C) 1998-2001, 2003 Hewlett-Packard Co
  *	David Mosberger-Tang <davidm@hpl.hp.com>
  *	Stephane Eranian <eranian@hpl.hp.com>
  * Copyright (C) 1999 VA Linux Systems
@@ -143,17 +143,14 @@
 	movl r2=init_thread_union
 	cmp.eq isBP,isAP=r0,r0
 #endif
-	;;
-	extr r3=r2,0,61		// r3 = phys addr of task struct
 	mov r16=KERNEL_TR_PAGE_NUM
 	;;
 
 	// load the "current" pointer (r13) and ar.k6 with the current task
-	mov r13=r2
-	mov IA64_KR(CURRENT)=r3		// Physical address
-
+	mov IA64_KR(CURRENT)=r2		// virtual address
 	// initialize k4 to a safe value (64-128MB is mapped by TR_KERNEL)
 	mov IA64_KR(CURRENT_STACK)=r16
+	mov r13=r2
 	/*
 	 * Reserve space at the top of the stack for "struct pt_regs".  Kernel threads
 	 * don't store interesting values in that structure, but the space still needs
diff -Nru a/arch/ia64/kernel/minstate.h b/arch/ia64/kernel/minstate.h
--- a/arch/ia64/kernel/minstate.h	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/minstate.h	Tue Jan 14 22:18:08 2003
@@ -30,25 +30,23 @@
  * on interrupts.
  */
 #define MINSTATE_START_SAVE_MIN_VIRT								\
-(pUser)	mov ar.rsc=0;		/* set enforced lazy mode, pl 0, little-endian, loadrs=0 */	\
-	dep r1=-1,r1,61,3;				/* r1 = current (virtual) */		\
+(pUStk)	mov ar.rsc=0;		/* set enforced lazy mode, pl 0, little-endian, loadrs=0 */	\
 	;;											\
-(pUser)	mov.m rARRNAT=ar.rnat;									\
-(pUser)	addl rKRBS=IA64_RBS_OFFSET,r1;			/* compute base of RBS */		\
-(pKern) mov r1=sp;					/* get sp  */				\
-	;;											\
-(pUser) lfetch.fault.excl.nt1 [rKRBS];								\
-(pUser)	mov rARBSPSTORE=ar.bspstore;			/* save ar.bspstore */			\
-(pUser)	addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1;	/* compute base of memory stack */	\
+(pUStk)	mov.m rARRNAT=ar.rnat;									\
+(pUStk)	addl rKRBS=IA64_RBS_OFFSET,r1;			/* compute base of RBS */		\
+(pKStk) mov r1=sp;					/* get sp  */				\
 	;;											\
-(pUser)	mov ar.bspstore=rKRBS;				/* switch to kernel RBS */		\
-(pKern) addl r1=-IA64_PT_REGS_SIZE,r1;			/* if in kernel mode, use sp (r12) */	\
+(pUStk) lfetch.fault.excl.nt1 [rKRBS];								\
+(pUStk)	addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1;	/* compute base of memory stack */	\
+(pUStk)	mov rARBSPSTORE=ar.bspstore;			/* save ar.bspstore */			\
 	;;											\
-(pUser)	mov r18=ar.bsp;										\
-(pUser)	mov ar.rsc=0x3;		/* set eager mode, pl 0, little-endian, loadrs=0 */		\
+(pUStk)	mov ar.bspstore=rKRBS;				/* switch to kernel RBS */		\
+(pKStk) addl r1=-IA64_PT_REGS_SIZE,r1;			/* if in kernel mode, use sp (r12) */	\
+	;;											\
+(pUStk)	mov r18=ar.bsp;										\
+(pUStk)	mov ar.rsc=0x3;		/* set eager mode, pl 0, little-endian, loadrs=0 */		\
 
 #define MINSTATE_END_SAVE_MIN_VIRT								\
-	or r13=r13,r14;		/* make `current' a kernel virtual address */			\
 	bsw.1;			/* switch back to bank 1 (must be last in insn group) */	\
 	;;
 
@@ -57,21 +55,21 @@
  * go virtual and dont want to destroy the iip or ipsr.
  */
 #define MINSTATE_START_SAVE_MIN_PHYS								\
-(pKern) movl sp=ia64_init_stack+IA64_STK_OFFSET-IA64_PT_REGS_SIZE;				\
-(pUser)	mov ar.rsc=0;		/* set enforced lazy mode, pl 0, little-endian, loadrs=0 */	\
-(pUser)	addl rKRBS=IA64_RBS_OFFSET,r1;		/* compute base of register backing store */	\
-	;;											\
-(pUser)	mov rARRNAT=ar.rnat;									\
-(pKern) dep r1=0,sp,61,3;				/* compute physical addr of sp	*/	\
-(pUser)	addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1;	/* compute base of memory stack */	\
-(pUser)	mov rARBSPSTORE=ar.bspstore;			/* save ar.bspstore */			\
-(pUser)	dep rKRBS=-1,rKRBS,61,3;			/* compute kernel virtual addr of RBS */\
+(pKStk) movl sp=ia64_init_stack+IA64_STK_OFFSET-IA64_PT_REGS_SIZE;				\
+(pUStk)	mov ar.rsc=0;		/* set enforced lazy mode, pl 0, little-endian, loadrs=0 */	\
+(pUStk)	addl rKRBS=IA64_RBS_OFFSET,r1;		/* compute base of register backing store */	\
+	;;											\
+(pUStk)	mov rARRNAT=ar.rnat;									\
+(pKStk) dep r1=0,sp,61,3;				/* compute physical addr of sp	*/	\
+(pUStk)	addl r1=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r1;	/* compute base of memory stack */	\
+(pUStk)	mov rARBSPSTORE=ar.bspstore;			/* save ar.bspstore */			\
+(pUStk)	dep rKRBS=-1,rKRBS,61,3;			/* compute kernel virtual addr of RBS */\
 	;;											\
-(pKern) addl r1=-IA64_PT_REGS_SIZE,r1;		/* if in kernel mode, use sp (r12) */		\
-(pUser)	mov ar.bspstore=rKRBS;			/* switch to kernel RBS */			\
+(pKStk) addl r1=-IA64_PT_REGS_SIZE,r1;		/* if in kernel mode, use sp (r12) */		\
+(pUStk)	mov ar.bspstore=rKRBS;			/* switch to kernel RBS */			\
 	;;											\
-(pUser)	mov r18=ar.bsp;										\
-(pUser)	mov ar.rsc=0x3;		/* set eager mode, pl 0, little-endian, loadrs=0 */		\
+(pUStk)	mov r18=ar.bsp;										\
+(pUStk)	mov ar.rsc=0x3;		/* set eager mode, pl 0, little-endian, loadrs=0 */		\
 
 #define MINSTATE_END_SAVE_MIN_PHYS								\
 	or r12=r12,r14;		/* make sp a kernel virtual address */				\
@@ -79,11 +77,13 @@
 	;;
 
 #ifdef MINSTATE_VIRT
+# define MINSTATE_GET_CURRENT(reg)	mov reg=IA64_KR(CURRENT)
 # define MINSTATE_START_SAVE_MIN	MINSTATE_START_SAVE_MIN_VIRT
 # define MINSTATE_END_SAVE_MIN		MINSTATE_END_SAVE_MIN_VIRT
 #endif
 
 #ifdef MINSTATE_PHYS
+# define MINSTATE_GET_CURRENT(reg)	mov reg=IA64_KR(CURRENT);; dep reg=0,reg,61,3
 # define MINSTATE_START_SAVE_MIN	MINSTATE_START_SAVE_MIN_PHYS
 # define MINSTATE_END_SAVE_MIN		MINSTATE_END_SAVE_MIN_PHYS
 #endif
@@ -110,23 +110,26 @@
  * we can pass interruption state as arguments to a handler.
  */
 #define DO_SAVE_MIN(COVER,SAVE_IFS,EXTRA)							  \
-	mov rARRSC=ar.rsc;									  \
-	mov rARPFS=ar.pfs;									  \
-	mov rR1=r1;										  \
-	mov rARUNAT=ar.unat;									  \
-	mov rCRIPSR=cr.ipsr;									  \
-	mov rB6¶;				/* rB6 = branch reg 6 */			  \
-	mov rCRIIP=cr.iip;									  \
-	mov r1=IA64_KR(CURRENT);		/* r1 = current (physical) */			  \
-	COVER;											  \
+	mov rARRSC=ar.rsc;		/* M */							  \
+	mov rARUNAT=ar.unat;		/* M */							  \
+	mov rR1=r1;			/* A */							  \
+	MINSTATE_GET_CURRENT(r1);	/* M (or M;;I) */					  \
+	mov rCRIPSR=cr.ipsr;		/* M */							  \
+	mov rARPFS=ar.pfs;		/* I */							  \
+	mov rCRIIP=cr.iip;		/* M */							  \
+	mov rB6¶;			/* I */	/* rB6 = branch reg 6 */			  \
+	COVER;				/* B;; (or nothing) */					  \
 	;;											  \
-	invala;											  \
-	extr.u r16=rCRIPSR,32,2;		/* extract psr.cpl */				  \
+	adds r16=IA64_TASK_THREAD_ON_USTACK_OFFSET,r1;						  \
 	;;											  \
-	cmp.eq pKern,pUser=r0,r16;		/* are we in kernel mode already? (psr.cpl=0) */ \
+	ld1 r17=[r16];				/* load current->thread.on_ustack flag */	  \
+	st1 [r16]=r0;				/* clear current->thread.on_ustack flag */	  \
 	/* switch from user to kernel RBS: */							  \
 	;;											  \
+	invala;				/* M */							  \
 	SAVE_IFS;										  \
+	cmp.eq pKStk,pUStk=r0,r17;		/* are we in kernel mode already? (psr.cpl=0) */ \
+	;;											  \
 	MINSTATE_START_SAVE_MIN									  \
 	add r17=L1_CACHE_BYTES,r1			/* really: biggest cache-line size */	  \
 	;;											  \
@@ -138,23 +141,23 @@
 	;;											  \
 	lfetch.fault.excl.nt1 [r17];								  \
 	adds r17=8,r1;					/* initialize second base pointer */	  \
-(pKern)	mov r18=r0;		/* make sure r18 isn't NaT */					  \
+(pKStk)	mov r18=r0;		/* make sure r18 isn't NaT */					  \
 	;;											  \
 	st8 [r17]=rCRIIP,16;	/* save cr.iip */						  \
 	st8 [r16]=rCRIFS,16;	/* save cr.ifs */						  \
-(pUser)	sub r18=r18,rKRBS;	/* r18=RSE.ndirty*8 */						  \
+(pUStk)	sub r18=r18,rKRBS;	/* r18=RSE.ndirty*8 */						  \
 	;;											  \
 	st8 [r17]=rARUNAT,16;	/* save ar.unat */						  \
 	st8 [r16]=rARPFS,16;	/* save ar.pfs */						  \
 	shl r18=r18,16;		/* compute ar.rsc to be used for "loadrs" */			  \
 	;;											  \
 	st8 [r17]=rARRSC,16;	/* save ar.rsc */						  \
-(pUser)	st8 [r16]=rARRNAT,16;	/* save ar.rnat */						  \
-(pKern)	adds r16\x16,r16;	/* skip over ar_rnat field */					  \
+(pUStk)	st8 [r16]=rARRNAT,16;	/* save ar.rnat */						  \
+(pKStk)	adds r16\x16,r16;	/* skip over ar_rnat field */					  \
 	;;			/* avoid RAW on r16 & r17 */					  \
-(pUser)	st8 [r17]=rARBSPSTORE,16;	/* save ar.bspstore */					  \
+(pUStk)	st8 [r17]=rARBSPSTORE,16;	/* save ar.bspstore */					  \
 	st8 [r16]=rARPR,16;	/* save predicates */						  \
-(pKern)	adds r17\x16,r17;	/* skip over ar_bspstore field */				  \
+(pKStk)	adds r17\x16,r17;	/* skip over ar_bspstore field */				  \
 	;;											  \
 	st8 [r17]=rB6,16;	/* save b6 */							  \
 	st8 [r16]=r18,16;	/* save ar.rsc value for "loadrs" */				  \
diff -Nru a/arch/ia64/kernel/traps.c b/arch/ia64/kernel/traps.c
--- a/arch/ia64/kernel/traps.c	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/traps.c	Tue Jan 14 22:18:08 2003
@@ -524,6 +524,23 @@
 	      case 29: /* Debug */
 	      case 35: /* Taken Branch Trap */
 	      case 36: /* Single Step Trap */
+		if (fsys_mode(regs)) {
+			extern char syscall_via_break[], __start_gate_section[];
+			/*
+			 * Got a trap in fsys-mode: Taken Branch Trap and Single Step trap
+			 * need special handling; Debug trap is not supposed to happen.
+			 */
+			if (unlikely(vector = 29)) {
+				die("Got debug trap in fsys-mode---not supposed to happen!",
+				    regs, 0);
+				return;
+			}
+			/* re-do the system call via break 0x100000: */
+			regs->cr_iip = GATE_ADDR + (syscall_via_break - __start_gate_section);
+			ia64_psr(regs)->ri = 0;
+			ia64_psr(regs)->cpl = 3;
+			return;
+		}
 		switch (vector) {
 		      case 29:
 			siginfo.si_code = TRAP_HWBKPT;
diff -Nru a/arch/ia64/kernel/unaligned.c b/arch/ia64/kernel/unaligned.c
--- a/arch/ia64/kernel/unaligned.c	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/kernel/unaligned.c	Tue Jan 14 22:18:08 2003
@@ -331,12 +331,8 @@
 		return;
 	}
 
-	/*
-	 * Avoid using user_mode() here: with "epc", we cannot use the privilege level to
-	 * infer whether the interrupt task was running on the kernel backing store.
-	 */
-	if (regs->r12 >= TASK_SIZE) {
-		DPRINT("ignoring kernel write to r%lu; register isn't on the RBS!", r1);
+	if (!user_stack(regs)) {
+		DPRINT("ignoring kernel write to r%lu; register isn't on the kernel RBS!", r1);
 		return;
 	}
 
@@ -406,11 +402,7 @@
 		return;
 	}
 
-	/*
-	 * Avoid using user_mode() here: with "epc", we cannot use the privilege level to
-	 * infer whether the interrupt task was running on the kernel backing store.
-	 */
-	if (regs->r12 >= TASK_SIZE) {
+	if (!user_stack(regs)) {
 		DPRINT("ignoring kernel read of r%lu; register isn't on the RBS!", r1);
 		goto fail;
 	}
diff -Nru a/arch/ia64/tools/print_offsets.c b/arch/ia64/tools/print_offsets.c
--- a/arch/ia64/tools/print_offsets.c	Tue Jan 14 22:18:08 2003
+++ b/arch/ia64/tools/print_offsets.c	Tue Jan 14 22:18:08 2003
@@ -1,7 +1,7 @@
 /*
  * Utility to generate asm-ia64/offsets.h.
  *
- * Copyright (C) 1999-2002 Hewlett-Packard Co
+ * Copyright (C) 1999-2003 Hewlett-Packard Co
  *	David Mosberger-Tang <davidm@hpl.hp.com>
  *
  * Note that this file has dual use: when building the kernel
@@ -53,7 +53,9 @@
     { "UNW_FRAME_INFO_SIZE",		sizeof (struct unw_frame_info) },
     { "", 0 },			/* spacer */
     { "IA64_TASK_THREAD_KSP_OFFSET",	offsetof (struct task_struct, thread.ksp) },
+    { "IA64_TASK_THREAD_ON_USTACK_OFFSET", offsetof (struct task_struct, thread.on_ustack) },
     { "IA64_TASK_PID_OFFSET",		offsetof (struct task_struct, pid) },
+    { "IA64_TASK_TGID_OFFSET",		offsetof (struct task_struct, tgid) },
     { "IA64_PT_REGS_CR_IPSR_OFFSET",	offsetof (struct pt_regs, cr_ipsr) },
     { "IA64_PT_REGS_CR_IIP_OFFSET",	offsetof (struct pt_regs, cr_iip) },
     { "IA64_PT_REGS_CR_IFS_OFFSET",	offsetof (struct pt_regs, cr_ifs) },
diff -Nru a/include/asm-ia64/asmmacro.h b/include/asm-ia64/asmmacro.h
--- a/include/asm-ia64/asmmacro.h	Tue Jan 14 22:18:08 2003
+++ b/include/asm-ia64/asmmacro.h	Tue Jan 14 22:18:08 2003
@@ -2,12 +2,17 @@
 #define _ASM_IA64_ASMMACRO_H
 
 /*
- * Copyright (C) 2000-2001 Hewlett-Packard Co
+ * Copyright (C) 2000-2001, 2003 Hewlett-Packard Co
  *	David Mosberger-Tang <davidm@hpl.hp.com>
  */
 
 #define ENTRY(name)				\
 	.align 32;				\
+	.proc name;				\
+name:
+
+#define ENTRY_MIN_ALIGN(name)			\
+	.align 16;				\
 	.proc name;				\
 name:
 
diff -Nru a/include/asm-ia64/elf.h b/include/asm-ia64/elf.h
--- a/include/asm-ia64/elf.h	Tue Jan 14 22:18:08 2003
+++ b/include/asm-ia64/elf.h	Tue Jan 14 22:18:08 2003
@@ -4,10 +4,12 @@
 /*
  * ELF-specific definitions.
  *
- * Copyright (C) 1998, 1999, 2002 Hewlett-Packard Co
+ * Copyright (C) 1998-1999, 2002-2003 Hewlett-Packard Co
  *	David Mosberger-Tang <davidm@hpl.hp.com>
  */
 
+#include <linux/config.h>
+
 #include <asm/fpu.h>
 #include <asm/page.h>
 
@@ -88,6 +90,11 @@
    relevant until we have real hardware to play with... */
 #define ELF_PLATFORM	0
 
+/*
+ * This should go into linux/elf.h...
+ */
+#define AT_SYSINFO	32
+
 #ifdef __KERNEL__
 struct elf64_hdr;
 extern void ia64_set_personality (struct elf64_hdr *elf_ex, int ibcs2_interpreter);
@@ -99,7 +106,14 @@
 #define ELF_CORE_COPY_TASK_REGS(tsk, elf_gregs) dump_task_regs(tsk, elf_gregs)
 #define ELF_CORE_COPY_FPREGS(tsk, elf_fpregs) dump_task_fpu(tsk, elf_fpregs)
 
-
+#ifdef CONFIG_FSYS
+#define ARCH_DLINFO					\
+do {							\
+	extern int syscall_via_epc;			\
+	NEW_AUX_ENT(AT_SYSINFO, syscall_via_epc);	\
+} while (0)
 #endif
+
+#endif /* __KERNEL__ */
 
 #endif /* _ASM_IA64_ELF_H */
diff -Nru a/include/asm-ia64/processor.h b/include/asm-ia64/processor.h
--- a/include/asm-ia64/processor.h	Tue Jan 14 22:18:08 2003
+++ b/include/asm-ia64/processor.h	Tue Jan 14 22:18:08 2003
@@ -2,7 +2,7 @@
 #define _ASM_IA64_PROCESSOR_H
 
 /*
- * Copyright (C) 1998-2002 Hewlett-Packard Co
+ * Copyright (C) 1998-2003 Hewlett-Packard Co
  *	David Mosberger-Tang <davidm@hpl.hp.com>
  *	Stephane Eranian <eranian@hpl.hp.com>
  * Copyright (C) 1999 Asit Mallick <asit.k.mallick@intel.com>
@@ -223,7 +223,10 @@
 struct siginfo;
 
 struct thread_struct {
-	__u64 flags;			/* various thread flags (see IA64_THREAD_*) */
+	__u32 flags;			/* various thread flags (see IA64_THREAD_*) */
+	/* writing on_ustack is performance-critical, so it's worth spending 8 bits on it... */
+	__u8 on_ustack;			/* executing on user-stacks? */
+	__u8 pad[3];
 	__u64 ksp;			/* kernel stack pointer */
 	__u64 map_base;			/* base address for get_unmapped_area() */
 	__u64 task_size;		/* limit for task size */
@@ -277,6 +280,7 @@
 
 #define INIT_THREAD {				\
 	.flags =	0,			\
+	.on_ustack =	0,			\
 	.ksp =		0,			\
 	.map_base =	DEFAULT_MAP_BASE,	\
 	.task_size =	DEFAULT_TASK_SIZE,	\
diff -Nru a/include/asm-ia64/ptrace.h b/include/asm-ia64/ptrace.h
--- a/include/asm-ia64/ptrace.h	Tue Jan 14 22:18:08 2003
+++ b/include/asm-ia64/ptrace.h	Tue Jan 14 22:18:08 2003
@@ -218,6 +218,8 @@
 # define ia64_task_regs(t)		(((struct pt_regs *) ((char *) (t) + IA64_STK_OFFSET)) - 1)
 # define ia64_psr(regs)			((struct ia64_psr *) &(regs)->cr_ipsr)
 # define user_mode(regs)		(((struct ia64_psr *) &(regs)->cr_ipsr)->cpl != 0)
+# define user_stack(regs)		(current->thread.on_ustack != 0)
+# define fsys_mode(regs)		(!user_mode(regs) && user_stack(regs))
 
   struct task_struct;			/* forward decl */
 


             reply	other threads:[~2003-01-15  6:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-01-15  6:36 David Mosberger [this message]
2003-01-15 16:23 ` [Linux-ia64] fsyscall-support David Mosberger
2003-01-16  1:14 ` David Mosberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=marc-linux-ia64-105590709805677@msgid-missing \
    --to=davidm@napali.hpl.hp.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.