[Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction
@ 2002-12-11 12:48 Keith Owens
  2003-03-14  4:39 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 unwind.c - allow unw_access_gr(r0) Keith Owens
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Keith Owens @ 2002-12-11 12:48 UTC (permalink / raw)
  To: linux-ia64

On Tue, 10 Dec 2002 21:44:12 -0700, 
Bjorn Helgaas <bjorn_helgaas@hp.com> wrote:
>The latest ia64 kernel patch for Linux 2.4.20 is available here:
>
>    ftp://ftp.kernel.org/pub/linux/kernel/ports/ia64/v2.4/linux-2.4.20-ia64-021210.diff.gz

Trivial backport of a fix from 2.5 ia64.  Prevents the kernel looping
on a zero instruction.

--- arch/ia64/kernel/traps.c.orig	Wed Dec 11 23:44:14 2002
+++ arch/ia64/kernel/traps.c	Wed Dec 11 23:45:47 2002
@@ -143,6 +143,7 @@
 
 	switch (break_num) {
 	      case 0: /* unknown error */
+		die_if_kernel("bad break", regs, break_num);
 		sig = SIGILL; code = ILL_ILLOPC;
 		break;
 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Linux-ia64] [patch] 2.4.20-ia64-021210 unwind.c - allow unw_access_gr(r0)
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
@ 2003-03-14  4:39 ` Keith Owens
  2003-03-15  0:01 ` Bjorn Helgaas
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-14  4:39 UTC (permalink / raw)
  To: linux-ia64

Patch is against 2.4.20-ia64-021210.
The patch allows unw_access_gr() to read from r0, to support unwind
directives such as .save ar.pfs,r0 and .save rp,r0.

Index: 20.5/arch/ia64/kernel/unwind.c
--- 20.5/arch/ia64/kernel/unwind.c Sat, 14 Sep 2002 01:17:57 +1000 kaos (linux-2.4/r/c/42_unwind.c 1.1.2.1.1.2.3.1.1.1.1.2 644)
+++ 20.5(w)/arch/ia64/kernel/unwind.c Fri, 14 Mar 2003 11:58:55 +1100 kaos (linux-2.4/r/c/42_unwind.c 1.1.2.1.1.2.3.1.1.1.1.2 644)
@@ -235,6 +235,11 @@ unw_access_gr (struct unw_frame_info *in
 	struct pt_regs *pt;
 
 	if ((unsigned) regnum - 1 >= 127) {
+		if (regnum = 0 && !write) {
+			*val = 0;	/* read r0 always returns 0 */
+			*nat = 0;
+			return 0;
+		}
 		dprintk("unwind: trying to access non-existent r%u\n", regnum);
 		return -1;
 	}



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 unwind.c - allow unw_access_gr(r0)
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
  2003-03-14  4:39 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 unwind.c - allow unw_access_gr(r0) Keith Owens
@ 2003-03-15  0:01 ` Bjorn Helgaas
  2003-03-15  1:10 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code Keith Owens
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Bjorn Helgaas @ 2003-03-15  0:01 UTC (permalink / raw)
  To: linux-ia64

On Thursday 13 March 2003 9:39 pm, Keith Owens wrote:
> Patch is against 2.4.20-ia64-021210.
> The patch allows unw_access_gr() to read from r0, to support unwind
> directives such as .save ar.pfs,r0 and .save rp,r0.

I applied this in the 2.4 tree, along with the other cleanups
from David's 2.5 tree.

Bjorn



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
  2003-03-14  4:39 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 unwind.c - allow unw_access_gr(r0) Keith Owens
  2003-03-15  0:01 ` Bjorn Helgaas
@ 2003-03-15  1:10 ` Keith Owens
  2003-03-15  1:30 ` David Mosberger
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-15  1:10 UTC (permalink / raw)
  To: linux-ia64

Patch is against 2.4.20-ia64-021210.

Reduce the uncontended spinlock path from 5 bundles to 3 (McKinley) or
4 (Itanium), both have one less memory access on the uncontended path.
Move the contended path out of line so we can do exponential backoff,
kdb, lock metering etc. in one place.

David, closing the 1 bundle unwind window was a bit harder than I
expected, it turns out that altrp does not specify where it applies, it
is prologue global.  To get altrp to apply after mov b7=r28, I needed
multiple prologues and bodies.  AFAICT this will unwind correctly on
any instruction within ia64_spinlock_contention.  Could you verify that
on your simulator (assuming it supports .save ar.pfs,r0)?

Note 1.  The patch uses .save ar.pfs, r0 which requires the unwind code
         be updated to allow reading from r0.  You need one of these
         unwind patches as well :-

	 2.4.20           http://external-lists.valinux.com/archives/linux-ia64/2003-March/004940.html
	 2.4.21-pre5 + bk http://external-lists.valinux.com/archives/linux-ia64/2003-March/004939.html

Note 2.  For (CONFIG_MODULES=y && CONFIG_MCKINLEY=y), you need modutils
         2.4.23.  If you get this message then upgrade your modutils.

	   Unhandled relocation of type 72 for ia64_spinlock_contention

Index: 20.5/include/asm-ia64/spinlock.h
--- 20.5/include/asm-ia64/spinlock.h Fri, 01 Mar 2002 11:01:28 +1100 kaos (linux-2.4/s/28_spinlock.h 1.1.3.1.1.1.2.1 644)
+++ 20.5(w)/include/asm-ia64/spinlock.h Fri, 14 Mar 2003 16:10:55 +1100 kaos (linux-2.4/s/28_spinlock.h 1.1.3.1.1.1.2.1 644)
@@ -15,10 +15,6 @@
 #include <asm/bitops.h>
 #include <asm/atomic.h>
 
-#undef NEW_LOCK
-
-#ifdef NEW_LOCK
-
 typedef struct {
 	volatile unsigned int lock;
 } spinlock_t;
@@ -27,81 +23,73 @@ typedef struct {
 #define spin_lock_init(x)			((x)->lock = 0)
 
 /*
- * Streamlined test_and_set_bit(0, (x)).  We use test-and-test-and-set
- * rather than a simple xchg to avoid writing the cache-line when
- * there is contention.
+ * Try to get the lock.  If we fail to get the lock, branch (not call) to
+ * ia64_spinlock_contention.  We do not use call because that stamps on ar.pfs
+ * which has unwanted side effects on the routine using spin_lock().
+ *
+ * ia64_spinlock_contention is entered with :-
+ *   r28 - address of start of spin_lock code.  Used as a "return" address
+ *         from the contention path.  mov r28=ip must be in the first bundle.
+ *   r29 - available for use.
+ *   r30 - available for use.
+ *   r31 - address of lock.
+ *   b7  - available for use.
+ *   p15 - available for use.
+ *
+ * If you patch ia64_spinlock_contention to use more registers, do not forget to
+ * update the clobber lists below.
  */
-#define spin_lock(x)									\
-{											\
+
+#ifdef	CONFIG_MCKINLEY
+#define spin_lock(x) 									\
+({											\
 	register char *addr __asm__ ("r31") = (char *) &(x)->lock;			\
 											\
 	__asm__ __volatile__ (								\
-		"mov r30=1\n"								\
+		"1:\n"		/* force a new bundle, r28 points here */		\
 		"mov ar.ccv=r0\n"							\
+		"mov r28=ip\n"								\
+		"mov r30=1\n"								\
 		";;\n"									\
 		"cmpxchg4.acq r30=[%0],r30,ar.ccv\n"					\
 		";;\n"									\
 		"cmp.ne p15,p0=r30,r0\n"						\
-		"(p15) br.call.spnt.few b7=ia64_spinlock_contention\n"			\
 		";;\n"									\
-		"1:\n"				/* force a new bundle */		\
+		"(p15) brl.cond.spnt.few ia64_spinlock_contention\n"			\
+		";;\n"									\
+		"2:\n"		/* force a new bundle */				\
 		:: "r"(addr)								\
-		: "ar.ccv", "ar.pfs", "b7", "p15", "r28", "r29", "r30", "memory");	\
-}
-
-#define spin_trylock(x)									\
+		: "ar.ccv", "r28", "r29", "r30", "b7", "p15", "memory");		\
+})
+#else	/* !CONFIG_MCKINLEY */
 ({											\
-	register long result;								\
+	register char *addr __asm__ ("r31") = (char *) &(x)->lock;			\
 											\
 	__asm__ __volatile__ (								\
+		"1:\n"		/* force a new bundle, r28 points here */		\
 		"mov ar.ccv=r0\n"							\
+		"mov r28=ip\n"								\
+		"mov r30=1\n"								\
+		";;\n"									\
+		"cmpxchg4.acq r30=[%0],r30,ar.ccv\n"					\
+		"movl r29=ia64_spinlock_contention\n"					\
+		";;\n"									\
+		"cmp.ne p15,p0=r30,r0\n"						\
+		"mov b7=r29\n"								\
+		";;\n"									\
+		"(p15) br.cond.spnt.few b7\n"						\
 		";;\n"									\
-		"cmpxchg4.acq %0=[%2],%1,ar.ccv\n"					\
-		: "=r"(result) : "r"(1), "r"(&(x)->lock) : "ar.ccv", "memory");		\
-	(result = 0);									\
+		"2:\n"		/* force a new bundle */				\
+		:: "r"(addr)								\
+		: "ar.ccv", "r28", "r29", "r30", "b7", "p15", "memory");		\
 })
-
-#define spin_is_locked(x)	((x)->lock != 0)
-#define spin_unlock(x)		do { barrier(); ((spinlock_t *) x)->lock = 0;} while (0)
-#define spin_unlock_wait(x)	do { barrier(); } while ((x)->lock)
-
-#else /* !NEW_LOCK */
-
-typedef struct {
-	volatile unsigned int lock;
-} spinlock_t;
-
-#define SPIN_LOCK_UNLOCKED			(spinlock_t) { 0 }
-#define spin_lock_init(x)			((x)->lock = 0)
-
-/*
- * Streamlined test_and_set_bit(0, (x)).  We use test-and-test-and-set
- * rather than a simple xchg to avoid writing the cache-line when
- * there is contention.
- */
-#define spin_lock(x) __asm__ __volatile__ (			\
-	"mov ar.ccv = r0\n"					\
-	"mov r29 = 1\n"						\
-	";;\n"							\
-	"1:\n"							\
-	"ld4 r2 = [%0]\n"					\
-	";;\n"							\
-	"cmp4.eq p0,p7 = r0,r2\n"				\
-	"(p7) br.cond.spnt.few 1b \n"				\
-	"cmpxchg4.acq r2 = [%0], r29, ar.ccv\n"			\
-	";;\n"							\
-	"cmp4.eq p0,p7 = r0, r2\n"				\
-	"(p7) br.cond.spnt.few 1b\n"				\
-	";;\n"							\
-	:: "r"(&(x)->lock) : "ar.ccv", "p7", "r2", "r29", "memory")
+#endif	/* CONFIG_MCKINLEY */
 
 #define spin_is_locked(x)	((x)->lock != 0)
 #define spin_unlock(x)		do { barrier(); ((spinlock_t *) x)->lock = 0; } while (0)
 #define spin_trylock(x)		(cmpxchg_acq(&(x)->lock, 0, 1) = 0)
 #define spin_unlock_wait(x)	do { barrier(); } while ((x)->lock)
 
-#endif /* !NEW_LOCK */
-
 typedef struct {
 	volatile int read_counter:31;
 	volatile int write_lock:1;
Index: 20.5/arch/ia64/kernel/ia64_ksyms.c
--- 20.5/arch/ia64/kernel/ia64_ksyms.c Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/r/c/35_ia64_ksyms 1.1.3.1.3.1.1.1.1.3 644)
+++ 20.5(w)/arch/ia64/kernel/ia64_ksyms.c Fri, 14 Mar 2003 16:11:57 +1100 kaos (linux-2.4/r/c/35_ia64_ksyms 1.1.3.1.3.1.1.1.1.3 644)
@@ -165,3 +165,9 @@ EXPORT_SYMBOL(machvec_noop);
 EXPORT_SYMBOL(pfm_install_alternate_syswide_subsystem);
 EXPORT_SYMBOL(pfm_remove_alternate_syswide_subsystem);
 #endif
+
+/* Spinlock contention path is entered via direct branch, not using a function
+ * pointer.  Fudge the declaration so we do not generate a function descriptor.
+ */
+extern char ia64_spinlock_contention[];
+EXPORT_SYMBOL_NOVERS(ia64_spinlock_contention);
Index: 20.5/arch/ia64/kernel/head.S
--- 20.5/arch/ia64/kernel/head.S Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/s/c/11_head.S 1.1.4.1.3.1.1.1.1.3 644)
+++ 20.5(w)/arch/ia64/kernel/head.S Fri, 14 Mar 2003 16:14:06 +1100 kaos (linux-2.4/s/c/11_head.S 1.1.4.1.3.1.1.1.1.3 644)
@@ -742,69 +742,62 @@ SET_REG(b5);
 #ifdef CONFIG_SMP
 
 	/*
-	 * This routine handles spinlock contention.  It uses a simple exponential backoff
-	 * algorithm to reduce unnecessary bus traffic.  The initial delay is selected from
-	 * the low-order bits of the cycle counter (a cheap "randomizer").  I'm sure this
-	 * could use additional tuning, especially on systems with a large number of CPUs.
-	 * Also, I think the maximum delay should be made a function of the number of CPUs in
-	 * the system. --davidm 00/08/05
+	 * This routine handles spinlock contention, using non-standard entry
+	 * conventions.  To avoid converting leaf routines into non-leaf, the
+	 * inline spin_lock() code uses br.cond (not br.call) to enter this
+	 * code.  r28 contains the start of the inline spin_lock() code.
 	 *
-	 * WARNING: This is not a normal procedure.  It gets called from C code without
-	 * the compiler knowing about it.  Thus, we must not use any scratch registers
-	 * beyond those that were declared "clobbered" at the call-site (see spin_lock()
-	 * macro).  We may not even use the stacked registers, because that could overwrite
-	 * output registers.  Similarly, we can't use the scratch stack area as it may be
-	 * in use, too.
+	 * Do not use gp relative variables, this code is called from the kernel
+	 * and from modules, r1 is undefined.  Do not use stacked registers, the
+	 * caller owns them.  Do not use the scratch stack space, the caller
+	 * owns it.  Do not change ar.pfs, the caller owns it.
 	 *
 	 * Inputs:
-	 *	ar.ccv = 0 (and available for use)
-	 *	r28 = available for use
-	 *	r29 = available for use
-	 *	r30 = non-zero (and available for use)
-	 *	r31 = address of lock we're trying to acquire
-	 *	p15 = available for use
+	 *   ar.ccv - 0 (and available for use)
+	 *   r12    - kernel stack pointer, but see above.
+	 *   r13    - current process.
+	 *   r28    - address of start of spin_lock code.  Used as a "return"
+	 *            address from this contention path.  Available for use
+	 *            after it has been saved.
+	 *   r29    - available for use.
+	 *   r30    - available for use.
+	 *   r31    - address of lock.
+	 *   b7     - available for use.
+	 *   p15    - available for use.
+	 *   Rest   - caller's state, do not use, especially ar.pfs.
+	 *
+	 * If you patch this code to use more registers, do not forget to update
+	 * the clobber lists for spin_lock() in include/asm-ia64/spinlock.h.
 	 */
 
-#	define delay	r28
-#	define timeout	r29
-#	define tmp	r30
-
 GLOBAL_ENTRY(ia64_spinlock_contention)
-	mov tmp=ar.itc
-	;;
-	and delay=0x3f,tmp
-	;;
-
-.retry:	add timeout=tmp,delay
-	shl delayÞlay,1
-	;;
-	dep delayÞlay,r0,0,13	// limit delay to 8192 cycles
-	;;
-	// delay a little...
-.wait:	sub tmp=tmp,timeout
-	or delay=0xf,delay	// make sure delay is non-zero (otherwise we get stuck with 0)
-	;;
-	cmp.lt p15,p0=tmp,r0
-	mov tmp=ar.itc
-(p15)	br.cond.sptk .wait
-	;;
-	ld4 tmp=[r31]
-	;;
-	cmp.ne p15,p0=tmp,r0
-	mov tmp=ar.itc
-(p15)	br.cond.sptk .retry	// lock is still busy
-	;;
-	// try acquiring lock (we know ar.ccv is still zero!):
-	mov tmp=1
-	;;
-	cmpxchg4.acq tmp=[r31],tmp,ar.ccv
-	;;
-	cmp.eq p15,p0=tmp,r0
-
-	mov tmp=ar.itc
-(p15)	br.ret.sptk.many b7	// got lock -> return
-	br .retry		// still no luck, retry
+	// To get decent unwind data, lie about our state
+	.prologue
+	.save ar.pfs, r0	// this code effectively has a zero frame size
+	.body
+	.label_state 1
+
+	.prologue
+	.save rp, r28		// r28 = my "return" address
+	.body
+	mov b7=r28
+	.body
+	.copy_state 1
+
+	.prologue
+	.altrp b7		// altrp has no 'when' field, it needs its own prologue
+	.body
+
+.retry:
+	// exponential backoff, kdb, lockmeter etc. go in here
+	//
+	;;
+	ld4 r28=[r31]
+	;;
+	cmp4.eq p15,p0=r28,r0
+(p15)	br.cond.spnt.few b7	// lock is now free, try to acquire
+	br.cond.sptk.few .retry
 
 END(ia64_spinlock_contention)
 
-#endif
+#endif	// CONFIG_SMP




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (2 preceding siblings ...)
  2003-03-15  1:10 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code Keith Owens
@ 2003-03-15  1:30 ` David Mosberger
  2003-03-15  2:36 ` Keith Owens
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-15  1:30 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Sat, 15 Mar 2003 12:10:38 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> Patch is against 2.4.20-ia64-021210.  Reduce the uncontended
  Keith> spinlock path from 5 bundles to 3 (McKinley) or 4 (Itanium),
  Keith> both have one less memory access on the uncontended path.
  Keith> Move the contended path out of line so we can do exponential
  Keith> backoff, kdb, lock metering etc. in one place.

  Keith> David, closing the 1 bundle unwind window was a bit harder
  Keith> than I expected, it turns out that altrp does not specify
  Keith> where it applies, it is prologue global.  To get altrp to
  Keith> apply after mov b7=r28, I needed multiple prologues and
  Keith> bodies.  AFAICT this will unwind correctly on any instruction
  Keith> within ia64_spinlock_contention.  Could you verify that on
  Keith> your simulator (assuming it supports .save ar.pfs,r0)?

I think it's correct, but more complicated than it has to be: you
should be able to use the general directive ".spillreg rp, b7" instead
of ".altrp".

Another suggestion: use #ifndef CONFIG_ITANIUM instead of #ifdef
CONFIG_MCKINLEY.  This ensures that brl will be used on future CPUs as
well (as it should be).

	--david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (3 preceding siblings ...)
  2003-03-15  1:30 ` David Mosberger
@ 2003-03-15  2:36 ` Keith Owens
  2003-03-15  2:40 ` Keith Owens
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-15  2:36 UTC (permalink / raw)
  To: linux-ia64

On Fri, 14 Mar 2003 17:30:01 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>I think it's correct, but more complicated than it has to be: you
>should be able to use the general directive ".spillreg rp, b7" instead
>of ".altrp".

Done ...

>Another suggestion: use #ifndef CONFIG_ITANIUM instead of #ifdef
>CONFIG_MCKINLEY.  This ensures that brl will be used on future CPUs as
>well (as it should be).

... and done.  Take 2.

Patch is against 2.4.20-ia64-021210.

Reduce the uncontended spinlock path from 5 bundles to 3 (McKinley) or
4 (Itanium), both have one less memory access on the uncontended path.
Move the contended path out of line so we can do exponential backoff,
kdb, lock metering etc. in one place.

David, closing the 1 bundle unwind window was a bit harder than I
expected, it turns out that altrp does not specify where it applies, it
is prologue global.  To get altrp to apply after mov b7=r28, I needed
multiple prologues and bodies.  AFAICT this will unwind correctly on
any instruction within ia64_spinlock_contention.  Could you verify that
on your simulator (assuming it supports .save ar.pfs,r0)?

Note 1.  The patch uses .save ar.pfs, r0 which requires the unwind code
         be updated to allow reading from r0.  You need one of these
         unwind patches as well :-

	 2.4.20           http://external-lists.valinux.com/archives/linux-ia64/2003-March/004940.html
	 2.4.21-pre5 + bk http://external-lists.valinux.com/archives/linux-ia64/2003-March/004939.html

Note 2.  For (CONFIG_MODULES=y && CONFIG_MCKINLEY=y), you need modutils
         2.4.23.  If you get this message then upgrade your modutils.

	   Unhandled relocation of type 72 for ia64_spinlock_contention

Index: 20.5/include/asm-ia64/spinlock.h
--- 20.5/include/asm-ia64/spinlock.h Fri, 01 Mar 2002 11:01:28 +1100 kaos (linux-2.4/s/28_spinlock.h 1.1.3.1.1.1.2.1 644)
+++ 20.5(w)/include/asm-ia64/spinlock.h Sat, 15 Mar 2003 12:45:54 +1100 kaos (linux-2.4/s/28_spinlock.h 1.1.3.1.1.1.2.1 644)
@@ -15,10 +15,6 @@
 #include <asm/bitops.h>
 #include <asm/atomic.h>
 
-#undef NEW_LOCK
-
-#ifdef NEW_LOCK
-
 typedef struct {
 	volatile unsigned int lock;
 } spinlock_t;
@@ -27,81 +23,74 @@ typedef struct {
 #define spin_lock_init(x)			((x)->lock = 0)
 
 /*
- * Streamlined test_and_set_bit(0, (x)).  We use test-and-test-and-set
- * rather than a simple xchg to avoid writing the cache-line when
- * there is contention.
+ * Try to get the lock.  If we fail to get the lock, branch (not call) to
+ * ia64_spinlock_contention.  We do not use call because that stamps on ar.pfs
+ * which has unwanted side effects on the routine using spin_lock().
+ *
+ * ia64_spinlock_contention is entered with :-
+ *   r28 - address of start of spin_lock code.  Used as a "return" address
+ *         from the contention path.  mov r28=ip must be in the first bundle.
+ *   r29 - available for use.
+ *   r30 - available for use.
+ *   r31 - address of lock.
+ *   b7  - available for use.
+ *   p15 - available for use.
+ *
+ * If you patch ia64_spinlock_contention to use more registers, do not forget to
+ * update the clobber lists below.
  */
-#define spin_lock(x)									\
-{											\
+
+#ifdef	CONFIG_ITANIUM
+#define spin_lock(x) 									\
+({											\
 	register char *addr __asm__ ("r31") = (char *) &(x)->lock;			\
 											\
 	__asm__ __volatile__ (								\
-		"mov r30=1\n"								\
+		"1:\n"		/* force a new bundle, r28 points here */		\
 		"mov ar.ccv=r0\n"							\
+		"mov r28=ip\n"								\
+		"mov r30=1\n"								\
 		";;\n"									\
 		"cmpxchg4.acq r30=[%0],r30,ar.ccv\n"					\
+		"movl r29=ia64_spinlock_contention\n"					\
 		";;\n"									\
 		"cmp.ne p15,p0=r30,r0\n"						\
-		"(p15) br.call.spnt.few b7=ia64_spinlock_contention\n"			\
+		"mov b7=r29\n"								\
 		";;\n"									\
-		"1:\n"				/* force a new bundle */		\
+		"(p15) br.cond.spnt.few b7\n"						\
+		";;\n"									\
+		"2:\n"		/* force a new bundle */				\
 		:: "r"(addr)								\
-		: "ar.ccv", "ar.pfs", "b7", "p15", "r28", "r29", "r30", "memory");	\
-}
-
-#define spin_trylock(x)									\
+		: "ar.ccv", "r28", "r29", "r30", "b7", "p15", "memory");		\
+})
+#else	/* !CONFIG_ITANIUM */
+#define spin_lock(x) 									\
 ({											\
-	register long result;								\
+	register char *addr __asm__ ("r31") = (char *) &(x)->lock;			\
 											\
 	__asm__ __volatile__ (								\
+		"1:\n"		/* force a new bundle, r28 points here */		\
 		"mov ar.ccv=r0\n"							\
+		"mov r28=ip\n"								\
+		"mov r30=1\n"								\
+		";;\n"									\
+		"cmpxchg4.acq r30=[%0],r30,ar.ccv\n"					\
+		";;\n"									\
+		"cmp.ne p15,p0=r30,r0\n"						\
+		";;\n"									\
+		"(p15) brl.cond.spnt.few ia64_spinlock_contention\n"			\
 		";;\n"									\
-		"cmpxchg4.acq %0=[%2],%1,ar.ccv\n"					\
-		: "=r"(result) : "r"(1), "r"(&(x)->lock) : "ar.ccv", "memory");		\
-	(result = 0);									\
+		"2:\n"		/* force a new bundle */				\
+		:: "r"(addr)								\
+		: "ar.ccv", "r28", "r29", "r30", "b7", "p15", "memory");		\
 })
-
-#define spin_is_locked(x)	((x)->lock != 0)
-#define spin_unlock(x)		do { barrier(); ((spinlock_t *) x)->lock = 0;} while (0)
-#define spin_unlock_wait(x)	do { barrier(); } while ((x)->lock)
-
-#else /* !NEW_LOCK */
-
-typedef struct {
-	volatile unsigned int lock;
-} spinlock_t;
-
-#define SPIN_LOCK_UNLOCKED			(spinlock_t) { 0 }
-#define spin_lock_init(x)			((x)->lock = 0)
-
-/*
- * Streamlined test_and_set_bit(0, (x)).  We use test-and-test-and-set
- * rather than a simple xchg to avoid writing the cache-line when
- * there is contention.
- */
-#define spin_lock(x) __asm__ __volatile__ (			\
-	"mov ar.ccv = r0\n"					\
-	"mov r29 = 1\n"						\
-	";;\n"							\
-	"1:\n"							\
-	"ld4 r2 = [%0]\n"					\
-	";;\n"							\
-	"cmp4.eq p0,p7 = r0,r2\n"				\
-	"(p7) br.cond.spnt.few 1b \n"				\
-	"cmpxchg4.acq r2 = [%0], r29, ar.ccv\n"			\
-	";;\n"							\
-	"cmp4.eq p0,p7 = r0, r2\n"				\
-	"(p7) br.cond.spnt.few 1b\n"				\
-	";;\n"							\
-	:: "r"(&(x)->lock) : "ar.ccv", "p7", "r2", "r29", "memory")
+#endif	/* CONFIG_ITANIUM */
 
 #define spin_is_locked(x)	((x)->lock != 0)
 #define spin_unlock(x)		do { barrier(); ((spinlock_t *) x)->lock = 0; } while (0)
 #define spin_trylock(x)		(cmpxchg_acq(&(x)->lock, 0, 1) = 0)
 #define spin_unlock_wait(x)	do { barrier(); } while ((x)->lock)
 
-#endif /* !NEW_LOCK */
-
 typedef struct {
 	volatile int read_counter:31;
 	volatile int write_lock:1;
Index: 20.5/arch/ia64/kernel/ia64_ksyms.c
--- 20.5/arch/ia64/kernel/ia64_ksyms.c Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/r/c/35_ia64_ksyms 1.1.3.1.3.1.1.1.1.3 644)
+++ 20.5(w)/arch/ia64/kernel/ia64_ksyms.c Fri, 14 Mar 2003 16:11:57 +1100 kaos (linux-2.4/r/c/35_ia64_ksyms 1.1.3.1.3.1.1.1.1.3 644)
@@ -165,3 +165,9 @@ EXPORT_SYMBOL(machvec_noop);
 EXPORT_SYMBOL(pfm_install_alternate_syswide_subsystem);
 EXPORT_SYMBOL(pfm_remove_alternate_syswide_subsystem);
 #endif
+
+/* Spinlock contention path is entered via direct branch, not using a function
+ * pointer.  Fudge the declaration so we do not generate a function descriptor.
+ */
+extern char ia64_spinlock_contention[];
+EXPORT_SYMBOL_NOVERS(ia64_spinlock_contention);
Index: 20.5/arch/ia64/kernel/head.S
--- 20.5/arch/ia64/kernel/head.S Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/s/c/11_head.S 1.1.4.1.3.1.1.1.1.3 644)
+++ 20.5(w)/arch/ia64/kernel/head.S Sat, 15 Mar 2003 12:47:57 +1100 kaos (linux-2.4/s/c/11_head.S 1.1.4.1.3.1.1.1.1.3 644)
@@ -742,69 +742,53 @@ SET_REG(b5);
 #ifdef CONFIG_SMP
 
 	/*
-	 * This routine handles spinlock contention.  It uses a simple exponential backoff
-	 * algorithm to reduce unnecessary bus traffic.  The initial delay is selected from
-	 * the low-order bits of the cycle counter (a cheap "randomizer").  I'm sure this
-	 * could use additional tuning, especially on systems with a large number of CPUs.
-	 * Also, I think the maximum delay should be made a function of the number of CPUs in
-	 * the system. --davidm 00/08/05
+	 * This routine handles spinlock contention, using non-standard entry
+	 * conventions.  To avoid converting leaf routines into non-leaf, the
+	 * inline spin_lock() code uses br.cond (not br.call) to enter this
+	 * code.  r28 contains the start of the inline spin_lock() code.
 	 *
-	 * WARNING: This is not a normal procedure.  It gets called from C code without
-	 * the compiler knowing about it.  Thus, we must not use any scratch registers
-	 * beyond those that were declared "clobbered" at the call-site (see spin_lock()
-	 * macro).  We may not even use the stacked registers, because that could overwrite
-	 * output registers.  Similarly, we can't use the scratch stack area as it may be
-	 * in use, too.
+	 * Do not use gp relative variables, this code is called from the kernel
+	 * and from modules, r1 is undefined.  Do not use stacked registers, the
+	 * caller owns them.  Do not use the scratch stack space, the caller
+	 * owns it.  Do not change ar.pfs, the caller owns it.
 	 *
 	 * Inputs:
-	 *	ar.ccv = 0 (and available for use)
-	 *	r28 = available for use
-	 *	r29 = available for use
-	 *	r30 = non-zero (and available for use)
-	 *	r31 = address of lock we're trying to acquire
-	 *	p15 = available for use
+	 *   ar.ccv - 0 (and available for use)
+	 *   r12    - kernel stack pointer, but see above.
+	 *   r13    - current process.
+	 *   r28    - address of start of spin_lock code.  Used as a "return"
+	 *            address from this contention path.  Available for use
+	 *            after it has been saved.
+	 *   r29    - available for use.
+	 *   r30    - available for use.
+	 *   r31    - address of lock.
+	 *   b7     - available for use.
+	 *   p15    - available for use.
+	 *   Rest   - caller's state, do not use, especially ar.pfs.
+	 *
+	 * If you patch this code to use more registers, do not forget to update
+	 * the clobber lists for spin_lock() in include/asm-ia64/spinlock.h.
 	 */
 
-#	define delay	r28
-#	define timeout	r29
-#	define tmp	r30
-
 GLOBAL_ENTRY(ia64_spinlock_contention)
-	mov tmp=ar.itc
-	;;
-	and delay=0x3f,tmp
-	;;
-
-.retry:	add timeout=tmp,delay
-	shl delayÞlay,1
-	;;
-	dep delayÞlay,r0,0,13	// limit delay to 8192 cycles
-	;;
-	// delay a little...
-.wait:	sub tmp=tmp,timeout
-	or delay=0xf,delay	// make sure delay is non-zero (otherwise we get stuck with 0)
-	;;
-	cmp.lt p15,p0=tmp,r0
-	mov tmp=ar.itc
-(p15)	br.cond.sptk .wait
-	;;
-	ld4 tmp=[r31]
-	;;
-	cmp.ne p15,p0=tmp,r0
-	mov tmp=ar.itc
-(p15)	br.cond.sptk .retry	// lock is still busy
-	;;
-	// try acquiring lock (we know ar.ccv is still zero!):
-	mov tmp=1
-	;;
-	cmpxchg4.acq tmp=[r31],tmp,ar.ccv
-	;;
-	cmp.eq p15,p0=tmp,r0
-
-	mov tmp=ar.itc
-(p15)	br.ret.sptk.many b7	// got lock -> return
-	br .retry		// still no luck, retry
+	// To get decent unwind data, lie about our state
+	.prologue
+	.save ar.pfs, r0	// this code effectively has a zero frame size
+	.spillreg rp, r28
+	mov b7=r28
+	.spillreg rp, b7
+	.body
+
+.retry:
+	// exponential backoff, kdb, lockmeter etc. go in here
+	//
+	;;
+	ld4 r28=[r31]
+	;;
+	cmp4.eq p15,p0=r28,r0
+(p15)	br.cond.spnt.few b7	// lock is now free, try to acquire
+	br.cond.sptk.few .retry
 
 END(ia64_spinlock_contention)
 
-#endif
+#endif	// CONFIG_SMP




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (4 preceding siblings ...)
  2003-03-15  2:36 ` Keith Owens
@ 2003-03-15  2:40 ` Keith Owens
  2003-03-15  6:46 ` David Mosberger
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-15  2:40 UTC (permalink / raw)
  To: linux-ia64

On Sat, 15 Mar 2003 13:36:01 +1100, 
Keith Owens <kaos@sgi.com> wrote:
>David, closing the 1 bundle unwind window was a bit harder than I
>expected, it turns out that altrp does not specify where it applies, it
>is prologue global.  To get altrp to apply after mov b7=r28, I needed
>multiple prologues and bodies.  AFAICT this will unwind correctly on
>any instruction within ia64_spinlock_contention.  Could you verify that
>on your simulator (assuming it supports .save ar.pfs,r0)?

Damned cut and paste key!  Ignore that bit.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (5 preceding siblings ...)
  2003-03-15  2:40 ` Keith Owens
@ 2003-03-15  6:46 ` David Mosberger
  2003-03-15 10:31 ` Keith Owens
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-15  6:46 UTC (permalink / raw)
  To: linux-ia64

I thought about it some more and recalled why I was so uneasy about
claiming ar.pfs is 0: the problem is that this informs that the
_previous_ register frame was empty, not the current one.  So the
unwind info technically is still wrong.  I think you realize that, and
the kernel unwinder won't complain, since it's not paranoid about
validating accesses to stacked registers.  But still, the unwind info
is wrong and I'm not terribly comfortable with that.

	--david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (6 preceding siblings ...)
  2003-03-15  6:46 ` David Mosberger
@ 2003-03-15 10:31 ` Keith Owens
  2003-03-27 20:29 ` David Mosberger
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-15 10:31 UTC (permalink / raw)
  To: linux-ia64

On Fri, 14 Mar 2003 22:46:28 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>I thought about it some more and recalled why I was so uneasy about
>claiming ar.pfs is 0: the problem is that this informs that the
>_previous_ register frame was empty, not the current one.  So the
>unwind info technically is still wrong.  I think you realize that, and
>the kernel unwinder won't complain, since it's not paranoid about
>validating accesses to stacked registers.  But still, the unwind info
>is wrong and I'm not terribly comfortable with that.

I agree, but the end result is benign.  Unwind will attribute the
previous frame's registers to the out of line contention code and will
attribute zero registers to the code that invoked spin_lock(), IOW the
argument lists are swapped on the top two unwind entries.  After that
the unwind is in sync and all other registers are right.

Unwind needs a way of saying "this is out of line code, not a function,
and its state is the same as this ip".  But without that feature in the
unwind spec, this is probably the best that we can do.  It is a pity
that unwind thinks that everything is a function and did not consider
out of line code.

How about putting the new spinlock code in now so I can continue with
adding kdb support for debugging hung spinlocks?  Even with the swapped
arg list, any debug data on hung spinlocks is better than none at all.
I will think some more about the unwind descriptors to see if there is
any way of avoiding the misattribution of the register usage, but the
worst case is that we live with the swapped argument list.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (7 preceding siblings ...)
  2003-03-15 10:31 ` Keith Owens
@ 2003-03-27 20:29 ` David Mosberger
  2003-03-27 23:15 ` Keith Owens
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-27 20:29 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Sat, 15 Mar 2003 21:31:53 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> On Fri, 14 Mar 2003 22:46:28 -0800, 
  Keith> David Mosberger <davidm@napali.hpl.hp.com> wrote:
  >> I thought about it some more and recalled why I was so uneasy about
  >> claiming ar.pfs is 0: the problem is that this informs that the
  >> _previous_ register frame was empty, not the current one.  So the
  >> unwind info technically is still wrong.  I think you realize that, and
  >> the kernel unwinder won't complain, since it's not paranoid about
  >> validating accesses to stacked registers.  But still, the unwind info
  >> is wrong and I'm not terribly comfortable with that.

  Keith> I agree, but the end result is benign.

I disagree.  A bug is a bug.  Relying on implementation-specific
behavior of one particular unwinder doesn't change that.

  Keith> Unwind needs a way of saying "this is out of line code, not a
  Keith> function, and its state is the same as this ip".  But without
  Keith> that feature in the unwind spec, this is probably the best
  Keith> that we can do.  It is a pity that unwind thinks that
  Keith> everything is a function and did not consider out of line
  Keith> code.

  Keith> How about putting the new spinlock code in now so I can
  Keith> continue with adding kdb support for debugging hung
  Keith> spinlocks?  Even with the swapped arg list, any debug data on
  Keith> hung spinlocks is better than none at all.  I will think some
  Keith> more about the unwind descriptors to see if there is any way
  Keith> of avoiding the misattribution of the register usage, but the
  Keith> worst case is that we live with the swapped argument list.

My experience tells me that if I put in the code now, nobody will work
on a corrected version.

I think it makes sense to start a discussion of extending the unwind
spec to make it easier to accommodate what we're trying to do here.  A
similar facility already exists in libunwind for dynamic unwind info
(since runtime function cloning naturally leads to the same issue).

Can you start this discussion?

	--david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (8 preceding siblings ...)
  2003-03-27 20:29 ` David Mosberger
@ 2003-03-27 23:15 ` Keith Owens
  2003-03-27 23:32 ` David Mosberger
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-27 23:15 UTC (permalink / raw)
  To: linux-ia64

On Thu, 27 Mar 2003 12:29:04 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Sat, 15 Mar 2003 21:31:53 +1100, Keith Owens <kaos@sgi.com> said:
>
>  Keith> On Fri, 14 Mar 2003 22:46:28 -0800, 
>  Keith> David Mosberger <davidm@napali.hpl.hp.com> wrote:
>  >> I thought about it some more and recalled why I was so uneasy about
>  >> claiming ar.pfs is 0: the problem is that this informs that the
>  >> _previous_ register frame was empty, not the current one.  So the
>  >> unwind info technically is still wrong.  I think you realize that, and
>  >> the kernel unwinder won't complain, since it's not paranoid about
>  >> validating accesses to stacked registers.  But still, the unwind info
>  >> is wrong and I'm not terribly comfortable with that.
>
>  Keith> I agree, but the end result is benign.
>
>I disagree.  A bug is a bug.  Relying on implementation-specific
>behavior of one particular unwinder doesn't change that.

The code does not rely on any implementation specific behaviour.
Stating that ar.pfs is zero is well defined, it means that the caller
(rp in r28) of this code has no frame.  Therefore the current cfm is
attributed to the contention code.  No matter how you read the unwind
spec, that construct is well defined and assigns the correct number of
registers to the _sum_ of the main line code and the contention path.
Therefore unwind through the mainline code will work correctly.

>  Keith> How about putting the new spinlock code in now so I can
>  Keith> continue with adding kdb support for debugging hung
>  Keith> spinlocks?  Even with the swapped arg list, any debug data on
>  Keith> hung spinlocks is better than none at all.  I will think some
>  Keith> more about the unwind descriptors to see if there is any way
>  Keith> of avoiding the misattribution of the register usage, but the
>  Keith> worst case is that we live with the swapped argument list.
>
>My experience tells me that if I put in the code now, nobody will work
>on a corrected version.
>
>I think it makes sense to start a discussion of extending the unwind
>spec to make it easier to accommodate what we're trying to do here.  A
>similar facility already exists in libunwind for dynamic unwind info
>(since runtime function cloning naturally leads to the same issue).

They are superficially similar but the unwind implementation would be
quite different.  Function cloning results in a call structure which
modifies ar.pfs plus a copy of the complete prologue and body of the
function.  That cannot be handled by a single "my state is the same as
that one over there" pointer.  It needs an unwind table entry for the
length and address of the new function that points to the info block
for the function it was cloned from.  Adding an unwind table entry does
not require any change to the unwind spec, it just needs the
implementation to support multiple unwind tables, which the spec
already allows.

Out of line code entered via a direct branch needs an unwind construct
that says ar.pfs is not changed by unwinding through the return
pointer.  IOW, this is _not_ a call structure, even through it has a
return pointer.  This requires a new unwind descriptor type, all the
existing unwind descriptors implicitly assume a call structure.  The
ideal would be a copy_state descriptor that took a register name
instead of a numbered label, i.e. like B4 but taking treg instead of
label.

It is the difference between unwind data that replicates an entire
function (and requires all the unwind info for that function) and
unwind data for a single return point.

>Can you start this discussion?

I can start it, but it will take months to get agreement on the change
to the unwind spec, followed by more time for the ia64 assemblers to be
upgraded to handle the new unwind descriptor and more time for users to
upgrade to the new binutils before the kernel can use any new
construct.  I want to get debugging working for hung ia64 spinlocks
this month, not in a year's time.

BTW, the new spinlock code with the out of line contention code is
faster.

David, you added the NEW_LOCK code even though it never worked and
could never work.  But when I supply code that works, is faster, allows
for better debugging and performance monitoring you quibble about one
construct to get the unwind data right.  I do not understand your
priorities here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (9 preceding siblings ...)
  2003-03-27 23:15 ` Keith Owens
@ 2003-03-27 23:32 ` David Mosberger
  2003-03-28  1:39 ` Keith Owens
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-27 23:32 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 28 Mar 2003 10:15:02 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> The code does not rely on any implementation specific
  Keith> behaviour.  Stating that ar.pfs is zero is well defined, it
  Keith> means that the caller (rp in r28) of this code has no frame.

No, an unwinder might check whether a stacked register is out of the
current frame and complain if so.  Ergo, it's implementation-dependent
behavior.

  >> Can you start this discussion?

  Keith> I can start it, but it will take months to get agreement on
  Keith> the change to the unwind spec, followed by more time for the
  Keith> ia64 assemblers to be upgraded to handle the new unwind
  Keith> descriptor and more time for users to upgrade to the new
  Keith> binutils before the kernel can use any new construct.  I want
  Keith> to get debugging working for hung ia64 spinlocks this month,
  Keith> not in a year's time.

We don't have to wait until all the details are settled.  What's
important is that there is a general agreement that the code in
question needs to be accommodated.

  Keith> David, you added the NEW_LOCK code even though it never
  Keith> worked and could never work.  But when I supply code that
  Keith> works, is faster, allows for better debugging and performance
  Keith> monitoring you quibble about one construct to get the unwind
  Keith> data right.  I do not understand your priorities here.

Want to guess why the NEW_LOCK code was never enabled?  If you want to
add the code with an #if 0, that's fine with me.

	--david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (10 preceding siblings ...)
  2003-03-27 23:32 ` David Mosberger
@ 2003-03-28  1:39 ` Keith Owens
  2003-03-28  1:45 ` David Mosberger
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-28  1:39 UTC (permalink / raw)
  To: linux-ia64

On Thu, 27 Mar 2003 15:32:52 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Fri, 28 Mar 2003 10:15:02 +1100, Keith Owens <kaos@sgi.com> said:
>
>  Keith> The code does not rely on any implementation specific
>  Keith> behaviour.  Stating that ar.pfs is zero is well defined, it
>  Keith> means that the caller (rp in r28) of this code has no frame.
>
>No, an unwinder might check whether a stacked register is out of the
>current frame and complain if so.  Ergo, it's implementation-dependent
>behavior.

The only unwinder I care about is in the kernel and that handles this
condition.  libunwind also accepts this condition.  Both unwinders
check that the stacked register is within the current backing store,
not within the current frame.  You are rejecting working, faster and
better debugging code because some hypothetical unwinder might detect
an error one day.  Even if this hypothetical unwinder did detect an
error, it would simply stop the unwind.

>Want to guess why the NEW_LOCK code was never enabled?  If you want to
>add the code with an #if 0, that's fine with me.

The new spinlock code is faster, shrinks the kernel, allows for
exponential backoff and better debugging.  It should be the default in
all kernels, with an option to disable it for people with strange
unwinders.

Disable out of line spinlock code
CONFIG_IA64_DISABLE_OOL_SPINLOCK
  By default, IA64 uses out of line code for the spinlock contention
  path.  This shrinks the size of the kernel, makes uncontended locks
  faster and allows for exponential backoff and debugging on contended
  locks.

  It uses an unusual unwind mechanism that may not work if you use an
  unwinder other than the one in the kernel.  Even if your unwinder
  does not work, the only effect will be an incomplete unwind for code
  that is stuck in the spinlock contention path, this has no effect on
  the kernel itself.

  If you know that you have an IA64 unwinder that cannot cope with the
  out of line spinlock code, say Y and live with a kernel that is
  larger, slower and cannot debug hung spinlocks.  Otherwise say N.

Acceptable?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (11 preceding siblings ...)
  2003-03-28  1:39 ` Keith Owens
@ 2003-03-28  1:45 ` David Mosberger
  2003-03-28  1:49 ` Keith Owens
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-28  1:45 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 28 Mar 2003 12:39:14 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> Acceptable?

No.  Look, I have been spending lots of time tracking down and
squashing unwind bugs over the last year or so.  I'm simply not going
to accept any new code that knowingly violates unwind conventions.
Period.

I'm willing to accept your patch _after_ the issue has been brought up
with the ABI committee and there is a concensus that something needs
to be done about it.  This shouldn't take more than a couple of days
or, at most, weeks.  Plenty of time to get it resolved before 2.6
materializes.

	--david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (12 preceding siblings ...)
  2003-03-28  1:45 ` David Mosberger
@ 2003-03-28  1:49 ` Keith Owens
  2003-03-28  1:53 ` David Mosberger
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-28  1:49 UTC (permalink / raw)
  To: linux-ia64

On Thu, 27 Mar 2003 17:45:27 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Fri, 28 Mar 2003 12:39:14 +1100, Keith Owens <kaos@sgi.com> said:
>
>  Keith> Acceptable?
>
>No.  Look, I have been spending lots of time tracking down and
>squashing unwind bugs over the last year or so.  I'm simply not going
>to accept any new code that knowingly violates unwind conventions.
>Period.
>
>I'm willing to accept your patch _after_ the issue has been brought up
>with the ABI committee and there is a concensus that something needs
>to be done about it.  This shouldn't take more than a couple of days
>or, at most, weeks.  Plenty of time to get it resolved before 2.6
>materializes.

I am giving up on getting this patch into the community.  It will be
applied to SGI ia64 kernels so they will be faster, smaller and have
better debugging.  Community kernels will have to stay with the old
code and inferior debugging for hung spinlocks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (13 preceding siblings ...)
  2003-03-28  1:49 ` Keith Owens
@ 2003-03-28  1:53 ` David Mosberger
  2003-03-28  2:10 ` Keith Owens
  2003-03-28  2:14 ` David Mosberger
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-28  1:53 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 28 Mar 2003 12:49:33 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> I am giving up on getting this patch into the community.  It will be
  Keith> applied to SGI ia64 kernels so they will be faster, smaller and have
  Keith> better debugging.  Community kernels will have to stay with the old
  Keith> code and inferior debugging for hung spinlocks.

Well, that's your right.  But please explain to me, just why is it so
hard to bring up the issue with the ABI folks?  That would have taken
less time than to argueing a moot point with me.

	--david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (14 preceding siblings ...)
  2003-03-28  1:53 ` David Mosberger
@ 2003-03-28  2:10 ` Keith Owens
  2003-03-28  2:14 ` David Mosberger
  16 siblings, 0 replies; 18+ messages in thread
From: Keith Owens @ 2003-03-28  2:10 UTC (permalink / raw)
  To: linux-ia64

On Thu, 27 Mar 2003 17:53:05 -0800, 
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Fri, 28 Mar 2003 12:49:33 +1100, Keith Owens <kaos@sgi.com> said:
>
>  Keith> I am giving up on getting this patch into the community.  It will be
>  Keith> applied to SGI ia64 kernels so they will be faster, smaller and have
>  Keith> better debugging.  Community kernels will have to stay with the old
>  Keith> code and inferior debugging for hung spinlocks.
>
>Well, that's your right.  But please explain to me, just why is it so
>hard to bring up the issue with the ABI folks?  That would have taken
>less time than to argueing a moot point with me.

Because I want to get on with kdb now!  Now after waiting for the ABI
folks to get up to speed.  I am quite happy to talk to the ABI folks
but not if it means delaying my improvements to kdb.  Put the code in
and I will talk to the ABI folks, otherwise I am not going to bother.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code
  2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
                   ` (15 preceding siblings ...)
  2003-03-28  2:10 ` Keith Owens
@ 2003-03-28  2:14 ` David Mosberger
  16 siblings, 0 replies; 18+ messages in thread
From: David Mosberger @ 2003-03-28  2:14 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 28 Mar 2003 13:10:25 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> On Thu, 27 Mar 2003 17:53:05 -0800, 
  Keith> David Mosberger <davidm@napali.hpl.hp.com> wrote:
  >>>>>>> On Fri, 28 Mar 2003 12:49:33 +1100, Keith Owens <kaos@sgi.com> said:
  >> 
  Keith> I am giving up on getting this patch into the community.  It will be
  Keith> applied to SGI ia64 kernels so they will be faster, smaller and have
  Keith> better debugging.  Community kernels will have to stay with the old
  Keith> code and inferior debugging for hung spinlocks.
  >> 
  >> Well, that's your right.  But please explain to me, just why is it so
  >> hard to bring up the issue with the ABI folks?  That would have taken
  >> less time than to argueing a moot point with me.

  Keith> Because I want to get on with kdb now!  Now after waiting for the ABI
  Keith> folks to get up to speed.  I am quite happy to talk to the ABI folks
  Keith> but not if it means delaying my improvements to kdb.  Put the code in
  Keith> and I will talk to the ABI folks, otherwise I am not going to bother.

Send a mail to the ABI folks now (and cc me) and I'll put in the
version that makes the new code available via a config option;
otherwise _I_ won't bother. ;-)

	--david


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-03-28  2:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-11 12:48 [Linux-ia64] [patch] 2.4.20-ia64-021210 prevent loop on zero instruction Keith Owens
2003-03-14  4:39 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 unwind.c - allow unw_access_gr(r0) Keith Owens
2003-03-15  0:01 ` Bjorn Helgaas
2003-03-15  1:10 ` [Linux-ia64] [patch] 2.4.20-ia64-021210 new spinlock code Keith Owens
2003-03-15  1:30 ` David Mosberger
2003-03-15  2:36 ` Keith Owens
2003-03-15  2:40 ` Keith Owens
2003-03-15  6:46 ` David Mosberger
2003-03-15 10:31 ` Keith Owens
2003-03-27 20:29 ` David Mosberger
2003-03-27 23:15 ` Keith Owens
2003-03-27 23:32 ` David Mosberger
2003-03-28  1:39 ` Keith Owens
2003-03-28  1:45 ` David Mosberger
2003-03-28  1:49 ` Keith Owens
2003-03-28  1:53 ` David Mosberger
2003-03-28  2:10 ` Keith Owens
2003-03-28  2:14 ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox