public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] pthread failure ???
@ 2002-06-13  0:56 Jack Steiner
  2002-06-13  8:59 ` Andreas Schwab
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Jack Steiner @ 2002-06-13  0:56 UTC (permalink / raw)
  To: linux-ia64

We have a pthread'ed application that ran fine on IA64 2.4.17.

When we upgraded the kernel to 2.4.18, the application started to
fail. The failure occurs in glibc at:
	chunk_free
	__libc_free
	...

We verified that the app consistently fails with a 2.4.18 kernel
but works fine with a 2.4.17 kernel (same app & libraries).

No other failures have been seen in other apps.



Has anyone else seen this behavior or have any ideas??


-- 
Thanks

Jack Steiner    (651-683-5302)   (vnet 233-5302)      steiner@sgi.com



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] pthread failure ???
  2002-06-13  0:56 [Linux-ia64] pthread failure ??? Jack Steiner
@ 2002-06-13  8:59 ` Andreas Schwab
  2002-06-13 15:25 ` David Mosberger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Andreas Schwab @ 2002-06-13  8:59 UTC (permalink / raw)
  To: linux-ia64

Jack Steiner <steiner@sgi.com> writes:

|> We have a pthread'ed application that ran fine on IA64 2.4.17.
|> 
|> When we upgraded the kernel to 2.4.18, the application started to
|> fail. The failure occurs in glibc at:
|> 	chunk_free
|> 	__libc_free
|> 	...

Please define "failure".

|> Has anyone else seen this behavior or have any ideas??

Check your malloc/realloc/free calls, use a malloc debugger (see the
glibc manual for the glibc internal one).

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] pthread failure ???
  2002-06-13  0:56 [Linux-ia64] pthread failure ??? Jack Steiner
  2002-06-13  8:59 ` Andreas Schwab
@ 2002-06-13 15:25 ` David Mosberger
  2002-06-13 18:14 ` Jack Steiner
  2002-06-13 18:26 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2002-06-13 15:25 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 12 Jun 2002 19:56:50 -0500 (CDT), Jack Steiner <steiner@sgi.com> said:

  Jack> We have a pthread'ed application that ran fine on IA64 2.4.17.

  Jack> When we upgraded the kernel to 2.4.18, the application started
  Jack> to fail. The failure occurs in glibc at: chunk_free
  Jack> __libc_free ...

  Jack> We verified that the app consistently fails with a 2.4.18
  Jack> kernel but works fine with a 2.4.17 kernel (same app &
  Jack> libraries).

  Jack> No other failures have been seen in other apps.

  Jack> Has anyone else seen this behavior or have any ideas??

I'm wondering if this is related to the fix for the "sp off by 16" bug
that was introduced in the 020410 ia64 patch.  The relevant bits are below.
Can you see if the problem occurs without these changes?

	--david

diff -urN linux-2.4.18/arch/ia64/ia32/ia32_entry.S lia64-2.4/arch/ia64/ia32/ia32_entry.S
--- linux-2.4.18/arch/ia64/ia32/ia32_entry.S	Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_entry.S	Sat Feb  9 10:41:41 2002
@@ -37,7 +37,7 @@
 	mov loc1=r16				// save ar.pfs across do_fork
 	.body
 	zxt4 out1=in1				// newsp
-	mov out3=0				// stacksize
+	mov out3\x16				// stacksize (compensates for 16-byte scratch area)
 	adds out2=IA64_SWITCH_STACK_SIZE+16,sp	// out2 = &regs
 	zxt4 out0=in0				// out0 = clone_flags
 	br.call.sptk.many rp=do_fork
diff -urN linux-2.4.18/arch/ia64/kernel/entry.S lia64-2.4/arch/ia64/kernel/entry.S
--- linux-2.4.18/arch/ia64/kernel/entry.S	Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/entry.S	Tue Apr  9 22:01:38 2002
@@ -115,7 +115,7 @@
 	mov loc1=r16				// save ar.pfs across do_fork
 	.body
 	mov out1=in1
-	mov out3=0
+	mov out3\x16				// stacksize (compensates for 16-byte scratch area)
 	adds out2=IA64_SWITCH_STACK_SIZE+16,sp	// out2 = &regs
 	mov out0=in0				// out0 = clone_flags
 	br.call.sptk.many rp=do_fork
diff -urN linux-2.4.18/arch/ia64/kernel/process.c lia64-2.4/arch/ia64/kernel/process.c
--- linux-2.4.18/arch/ia64/kernel/process.c	Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/process.c	Tue Feb 26 14:53:42 2002
@@ -235,7 +273,7 @@
 
 	if (user_mode(child_ptregs)) {
 		if (user_stack_base) {
-			child_ptregs->r12 = user_stack_base + user_stack_size;
+			child_ptregs->r12 = user_stack_base + user_stack_size - 16;
 			child_ptregs->ar_bspstore = user_stack_base;
 			child_ptregs->ar_rnat = 0;
 			child_ptregs->loadrs = 0;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] pthread failure ???
  2002-06-13  0:56 [Linux-ia64] pthread failure ??? Jack Steiner
  2002-06-13  8:59 ` Andreas Schwab
  2002-06-13 15:25 ` David Mosberger
@ 2002-06-13 18:14 ` Jack Steiner
  2002-06-13 18:26 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: Jack Steiner @ 2002-06-13 18:14 UTC (permalink / raw)
  To: linux-ia64

> 
> >>>>> On Wed, 12 Jun 2002 19:56:50 -0500 (CDT), Jack Steiner <steiner@sgi.com> said:
> 
>   Jack> We have a pthread'ed application that ran fine on IA64 2.4.17.
> 
>   Jack> When we upgraded the kernel to 2.4.18, the application started
>   Jack> to fail. The failure occurs in glibc at: chunk_free
>   Jack> __libc_free ...
> 
>   Jack> We verified that the app consistently fails with a 2.4.18
>   Jack> kernel but works fine with a 2.4.17 kernel (same app &
>   Jack> libraries).
> 
>   Jack> No other failures have been seen in other apps.
> 
>   Jack> Has anyone else seen this behavior or have any ideas??
> 
> I'm wondering if this is related to the fix for the "sp off by 16" bug
> that was introduced in the 020410 ia64 patch.  The relevant bits are below.
> Can you see if the problem occurs without these changes?



I undid the patch (below). It still fails. Some observations about the failure:

	- the failure is a SEGV. chunk_free tries to dereference a NULL pointer
	  (plus a small offset).

	- the failure appears to occur at the end of the test when the control process
	  is killing off child threads and freeing up structure allocated from
	  the heap.

	- gdb is not helpful on the core file. However, I hacked the kernel to
	  drop to KDB on SEGV & dumpped registers that way.

	- the program runs fine when launched from gdb.

	- the address of pthread_testcancel() is frequently seen around the point of 
	  failure. I dont know if that is significant or not.

	- we are using a 2.4.18 (ia64 020410) kernel with glibc-2.2.3-10




I certainly dont rule out bugs in the app. 


> 
> 	--david
> 
> diff -urN linux-2.4.18/arch/ia64/ia32/ia32_entry.S lia64-2.4/arch/ia64/ia32/ia32_entry.S
> --- linux-2.4.18/arch/ia64/ia32/ia32_entry.S	Mon Nov 26 11:18:19 2001
> +++ lia64-2.4/arch/ia64/ia32/ia32_entry.S	Sat Feb  9 10:41:41 2002
> @@ -37,7 +37,7 @@
>  	mov loc1=r16				// save ar.pfs across do_fork
>  	.body
>  	zxt4 out1=in1				// newsp
> -	mov out3=0				// stacksize
> +	mov out3\x16				// stacksize (compensates for 16-byte scratch area)
>  	adds out2=IA64_SWITCH_STACK_SIZE+16,sp	// out2 = &regs
>  	zxt4 out0=in0				// out0 = clone_flags
>  	br.call.sptk.many rp=do_fork
> diff -urN linux-2.4.18/arch/ia64/kernel/entry.S lia64-2.4/arch/ia64/kernel/entry.S
> --- linux-2.4.18/arch/ia64/kernel/entry.S	Mon Nov 26 11:18:20 2001
> +++ lia64-2.4/arch/ia64/kernel/entry.S	Tue Apr  9 22:01:38 2002
> @@ -115,7 +115,7 @@
>  	mov loc1=r16				// save ar.pfs across do_fork
>  	.body
>  	mov out1=in1
> -	mov out3=0
> +	mov out3\x16				// stacksize (compensates for 16-byte scratch area)
>  	adds out2=IA64_SWITCH_STACK_SIZE+16,sp	// out2 = &regs
>  	mov out0=in0				// out0 = clone_flags
>  	br.call.sptk.many rp=do_fork
> diff -urN linux-2.4.18/arch/ia64/kernel/process.c lia64-2.4/arch/ia64/kernel/process.c
> --- linux-2.4.18/arch/ia64/kernel/process.c	Mon Nov 26 11:18:21 2001
> +++ lia64-2.4/arch/ia64/kernel/process.c	Tue Feb 26 14:53:42 2002
> @@ -235,7 +273,7 @@
>  
>  	if (user_mode(child_ptregs)) {
>  		if (user_stack_base) {
> -			child_ptregs->r12 = user_stack_base + user_stack_size;
> +			child_ptregs->r12 = user_stack_base + user_stack_size - 16;
>  			child_ptregs->ar_bspstore = user_stack_base;
>  			child_ptregs->ar_rnat = 0;
>  			child_ptregs->loadrs = 0;
> 


-- 
Thanks

Jack Steiner    (651-683-5302)   (vnet 233-5302)      steiner@sgi.com



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] pthread failure ???
  2002-06-13  0:56 [Linux-ia64] pthread failure ??? Jack Steiner
                   ` (2 preceding siblings ...)
  2002-06-13 18:14 ` Jack Steiner
@ 2002-06-13 18:26 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2002-06-13 18:26 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Thu, 13 Jun 2002 13:14:52 -0500 (CDT), Jack Steiner <steiner@sgi.com> said:

  Jack> I undid the patch (below). It still fails.

OK, that's good, actually.

  Jack> - the failure is a SEGV. chunk_free tries to dereference a NULL pointer
  Jack> (plus a small offset).

I'm sure you know this, but SEGV in chunk_free is not unusual when
dynamically allocated memory got corrupted.  So the trick is to track
down where the memory corruption occurred.  Do you have a minimal test
program reproducing the problem?

	--david


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-06-13 18:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-13  0:56 [Linux-ia64] pthread failure ??? Jack Steiner
2002-06-13  8:59 ` Andreas Schwab
2002-06-13 15:25 ` David Mosberger
2002-06-13 18:14 ` Jack Steiner
2002-06-13 18:26 ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox