* [Linux-ia64] pthread failure ???
@ 2002-06-13 0:56 Jack Steiner
2002-06-13 8:59 ` Andreas Schwab
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Jack Steiner @ 2002-06-13 0:56 UTC (permalink / raw)
To: linux-ia64
We have a pthread'ed application that ran fine on IA64 2.4.17.
When we upgraded the kernel to 2.4.18, the application started to
fail. The failure occurs in glibc at:
chunk_free
__libc_free
...
We verified that the app consistently fails with a 2.4.18 kernel
but works fine with a 2.4.17 kernel (same app & libraries).
No other failures have been seen in other apps.
Has anyone else seen this behavior or have any ideas??
--
Thanks
Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Linux-ia64] pthread failure ???
2002-06-13 0:56 [Linux-ia64] pthread failure ??? Jack Steiner
@ 2002-06-13 8:59 ` Andreas Schwab
2002-06-13 15:25 ` David Mosberger
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Andreas Schwab @ 2002-06-13 8:59 UTC (permalink / raw)
To: linux-ia64
Jack Steiner <steiner@sgi.com> writes:
|> We have a pthread'ed application that ran fine on IA64 2.4.17.
|>
|> When we upgraded the kernel to 2.4.18, the application started to
|> fail. The failure occurs in glibc at:
|> chunk_free
|> __libc_free
|> ...
Please define "failure".
|> Has anyone else seen this behavior or have any ideas??
Check your malloc/realloc/free calls, use a malloc debugger (see the
glibc manual for the glibc internal one).
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Linux-ia64] pthread failure ???
2002-06-13 0:56 [Linux-ia64] pthread failure ??? Jack Steiner
2002-06-13 8:59 ` Andreas Schwab
@ 2002-06-13 15:25 ` David Mosberger
2002-06-13 18:14 ` Jack Steiner
2002-06-13 18:26 ` David Mosberger
3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2002-06-13 15:25 UTC (permalink / raw)
To: linux-ia64
>>>>> On Wed, 12 Jun 2002 19:56:50 -0500 (CDT), Jack Steiner <steiner@sgi.com> said:
Jack> We have a pthread'ed application that ran fine on IA64 2.4.17.
Jack> When we upgraded the kernel to 2.4.18, the application started
Jack> to fail. The failure occurs in glibc at: chunk_free
Jack> __libc_free ...
Jack> We verified that the app consistently fails with a 2.4.18
Jack> kernel but works fine with a 2.4.17 kernel (same app &
Jack> libraries).
Jack> No other failures have been seen in other apps.
Jack> Has anyone else seen this behavior or have any ideas??
I'm wondering if this is related to the fix for the "sp off by 16" bug
that was introduced in the 020410 ia64 patch. The relevant bits are below.
Can you see if the problem occurs without these changes?
--david
diff -urN linux-2.4.18/arch/ia64/ia32/ia32_entry.S lia64-2.4/arch/ia64/ia32/ia32_entry.S
--- linux-2.4.18/arch/ia64/ia32/ia32_entry.S Mon Nov 26 11:18:19 2001
+++ lia64-2.4/arch/ia64/ia32/ia32_entry.S Sat Feb 9 10:41:41 2002
@@ -37,7 +37,7 @@
mov loc1=r16 // save ar.pfs across do_fork
.body
zxt4 out1=in1 // newsp
- mov out3=0 // stacksize
+ mov out3\x16 // stacksize (compensates for 16-byte scratch area)
adds out2=IA64_SWITCH_STACK_SIZE+16,sp // out2 = ®s
zxt4 out0=in0 // out0 = clone_flags
br.call.sptk.many rp=do_fork
diff -urN linux-2.4.18/arch/ia64/kernel/entry.S lia64-2.4/arch/ia64/kernel/entry.S
--- linux-2.4.18/arch/ia64/kernel/entry.S Mon Nov 26 11:18:20 2001
+++ lia64-2.4/arch/ia64/kernel/entry.S Tue Apr 9 22:01:38 2002
@@ -115,7 +115,7 @@
mov loc1=r16 // save ar.pfs across do_fork
.body
mov out1=in1
- mov out3=0
+ mov out3\x16 // stacksize (compensates for 16-byte scratch area)
adds out2=IA64_SWITCH_STACK_SIZE+16,sp // out2 = ®s
mov out0=in0 // out0 = clone_flags
br.call.sptk.many rp=do_fork
diff -urN linux-2.4.18/arch/ia64/kernel/process.c lia64-2.4/arch/ia64/kernel/process.c
--- linux-2.4.18/arch/ia64/kernel/process.c Mon Nov 26 11:18:21 2001
+++ lia64-2.4/arch/ia64/kernel/process.c Tue Feb 26 14:53:42 2002
@@ -235,7 +273,7 @@
if (user_mode(child_ptregs)) {
if (user_stack_base) {
- child_ptregs->r12 = user_stack_base + user_stack_size;
+ child_ptregs->r12 = user_stack_base + user_stack_size - 16;
child_ptregs->ar_bspstore = user_stack_base;
child_ptregs->ar_rnat = 0;
child_ptregs->loadrs = 0;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Linux-ia64] pthread failure ???
2002-06-13 0:56 [Linux-ia64] pthread failure ??? Jack Steiner
2002-06-13 8:59 ` Andreas Schwab
2002-06-13 15:25 ` David Mosberger
@ 2002-06-13 18:14 ` Jack Steiner
2002-06-13 18:26 ` David Mosberger
3 siblings, 0 replies; 5+ messages in thread
From: Jack Steiner @ 2002-06-13 18:14 UTC (permalink / raw)
To: linux-ia64
>
> >>>>> On Wed, 12 Jun 2002 19:56:50 -0500 (CDT), Jack Steiner <steiner@sgi.com> said:
>
> Jack> We have a pthread'ed application that ran fine on IA64 2.4.17.
>
> Jack> When we upgraded the kernel to 2.4.18, the application started
> Jack> to fail. The failure occurs in glibc at: chunk_free
> Jack> __libc_free ...
>
> Jack> We verified that the app consistently fails with a 2.4.18
> Jack> kernel but works fine with a 2.4.17 kernel (same app &
> Jack> libraries).
>
> Jack> No other failures have been seen in other apps.
>
> Jack> Has anyone else seen this behavior or have any ideas??
>
> I'm wondering if this is related to the fix for the "sp off by 16" bug
> that was introduced in the 020410 ia64 patch. The relevant bits are below.
> Can you see if the problem occurs without these changes?
I undid the patch (below). It still fails. Some observations about the failure:
- the failure is a SEGV. chunk_free tries to dereference a NULL pointer
(plus a small offset).
- the failure appears to occur at the end of the test when the control process
is killing off child threads and freeing up structure allocated from
the heap.
- gdb is not helpful on the core file. However, I hacked the kernel to
drop to KDB on SEGV & dumpped registers that way.
- the program runs fine when launched from gdb.
- the address of pthread_testcancel() is frequently seen around the point of
failure. I dont know if that is significant or not.
- we are using a 2.4.18 (ia64 020410) kernel with glibc-2.2.3-10
I certainly dont rule out bugs in the app.
>
> --david
>
> diff -urN linux-2.4.18/arch/ia64/ia32/ia32_entry.S lia64-2.4/arch/ia64/ia32/ia32_entry.S
> --- linux-2.4.18/arch/ia64/ia32/ia32_entry.S Mon Nov 26 11:18:19 2001
> +++ lia64-2.4/arch/ia64/ia32/ia32_entry.S Sat Feb 9 10:41:41 2002
> @@ -37,7 +37,7 @@
> mov loc1=r16 // save ar.pfs across do_fork
> .body
> zxt4 out1=in1 // newsp
> - mov out3=0 // stacksize
> + mov out3\x16 // stacksize (compensates for 16-byte scratch area)
> adds out2=IA64_SWITCH_STACK_SIZE+16,sp // out2 = ®s
> zxt4 out0=in0 // out0 = clone_flags
> br.call.sptk.many rp=do_fork
> diff -urN linux-2.4.18/arch/ia64/kernel/entry.S lia64-2.4/arch/ia64/kernel/entry.S
> --- linux-2.4.18/arch/ia64/kernel/entry.S Mon Nov 26 11:18:20 2001
> +++ lia64-2.4/arch/ia64/kernel/entry.S Tue Apr 9 22:01:38 2002
> @@ -115,7 +115,7 @@
> mov loc1=r16 // save ar.pfs across do_fork
> .body
> mov out1=in1
> - mov out3=0
> + mov out3\x16 // stacksize (compensates for 16-byte scratch area)
> adds out2=IA64_SWITCH_STACK_SIZE+16,sp // out2 = ®s
> mov out0=in0 // out0 = clone_flags
> br.call.sptk.many rp=do_fork
> diff -urN linux-2.4.18/arch/ia64/kernel/process.c lia64-2.4/arch/ia64/kernel/process.c
> --- linux-2.4.18/arch/ia64/kernel/process.c Mon Nov 26 11:18:21 2001
> +++ lia64-2.4/arch/ia64/kernel/process.c Tue Feb 26 14:53:42 2002
> @@ -235,7 +273,7 @@
>
> if (user_mode(child_ptregs)) {
> if (user_stack_base) {
> - child_ptregs->r12 = user_stack_base + user_stack_size;
> + child_ptregs->r12 = user_stack_base + user_stack_size - 16;
> child_ptregs->ar_bspstore = user_stack_base;
> child_ptregs->ar_rnat = 0;
> child_ptregs->loadrs = 0;
>
--
Thanks
Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Linux-ia64] pthread failure ???
2002-06-13 0:56 [Linux-ia64] pthread failure ??? Jack Steiner
` (2 preceding siblings ...)
2002-06-13 18:14 ` Jack Steiner
@ 2002-06-13 18:26 ` David Mosberger
3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2002-06-13 18:26 UTC (permalink / raw)
To: linux-ia64
>>>>> On Thu, 13 Jun 2002 13:14:52 -0500 (CDT), Jack Steiner <steiner@sgi.com> said:
Jack> I undid the patch (below). It still fails.
OK, that's good, actually.
Jack> - the failure is a SEGV. chunk_free tries to dereference a NULL pointer
Jack> (plus a small offset).
I'm sure you know this, but SEGV in chunk_free is not unusual when
dynamically allocated memory got corrupted. So the trick is to track
down where the memory corruption occurred. Do you have a minimal test
program reproducing the problem?
--david
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-06-13 18:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-13 0:56 [Linux-ia64] pthread failure ??? Jack Steiner
2002-06-13 8:59 ` Andreas Schwab
2002-06-13 15:25 ` David Mosberger
2002-06-13 18:14 ` Jack Steiner
2002-06-13 18:26 ` David Mosberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox