linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* glibc-2.5 test suite hangs/crashes the machine
@ 2006-10-27  5:56 Fabio Massimo Di Nitto
  2006-10-27 16:22 ` Jeff Bailey
  2006-10-30  3:02 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 15+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-10-27  5:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Jeff Bailey, Ben Collins


Hi everybody,

i am in the process of bootstrapping the new toolchain for ubuntu and I am
hitting a problem building glibc-2.5 on ppc.

This behaviour has been reproduced on 2.6.15/2.6.17 and 2.6.19-rc2 (where the
machine crashes) and with ppc32 and ppc64 kernels.
A hard reboot of the machine is required to get rid of the Zl processes hanging
around that keep spinning the CPU at 100%.

I did place sources here: http://people.ubuntu.com/~fabbione/benh/

but i start to believe it is a kernel bug we are exploiting only now.

Any hint or help for what to look for would be extremely appreciated.

Thanks
Fabio

-- 
I'm going to make him an offer he can't refuse.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-27  5:56 glibc-2.5 test suite hangs/crashes the machine Fabio Massimo Di Nitto
@ 2006-10-27 16:22 ` Jeff Bailey
  2006-10-30  1:47   ` Benjamin Herrenschmidt
  2006-10-30  3:02 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Bailey @ 2006-10-27 16:22 UTC (permalink / raw)
  To: Fabio Massimo Di Nitto; +Cc: Ben Collins, Paul Mackerras, linuxppc-dev

Le vendredi 27 octobre 2006 =C3=A0 07:56 +0200, Fabio Massimo Di Nitto a
=C3=A9crit :
> Hi everybody,
>=20
> i am in the process of bootstrapping the new toolchain for ubuntu and I a=
m
> hitting a problem building glibc-2.5 on ppc.
>=20
> This behaviour has been reproduced on 2.6.15/2.6.17 and 2.6.19-rc2 (where=
 the
> machine crashes) and with ppc32 and ppc64 kernels.
> A hard reboot of the machine is required to get rid of the Zl processes h=
anging
> around that keep spinning the CPU at 100%.
>=20
> I did place sources here: http://people.ubuntu.com/~fabbione/benh/
>=20
> but i start to believe it is a kernel bug we are exploiting only now.
>=20
> Any hint or help for what to look for would be extremely appreciated.

Heya Fabio, just an update, it looks like the tests that are zombie'ing
are the nptl tst-robust[1-8] tests.  According to /proc/##/wchan, the
tasks are cheerfully spinning in do_exit.

If there's other info I can get you, lemme know.

--=20
Jeff Bailey
Manager, Technical Support
Canonical Ltd. - Sales, Service, and Support.
+1 514 940-8910
http://www.ubuntu.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-27 16:22 ` Jeff Bailey
@ 2006-10-30  1:47   ` Benjamin Herrenschmidt
  2006-11-01 22:17     ` Steve Munroe
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-10-30  1:47 UTC (permalink / raw)
  To: Jeff Bailey
  Cc: linuxppc-dev, Fabio Massimo Di Nitto, Paul Mackerras,
	Steve Munroe, Ben Collins

On Fri, 2006-10-27 at 12:22 -0400, Jeff Bailey wrote:
> Le vendredi 27 octobre 2006 à 07:56 +0200, Fabio Massimo Di Nitto a
> écrit :
> > Hi everybody,
> > 
> > i am in the process of bootstrapping the new toolchain for ubuntu and I am
> > hitting a problem building glibc-2.5 on ppc.
> > 
> > This behaviour has been reproduced on 2.6.15/2.6.17 and 2.6.19-rc2 (where the
> > machine crashes) and with ppc32 and ppc64 kernels.
> > A hard reboot of the machine is required to get rid of the Zl processes hanging
> > around that keep spinning the CPU at 100%.
> > 
> > I did place sources here: http://people.ubuntu.com/~fabbione/benh/
> > 
> > but i start to believe it is a kernel bug we are exploiting only now.
> > 
> > Any hint or help for what to look for would be extremely appreciated.
> 
> Heya Fabio, just an update, it looks like the tests that are zombie'ing
> are the nptl tst-robust[1-8] tests.  According to /proc/##/wchan, the
> tasks are cheerfully spinning in do_exit.

So I've built that glibc with debian 2.6.16 kernel headers (since Fabio
says the problem doesn't happen with glibc built with 2.6.19 headers)
and have ran that with 2.6.19-rc3-git-du-jour.

The machine didn't crash, nor did I see any zombie with those
tst-robust[1-8], however, I did get as SIGBUS with tst-robustpi1. I've
tracked it down to being an alignment exception. It looks like glibc is
doing a lwarx on a non-aligned value, though I can't say precisely
what's up here. I don't know how I can get a backtrace when running
those test-cases... the test harness seems to catch signals, I suppose
it could be modified to spit one out.

At this point, it would be useful to have somebody who knows glibc to
tell us:

 - what are those tst-robust all about ? (what do they do "special" that
might trigger bad reactions with older kernels)
 - how can glibc ever do atomic operations on a non-aligned value ?

Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-27  5:56 glibc-2.5 test suite hangs/crashes the machine Fabio Massimo Di Nitto
  2006-10-27 16:22 ` Jeff Bailey
@ 2006-10-30  3:02 ` Benjamin Herrenschmidt
  2006-10-30 11:35   ` Fabio Massimo Di Nitto
  1 sibling, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-10-30  3:02 UTC (permalink / raw)
  To: Fabio Massimo Di Nitto
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

Does that patch fixes it ?

Index: linux-work/arch/powerpc/kernel/traps.c
===================================================================
--- linux-work.orig/arch/powerpc/kernel/traps.c	2006-10-23 14:41:37.000000000 +1000
+++ linux-work/arch/powerpc/kernel/traps.c	2006-10-30 13:59:41.000000000 +1100
@@ -843,7 +843,7 @@ void __kprobes program_check_exception(s
 
 void alignment_exception(struct pt_regs *regs)
 {
-	int fixed = 0;
+	int sig, fixed = 0;
 
 	/* we don't implement logging of alignment exceptions */
 	if (!(current->thread.align_ctl & PR_UNALIGN_SIGBUS))
@@ -856,15 +856,11 @@ void alignment_exception(struct pt_regs 
 	}
 
 	/* Operand address was bad */
-	if (fixed == -EFAULT) {
-		if (user_mode(regs))
-			_exception(SIGSEGV, regs, SEGV_ACCERR, regs->dar);
-		else
-			/* Search exception table */
-			bad_page_fault(regs, regs->dar, SIGSEGV);
-		return;
-	}
-	_exception(SIGBUS, regs, BUS_ADRALN, regs->dar);
+	sig = fixed == -EFAULT ? SIGSEGV : SIGBUS;
+	if (user_mode(regs))
+		_exception(sig, regs, SEGV_ACCERR, regs->dar);
+	else
+		bad_page_fault(regs, regs->dar, sig);
 }
 
 void StackOverflow(struct pt_regs *regs)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-30  3:02 ` Benjamin Herrenschmidt
@ 2006-10-30 11:35   ` Fabio Massimo Di Nitto
  2006-10-30 20:36     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-10-30 11:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

Benjamin Herrenschmidt wrote:
> Does that patch fixes it ?
> 

tested with kernel .17 and headers from .19 and the build hangs. Still tons of
Zl processes around.

On the note building with kernel .19 and .19 headers it all goes smooth.

Fabio

-- 
I'm going to make him an offer he can't refuse.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-30 11:35   ` Fabio Massimo Di Nitto
@ 2006-10-30 20:36     ` Benjamin Herrenschmidt
  2006-10-31  6:37       ` Fabio Massimo Di Nitto
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-10-30 20:36 UTC (permalink / raw)
  To: Fabio Massimo Di Nitto
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

On Mon, 2006-10-30 at 12:35 +0100, Fabio Massimo Di Nitto wrote:
> Benjamin Herrenschmidt wrote:
> > Does that patch fixes it ?
> > 
> 
> tested with kernel .17 and headers from .19 and the build hangs. Still tons of
> Zl processes around.
> 
> On the note building with kernel .19 and .19 headers it all goes smooth.

Ok, so there's a different issue from what I've found. You haven't by
chance noted what those processes are (which test case typically) ?
Also, there's a sysrq to get a backtrace of all pending processes,
though I don't remember which one off the top of my mind, might be
useful to have a look though.

Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-30 20:36     ` Benjamin Herrenschmidt
@ 2006-10-31  6:37       ` Fabio Massimo Di Nitto
  2006-10-31  6:51       ` Fabio Massimo Di Nitto
  2006-10-31  9:47       ` Fabio Massimo Di Nitto
  2 siblings, 0 replies; 15+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-10-31  6:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

Benjamin Herrenschmidt wrote:
> On Mon, 2006-10-30 at 12:35 +0100, Fabio Massimo Di Nitto wrote:
>> Benjamin Herrenschmidt wrote:
>>> Does that patch fixes it ?
>>>
>> tested with kernel .17 and headers from .19 and the build hangs. Still tons of
>> Zl processes around.
>>
>> On the note building with kernel .19 and .19 headers it all goes smooth.
> 
> Ok, so there's a different issue from what I've found. You haven't by
> chance noted what those processes are (which test case typically) ?

I uploaded a build log and a ps ax output to
http://people.ubuntu.com/~fabbione/benh/

> Also, there's a sysrq to get a backtrace of all pending processes,
> though I don't remember which one off the top of my mind, might be
> useful to have a look though.

I don't remember them either. Anybody?

Fabio

-- 
I'm going to make him an offer he can't refuse.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-30 20:36     ` Benjamin Herrenschmidt
  2006-10-31  6:37       ` Fabio Massimo Di Nitto
@ 2006-10-31  6:51       ` Fabio Massimo Di Nitto
  2006-10-31  9:47       ` Fabio Massimo Di Nitto
  2 siblings, 0 replies; 15+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-10-31  6:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

Benjamin Herrenschmidt wrote:
> On Mon, 2006-10-30 at 12:35 +0100, Fabio Massimo Di Nitto wrote:
>> Benjamin Herrenschmidt wrote:
>>> Does that patch fixes it ?
>>>
>> tested with kernel .17 and headers from .19 and the build hangs. Still tons of
>> Zl processes around.
>>
>> On the note building with kernel .19 and .19 headers it all goes smooth.
> 
> Ok, so there's a different issue from what I've found. You haven't by
> chance noted what those processes are (which test case typically) ?
> Also, there's a sysrq to get a backtrace of all pending processes,
> though I don't remember which one off the top of my mind, might be
> useful to have a look though.
> 
> Ben.
> 

Added also dmesg+sysrq.txt at the same url.

Fabio

-- 
I'm going to make him an offer he can't refuse.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-30 20:36     ` Benjamin Herrenschmidt
  2006-10-31  6:37       ` Fabio Massimo Di Nitto
  2006-10-31  6:51       ` Fabio Massimo Di Nitto
@ 2006-10-31  9:47       ` Fabio Massimo Di Nitto
  2006-10-31 20:30         ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 15+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-10-31  9:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

Benjamin Herrenschmidt wrote:
> On Mon, 2006-10-30 at 12:35 +0100, Fabio Massimo Di Nitto wrote:
>> Benjamin Herrenschmidt wrote:
>>> Does that patch fixes it ?
>>>
>> tested with kernel .17 and headers from .19 and the build hangs. Still tons of
>> Zl processes around.
>>
>> On the note building with kernel .19 and .19 headers it all goes smooth.
> 
> Ok, so there's a different issue from what I've found. You haven't by
> chance noted what those processes are (which test case typically) ?
> Also, there's a sysrq to get a backtrace of all pending processes,
> though I don't remember which one off the top of my mind, might be
> useful to have a look though.
> 
> Ben.
> 

After discussing with Ben on IRC i applied 69588298188b40ed7f75c98a6fd328d82f23ca21
to kernel .17 and glibc does build without zombie processes and no hang whatsoever.

I suggest to push this patch back to the stable kernel trees.

Fabio

-- 
I'm going to make him an offer he can't refuse.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-31  9:47       ` Fabio Massimo Di Nitto
@ 2006-10-31 20:30         ` Benjamin Herrenschmidt
  2006-10-31 20:44           ` Fabio Massimo Di Nitto
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-10-31 20:30 UTC (permalink / raw)
  To: Fabio Massimo Di Nitto
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

On Tue, 2006-10-31 at 10:47 +0100, Fabio Massimo Di Nitto wrote:
> Benjamin Herrenschmidt wrote:
> > On Mon, 2006-10-30 at 12:35 +0100, Fabio Massimo Di Nitto wrote:
> >> Benjamin Herrenschmidt wrote:
> >>> Does that patch fixes it ?
> >>>
> >> tested with kernel .17 and headers from .19 and the build hangs. Still tons of
> >> Zl processes around.
> >>
> >> On the note building with kernel .19 and .19 headers it all goes smooth.
> > 
> > Ok, so there's a different issue from what I've found. You haven't by
> > chance noted what those processes are (which test case typically) ?
> > Also, there's a sysrq to get a backtrace of all pending processes,
> > though I don't remember which one off the top of my mind, might be
> > useful to have a look though.
> > 
> > Ben.
> > 
> 
> After discussing with Ben on IRC i applied 69588298188b40ed7f75c98a6fd328d82f23ca21
> to kernel .17 and glibc does build without zombie processes and no hang whatsoever.
> 
> I suggest to push this patch back to the stable kernel trees.

It also need the alignment bits I did though.

Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-31 20:30         ` Benjamin Herrenschmidt
@ 2006-10-31 20:44           ` Fabio Massimo Di Nitto
  0 siblings, 0 replies; 15+ messages in thread
From: Fabio Massimo Di Nitto @ 2006-10-31 20:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Paul Mackerras, Ben Collins

Benjamin Herrenschmidt wrote:
> On Tue, 2006-10-31 at 10:47 +0100, Fabio Massimo Di Nitto wrote:
>> Benjamin Herrenschmidt wrote:
>>> On Mon, 2006-10-30 at 12:35 +0100, Fabio Massimo Di Nitto wrote:
>>>> Benjamin Herrenschmidt wrote:
>>>>> Does that patch fixes it ?
>>>>>
>>>> tested with kernel .17 and headers from .19 and the build hangs. Still tons of
>>>> Zl processes around.
>>>>
>>>> On the note building with kernel .19 and .19 headers it all goes smooth.
>>> Ok, so there's a different issue from what I've found. You haven't by
>>> chance noted what those processes are (which test case typically) ?
>>> Also, there's a sysrq to get a backtrace of all pending processes,
>>> though I don't remember which one off the top of my mind, might be
>>> useful to have a look though.
>>>
>>> Ben.
>>>
>> After discussing with Ben on IRC i applied 69588298188b40ed7f75c98a6fd328d82f23ca21
>> to kernel .17 and glibc does build without zombie processes and no hang whatsoever.
>>
>> I suggest to push this patch back to the stable kernel trees.
> 
> It also need the alignment bits I did though.
> 
> Ben.
> 

I did apply that one too to .17 as we agreed for the test.

Fabio

-- 
I'm going to make him an offer he can't refuse.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-10-30  1:47   ` Benjamin Herrenschmidt
@ 2006-11-01 22:17     ` Steve Munroe
  2006-11-01 22:35       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Munroe @ 2006-11-01 22:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Fabio Massimo Di Nitto, Paul Mackerras,
	Ben Collins



Benjamin Herrenschmidt <benh@kernel=2Ecrashing=2Eorg> wrote on 10/29/20=
06
07:47:05 PM:

> On Fri, 2006-10-27 at 12:22 -0400, Jeff Bailey wrote:
> > Le vendredi 27 octobre 2006 =E0 07:56 +0200, Fabio Massimo Di Nitto=
 a
> > =E9crit :
> > > Hi everybody,
> > >
> > > i am in the process of bootstrapping the new toolchain for ubuntu=
 and
I am
> > > hitting a problem building glibc-2=2E5 on ppc=2E
> > >
> > > This behaviour has been reproduced on 2=2E6=2E15/2=2E6=2E17 and 2=
=2E6=2E19-
> rc2 (where the
> > > machine crashes) and with ppc32 and ppc64 kernels=2E
> > > A hard reboot of the machine is required to get rid of the Zl
> processes hanging
> > > around that keep spinning the CPU at 100%=2E
> > >
> > > I did place sources here: http://people=2Eubuntu=2Ecom/~fabbione/=
benh/
> > >
> > > but i start to believe it is a kernel bug we are exploiting only =
now=2E
> > >
> > > Any hint or help for what to look for would be extremely apprecia=
ted=2E
> >
> > Heya Fabio, just an update, it looks like the tests that are zombie=
'ing
> > are the nptl tst-robust[1-8] tests=2E  According to /proc/##/wchan,=
 the
> > tasks are cheerfully spinning in do_exit=2E
>
> So I've built that glibc with debian 2=2E6=2E16 kernel headers (since=
 Fabio
> says the problem doesn't happen with glibc built with 2=2E6=2E19 head=
ers)
> and have ran that with 2=2E6=2E19-rc3-git-du-jour=2E
>
> The machine didn't crash, nor did I see any zombie with those
> tst-robust[1-8], however, I did get as SIGBUS with tst-robustpi1=2E I=
've
> tracked it down to being an alignment exception=2E It looks like glib=
c is
> doing a lwarx on a non-aligned value, though I can't say precisely
> what's up here=2E I don't know how I can get a backtrace when running=

> those test-cases=2E=2E=2E the test harness seems to catch signals, I =
suppose
> it could be modified to spit one out=2E
>
> At this point, it would be useful to have somebody who knows glibc to=

> tell us:
>
>  - what are those tst-robust all about ? (what do they do "special" t=
hat
> might trigger bad reactions with older kernels)
>  - how can glibc ever do atomic operations on a non-aligned value ?
>
> Ben=2E
>
The tst-robustpi# test are exercising the new PTHREAD_MUXTEX_ROBUST api=
,
with PTHREAD_PRIO_INHERIT attribute=2E

The fuxtex word seems to include the waiters TID, I don't know if the
kernel cares about this or not=2E


Steven J=2E Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center=

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-11-01 22:17     ` Steve Munroe
@ 2006-11-01 22:35       ` Benjamin Herrenschmidt
  2006-11-01 22:56         ` Steve Munroe
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-11-01 22:35 UTC (permalink / raw)
  To: Steve Munroe
  Cc: linuxppc-dev, Jeff Bailey, Fabio Massimo Di Nitto, Paul Mackerras,
	Ben Collins


> The tst-robustpi# test are exercising the new PTHREAD_MUXTEX_ROBUST api,
> with PTHREAD_PRIO_INHERIT attribute.
> 
> The fuxtex word seems to include the waiters TID, I don't know if the
> kernel cares about this or not.

Ok, well, we have seen a few issues so far with these. 2 are kernel
issues, but one might not be:

 - kernels 2.6.15 .. .17 at least it seems wire the robust futex
syscalls on powerpc without properly implementing the support, which can
cause hangs in process exit. Do you have any way to "blacklist" kernels
in glibc ?

 - kernel 2.6.18 and current git until yesterday (fix got in today) has
a bug if you manage to pass a wrong futex with a non-aligned atomic
value, it will possibly oops the kernel. With the fix, it will return an
error.

Now what seems to be a glibc issue:

 - I've had the tst-robustpi# tests (in fact the very first one, I
haven't tested the others) die on me with a SIGBUS caused by glibc
trying to do a lwarx/starx. on an odd address.

I do not know for sure yet if the crash reported by Fabio with 2.6.19
(before my fix above) was related to the same kind of misaligned futex
managing to reach the kernel and triggering the oops I've talked about,
but it's very possible.

In my case, glibc was built against 2.6.16 headers, in fabio case, I
think it was built against 2.6.15 or .17 headers. It -seems- that fabio
cannot reproduce the problem when building glibc against 2.6.19 headers,
though at this point I can't explain why and I haven't reproduced here
yet.

Do you have any insight in what might be happening or should we just dig
more ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-11-01 22:35       ` Benjamin Herrenschmidt
@ 2006-11-01 22:56         ` Steve Munroe
  2006-11-01 23:22           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Munroe @ 2006-11-01 22:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Jeff Bailey, Fabio Massimo Di Nitto, Paul Mackerras,
	Ben Collins



Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 11/01/2006
04:35:52 PM:

>
> > The tst-robustpi# test are exercising the new PTHREAD_MUXTEX_ROBUST
api,
> > with PTHREAD_PRIO_INHERIT attribute.
> >
> > The fuxtex word seems to include the waiters TID, I don't know if the
> > kernel cares about this or not.
>
> Ok, well, we have seen a few issues so far with these. 2 are kernel
> issues, but one might not be:
>
>  - kernels 2.6.15 .. .17 at least it seems wire the robust futex
> syscalls on powerpc without properly implementing the support, which can
> cause hangs in process exit. Do you have any way to "blacklist" kernels
> in glibc ?
>
>From libc/sysdeps/unix/sysv/linux/kernel-features.h

/* Support for inter-process robust mutexes was added in 2.6.17.  */
#if __LINUX_KERNEL_VERSION >= 0x020611
# define __ASSUME_SET_ROBUST_LIST       1
#endif

/* Support for PI futexes was added in 2.6.18.  */
#if __LINUX_KERNEL_VERSION >= 0x020612
# define __ASSUME_FUTEX_LOCK_PI 1
#endif

So I need to delay __ASSUME_SET_ROBUST_LIST to 2.6.18 for __powerpc__ ?

What about __ASSUME_FUTEX_LOCK_PI ?

>  - kernel 2.6.18 and current git until yesterday (fix got in today) has
> a bug if you manage to pass a wrong futex with a non-aligned atomic
> value, it will possibly oops the kernel. With the fix, it will return an
> error.
>
> Now what seems to be a glibc issue:
>
>  - I've had the tst-robustpi# tests (in fact the very first one, I
> haven't tested the others) die on me with a SIGBUS caused by glibc
> trying to do a lwarx/starx. on an odd address.
>
Rayn reminded me of a bug where the new robust code did not account for the
fact that the TID was not at the same place as i386. I think Ryan has a
patch.

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: glibc-2.5 test suite hangs/crashes the machine
  2006-11-01 22:56         ` Steve Munroe
@ 2006-11-01 23:22           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-11-01 23:22 UTC (permalink / raw)
  To: Steve Munroe
  Cc: linuxppc-dev, Jeff Bailey, Fabio Massimo Di Nitto, Paul Mackerras,
	Ben Collins


> So I need to delay __ASSUME_SET_ROBUST_LIST to 2.6.18 for __powerpc__ ?

The problem is not so much the headers you build with than the runtime
checks... I suppose you shouldn't bother as we are trickling down the
fixes to distros & stable series anyway.

> What about __ASSUME_FUTEX_LOCK_PI ?

I don't know, I'm very unfamiliar with that futex stuff. I need to look
more closely.

> >  - kernel 2.6.18 and current git until yesterday (fix got in today) has
> > a bug if you manage to pass a wrong futex with a non-aligned atomic
> > value, it will possibly oops the kernel. With the fix, it will return an
> > error.
> >
> > Now what seems to be a glibc issue:
> >
> >  - I've had the tst-robustpi# tests (in fact the very first one, I
> > haven't tested the others) die on me with a SIGBUS caused by glibc
> > trying to do a lwarx/starx. on an odd address.
> >
> Rayn reminded me of a bug where the new robust code did not account for the
> fact that the TID was not at the same place as i386. I think Ryan has a
> patch.

Ah ok. Can you give me the details as soon as you get them ? (In
addition to submitting the fix upstream of course :) ubuntu at least is
starting to build their next distro on top of glibc-2.5 so it would be
nice if they had the fix asap.

Thanks for looking at this !

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-11-01 23:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-27  5:56 glibc-2.5 test suite hangs/crashes the machine Fabio Massimo Di Nitto
2006-10-27 16:22 ` Jeff Bailey
2006-10-30  1:47   ` Benjamin Herrenschmidt
2006-11-01 22:17     ` Steve Munroe
2006-11-01 22:35       ` Benjamin Herrenschmidt
2006-11-01 22:56         ` Steve Munroe
2006-11-01 23:22           ` Benjamin Herrenschmidt
2006-10-30  3:02 ` Benjamin Herrenschmidt
2006-10-30 11:35   ` Fabio Massimo Di Nitto
2006-10-30 20:36     ` Benjamin Herrenschmidt
2006-10-31  6:37       ` Fabio Massimo Di Nitto
2006-10-31  6:51       ` Fabio Massimo Di Nitto
2006-10-31  9:47       ` Fabio Massimo Di Nitto
2006-10-31 20:30         ` Benjamin Herrenschmidt
2006-10-31 20:44           ` Fabio Massimo Di Nitto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).