public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.24 SMP lockups
@ 2004-01-09 21:04 Simon Kirby
  2004-01-09 22:20 ` Arkadiusz Miskiewicz
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Simon Kirby @ 2004-01-09 21:04 UTC (permalink / raw)
  To: linux-kernel

'lo all,

We've had about 6 cases of this now, across 4 separate boxes.  Since
upgrading to 2.4.24, our SMP web server boxes (both Intel and AMD
hardware) are randomly blowing up.  This may have happened on 2.4.23 as
well, but they weren't really running long enough to tell.  2.4.22 was
fine.  GCC 3.3.3.

These boxes are all dual CPU, and the failure case shows up suddenly with
no warning.  Sysreq-P works, but only reports from one CPU no matter how
many times I try.  In normal operation, every machine distributes all
IRQs across both CPUs, and Sysreq-P reports from both CPUs.

Mapping the EIP reported by Sysreq-P to symbols shows that the responding
CPU is spinning on a spinlock (so far I have seen .text.lock.fcntl,
.text.lock.sched, .text.lock.locks, and .text.lock.inode), which I assume
is being held by the other (dead) CPU.

Even on boxes with nmi_watchdog=1, nothing is reported from the NMI
watchdog.

Simon-

^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: 2.4.24 SMP lockups
@ 2004-01-10 19:58 Marcelo Tosatti
  2004-01-11  9:01 ` Simon Kirby
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2004-01-10 19:58 UTC (permalink / raw)
  To: linux-kernel


---------- Forwarded message ----------
Date: Sat, 10 Jan 2004 17:32:55 -0200 (BRST)
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Simon Kirby <sim@netnation.com>, Andrew Morton <akpm@osdl.org>
Subject: Re: 2.4.24 SMP lockups



On Fri, 9 Jan 2004, Simon Kirby wrote:

> 'lo all,

Hi Simon,

> We've had about 6 cases of this now, across 4 separate boxes.  Since
> upgrading to 2.4.24, our SMP web server boxes (both Intel and AMD
> hardware) are randomly blowing up.  This may have happened on 2.4.23 as
> well, but they weren't really running long enough to tell.  2.4.22 was
> fine.  GCC 3.3.3.
>
> These boxes are all dual CPU, and the failure case shows up suddenly with
> no warning.  Sysreq-P works, but only reports from one CPU no matter how
> many times I try.  In normal operation, every machine distributes all
> IRQs across both CPUs, and Sysreq-P reports from both CPUs.
>
> Mapping the EIP reported by Sysreq-P to symbols shows that the responding
> CPU is spinning on a spinlock (so far I have seen .text.lock.fcntl,
> .text.lock.sched, .text.lock.locks, and .text.lock.inode), which I assume
> is being held by the other (dead) CPU.

This sounds like a deadlock. I wonder why the NMI watchdog is not
triggering.

> Even on boxes with nmi_watchdog=1, nothing is reported from the NMI
> watchdog.

Can you share all available SysRQ-P output for the locked CPU ? SysRQ-T if
possible, too.

Can you please describe the hardware in more detail. Is there any common
hardware used in these boxes?


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-01-16  2:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-09 21:04 2.4.24 SMP lockups Simon Kirby
2004-01-09 22:20 ` Arkadiusz Miskiewicz
2004-01-10 15:51 ` Thomas Zehetbauer
     [not found] ` <Pine.LNX.4.58L.0401101719400.1310@logos.cnet>
2004-01-10 22:40   ` Andrew Morton
2004-01-11  4:12     ` Rik van Riel
2004-01-11 13:16       ` Marcelo Tosatti
2004-01-12 12:18       ` Marcelo Tosatti
2004-01-12 12:43         ` Thomas Zehetbauer
2004-01-11  8:55     ` Simon Kirby
2004-01-11  9:30       ` Willy Tarreau
2004-01-14 17:07   ` Simon Kirby
2004-01-14 17:56     ` Marcelo Tosatti
2004-01-16  2:34       ` Philippe Troin
2004-01-14 18:28     ` David Woodhouse
2004-01-14 21:01       ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2004-01-10 19:58 Marcelo Tosatti
2004-01-11  9:01 ` Simon Kirby
2004-01-14 16:23   ` Marcelo Tosatti
2004-01-15 14:35     ` Thomas Zehetbauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox