* [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer
@ 2009-04-08 9:12 Mikael Pettersson
2009-04-08 9:24 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Mikael Pettersson @ 2009-04-08 9:12 UTC (permalink / raw)
To: linux-kernel; +Cc: tglx
Less than an hour after updating a dual Opteron 8384 box from 2.6.29-rc6
(which it had been running for weeks) to 2.6.29 final it died with the
following watchdog-detected lockup:
BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff80315202, registers:
CPU 0
Modules linked in: autofs4 sunrpc af_packet sg sr_mod cdrom bnx2 zlib_inflate crc32 pcspkr usb_storage ohci_hcd ehci_hcd usbcore
Pid: 2758, comm: nscd Not tainted 2.6.29 #1 Toonie
RIP: 0010:[<ffffffff80315202>] [<ffffffff80315202>] rb_insert_color+0x2/0x110
RSP: 0018:ffff8801369c1c28 EFLAGS: 00000002
RAX: 0000000000000000 RBX: ffff8801369c1d48 RCX: 0000000000000000
RDX: ffff880028014fd0 RSI: ffff880028014fd0 RDI: ffff8801369c1d48
RBP: 0000000000000001 R08: 0000000000000001 R09: ffff8801369c1d48
R10: 0000000041a28978 R11: 0000000000000202 R12: ffff880028014fc0
R13: 000000000000c350 R14: 00000195812829fb R15: 00000000ffffffff
FS: 0000000041a29940(0063) GS:ffffffff8059d040(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f6b4daac000 CR3: 000000007f363000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nscd (pid: 2758, threadinfo ffff8801369c0000, task ffff8801374d2ac0)
Stack:
00000195812829fb ffffffff8024ffba ffff8801369c1d48 ffff8801369c1d48
ffff880028014fc0 ffffffff80250644 0000000000000000 ffffffff802565c5
0000000000000286 000000000000c350 ffff8801369c1d48 ffff8801369c1d10
Call Trace:
[<ffffffff8024ffba>] ? enqueue_hrtimer+0x6a/0x80
[<ffffffff80250644>] ? hrtimer_start_range_ns+0xd4/0x160
[<ffffffff802565c5>] ? get_futex_key+0x135/0x140
[<ffffffff80257c1a>] ? futex_wait+0x23a/0x410
[<ffffffff802a96a5>] ? mntput_no_expire+0x35/0x140
[<ffffffff8024fec0>] ? hrtimer_wakeup+0x0/0x30
[<ffffffff802301b0>] ? default_wake_function+0x0/0x10
[<ffffffff80257ed9>] ? do_futex+0xe9/0x960
[<ffffffff802966c5>] ? cp_new_stat+0xe5/0x100
[<ffffffff80252e18>] ? getnstimeofday+0x48/0xe0
[<ffffffff802501c5>] ? ktime_get_ts+0x25/0x60
[<ffffffff802587d1>] ? sys_futex+0x81/0x140
[<ffffffff8024bdcc>] ? posix_ktime_get_ts+0xc/0x20
[<ffffffff8020b71b>] ? system_call_fastpath+0x16/0x1b
Code: e0 03 48 09 c1 48 89 0f c3 48 89 0e 48 8b 07 83 e0 03 48 09 c1 48 89 0f c3 49 89 48 08 eb ed 66 2e 0f 1f 84 00 00 00 00 00 41 56 <49> 89 f6 41 55 49 89 fd 41 54 55 53 66 90 49 8b 5d 00 48 83 e3
---[ end trace 5244498c0e791391 ]---
lspci appended, full dmesg and .config available on request.
00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part
00:02.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port B)
00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port D)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3b)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:00.0 VGA compatible controller: ATI Technologies Inc RV515 [Radeon X1300]
01:00.1 Display controller: ATI Technologies Inc RV515 [Radeon X1300] (Secondary)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer
2009-04-08 9:12 [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer Mikael Pettersson
@ 2009-04-08 9:24 ` Ingo Molnar
2009-04-08 9:49 ` Peter Zijlstra
2009-04-08 11:29 ` Mikael Pettersson
0 siblings, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-04-08 9:24 UTC (permalink / raw)
To: Mikael Pettersson, Peter Zijlstra; +Cc: linux-kernel, tglx
* Mikael Pettersson <mikpe@it.uu.se> wrote:
> Less than an hour after updating a dual Opteron 8384 box from
> 2.6.29-rc6 (which it had been running for weeks) to 2.6.29 final
> it died with the following watchdog-detected lockup:
Could you please check whether this is the bug fixed by:
7f1e2ca: hrtimer: fix rq->lock inversion (again)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer
2009-04-08 9:24 ` Ingo Molnar
@ 2009-04-08 9:49 ` Peter Zijlstra
2009-04-08 11:29 ` Mikael Pettersson
1 sibling, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2009-04-08 9:49 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Mikael Pettersson, linux-kernel, tglx
On Wed, 2009-04-08 at 11:24 +0200, Ingo Molnar wrote:
> * Mikael Pettersson <mikpe@it.uu.se> wrote:
>
> > Less than an hour after updating a dual Opteron 8384 box from
> > 2.6.29-rc6 (which it had been running for weeks) to 2.6.29 final
> > it died with the following watchdog-detected lockup:
>
> Could you please check whether this is the bug fixed by:
>
> 7f1e2ca: hrtimer: fix rq->lock inversion (again)
Doesn't look like rq->lock recursion, but its worth a try.
Also, hrtimer_wakeup() isn't a self-rearming timer, so its unlikely to
get stuck in a loop, oddness.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer
2009-04-08 9:24 ` Ingo Molnar
2009-04-08 9:49 ` Peter Zijlstra
@ 2009-04-08 11:29 ` Mikael Pettersson
2009-04-09 12:02 ` Mikael Pettersson
1 sibling, 1 reply; 6+ messages in thread
From: Mikael Pettersson @ 2009-04-08 11:29 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Mikael Pettersson, Peter Zijlstra, linux-kernel, tglx
Ingo Molnar writes:
>
> * Mikael Pettersson <mikpe@it.uu.se> wrote:
>
> > Less than an hour after updating a dual Opteron 8384 box from
> > 2.6.29-rc6 (which it had been running for weeks) to 2.6.29 final
> > it died with the following watchdog-detected lockup:
>
> Could you please check whether this is the bug fixed by:
>
> 7f1e2ca: hrtimer: fix rq->lock inversion (again)
I am testing this now and will let you know how it goes.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer
2009-04-08 11:29 ` Mikael Pettersson
@ 2009-04-09 12:02 ` Mikael Pettersson
2009-04-10 13:10 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Mikael Pettersson @ 2009-04-09 12:02 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, tglx
Mikael Pettersson writes:
> Ingo Molnar writes:
> >
> > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> >
> > > Less than an hour after updating a dual Opteron 8384 box from
> > > 2.6.29-rc6 (which it had been running for weeks) to 2.6.29 final
> > > it died with the following watchdog-detected lockup:
> >
> > Could you please check whether this is the bug fixed by:
> >
> > 7f1e2ca: hrtimer: fix rq->lock inversion (again)
>
> I am testing this now and will let you know how it goes.
The machine has now been running 2.6.29 plus that fix for more
than 24 hours with no problems. Previously it oopsed and hung
within an hour on two separate attempts to run 2.6.29 vanilla.
Thanks,
/Mikael
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer
2009-04-09 12:02 ` Mikael Pettersson
@ 2009-04-10 13:10 ` Ingo Molnar
0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-04-10 13:10 UTC (permalink / raw)
To: Mikael Pettersson, stable; +Cc: Peter Zijlstra, linux-kernel, tglx
* Mikael Pettersson <mikpe@it.uu.se> wrote:
> Mikael Pettersson writes:
> > Ingo Molnar writes:
> > >
> > > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> > >
> > > > Less than an hour after updating a dual Opteron 8384 box from
> > > > 2.6.29-rc6 (which it had been running for weeks) to 2.6.29 final
> > > > it died with the following watchdog-detected lockup:
> > >
> > > Could you please check whether this is the bug fixed by:
> > >
> > > 7f1e2ca: hrtimer: fix rq->lock inversion (again)
> >
> > I am testing this now and will let you know how it goes.
>
> The machine has now been running 2.6.29 plus that fix for more
> than 24 hours with no problems. Previously it oopsed and hung
> within an hour on two separate attempts to run 2.6.29 vanilla.
ok, thanks for testing. The fix is now upstream:
7f1e2ca: hrtimer: fix rq->lock inversion (again)
-stable team, please pick it up - i just checked, it cherry-picks
fine on v2.6.29.
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-10 13:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-08 9:12 [BUG 2.6.29] NMI watchdog triggered in rb_insert_color called from enqueue_hrtimer Mikael Pettersson
2009-04-08 9:24 ` Ingo Molnar
2009-04-08 9:49 ` Peter Zijlstra
2009-04-08 11:29 ` Mikael Pettersson
2009-04-09 12:02 ` Mikael Pettersson
2009-04-10 13:10 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox