* 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
@ 2006-07-11 18:13 Keith Mannthey
[not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
2006-07-11 20:21 ` Andrew Morton
0 siblings, 2 replies; 5+ messages in thread
From: Keith Mannthey @ 2006-07-11 18:13 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
Hello,
I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
fine) and got the following error on boot.
CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
Brought up 16 CPUs
testing NMI watchdog ... OK.
time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
time.c: Detected 3002.570 MHz processor.
migration_cost=9,1121,16845
checking if image is initramfs... it is
Freeing initrd memory: 2770k freed
NMI Watchdog detected LOCKUP on CPU 8
CPU 8
Modules linked in:
Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
RIP: 0010:[<ffffffff803dd6f5>] [<ffffffff803dd6f5>]
.text.lock.spinlock+0x31/0x8a
RSP: 0000:ffff81065f91be70 EFLAGS: 00000086
RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
FS: 0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
Stack: ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
Call Trace:
[<ffffffff803dd040>] __down_read+0x12/0x9a
[<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
[<ffffffff80232e89>] do_exit+0x178/0x8f6
[<ffffffff8023f940>] request_module+0x0/0x150
[<ffffffff8020a05a>] child_rip+0x8/0x12
[<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
[<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
[<ffffffff8020a052>] child_rip+0x0/0x12
Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
console shuts up ...
Any ideas, have we seen this? I can attach config and full dmesg if needed.
thanks,
Keith
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
[not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
@ 2006-07-11 19:00 ` Keith Mannthey
0 siblings, 0 replies; 5+ messages in thread
From: Keith Mannthey @ 2006-07-11 19:00 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
Also just tested 2.6.18-rc1 and it booted just fine with same basic
config. Must be something in -mm.
Keith
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
2006-07-11 18:13 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP Keith Mannthey
[not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
@ 2006-07-11 20:21 ` Andrew Morton
2006-07-11 20:23 ` Shailabh Nagar
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2006-07-11 20:21 UTC (permalink / raw)
To: Keith Mannthey; +Cc: linux-kernel, Shailabh Nagar
On Tue, 11 Jul 2006 11:13:00 -0700
"Keith Mannthey" <kmannth@gmail.com> wrote:
> Hello,
> I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
> fine) and got the following error on boot.
>
> CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
> Brought up 16 CPUs
> testing NMI watchdog ... OK.
> time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
> time.c: Detected 3002.570 MHz processor.
> migration_cost=9,1121,16845
> checking if image is initramfs... it is
> Freeing initrd memory: 2770k freed
> NMI Watchdog detected LOCKUP on CPU 8
> CPU 8
> Modules linked in:
> Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
> RIP: 0010:[<ffffffff803dd6f5>] [<ffffffff803dd6f5>]
> .text.lock.spinlock+0x31/0x8a
> RSP: 0000:ffff81065f91be70 EFLAGS: 00000086
> RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
> RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
> RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
> R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
> FS: 0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
> Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
> Stack: ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
> 0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
> ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
> Call Trace:
> [<ffffffff803dd040>] __down_read+0x12/0x9a
> [<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
> [<ffffffff80232e89>] do_exit+0x178/0x8f6
> [<ffffffff8023f940>] request_module+0x0/0x150
> [<ffffffff8020a05a>] child_rip+0x8/0x12
> [<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
> [<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
> [<ffffffff8020a052>] child_rip+0x0/0x12
>
>
> Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
> console shuts up ...
>
>
> Any ideas, have we seen this? I can attach config and full dmesg if needed.
>
Thanks. Shailabh sent the below patch through yesterday. It looks awfully
similar.
From: Shailabh Nagar <nagar@watson.ibm.com>
Shift initialization of semaphores taken on exit() path to earlier in the
bootup sequence. Without this fix, booting on large cpu machines hangs at
down_read() called on one of the per-cpu semaphores declared in taskstats.
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
kernel/taskstats.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff -puN kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2 kernel/taskstats.c
--- a/kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2
+++ a/kernel/taskstats.c
@@ -501,15 +501,20 @@ static struct genl_ops taskstats_ops = {
/* Needed early in initialization */
void __init taskstats_init_early(void)
{
+ unsigned int i;
+
taskstats_cache = kmem_cache_create("taskstats_cache",
sizeof(struct taskstats),
0, SLAB_PANIC, NULL, NULL);
+ for_each_possible_cpu(i) {
+ INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
+ init_rwsem(&(per_cpu(listener_array, i).sem));
+ }
}
static int __init taskstats_init(void)
{
int rc;
- unsigned int i;
rc = genl_register_family(&family);
if (rc)
@@ -519,11 +524,6 @@ static int __init taskstats_init(void)
if (rc < 0)
goto err;
- for_each_possible_cpu(i) {
- INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
- init_rwsem(&(per_cpu(listener_array, i).sem));
- }
-
family_registered = 1;
return 0;
err:
_
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
2006-07-11 20:21 ` Andrew Morton
@ 2006-07-11 20:23 ` Shailabh Nagar
2006-07-11 20:44 ` Keith Mannthey
0 siblings, 1 reply; 5+ messages in thread
From: Shailabh Nagar @ 2006-07-11 20:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: Keith Mannthey, linux-kernel
Andrew Morton wrote:
> On Tue, 11 Jul 2006 11:13:00 -0700
> "Keith Mannthey" <kmannth@gmail.com> wrote:
>
>
>>Hello,
>> I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
>>fine) and got the following error on boot.
>>
>>CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
>>Brought up 16 CPUs
>>testing NMI watchdog ... OK.
>>time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
>>time.c: Detected 3002.570 MHz processor.
>>migration_cost=9,1121,16845
>>checking if image is initramfs... it is
>>Freeing initrd memory: 2770k freed
>>NMI Watchdog detected LOCKUP on CPU 8
>>CPU 8
>>Modules linked in:
>>Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
>>RIP: 0010:[<ffffffff803dd6f5>] [<ffffffff803dd6f5>]
>>.text.lock.spinlock+0x31/0x8a
>>RSP: 0000:ffff81065f91be70 EFLAGS: 00000086
>>RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
>>RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
>>RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
>>R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
>>R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
>>FS: 0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
>>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
>>Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
>>Stack: ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
>> 0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
>> ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
>>Call Trace:
>> [<ffffffff803dd040>] __down_read+0x12/0x9a
>> [<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
>> [<ffffffff80232e89>] do_exit+0x178/0x8f6
>> [<ffffffff8023f940>] request_module+0x0/0x150
>> [<ffffffff8020a05a>] child_rip+0x8/0x12
>> [<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
>> [<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
>> [<ffffffff8020a052>] child_rip+0x0/0x12
>>
>>
>>Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
>>console shuts up ...
>>
>>
>>Any ideas, have we seen this? I can attach config and full dmesg if needed.
>>
>
>
> Thanks. Shailabh sent the below patch through yesterday. It looks awfully
> similar.
Yes, this lockup on boot is caused by not initializing the per-cpu
semaphores early enough. The patch below should fix it.
--Shailabh
>
> From: Shailabh Nagar <nagar@watson.ibm.com>
>
> Shift initialization of semaphores taken on exit() path to earlier in the
> bootup sequence. Without this fix, booting on large cpu machines hangs at
> down_read() called on one of the per-cpu semaphores declared in taskstats.
>
> Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
>
> kernel/taskstats.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff -puN kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2 kernel/taskstats.c
> --- a/kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2
> +++ a/kernel/taskstats.c
> @@ -501,15 +501,20 @@ static struct genl_ops taskstats_ops = {
> /* Needed early in initialization */
> void __init taskstats_init_early(void)
> {
> + unsigned int i;
> +
> taskstats_cache = kmem_cache_create("taskstats_cache",
> sizeof(struct taskstats),
> 0, SLAB_PANIC, NULL, NULL);
> + for_each_possible_cpu(i) {
> + INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
> + init_rwsem(&(per_cpu(listener_array, i).sem));
> + }
> }
>
> static int __init taskstats_init(void)
> {
> int rc;
> - unsigned int i;
>
> rc = genl_register_family(&family);
> if (rc)
> @@ -519,11 +524,6 @@ static int __init taskstats_init(void)
> if (rc < 0)
> goto err;
>
> - for_each_possible_cpu(i) {
> - INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
> - init_rwsem(&(per_cpu(listener_array, i).sem));
> - }
> -
> family_registered = 1;
> return 0;
> err:
> _
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
2006-07-11 20:23 ` Shailabh Nagar
@ 2006-07-11 20:44 ` Keith Mannthey
0 siblings, 0 replies; 5+ messages in thread
From: Keith Mannthey @ 2006-07-11 20:44 UTC (permalink / raw)
To: Shailabh Nagar; +Cc: Andrew Morton, linux-kernel
On 7/11/06, Shailabh Nagar <nagar@watson.ibm.com> wrote:
> Andrew Morton wrote:
> > Thanks. Shailabh sent the below patch through yesterday. It looks awfully
> > similar.
>
>
> Yes, this lockup on boot is caused by not initializing the per-cpu
> semaphores early enough. The patch below should fix it.
>
Thanks. I applied the patch and the system booted :)
Keith
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-07-11 20:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-11 18:13 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP Keith Mannthey
[not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
2006-07-11 19:00 ` Keith Mannthey
2006-07-11 20:21 ` Andrew Morton
2006-07-11 20:23 ` Shailabh Nagar
2006-07-11 20:44 ` Keith Mannthey
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.