All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
@ 2006-07-11 18:13 Keith Mannthey
       [not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
  2006-07-11 20:21 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: Keith Mannthey @ 2006-07-11 18:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

Hello,
  I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
fine) and got the following error on boot.

CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
Brought up 16 CPUs
testing NMI watchdog ... OK.
time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
time.c: Detected 3002.570 MHz processor.
migration_cost=9,1121,16845
checking if image is initramfs... it is
Freeing initrd memory: 2770k freed
NMI Watchdog detected LOCKUP on CPU 8
CPU 8
Modules linked in:
Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
RIP: 0010:[<ffffffff803dd6f5>]  [<ffffffff803dd6f5>]
.text.lock.spinlock+0x31/0x8a
RSP: 0000:ffff81065f91be70  EFLAGS: 00000086
RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
FS:  0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
Stack:  ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
 0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
 ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
Call Trace:
 [<ffffffff803dd040>] __down_read+0x12/0x9a
 [<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
 [<ffffffff80232e89>] do_exit+0x178/0x8f6
 [<ffffffff8023f940>] request_module+0x0/0x150
 [<ffffffff8020a05a>] child_rip+0x8/0x12
 [<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
 [<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
 [<ffffffff8020a052>] child_rip+0x0/0x12


Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
console shuts up ...


Any ideas, have we seen this?  I can attach config and full dmesg if needed.

thanks,
  Keith

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
       [not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
@ 2006-07-11 19:00   ` Keith Mannthey
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Mannthey @ 2006-07-11 19:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

Also just tested 2.6.18-rc1 and it booted just fine with same basic
config. Must be something in -mm.

Keith

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
  2006-07-11 18:13 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP Keith Mannthey
       [not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
@ 2006-07-11 20:21 ` Andrew Morton
  2006-07-11 20:23   ` Shailabh Nagar
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2006-07-11 20:21 UTC (permalink / raw)
  To: Keith Mannthey; +Cc: linux-kernel, Shailabh Nagar

On Tue, 11 Jul 2006 11:13:00 -0700
"Keith Mannthey" <kmannth@gmail.com> wrote:

> Hello,
>   I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
> fine) and got the following error on boot.
> 
> CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
> Brought up 16 CPUs
> testing NMI watchdog ... OK.
> time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
> time.c: Detected 3002.570 MHz processor.
> migration_cost=9,1121,16845
> checking if image is initramfs... it is
> Freeing initrd memory: 2770k freed
> NMI Watchdog detected LOCKUP on CPU 8
> CPU 8
> Modules linked in:
> Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
> RIP: 0010:[<ffffffff803dd6f5>]  [<ffffffff803dd6f5>]
> .text.lock.spinlock+0x31/0x8a
> RSP: 0000:ffff81065f91be70  EFLAGS: 00000086
> RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
> RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
> RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
> R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
> FS:  0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
> Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
> Stack:  ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
>  0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
>  ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
> Call Trace:
>  [<ffffffff803dd040>] __down_read+0x12/0x9a
>  [<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
>  [<ffffffff80232e89>] do_exit+0x178/0x8f6
>  [<ffffffff8023f940>] request_module+0x0/0x150
>  [<ffffffff8020a05a>] child_rip+0x8/0x12
>  [<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
>  [<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
>  [<ffffffff8020a052>] child_rip+0x0/0x12
> 
> 
> Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
> console shuts up ...
> 
> 
> Any ideas, have we seen this?  I can attach config and full dmesg if needed.
> 

Thanks.  Shailabh sent the below patch through yesterday.  It looks awfully
similar.

From: Shailabh Nagar <nagar@watson.ibm.com>

Shift initialization of semaphores taken on exit() path to earlier in the
bootup sequence.  Without this fix, booting on large cpu machines hangs at
down_read() called on one of the per-cpu semaphores declared in taskstats.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 kernel/taskstats.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff -puN kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2 kernel/taskstats.c
--- a/kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2
+++ a/kernel/taskstats.c
@@ -501,15 +501,20 @@ static struct genl_ops taskstats_ops = {
 /* Needed early in initialization */
 void __init taskstats_init_early(void)
 {
+	unsigned int i;
+
 	taskstats_cache = kmem_cache_create("taskstats_cache",
 						sizeof(struct taskstats),
 						0, SLAB_PANIC, NULL, NULL);
+	for_each_possible_cpu(i) {
+		INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
+		init_rwsem(&(per_cpu(listener_array, i).sem));
+	}
 }
 
 static int __init taskstats_init(void)
 {
 	int rc;
-	unsigned int i;
 
 	rc = genl_register_family(&family);
 	if (rc)
@@ -519,11 +524,6 @@ static int __init taskstats_init(void)
 	if (rc < 0)
 		goto err;
 
-	for_each_possible_cpu(i) {
-		INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
-		init_rwsem(&(per_cpu(listener_array, i).sem));
-	}
-
 	family_registered = 1;
 	return 0;
 err:
_


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
  2006-07-11 20:21 ` Andrew Morton
@ 2006-07-11 20:23   ` Shailabh Nagar
  2006-07-11 20:44     ` Keith Mannthey
  0 siblings, 1 reply; 5+ messages in thread
From: Shailabh Nagar @ 2006-07-11 20:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Keith Mannthey, linux-kernel

Andrew Morton wrote:
> On Tue, 11 Jul 2006 11:13:00 -0700
> "Keith Mannthey" <kmannth@gmail.com> wrote:
> 
> 
>>Hello,
>>  I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
>>fine) and got the following error on boot.
>>
>>CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
>>Brought up 16 CPUs
>>testing NMI watchdog ... OK.
>>time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
>>time.c: Detected 3002.570 MHz processor.
>>migration_cost=9,1121,16845
>>checking if image is initramfs... it is
>>Freeing initrd memory: 2770k freed
>>NMI Watchdog detected LOCKUP on CPU 8
>>CPU 8
>>Modules linked in:
>>Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
>>RIP: 0010:[<ffffffff803dd6f5>]  [<ffffffff803dd6f5>]
>>.text.lock.spinlock+0x31/0x8a
>>RSP: 0000:ffff81065f91be70  EFLAGS: 00000086
>>RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
>>RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
>>RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
>>R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
>>R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
>>FS:  0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
>>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
>>Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
>>Stack:  ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
>> 0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
>> ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
>>Call Trace:
>> [<ffffffff803dd040>] __down_read+0x12/0x9a
>> [<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
>> [<ffffffff80232e89>] do_exit+0x178/0x8f6
>> [<ffffffff8023f940>] request_module+0x0/0x150
>> [<ffffffff8020a05a>] child_rip+0x8/0x12
>> [<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
>> [<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
>> [<ffffffff8020a052>] child_rip+0x0/0x12
>>
>>
>>Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
>>console shuts up ...
>>
>>
>>Any ideas, have we seen this?  I can attach config and full dmesg if needed.
>>
> 
> 
> Thanks.  Shailabh sent the below patch through yesterday.  It looks awfully
> similar.


Yes, this lockup on boot is caused by not initializing the per-cpu
semaphores early enough. The patch below should fix it.

--Shailabh

> 
> From: Shailabh Nagar <nagar@watson.ibm.com>
> 
> Shift initialization of semaphores taken on exit() path to earlier in the
> bootup sequence.  Without this fix, booting on large cpu machines hangs at
> down_read() called on one of the per-cpu semaphores declared in taskstats.
> 
> Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  kernel/taskstats.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff -puN kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2 kernel/taskstats.c
> --- a/kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2
> +++ a/kernel/taskstats.c
> @@ -501,15 +501,20 @@ static struct genl_ops taskstats_ops = {
>  /* Needed early in initialization */
>  void __init taskstats_init_early(void)
>  {
> +	unsigned int i;
> +
>  	taskstats_cache = kmem_cache_create("taskstats_cache",
>  						sizeof(struct taskstats),
>  						0, SLAB_PANIC, NULL, NULL);
> +	for_each_possible_cpu(i) {
> +		INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
> +		init_rwsem(&(per_cpu(listener_array, i).sem));
> +	}
>  }
>  
>  static int __init taskstats_init(void)
>  {
>  	int rc;
> -	unsigned int i;
>  
>  	rc = genl_register_family(&family);
>  	if (rc)
> @@ -519,11 +524,6 @@ static int __init taskstats_init(void)
>  	if (rc < 0)
>  		goto err;
>  
> -	for_each_possible_cpu(i) {
> -		INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
> -		init_rwsem(&(per_cpu(listener_array, i).sem));
> -	}
> -
>  	family_registered = 1;
>  	return 0;
>  err:
> _
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP
  2006-07-11 20:23   ` Shailabh Nagar
@ 2006-07-11 20:44     ` Keith Mannthey
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Mannthey @ 2006-07-11 20:44 UTC (permalink / raw)
  To: Shailabh Nagar; +Cc: Andrew Morton, linux-kernel

On 7/11/06, Shailabh Nagar <nagar@watson.ibm.com> wrote:
> Andrew Morton wrote:
> > Thanks.  Shailabh sent the below patch through yesterday.  It looks awfully
> > similar.
>
>
> Yes, this lockup on boot is caused by not initializing the per-cpu
> semaphores early enough. The patch below should fix it.
>

Thanks.  I applied the patch and the system booted :)

Keith

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-07-11 20:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-11 18:13 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detected LOCKUP Keith Mannthey
     [not found] ` <a762e240607111125y1f9a67eleadbd1fffd053be6@mail.gmail.com>
2006-07-11 19:00   ` Keith Mannthey
2006-07-11 20:21 ` Andrew Morton
2006-07-11 20:23   ` Shailabh Nagar
2006-07-11 20:44     ` Keith Mannthey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.