From: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Cc: mingo@elte.hu, torvalds@linux-foundation.org,
a.p.zijlstra@chello.nl, anton@samba.org
Subject: [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982
Date: Thu, 7 Jul 2011 15:52:32 +0530 [thread overview]
Message-ID: <20110707102107.GA16666@in.ibm.com> (raw)
Hi,
linux-3.0-rc fails to boot on a power7 system with 1TB ram and 896 CPUs.
While the initial boot log shows a soft-lockup [1], the machine is hung after.
Dropping into xmon shows the cpus are all struck at:
--------------------
cpu 0xa: Vector: 100 (System Reset) at [c000000fae51fae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fae51fd60
msr: 8000000000089032
current = 0xc000000fae49d990
paca = 0xc00000000ebb1900
pid = 0, comm = kworker/0:1
cpu 0x41: Vector: 100 (System Reset) at [c000000fac01bae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fac01bd60
msr: 8000000000089032
current = 0xc000000faefbf210
paca = 0xc00000000ebba280
pid = 0, comm = kworker/0:1
cpu 0x21: Vector: 100 (System Reset) at [c000000fae9abae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fae9abd60
msr: 8000000000089032
current = 0xc000000fae998590
paca = 0xc00000000ebb5280
pid = 0, comm = kworker/0:1
cpu 0xb8: Vector: 100 (System Reset) at [c000000fab3dbae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fab3dbd60
msr: 8000000000089032
current = 0xc000000fab3a2710
paca = 0xc00000000ebccc00
pid = 0, comm = kworker/0:1
......
......
And shows same for all the CPUs.
a:mon> t
[link register ] c00000000005b9a4 .pseries_dedicated_idle_sleep+0x194/0x210
[c000000fae51fd60] 00000000134d0000 (unreliable)
[c000000fae51fe20] c000000000018b64 .cpu_idle+0x164/0x210
[c000000fae51fed0] c0000000005d55b0 .start_secondary+0x348/0x354
[c000000fae51ff90] c000000000009268 .start_secondary_prolog+0x10/0x14
a:mon> S
msr = 8000000000001032 sprg0= 0000000000000000
pvr = 00000000003f0201 sprg1= c00000000ebb1900
dec = 0000000030fb5b4f sprg2= c00000000ebb1900
sp = c000000fae51f440 sprg3= 000000000000000a
toc = c000000000e21f90 dar = c000011aee0c20e8
a:mon>
--------------------
2.6.39 booted fine on the system and a git bisect shows commit cd4ea6ae -
"sched: Change NODE sched_domain group creation" as the cause.
Thanks,
-Mahesh.
[1]:
POWER7 performance monitor hardware support registered
Brought up 896 CPUs
Enabling Asymmetric SMT scheduling
BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]
Modules linked in:
NIP: c000000000074b90 LR: c00000000008a1c4 CTR: 0000000000000000
REGS: c000000fae25f9c0 TRAP: 0901 Not tainted (3.0.0-rc6)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24000088 XER: 00000004
TASK = c000000fae248490[1] 'swapper' THREAD: c000000fae25c000 CPU: 0
GPR00: 0000e2a55cbeec50 c000000fae25fc40 c000000000e21f90 c000007b2b34cb00
GPR04: 0000000000000100 0000000000000100 c000011adcf23418 0000000000000000
GPR08: 0000000000000000 c000008b2b7d4480 c000007b2b35ef80 00000000000024ac
GPR12: 0000000044000042 c00000000ebb0000
NIP [c000000000074b90] .update_group_power+0x50/0x190
LR [c00000000008a1c4] .build_sched_domains+0x434/0x490
Call Trace:
[c000000fae25fc40] [c000000fae25fce0] 0xc000000fae25fce0 (unreliable)
[c000000fae25fce0] [c00000000008a1c4] .build_sched_domains+0x434/0x490
[c000000fae25fdd0] [c000000000867370] .sched_init_smp+0xa8/0x224
[c000000fae25fee0] [c000000000850274] .kernel_init+0x10c/0x1fc
[c000000fae25ff90] [c000000000023884] .kernel_thread+0x54/0x70
Instruction dump:
f821ff61 ebc2b1a0 7c7f1b78 7c9c2378 e9230008 eba30010 2fa90000 419e0054
e9490010 38000000 7d495378 60000000 <8169000c> e9290000 7faa4800 7c005a14
WARNING: multiple messages have this Message-ID (diff)
From: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Cc: a.p.zijlstra@chello.nl, mingo@elte.hu, anton@samba.org,
benh@kernel.crashing.org, torvalds@linux-foundation.org
Subject: [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982
Date: Thu, 7 Jul 2011 15:52:32 +0530 [thread overview]
Message-ID: <20110707102107.GA16666@in.ibm.com> (raw)
Hi,
linux-3.0-rc fails to boot on a power7 system with 1TB ram and 896 CPUs.
While the initial boot log shows a soft-lockup [1], the machine is hung after.
Dropping into xmon shows the cpus are all struck at:
--------------------
cpu 0xa: Vector: 100 (System Reset) at [c000000fae51fae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fae51fd60
msr: 8000000000089032
current = 0xc000000fae49d990
paca = 0xc00000000ebb1900
pid = 0, comm = kworker/0:1
cpu 0x41: Vector: 100 (System Reset) at [c000000fac01bae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fac01bd60
msr: 8000000000089032
current = 0xc000000faefbf210
paca = 0xc00000000ebba280
pid = 0, comm = kworker/0:1
cpu 0x21: Vector: 100 (System Reset) at [c000000fae9abae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fae9abd60
msr: 8000000000089032
current = 0xc000000fae998590
paca = 0xc00000000ebb5280
pid = 0, comm = kworker/0:1
cpu 0xb8: Vector: 100 (System Reset) at [c000000fab3dbae0]
pc: c0000000000596b8: .plpar_hcall_norets+0x80/0xd0
lr: c00000000005b9a4: .pseries_dedicated_idle_sleep+0x194/0x210
sp: c000000fab3dbd60
msr: 8000000000089032
current = 0xc000000fab3a2710
paca = 0xc00000000ebccc00
pid = 0, comm = kworker/0:1
......
......
And shows same for all the CPUs.
a:mon> t
[link register ] c00000000005b9a4 .pseries_dedicated_idle_sleep+0x194/0x210
[c000000fae51fd60] 00000000134d0000 (unreliable)
[c000000fae51fe20] c000000000018b64 .cpu_idle+0x164/0x210
[c000000fae51fed0] c0000000005d55b0 .start_secondary+0x348/0x354
[c000000fae51ff90] c000000000009268 .start_secondary_prolog+0x10/0x14
a:mon> S
msr = 8000000000001032 sprg0= 0000000000000000
pvr = 00000000003f0201 sprg1= c00000000ebb1900
dec = 0000000030fb5b4f sprg2= c00000000ebb1900
sp = c000000fae51f440 sprg3= 000000000000000a
toc = c000000000e21f90 dar = c000011aee0c20e8
a:mon>
--------------------
2.6.39 booted fine on the system and a git bisect shows commit cd4ea6ae -
"sched: Change NODE sched_domain group creation" as the cause.
Thanks,
-Mahesh.
[1]:
POWER7 performance monitor hardware support registered
Brought up 896 CPUs
Enabling Asymmetric SMT scheduling
BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]
Modules linked in:
NIP: c000000000074b90 LR: c00000000008a1c4 CTR: 0000000000000000
REGS: c000000fae25f9c0 TRAP: 0901 Not tainted (3.0.0-rc6)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24000088 XER: 00000004
TASK = c000000fae248490[1] 'swapper' THREAD: c000000fae25c000 CPU: 0
GPR00: 0000e2a55cbeec50 c000000fae25fc40 c000000000e21f90 c000007b2b34cb00
GPR04: 0000000000000100 0000000000000100 c000011adcf23418 0000000000000000
GPR08: 0000000000000000 c000008b2b7d4480 c000007b2b35ef80 00000000000024ac
GPR12: 0000000044000042 c00000000ebb0000
NIP [c000000000074b90] .update_group_power+0x50/0x190
LR [c00000000008a1c4] .build_sched_domains+0x434/0x490
Call Trace:
[c000000fae25fc40] [c000000fae25fce0] 0xc000000fae25fce0 (unreliable)
[c000000fae25fce0] [c00000000008a1c4] .build_sched_domains+0x434/0x490
[c000000fae25fdd0] [c000000000867370] .sched_init_smp+0xa8/0x224
[c000000fae25fee0] [c000000000850274] .kernel_init+0x10c/0x1fc
[c000000fae25ff90] [c000000000023884] .kernel_thread+0x54/0x70
Instruction dump:
f821ff61 ebc2b1a0 7c7f1b78 7c9c2378 e9230008 eba30010 2fa90000 419e0054
e9490010 38000000 7d495378 60000000 <8169000c> e9290000 7faa4800 7c005a14
next reply other threads:[~2011-07-07 10:22 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-07 10:22 Mahesh J Salgaonkar [this message]
2011-07-07 10:22 ` [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982 Mahesh J Salgaonkar
2011-07-07 10:59 ` Peter Zijlstra
2011-07-07 10:59 ` Peter Zijlstra
2011-07-07 11:55 ` Mahesh J Salgaonkar
2011-07-07 11:55 ` Mahesh J Salgaonkar
2011-07-07 12:28 ` Peter Zijlstra
2011-07-07 12:28 ` Peter Zijlstra
2011-07-14 0:34 ` Anton Blanchard
2011-07-14 0:34 ` Anton Blanchard
2011-07-14 4:35 ` Anton Blanchard
2011-07-14 4:35 ` Anton Blanchard
2011-07-14 13:16 ` Peter Zijlstra
2011-07-14 13:16 ` Peter Zijlstra
2011-07-15 0:45 ` Anton Blanchard
2011-07-15 0:45 ` Anton Blanchard
2011-07-15 8:37 ` Peter Zijlstra
2011-07-15 8:37 ` Peter Zijlstra
2011-07-18 21:35 ` Peter Zijlstra
2011-07-18 21:35 ` Peter Zijlstra
2011-07-19 4:44 ` Anton Blanchard
2011-07-19 4:44 ` Anton Blanchard
2011-07-19 10:21 ` Peter Zijlstra
2011-07-19 10:21 ` Peter Zijlstra
2011-07-20 2:03 ` Anton Blanchard
2011-07-20 2:03 ` Anton Blanchard
2011-07-20 10:14 ` Anton Blanchard
2011-07-20 10:14 ` Anton Blanchard
2011-07-20 10:45 ` Peter Zijlstra
2011-07-20 10:45 ` Peter Zijlstra
2011-07-20 12:14 ` Anton Blanchard
2011-07-20 12:14 ` Anton Blanchard
2011-07-20 14:40 ` Linus Torvalds
2011-07-20 14:40 ` Linus Torvalds
2011-07-20 14:58 ` Peter Zijlstra
2011-07-20 14:58 ` Peter Zijlstra
2011-07-20 16:04 ` Linus Torvalds
2011-07-20 16:04 ` Linus Torvalds
2011-07-20 16:42 ` Ingo Molnar
2011-07-20 16:42 ` Ingo Molnar
2011-07-20 16:42 ` Peter Zijlstra
2011-07-20 16:42 ` Peter Zijlstra
2011-07-20 17:29 ` [tip:sched/urgent] sched: Avoid creating superfluous NUMA domains on non-NUMA systems tip-bot for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110707102107.GA16666@in.ibm.com \
--to=mahesh@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=anton@samba.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@elte.hu \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.