* Kernel Panic booting cdrom
@ 2007-04-14 0:37 David Huffman
2007-04-14 4:45 ` Nathan Lynch
0 siblings, 1 reply; 5+ messages in thread
From: David Huffman @ 2007-04-14 0:37 UTC (permalink / raw)
To: linuxppc-dev
I created a cdrom (using the same procedure that has been working for
years), and received a kernel panic when booting a linux system on a
System i p5 lpar. Here is a section of the boot messages, plus I added
the output of .registers from OF. Anyone have an idea what could be the
cause? This same CDROM works on a standalone System p 43p-170.
David Huffman
PPC Dev
Storix, Inc.
[boot]0015 Setup Done
Built 6 zonelists
Kernel command line: root=/dev/ram0 selinux=0 maxcpus=1 devfs=nomount
load_ramdisk=1 ramdisk_blocksize=1024 ramdisk_size=65536 rw
raid=noautodetect init=/linuxrc console=hvc0,9600
[boot]0020 XICS Init
xics: no ISA Interrupt Controller
[boot]0021 XICS Done
PID hash table entries: 16 (order 4: 256 bytes)
CKRM Initialization
...... Initializing ClassType<taskclass> ........
...... Initializing ClassType<socketclass> ........
CKRM Initialization done
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency = 2197.800000 MHz
Console: colour dummy device 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 5
Memory: 2032640k available (0k kernel code, 0k data, 0k init)
[c000000000000000,c000000080000000]
kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights
Reserved
kdb_cmd[0]: defcmd archkdb "" "First line arch debugging"
kdb_cmd[6]: defcmd archkdbcpu "" "archkdb with only tasks on cpus"
kdb_cmd[12]: defcmd archkdbshort "" "archkdb with less detailed backtrace"
Security Scaffold v1.0.0 initialized
SELinux: Disabled at boot.
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
Partition configured for 8 cpus.
Brought up 1 CPUs
checking if image is initramfs...it isn't (no cpio magic); looks like an
initrd
Freeing initrd memory: 1773k freed
khelper: max 64 concurrent processes
resid is -1 name is io <NULL>
CKRM .. create res clsobj for resouce <io>class <taskclass>
par=0000000000000000
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
PCI: Probing PCI hardware done
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA PSERIES LPAR
NIP: C00000000012F210 XER: 0000000020000010 LR: C000000000476024
REGS: c00000007ff5fab0 TRAP: 0380 Not tainted (2.6.5-7.244-pseries64
SLES9_SP3_BRANCH-200512121832250000)
MSR: 8000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000000010, DSISR: 0000000000200000
TASK: c00000007ff5b440[1] 'swapper' THREAD: c00000007ff5c000 CPU: 0
GPR00: C000000000476024 C00000007FF5FD30 C00000000071DCC8 C000000000723338
GPR04: C0000000028CC080 C0000000028CC088 C000000000954C98 C000000002692888
GPR08: 0000000000000002 0000000000000001 C000000000643160 0000000000000001
GPR12: 0000000024000042 C000000000491000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000010 00000000001BB591 0000000001C00000
GPR24: C000000000491000 C00000000071D008 C00000000048B5F0 C0000000028CC080
GPR28: 0000000000000000 C000000000723338 C0000000006316D0 C0000000028CC088
NIP [c00000000012f210] .sysfs_create_link+0x30/0x134
LR [c000000000476024] .register_cpu+0xc8/0x104
Call Trace:
[c00000007ff5fd30] [c0000000002753cc] .sysdev_register+0x98/0x250
(unreliable)
[c00000007ff5fdd0] [c000000000476024] .register_cpu+0xc8/0x104
[c00000007ff5fe70] [c0000000004621dc] .topology_init+0x1b4/0x290
[c00000007ff5ff00] [c00000000000c654] .init+0x1a0/0x360
[c00000007ff5ff90] [c000000000017d24] .kernel_thread+0x4c/0x68
<0>Fatal exception: panic in 5 seconds
Kernel panic: Fatal exception
###############################################################
Registers
###############################################################
0 > .registers
My Fix Pt Regs:
00 800000000000b002 0000000000000000 00000000deadbeef 0000000000000001
04 00000000ea28ef60 0000000000c4f0d8 0000000000000000 0000000000c03010
08 0000000008000000 80000000001f9ca0 80000000001f9ca0 80000000001af548
0c 0000000000004000 0000000000c17100 0000000000c18000 00000000000e86b0
10 0000000000e262e5 0000000000e262e5 0000000000c457c0 0000000000c45848
14 0000000000144078 0001f89aea28ef60 0000000000000000 0000000000000000
18 0000000000c13000 0000000000c38000 0000000000c14f80 0000000000c16fc0
1c 0000000000c20000 0000000000c3fdd0 0000000000c11fb0 0000000000c11000
Special Regs:
%IV: 00000900 %CR: 22808000 %XER: 20000001 %DSISR: 00000000
%SRR0: 0000000000c4580c %SRR1: 800000000000b002
%LR: 0000000000c46974 %CTR: 0000000000000004
%DAR: 0000000000000000
Virtual PID = 0
I posted the entire boot message output at:
http://www.storix.com/p5panic.txt
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel Panic booting cdrom
2007-04-14 0:37 Kernel Panic booting cdrom David Huffman
@ 2007-04-14 4:45 ` Nathan Lynch
[not found] ` <4623F574.1030804@storix.com>
0 siblings, 1 reply; 5+ messages in thread
From: Nathan Lynch @ 2007-04-14 4:45 UTC (permalink / raw)
To: David Huffman; +Cc: linuxppc-dev
Hi David-
David Huffman wrote:
> I created a cdrom (using the same procedure that has been working for
> years), and received a kernel panic when booting a linux system on a
> System i p5 lpar. Here is a section of the boot messages, plus I added
> the output of .registers from OF. Anyone have an idea what could be the
> cause?
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=128 NUMA PSERIES LPAR
> NIP: C00000000012F210 XER: 0000000020000010 LR: C000000000476024
> REGS: c00000007ff5fab0 TRAP: 0380 Not tainted (2.6.5-7.244-pseries64
> SLES9_SP3_BRANCH-200512121832250000)
> MSR: 8000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: 0000000000000010, DSISR: 0000000000200000
> TASK: c00000007ff5b440[1] 'swapper' THREAD: c00000007ff5c000 CPU: 0
> GPR00: C000000000476024 C00000007FF5FD30 C00000000071DCC8 C000000000723338
> GPR04: C0000000028CC080 C0000000028CC088 C000000000954C98 C000000002692888
> GPR08: 0000000000000002 0000000000000001 C000000000643160 0000000000000001
> GPR12: 0000000024000042 C000000000491000 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000010 00000000001BB591 0000000001C00000
> GPR24: C000000000491000 C00000000071D008 C00000000048B5F0 C0000000028CC080
> GPR28: 0000000000000000 C000000000723338 C0000000006316D0 C0000000028CC088
> NIP [c00000000012f210] .sysfs_create_link+0x30/0x134
> LR [c000000000476024] .register_cpu+0xc8/0x104
> Call Trace:
> [c00000007ff5fd30] [c0000000002753cc] .sysdev_register+0x98/0x250
> (unreliable)
> [c00000007ff5fdd0] [c000000000476024] .register_cpu+0xc8/0x104
> [c00000007ff5fe70] [c0000000004621dc] .topology_init+0x1b4/0x290
> [c00000007ff5ff00] [c00000000000c654] .init+0x1a0/0x360
> [c00000007ff5ff90] [c000000000017d24] .kernel_thread+0x4c/0x68
> <0>Fatal exception: panic in 5 seconds
>From the full version at http://www.storix.com/p5panic.txt:
Linux version 2.6.5-7.244-pseries64 (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Mon Dec 12 18:32:25 UTC 2005
[boot]0012 Setup Arch
NUMA associativity depth for CPU/Memory: 3
cpu 0 maps to domain 5
cpu 1 maps to domain 5
memory region 0 to 8000000 maps to domain 5
memory region 8000000 to 9000000 maps to domain 5
memory region 9000000 to a000000 maps to domain 5
....
Could you see whether adding numa=off on the kernel command line
helps?
I think it's crashing because kernels this old assume that numa node
numbering as given by firmware begins with 0. This partition has a
single node numbered 5.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel Panic booting cdrom
[not found] ` <20070417054537.GF6062@localdomain>
@ 2007-04-20 21:05 ` David Huffman
2007-04-23 2:10 ` Michael Ellerman
2007-04-23 22:35 ` Linas Vepstas
0 siblings, 2 replies; 5+ messages in thread
From: David Huffman @ 2007-04-20 21:05 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev
Nathan,
I think I determined why I received a kernel panic and the numa=off
argument fixed the problem. When we boot from cdrom we specify maxcpus=1
as a kernel argument. A system with numa enabled fails. I plan on
adding numa=off whenever I use maxcpus=1, but I wonder if you could
answer a question for me.
I originally was told that in the case where I am booting a basic system
into an initrd instead of in normal mode, I should use maxcpus=1 because
there may be power and cooling daemons that are not running and try to
limit the system resources by limiting the number of cpus. Does this
sound right? I can successfully boot a cdrom without the maxcpus flag on
an SMP system but maybe it is typically not a good idea?
I can prevent the kernel panics by removing maxcpus=1 and not adding
numa=off. I am a little more informed about numa (now), but I am fuzzy
as to all the implications with allowing more cpus for cdrom install
media. The maxcpus=1 argument was something we added to our install boot
media years ago and few here remember why it was such a great idea. The
power/resource management was the only thing we could come up with.
David Huffman
Storix, Inc
Nathan Lynch wrote:
> David Huffman wrote:
>
>> Nathan,
>>
>> Thank you very much for the info. This worked to get the system booted
>> from cdrom. However, in normal mode, the numa=off argument is not in the
>> yaboo.conf file. Any idea why it works without it in normal mode, but
>> requires numa=off when using the same kernel and booting from cdrom?
>>
>
> No, that doesn't make sense to me. If you'd like to dig deeper into
> it, diff -u the output of dmesg from both boots. (and please copy the
> list next time, thanks :)
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel Panic booting cdrom
2007-04-20 21:05 ` David Huffman
@ 2007-04-23 2:10 ` Michael Ellerman
2007-04-23 22:35 ` Linas Vepstas
1 sibling, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2007-04-23 2:10 UTC (permalink / raw)
To: David Huffman; +Cc: linuxppc-dev, Nathan Lynch
[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]
On Fri, 2007-04-20 at 14:05 -0700, David Huffman wrote:
> Nathan,
>
> I think I determined why I received a kernel panic and the numa=off
> argument fixed the problem. When we boot from cdrom we specify maxcpus=1
> as a kernel argument. A system with numa enabled fails. I plan on
> adding numa=off whenever I use maxcpus=1, but I wonder if you could
> answer a question for me.
>
> I originally was told that in the case where I am booting a basic system
> into an initrd instead of in normal mode, I should use maxcpus=1 because
> there may be power and cooling daemons that are not running and try to
> limit the system resources by limiting the number of cpus. Does this
> sound right? I can successfully boot a cdrom without the maxcpus flag on
> an SMP system but maybe it is typically not a good idea?
>
> I can prevent the kernel panics by removing maxcpus=1 and not adding
> numa=off. I am a little more informed about numa (now), but I am fuzzy
> as to all the implications with allowing more cpus for cdrom install
> media. The maxcpus=1 argument was something we added to our install boot
> media years ago and few here remember why it was such a great idea. The
> power/resource management was the only thing we could come up with.
maxcpus=x is poorly tested, I would definitely NOT recommend it.
I don't think there's any issue with power etc. With maxcpus=1 all the
other CPUs are still running, they're just not used by the kernel - in
fact a cursory glance suggests they will do less power saving in that
situation than when they're in use but idle.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel Panic booting cdrom
2007-04-20 21:05 ` David Huffman
2007-04-23 2:10 ` Michael Ellerman
@ 2007-04-23 22:35 ` Linas Vepstas
1 sibling, 0 replies; 5+ messages in thread
From: Linas Vepstas @ 2007-04-23 22:35 UTC (permalink / raw)
To: David Huffman; +Cc: linuxppc-dev, Nathan Lynch
On Fri, Apr 20, 2007 at 02:05:11PM -0700, David Huffman wrote:
> there may be power and cooling daemons that are not running and try to
> limit the system resources by limiting the number of cpus. Does this
> sound right?
As Michael Ellerman points out, the answer is no.
The idle loop specifically uses instructions that put the
cpu into low-power mode. If you don't call the idle loop, the
cpu probably won't go into low-power mode!
> The maxcpus=1 argument was something we added to our install boot
> media years ago and few here remember why it was such a great idea. The
> power/resource management was the only thing we could come up with.
Ooh, this is worthy of tatooing onto a cluestick.
--linas
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-04-23 22:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-14 0:37 Kernel Panic booting cdrom David Huffman
2007-04-14 4:45 ` Nathan Lynch
[not found] ` <4623F574.1030804@storix.com>
[not found] ` <20070417054537.GF6062@localdomain>
2007-04-20 21:05 ` David Huffman
2007-04-23 2:10 ` Michael Ellerman
2007-04-23 22:35 ` Linas Vepstas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).