From: Nishanth Aravamudan <nacc@us.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>,
linux-mm@kvack.org, lee.schermerhorn@hp.com, bob.picco@hp.com,
kamezawa.hiroyu@jp.fujitsu.com, mel@skynet.ie
Subject: Re: [BUG] at mm/slab.c:3320
Date: Thu, 3 Jan 2008 16:33:36 -0800 [thread overview]
Message-ID: <20080104003336.GA2594@us.ibm.com> (raw)
In-Reply-To: <20080103155046.GA7092@skywalker>
On 03.01.2008 [21:20:46 +0530], Aneesh Kumar K.V wrote:
> On Wed, Jan 02, 2008 at 12:32:42PM -0800, Christoph Lameter wrote:
> >
> > This occurred on a 32 bit NUMA platform? Guess NUMAQ?
Not NUMA-Q afaict, but 32-bit, yes. It's unclear what's going on with
this box, actually. Clearly the kernel detected NUMA; however the
listing in our testing grid does not indicate any NUMA nodes per sysfs,
I don't think. And in fact what the kernel detected doesn't necessarily
mesh with a normal NUMA system.
Does reverting this patch actually make the box boot? What was the last
kernel that worked on this box?
> > The dmesg that I saw was partial. Could you repost a full problem
> > description to linux-mm@kvack.org and cc the authors of memoryless node
> > support?
> >
> > Nishanth Aravamudan <nacc@us.ibm.com>
> > Lee Schermerhorn <lee.schermerhorn@hp.com>
> > Bob Picco <bob.picco@hp.com>
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Mel Gorman <mel@skynet.ie>
> > Christoph Lameter <clameter@sgi.com>
> >
> Full dmesg:
> ----------
> Booting 'autobench'
>
> root (hd0,0)
> Filesystem type is ext2fs, partition type 0x83
> kernel /boot/vmlinuz-autobench ro console=tty0 console=ttyS0,115200 autobench_a
> rgs: root=/dev/sda3 ABAT:1198144312
> [Linux-bzImage, setup=0x2800, size=0x1a08e8]
> initrd /boot/initrd-autobench.img
> [Linux-initrd @ 0x37ed8000, 0x117985 bytes]
>
> Linux version 2.6.24-rc5-autokern1 (root@elm3a23) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP PREEMPT Thu Dec 20 04:16:18 EST 2007
<snip>
> Node: 0, start_pfn: 0, end_pfn: 156
> Node: 0, start_pfn: 256, end_pfn: 917393
> Node: 0, start_pfn: 1048576, end_pfn: 2752512
Hrm, this indicates 1 node with holes?
> get_memcfg_from_srat: assigning address to rsdp
> RSD PTR v0 [IBM ]
> Begin SRAT table scan....
> CPU 0x00 in proximity domain 0x00
> CPU 0x02 in proximity domain 0x00
> CPU 0x10 in proximity domain 0x00
> CPU 0x12 in proximity domain 0x00
> Memory range 0x0 to 0xE0000 (type 0x0) in proximity domain 0x00 enabled
> Memory range 0x100000 to 0x120000 (type 0x0) in proximity domain 0x00 enabled
> CPU 0x20 in proximity domain 0x01
> CPU 0x22 in proximity domain 0x01
> CPU 0x30 in proximity domain 0x01
> CPU 0x32 in proximity domain 0x01
> Memory range 0x120000 to 0x2A0000 (type 0x0) in proximity domain 0x01 enabled
> acpi20_parse_srat: Entry length value is zero; can't parse any further!
But two proximity domains (NUMA nodes?) according to SRAT? And then we
get a parse error?
> pxm bitmap: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Number of logical nodes in system = 2
So we had 1 physical node above, but now we have 2 logical nodes?
> Number of memory chunks in system = 3
> chunk 0 nid 0 start_pfn 00000000 end_pfn 000e0000
> chunk 1 nid 0 start_pfn 00100000 end_pfn 00120000
> chunk 2 nid 1 start_pfn 00120000 end_pfn 002a0000
> Node: 0, start_pfn: 0, end_pfn: 1179648
> Node: 1, start_pfn: 1179648, end_pfn: 2752512
(side nit: why don't we always print in hex here?)
> Reserving 16384 pages of KVA for lmem_map of node 0
> Shrinking node 0 from 1179648 pages to 1163264 pages
> Reserving 22016 pages of KVA for lmem_map of node 1
> Shrinking node 1 from 2752512 pages to 2730496 pages
> Reserving total of 38400 pages for numa KVA remap
> kva_start_pfn ~ 190464 find_max_low_pfn() ~ 229376
> max_pfn = 2752512
> 9856MB HIGHMEM available.
> 896MB LOWMEM available.
> min_low_pfn = 1945, max_low_pfn = 229376, highstart_pfn = 229376
> Low memory ends at vaddr f8000000
> node 0 will remap to vaddr ee800000 - fc000000
> node 1 will remap to vaddr f2800000 - 01600000
And we have two nodes from here on out...
> High memory starts at vaddr f8000000
> found SMP MP-table at 0009c540
> Zone PFN ranges:
> DMA 0 -> 4096
> Normal 4096 -> 229376
> HighMem 229376 -> 2752512
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
> 0: 0 -> 917504
> 0: 1048576 -> 1163264
> 1: 1179648 -> 2730496
with holes as before.
<snip>
> Calibrating delay using timer specific routine.. 4002.61 BogoMIPS (lpj=8005239)
> ------------[ cut here ]------------
> kernel BUG at mm/slab.c:3320!
> invalid opcode: 0000 [#1] PREEMPT SMP
> Modules linked in:
>
> Pid: 0, comm: swapper Not tainted (2.6.24-rc5-autokern1 #1)
> EIP: 0060:[<c0181707>] EFLAGS: 00010046 CPU: 0
> EIP is at ____cache_alloc_node+0x1c/0x130
> EAX: ee4005c0 EBX: 00000000 ECX: 00000001 EDX: 000000d0
> ESI: 00000000 EDI: ee4005c0 EBP: c0408f74 ESP: c0408f54
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c0408000 task=c03d5d80 task.ti=c0408000)
> Stack: c03d5d80 c0408f6c c017ac36 00000001 000000d0 00000000 000000d0 ee4005c0
> c0408f88 c0181577 0001080c 00000246 ee4005c0 c0408fa8 c0181a97 c0408fb0
> c01395b9 000000d0 0001080c 00099800 c03dccec c0408fd0 c01395b9 c0408fd0
> Call Trace:
> [<c0105e23>] show_trace_log_lvl+0x19/0x2e
> [<c0105ee5>] show_stack_log_lvl+0x99/0xa1
> [<c010603f>] show_registers+0xb3/0x1e9
> [<c0106301>] die+0x11b/0x1fe
> [<c02fb654>] do_trap+0x8e/0xa8
> [<c01065cd>] do_invalid_op+0x88/0x92
> [<c02fb422>] error_code+0x72/0x78
> [<c0181577>] alternate_node_alloc+0x5b/0x60
> [<c0181a97>] kmem_cache_alloc+0x50/0x120
> [<c01395b9>] create_pid_cachep+0x4c/0xec
> [<c041ae65>] pidmap_init+0x2f/0x6e
> [<c040c715>] start_kernel+0x1ca/0x23e
> [<00000000>] 0x0
> =======================
> Code: ff eb 02 31 ff 89 f8 83 c4 10 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 53 83 ec 14 89 55 f0 89 4d ec 8b b4 88 88 02 00 00 85 f6 75 04 <0f> 0b eb fe e8 f3 ee ff ff 8d 46 24 89 45 e4 e8 23 97 17 00 8b
> EIP: [<c0181707>] ____cache_alloc_node+0x1c/0x130 SS:ESP 0068:c0408f54
> Kernel panic - not syncing: Attempted to kill the idle task!
> -- 0:conmux-control -- time-stamp -- Dec/20/07 2:00:36 --
> (bot:conmon-payload) disconnected
>
>
> dmidecode output for machine details
> ----------------------------------
<snip>
The DMI information seems to indicate also that there is only one node
(Node 1)?
I'll try and reproduce on the box and investigate further.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-01-04 0:33 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-20 10:05 [BUG] at mm/slab.c:3320 Aneesh Kumar K.V
2007-12-25 22:05 ` Andrew Morton
2007-12-25 22:05 ` Andrew Morton
2007-12-27 15:32 ` Aneesh Kumar K.V
2007-12-27 15:32 ` Aneesh Kumar K.V
2007-12-27 19:31 ` Christoph Lameter
2007-12-27 19:31 ` Christoph Lameter
[not found] ` <20071228051959.GA6385@skywalker>
[not found] ` <Pine.LNX.4.64.0801021227580.20331@schroedinger.engr.sgi.com>
2008-01-03 15:50 ` Aneesh Kumar K.V
2008-01-04 0:33 ` Nishanth Aravamudan [this message]
2008-01-07 1:23 ` KAMEZAWA Hiroyuki
2008-01-07 18:10 ` Christoph Lameter
2008-01-08 1:40 ` KAMEZAWA Hiroyuki
2008-01-08 5:38 ` Christoph Lameter
2008-01-08 7:11 ` Aneesh Kumar K.V
2008-01-09 6:50 ` Nishanth Aravamudan
2008-01-09 17:50 ` Christoph Lameter
2008-01-09 18:58 ` Aneesh Kumar K.V
2008-01-09 19:23 ` Christoph Lameter
2008-01-09 21:47 ` Nishanth Aravamudan
2008-01-09 21:51 ` Christoph Lameter
2008-01-09 22:13 ` Nishanth Aravamudan
2008-01-10 0:02 ` Christoph Lameter
2008-01-17 12:31 ` Pekka Enberg
2008-01-17 14:32 ` Christoph Lameter
2008-01-17 14:36 ` Pekka J Enberg
2008-01-17 15:05 ` Christoph Lameter
2008-01-17 15:25 ` Aneesh Kumar K.V
2008-01-17 16:58 ` Christoph Lameter
2008-01-17 17:42 ` Aneesh Kumar K.V
2008-01-17 21:40 ` Mel Gorman
2008-01-17 20:47 ` Pekka J Enberg
2008-01-20 0:58 ` Mel Gorman
2008-01-22 20:20 ` Christoph Lameter
2008-01-10 4:13 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080104003336.GA2594@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bob.picco@hp.com \
--cc=clameter@sgi.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.