From: Nishanth Aravamudan <nacc@us.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>,
linux-mm@kvack.org, lee.schermerhorn@hp.com, bob.picco@hp.com,
kamezawa.hiroyu@jp.fujitsu.com, mel@skynet.ie
Subject: Re: [BUG] at mm/slab.c:3320
Date: Thu, 3 Jan 2008 16:33:36 -0800 [thread overview]
Message-ID: <20080104003336.GA2594@us.ibm.com> (raw)
In-Reply-To: <20080103155046.GA7092@skywalker>
On 03.01.2008 [21:20:46 +0530], Aneesh Kumar K.V wrote:
> On Wed, Jan 02, 2008 at 12:32:42PM -0800, Christoph Lameter wrote:
> >
> > This occurred on a 32 bit NUMA platform? Guess NUMAQ?
Not NUMA-Q afaict, but 32-bit, yes. It's unclear what's going on with
this box, actually. Clearly the kernel detected NUMA; however the
listing in our testing grid does not indicate any NUMA nodes per sysfs,
I don't think. And in fact what the kernel detected doesn't necessarily
mesh with a normal NUMA system.
Does reverting this patch actually make the box boot? What was the last
kernel that worked on this box?
> > The dmesg that I saw was partial. Could you repost a full problem
> > description to linux-mm@kvack.org and cc the authors of memoryless node
> > support?
> >
> > Nishanth Aravamudan <nacc@us.ibm.com>
> > Lee Schermerhorn <lee.schermerhorn@hp.com>
> > Bob Picco <bob.picco@hp.com>
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Mel Gorman <mel@skynet.ie>
> > Christoph Lameter <clameter@sgi.com>
> >
> Full dmesg:
> ----------
> Booting 'autobench'
>
> root (hd0,0)
> Filesystem type is ext2fs, partition type 0x83
> kernel /boot/vmlinuz-autobench ro console=tty0 console=ttyS0,115200 autobench_a
> rgs: root=/dev/sda3 ABAT:1198144312
> [Linux-bzImage, setup=0x2800, size=0x1a08e8]
> initrd /boot/initrd-autobench.img
> [Linux-initrd @ 0x37ed8000, 0x117985 bytes]
>
> Linux version 2.6.24-rc5-autokern1 (root@elm3a23) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP PREEMPT Thu Dec 20 04:16:18 EST 2007
<snip>
> Node: 0, start_pfn: 0, end_pfn: 156
> Node: 0, start_pfn: 256, end_pfn: 917393
> Node: 0, start_pfn: 1048576, end_pfn: 2752512
Hrm, this indicates 1 node with holes?
> get_memcfg_from_srat: assigning address to rsdp
> RSD PTR v0 [IBM ]
> Begin SRAT table scan....
> CPU 0x00 in proximity domain 0x00
> CPU 0x02 in proximity domain 0x00
> CPU 0x10 in proximity domain 0x00
> CPU 0x12 in proximity domain 0x00
> Memory range 0x0 to 0xE0000 (type 0x0) in proximity domain 0x00 enabled
> Memory range 0x100000 to 0x120000 (type 0x0) in proximity domain 0x00 enabled
> CPU 0x20 in proximity domain 0x01
> CPU 0x22 in proximity domain 0x01
> CPU 0x30 in proximity domain 0x01
> CPU 0x32 in proximity domain 0x01
> Memory range 0x120000 to 0x2A0000 (type 0x0) in proximity domain 0x01 enabled
> acpi20_parse_srat: Entry length value is zero; can't parse any further!
But two proximity domains (NUMA nodes?) according to SRAT? And then we
get a parse error?
> pxm bitmap: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Number of logical nodes in system = 2
So we had 1 physical node above, but now we have 2 logical nodes?
> Number of memory chunks in system = 3
> chunk 0 nid 0 start_pfn 00000000 end_pfn 000e0000
> chunk 1 nid 0 start_pfn 00100000 end_pfn 00120000
> chunk 2 nid 1 start_pfn 00120000 end_pfn 002a0000
> Node: 0, start_pfn: 0, end_pfn: 1179648
> Node: 1, start_pfn: 1179648, end_pfn: 2752512
(side nit: why don't we always print in hex here?)
> Reserving 16384 pages of KVA for lmem_map of node 0
> Shrinking node 0 from 1179648 pages to 1163264 pages
> Reserving 22016 pages of KVA for lmem_map of node 1
> Shrinking node 1 from 2752512 pages to 2730496 pages
> Reserving total of 38400 pages for numa KVA remap
> kva_start_pfn ~ 190464 find_max_low_pfn() ~ 229376
> max_pfn = 2752512
> 9856MB HIGHMEM available.
> 896MB LOWMEM available.
> min_low_pfn = 1945, max_low_pfn = 229376, highstart_pfn = 229376
> Low memory ends at vaddr f8000000
> node 0 will remap to vaddr ee800000 - fc000000
> node 1 will remap to vaddr f2800000 - 01600000
And we have two nodes from here on out...
> High memory starts at vaddr f8000000
> found SMP MP-table at 0009c540
> Zone PFN ranges:
> DMA 0 -> 4096
> Normal 4096 -> 229376
> HighMem 229376 -> 2752512
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
> 0: 0 -> 917504
> 0: 1048576 -> 1163264
> 1: 1179648 -> 2730496
with holes as before.
<snip>
> Calibrating delay using timer specific routine.. 4002.61 BogoMIPS (lpj=8005239)
> ------------[ cut here ]------------
> kernel BUG at mm/slab.c:3320!
> invalid opcode: 0000 [#1] PREEMPT SMP
> Modules linked in:
>
> Pid: 0, comm: swapper Not tainted (2.6.24-rc5-autokern1 #1)
> EIP: 0060:[<c0181707>] EFLAGS: 00010046 CPU: 0
> EIP is at ____cache_alloc_node+0x1c/0x130
> EAX: ee4005c0 EBX: 00000000 ECX: 00000001 EDX: 000000d0
> ESI: 00000000 EDI: ee4005c0 EBP: c0408f74 ESP: c0408f54
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c0408000 task=c03d5d80 task.ti=c0408000)
> Stack: c03d5d80 c0408f6c c017ac36 00000001 000000d0 00000000 000000d0 ee4005c0
> c0408f88 c0181577 0001080c 00000246 ee4005c0 c0408fa8 c0181a97 c0408fb0
> c01395b9 000000d0 0001080c 00099800 c03dccec c0408fd0 c01395b9 c0408fd0
> Call Trace:
> [<c0105e23>] show_trace_log_lvl+0x19/0x2e
> [<c0105ee5>] show_stack_log_lvl+0x99/0xa1
> [<c010603f>] show_registers+0xb3/0x1e9
> [<c0106301>] die+0x11b/0x1fe
> [<c02fb654>] do_trap+0x8e/0xa8
> [<c01065cd>] do_invalid_op+0x88/0x92
> [<c02fb422>] error_code+0x72/0x78
> [<c0181577>] alternate_node_alloc+0x5b/0x60
> [<c0181a97>] kmem_cache_alloc+0x50/0x120
> [<c01395b9>] create_pid_cachep+0x4c/0xec
> [<c041ae65>] pidmap_init+0x2f/0x6e
> [<c040c715>] start_kernel+0x1ca/0x23e
> [<00000000>] 0x0
> =======================
> Code: ff eb 02 31 ff 89 f8 83 c4 10 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 53 83 ec 14 89 55 f0 89 4d ec 8b b4 88 88 02 00 00 85 f6 75 04 <0f> 0b eb fe e8 f3 ee ff ff 8d 46 24 89 45 e4 e8 23 97 17 00 8b
> EIP: [<c0181707>] ____cache_alloc_node+0x1c/0x130 SS:ESP 0068:c0408f54
> Kernel panic - not syncing: Attempted to kill the idle task!
> -- 0:conmux-control -- time-stamp -- Dec/20/07 2:00:36 --
> (bot:conmon-payload) disconnected
>
>
> dmidecode output for machine details
> ----------------------------------
<snip>
The DMI information seems to indicate also that there is only one node
(Node 1)?
I'll try and reproduce on the box and investigate further.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-01-04 0:33 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20071220100541.GA6953@skywalker>
2007-12-25 22:05 ` [BUG] at mm/slab.c:3320 Andrew Morton
2007-12-27 15:32 ` Aneesh Kumar K.V
2007-12-27 19:31 ` Christoph Lameter
[not found] ` <20071228051959.GA6385@skywalker>
[not found] ` <Pine.LNX.4.64.0801021227580.20331@schroedinger.engr.sgi.com>
2008-01-03 15:50 ` Aneesh Kumar K.V
2008-01-04 0:33 ` Nishanth Aravamudan [this message]
2008-01-07 1:23 ` KAMEZAWA Hiroyuki
2008-01-07 18:10 ` Christoph Lameter
2008-01-08 1:40 ` KAMEZAWA Hiroyuki
2008-01-08 5:38 ` Christoph Lameter
2008-01-08 7:11 ` Aneesh Kumar K.V
2008-01-09 6:50 ` Nishanth Aravamudan
2008-01-09 17:50 ` Christoph Lameter
2008-01-09 18:58 ` Aneesh Kumar K.V
2008-01-09 19:23 ` Christoph Lameter
2008-01-09 21:47 ` Nishanth Aravamudan
2008-01-09 21:51 ` Christoph Lameter
2008-01-09 22:13 ` Nishanth Aravamudan
2008-01-10 0:02 ` Christoph Lameter
2008-01-17 12:31 ` Pekka Enberg
2008-01-17 14:32 ` Christoph Lameter
2008-01-17 14:36 ` Pekka J Enberg
2008-01-17 15:05 ` Christoph Lameter
2008-01-17 15:25 ` Aneesh Kumar K.V
2008-01-17 16:58 ` Christoph Lameter
2008-01-17 17:42 ` Aneesh Kumar K.V
2008-01-17 21:40 ` Mel Gorman
2008-01-17 20:47 ` Pekka J Enberg
2008-01-20 0:58 ` Mel Gorman
2008-01-22 20:20 ` Christoph Lameter
2008-01-10 4:13 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080104003336.GA2594@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bob.picco@hp.com \
--cc=clameter@sgi.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).