Re: [BUG] at mm/slab.c:3320 - Nishanth Aravamudan

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nishanth Aravamudan <nacc@us.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	linux-mm@kvack.org, lee.schermerhorn@hp.com, bob.picco@hp.com,
	kamezawa.hiroyu@jp.fujitsu.com, mel@skynet.ie
Subject: Re: [BUG]  at mm/slab.c:3320
Date: Thu, 3 Jan 2008 16:33:36 -0800	[thread overview]
Message-ID: <20080104003336.GA2594@us.ibm.com> (raw)
In-Reply-To: <20080103155046.GA7092@skywalker>

On 03.01.2008 [21:20:46 +0530], Aneesh Kumar K.V wrote:
> On Wed, Jan 02, 2008 at 12:32:42PM -0800, Christoph Lameter wrote:
> > 
> > This occurred on a 32 bit NUMA platform? Guess NUMAQ? 

Not NUMA-Q afaict, but 32-bit, yes. It's unclear what's going on with
this box, actually. Clearly the kernel detected NUMA; however the
listing in our testing grid does not indicate any NUMA nodes per sysfs,
I don't think. And in fact what the kernel detected doesn't necessarily
mesh with a normal NUMA system.

Does reverting this patch actually make the box boot? What was the last
kernel that worked on this box?

> > The dmesg that I saw was partial. Could you repost a full problem 
> > description to linux-mm@kvack.org and cc the authors of memoryless node 
> > support?
> > 
> > Nishanth Aravamudan <nacc@us.ibm.com>
> > Lee Schermerhorn <lee.schermerhorn@hp.com>
> > Bob Picco <bob.picco@hp.com>
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Mel Gorman <mel@skynet.ie>
> > Christoph Lameter <clameter@sgi.com>
> > 
> Full dmesg:
> ----------
> Booting 'autobench'
> 
> root (hd0,0)
>  Filesystem type is ext2fs, partition type 0x83
> kernel /boot/vmlinuz-autobench ro console=tty0 console=ttyS0,115200 autobench_a
> rgs: root=/dev/sda3 ABAT:1198144312
>    [Linux-bzImage, setup=0x2800, size=0x1a08e8]
> initrd /boot/initrd-autobench.img
>    [Linux-initrd @ 0x37ed8000, 0x117985 bytes]
> 
> Linux version 2.6.24-rc5-autokern1 (root@elm3a23) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP PREEMPT Thu Dec 20 04:16:18 EST 2007

<snip>

> Node: 0, start_pfn: 0, end_pfn: 156
> Node: 0, start_pfn: 256, end_pfn: 917393
> Node: 0, start_pfn: 1048576, end_pfn: 2752512

Hrm, this indicates 1 node with holes?

> get_memcfg_from_srat: assigning address to rsdp
> RSD PTR  v0 [IBM   ]
> Begin SRAT table scan....
> CPU 0x00 in proximity domain 0x00
> CPU 0x02 in proximity domain 0x00
> CPU 0x10 in proximity domain 0x00
> CPU 0x12 in proximity domain 0x00
> Memory range 0x0 to 0xE0000 (type 0x0) in proximity domain 0x00 enabled
> Memory range 0x100000 to 0x120000 (type 0x0) in proximity domain 0x00 enabled
> CPU 0x20 in proximity domain 0x01
> CPU 0x22 in proximity domain 0x01
> CPU 0x30 in proximity domain 0x01
> CPU 0x32 in proximity domain 0x01
> Memory range 0x120000 to 0x2A0000 (type 0x0) in proximity domain 0x01 enabled
> acpi20_parse_srat: Entry length value is zero; can't parse any further!

But two proximity domains (NUMA nodes?) according to SRAT? And then we
get a parse error?

> pxm bitmap: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> Number of logical nodes in system = 2

So we had 1 physical node above, but now we have 2 logical nodes?

> Number of memory chunks in system = 3
> chunk 0 nid 0 start_pfn 00000000 end_pfn 000e0000
> chunk 1 nid 0 start_pfn 00100000 end_pfn 00120000
> chunk 2 nid 1 start_pfn 00120000 end_pfn 002a0000
> Node: 0, start_pfn: 0, end_pfn: 1179648
> Node: 1, start_pfn: 1179648, end_pfn: 2752512

(side nit: why don't we always print in hex here?)

> Reserving 16384 pages of KVA for lmem_map of node 0
> Shrinking node 0 from 1179648 pages to 1163264 pages
> Reserving 22016 pages of KVA for lmem_map of node 1
> Shrinking node 1 from 2752512 pages to 2730496 pages
> Reserving total of 38400 pages for numa KVA remap
> kva_start_pfn ~ 190464 find_max_low_pfn() ~ 229376
> max_pfn = 2752512
> 9856MB HIGHMEM available.
> 896MB LOWMEM available.
> min_low_pfn = 1945, max_low_pfn = 229376, highstart_pfn = 229376
> Low memory ends at vaddr f8000000
> node 0 will remap to vaddr ee800000 - fc000000
> node 1 will remap to vaddr f2800000 - 01600000

And we have two nodes from here on out...

> High memory starts at vaddr f8000000
> found SMP MP-table at 0009c540
> Zone PFN ranges:
>   DMA             0 ->     4096
>   Normal       4096 ->   229376
>   HighMem    229376 ->  2752512
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
>     0:        0 ->   917504
>     0:  1048576 ->  1163264
>     1:  1179648 ->  2730496

with holes as before.

<snip>

> Calibrating delay using timer specific routine.. 4002.61 BogoMIPS (lpj=8005239)
> ------------[ cut here ]------------
> kernel BUG at mm/slab.c:3320!
> invalid opcode: 0000 [#1] PREEMPT SMP 
> Modules linked in:
> 
> Pid: 0, comm: swapper Not tainted (2.6.24-rc5-autokern1 #1)
> EIP: 0060:[<c0181707>] EFLAGS: 00010046 CPU: 0
> EIP is at ____cache_alloc_node+0x1c/0x130
> EAX: ee4005c0 EBX: 00000000 ECX: 00000001 EDX: 000000d0
> ESI: 00000000 EDI: ee4005c0 EBP: c0408f74 ESP: c0408f54
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c0408000 task=c03d5d80 task.ti=c0408000)
> Stack: c03d5d80 c0408f6c c017ac36 00000001 000000d0 00000000 000000d0 ee4005c0 
>        c0408f88 c0181577 0001080c 00000246 ee4005c0 c0408fa8 c0181a97 c0408fb0 
>        c01395b9 000000d0 0001080c 00099800 c03dccec c0408fd0 c01395b9 c0408fd0 
> Call Trace:
>  [<c0105e23>] show_trace_log_lvl+0x19/0x2e
>  [<c0105ee5>] show_stack_log_lvl+0x99/0xa1
>  [<c010603f>] show_registers+0xb3/0x1e9
>  [<c0106301>] die+0x11b/0x1fe
>  [<c02fb654>] do_trap+0x8e/0xa8
>  [<c01065cd>] do_invalid_op+0x88/0x92
>  [<c02fb422>] error_code+0x72/0x78
>  [<c0181577>] alternate_node_alloc+0x5b/0x60
>  [<c0181a97>] kmem_cache_alloc+0x50/0x120
>  [<c01395b9>] create_pid_cachep+0x4c/0xec
>  [<c041ae65>] pidmap_init+0x2f/0x6e
>  [<c040c715>] start_kernel+0x1ca/0x23e
>  [<00000000>] 0x0
>  =======================
> Code: ff eb 02 31 ff 89 f8 83 c4 10 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 53 83 ec 14 89 55 f0 89 4d ec 8b b4 88 88 02 00 00 85 f6 75 04 <0f> 0b eb fe e8 f3 ee ff ff 8d 46 24 89 45 e4 e8 23 97 17 00 8b 
> EIP: [<c0181707>] ____cache_alloc_node+0x1c/0x130 SS:ESP 0068:c0408f54
> Kernel panic - not syncing: Attempted to kill the idle task!
> -- 0:conmux-control -- time-stamp -- Dec/20/07  2:00:36 --
> (bot:conmon-payload) disconnected
> 
> 
> dmidecode output for machine details
> ----------------------------------

<snip>

The DMI information seems to indicate also that there is only one node
(Node 1)?

I'll try and reproduce on the box and investigate further.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-01-04  0:33 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20071220100541.GA6953@skywalker>
2007-12-25 22:05 ` [BUG] at mm/slab.c:3320 Andrew Morton
2007-12-27 15:32   ` Aneesh Kumar K.V
2007-12-27 19:31     ` Christoph Lameter
     [not found]       ` <20071228051959.GA6385@skywalker>
     [not found]         ` <Pine.LNX.4.64.0801021227580.20331@schroedinger.engr.sgi.com>
2008-01-03 15:50           ` Aneesh Kumar K.V
2008-01-04  0:33             ` Nishanth Aravamudan [this message]
2008-01-07  1:23             ` KAMEZAWA Hiroyuki
2008-01-07 18:10               ` Christoph Lameter
2008-01-08  1:40                 ` KAMEZAWA Hiroyuki
2008-01-08  5:38                   ` Christoph Lameter
2008-01-08  7:11                     ` Aneesh Kumar K.V
2008-01-09  6:50                     ` Nishanth Aravamudan
2008-01-09 17:50                       ` Christoph Lameter
2008-01-09 18:58                         ` Aneesh Kumar K.V
2008-01-09 19:23                           ` Christoph Lameter
2008-01-09 21:47                             ` Nishanth Aravamudan
2008-01-09 21:51                               ` Christoph Lameter
2008-01-09 22:13                                 ` Nishanth Aravamudan
2008-01-10  0:02                                   ` Christoph Lameter
2008-01-17 12:31                                     ` Pekka Enberg
2008-01-17 14:32                                       ` Christoph Lameter
2008-01-17 14:36                                         ` Pekka J Enberg
2008-01-17 15:05                                           ` Christoph Lameter
2008-01-17 15:25                                             ` Aneesh Kumar K.V
2008-01-17 16:58                                               ` Christoph Lameter
2008-01-17 17:42                                                 ` Aneesh Kumar K.V
2008-01-17 21:40                                                 ` Mel Gorman
2008-01-17 20:47                                         ` Pekka J Enberg
2008-01-20  0:58                                       ` Mel Gorman
2008-01-22 20:20                                         ` Christoph Lameter
2008-01-10  4:13                             ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080104003336.GA2594@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bob.picco@hp.com \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@skynet.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).