Re: 2.6.18-rc7-mm1 -- ppc64 crash in slab_node ??

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andy Whitcroft <apw@shadowen.org>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Christoph Lameter <clameter@engr.sgi.com>
Subject: Re: 2.6.18-rc7-mm1 -- ppc64 crash in slab_node ??
Date: Thu, 21 Sep 2006 19:03:36 +0100	[thread overview]
Message-ID: <4512D3F8.5030307@shadowen.org> (raw)
In-Reply-To: <20060921102823.628a2a74.akpm@osdl.org>

Andrew Morton wrote:
> On Thu, 21 Sep 2006 14:11:48 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
>> Hmmm seeing this on a ppc64 lpar.
>>
>> PID hash table entries: 4096 (order: 12, 32768 bytes)
>> Console: colour dummy device 80x25
>> Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
>> Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
>> freeing bootmem node 0
>> freeing bootmem node 1
>> Memory: 2042288k/2097152k available (5752k kernel code, 55392k reserved,
>> 1456k data, 875k bss, 252k init)
>> Unable to handle kernel paging request for data at address 0x00000004
>> Faulting instruction address: 0xc0000000000bc830
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> SMP NR_CPUS=128 NUMA
>> Modules linked in:
>> NIP: C0000000000BC830 LR: C0000000000C7DF4 CTR: 0000000000000000
>> REGS: c00000000070f990 TRAP: 0300   Not tainted  (2.6.18-rc7-mm1-autokern1)
>> MSR: 8000000000001032 <ME,IR,DR>  CR: 24004022  XER: 0000000B
>> DAR: 0000000000000004, DSISR: 0000000040000000
>> TASK = c0000000005c0900[0] 'swapper' THREAD: c00000000070c000 CPU: 0
>> GPR00: C0000000000C80DC C00000000070FC10 C00000000070B1A0 0000000000000000
>> GPR04: 00000000000000D0 0000000000000000 0000000000000000 0000000000000042
>> GPR08: 0000000000000000 C0000000005C0900 0000000000000000 C00000007FFF3800
>> GPR12: 0000000024004022 C0000000005C1480 0000000000000000 0000000000000000
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000001C00000
>> GPR20: 0000000000000000 0000000000000000 0000000000141000 C0000000004DA2D0
>> GPR24: 000000000199FB40 0000000000000000 0000000000042000 C0000000005F31A8
>> GPR28: C0000000005F31A8 C0000000007416E8 C0000000005FCC60 00000000000000D0
>> NIP [C0000000000BC830] .slab_node+0x10/0x78
>> LR [C0000000000C7DF4] .fallback_alloc+0x3c/0x100
>> Call Trace:
>> [C00000000070FC10] [8000000000001032] 0x8000000000001032 (unreliable)
>> [C00000000070FCB0] [C0000000000C80DC] .kmem_cache_zalloc+0x128/0x150
>> [C00000000070FD50] [C0000000000C90BC] .kmem_cache_create+0x2a0/0x6ac
>> [C00000000070FE30] [C00000000057BF90] .kmem_cache_init+0x1b4/0x4f8
>> [C00000000070FEF0] [C00000000055F7BC] .start_kernel+0x214/0x33c
>> [C00000000070FF90] [C0000000000084F4] .start_here_common+0x50/0x5c
>> Instruction dump:
>> 7fc3f378 60000000 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020
>> fbc1fff0 ebc2ce20 60000000 60000000 <a8030004> 2f800002 419e0038 2c800001
>>  <0>Kernel panic - not syncing: Attempted to kill the idle task!
>>
>> Given all the problems with -mm1 I'm not sure how hard to search for this.
>>
> 
> I guess the below will fix it.  But Christoph's machine would have oopsed
> too, if it had called fallback_alloc() this early.  So presumably it did
> not.  But yours does.  I wonder why?

Thanks I'll push it into the testing system and see what happens.

The following at least feels suspicious to my mind.  This appears to say
that this machine has most of its memory in node 1.  I am pretty sure
that this machine is infact a single node lpar and shouldn't be numa at all.

early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->    40960
    1:    40960 ->   524288

If I am doing the math right this machine only has 32Mb in node 0.
Yeah, according to the system we have one node of 32Mb with both CPU's
in it, and another node with no CPUS's with the rest of its 2Gb of ram.

# cat /sys/devices/system/node/node*/*
00000000,00000000,00000000,00000003
10 20

Node 0 MemTotal:        32768 kB
[...]
00000000,00000000,00000000,00000000
20 10

Node 1 MemTotal:      2064384 kB
[...]

I'll have a look at it tommorrow and see if I can figure out whats wrong
with the layout.

:/

-apw

next prev parent reply	other threads:[~2006-09-21 18:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-19  8:28 2.6.18-rc7-mm1 Andrew Morton
2006-09-19 13:08 ` 2.6.18-rc7-mm1 Olivier Galibert
2006-09-19 14:22   ` 2.6.18-rc7-mm1 Greg KH
2006-09-19 14:21 ` 2.6.18-rc7-mm1 Greg KH
2006-09-19 16:36   ` 2.6.18-rc7-mm1 Andrew Morton
2006-09-19 18:36     ` 2.6.18-rc7-mm1 Greg KH
2006-09-19 14:45 ` 2.6.18-rc7-mm1 Martin J. Bligh
2006-09-19 16:31   ` 2.6.18-rc7-mm1 Andrew Morton
2006-09-19 16:57     ` 2.6.18-rc7-mm1 Olaf Hering
2006-09-19 17:00     ` 2.6.18-rc7-mm1 Martin Bligh
2006-09-21 12:55     ` 2.6.18-rc7-mm1 Andy Whitcroft
2006-10-07 20:26       ` 2.6.18-rc7-mm1 Greg KH
2006-10-09 12:31         ` 2.6.18-rc7-mm1 Andy Whitcroft
2006-10-09 16:08           ` 2.6.18-rc7-mm1 Greg KH
2006-09-19 17:39 ` [-mm patch] missing class_dev to dev conversions Frederik Deweerdt
2006-10-04  5:10   ` Greg KH
2006-09-19 20:25 ` 2.6.18-rc7-mm1 Rafael J. Wysocki
2006-09-19 20:36   ` 2.6.18-rc7-mm1 Andrew Morton
2006-09-19 21:30     ` 2.6.18-rc7-mm1: networking breakage on HPC nx6325 + SUSE 10.1 Rafael J. Wysocki
2006-09-19 22:06       ` Rafael J. Wysocki
2006-09-19 22:06         ` David Miller
2006-09-19 22:30           ` Greg KH
2006-09-19 22:56             ` Rafael J. Wysocki
2006-09-20  2:28               ` Greg KH
2006-09-20  1:31             ` Dmitry Torokhov
2006-09-20  1:03       ` Valdis.Kletnieks
2006-09-20 14:23     ` 2.6.18-rc7-mm1 Mike Galbraith
2006-09-20 13:18       ` 2.6.18-rc7-mm1 Rafael J. Wysocki
2006-09-21  9:44       ` 2.6.18-rc7-mm1 Andi Kleen
2006-09-21 13:11 ` 2.6.18-rc7-mm1 -- ppc64 crash in slab_node ?? Andy Whitcroft
2006-09-21 17:28   ` Andrew Morton
2006-09-21 18:02     ` Christoph Lameter
2006-09-21 18:07       ` Andy Whitcroft
2006-09-21 18:03     ` Andy Whitcroft [this message]
2006-09-21 13:40 ` 2.6.18-rc7-mm1 Ian Kent
2006-09-21 21:07 ` 2.6.18-rc7-mm1 Mark Haverkamp
2006-09-21 22:19   ` 2.6.18-rc7-mm1 Arnd Bergmann
2006-09-22 11:47 ` [PATCH -mm] x86_64 mm generic getcpu syscall fix Cedric Le Goater
2006-09-22 12:19   ` Andi Kleen
2006-09-23 11:03 ` 2.6.18-rc7-mm1 - gregkh-driver-pcmcia-device.patch breaks orinoco card Valdis.Kletnieks
2006-09-27  3:29   ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4512D3F8.5030307@shadowen.org \
    --to=apw@shadowen.org \
    --cc=akpm@osdl.org \
    --cc=clameter@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox