crash in kmem_cache

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* crash in kmem_cache_init
@ 2008-01-15 15:09 Olaf Hering
  2008-01-15 15:58 ` Olaf Hering
  2008-01-17 12:14 ` Pekka Enberg
  0 siblings, 2 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-15 15:09 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev


Current linus tree crashes in kmem_cache_init, as shown below. The
system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
Firmware is 240_332, 2.6.23 boots ok with the same config.

There is a series of mm related patches in 2.6.24-rc1:
commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

==> .git/BISECT_LOG <==
git-bisect start
# good: [0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d] Linux 2.6.23
git-bisect good 0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d
# bad: [cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2] Linux 2.6.24-rc1
git-bisect bad cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2
# good: [9ac52315d4cf5f561f36dabaf0720c00d3553162] sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
git-bisect good 9ac52315d4cf5f561f36dabaf0720c00d3553162
# bad: [b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0] add consts where appropriate in fs/nls/Kconfig fs/nls/Makefile fs/nls/nls_ascii.c fs/nls/nls_base.c fs/nls/nls_cp1250.c fs/nls/nls_cp1251.c fs/nls/nls_cp1255.c fs/nls/nls_cp437.c fs/nls/nls_cp737.c fs/nls/nls_cp775.c fs/nls/nls_cp850.c fs/nls/nls_cp852.c fs/nls/nls_cp855.c fs/nls/nls_cp857.c fs/nls/nls_cp860.c fs/nls/nls_cp861.c fs/nls/nls_cp862.c fs/nls/nls_cp863.c fs/nls/nls_cp864.c fs/nls/nls_cp865.c fs/nls/nls_cp866.c fs/nls/nls_cp869.c fs/nls/nls_cp874.c fs/nls/nls_cp932.c fs/nls/nls_cp936.c fs/nls/nls_cp949.c fs/nls/nls_cp950.c fs/nls/nls_euc-jp.c fs/nls/nls_iso8859-1.c fs/nls/nls_iso8859-13.c fs/nls/nls_iso8859-14.c fs/nls/nls_iso8859-15.c fs/nls/nls_iso8859-2.c fs/nls/nls_iso8859-3.c fs/nls/nls_iso8859-4.c fs/nls/nls_iso8859-5.c fs/nls/nls_iso8859-6.c fs/nls/nls_iso8859-7.c fs/nls/nls_iso8859-9.c fs/nls/nls_koi8-r.c fs/nls/nls_koi8-ru.c fs/nls/nls_koi8-u.c fs/nls/nls_utf8.c
git-bisect bad b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0
# bad: [78a26e25ce4837a03ac3b6c32cdae1958e547639] uml: separate timer initialization
git-bisect bad 78a26e25ce4837a03ac3b6c32cdae1958e547639
# good: [4acad72ded8e3f0211bd2a762e23c28229c61a51] [IPV6]: Consolidate the ip6_pol_route_(input|output) pair
git-bisect good 4acad72ded8e3f0211bd2a762e23c28229c61a51
# good: [64da82efae0d7b5f7c478021840fd329f76d965d] Add support for PCMCIA card Sierra WIreless AC850
git-bisect good 64da82efae0d7b5f7c478021840fd329f76d965d
# bad: [37b07e4163f7306aa735a6e250e8d22293e5b8de] memoryless nodes: fixup uses of node_online_map in generic code
git-bisect bad 37b07e4163f7306aa735a6e250e8d22293e5b8de
# good: [64649a58919e66ec21792dbb6c48cb3da22cbd7f] mm: trim more holes
git-bisect good 64649a58919e66ec21792dbb6c48cb3da22cbd7f
# good: [fb53b3094888be0cf8ddf052277654268904bdf5] smbfs: convert to new aops
git-bisect good fb53b3094888be0cf8ddf052277654268904bdf5
# good: [13808910713a98cc1159291e62cdfec92cc94d05] Memoryless nodes: Generic management of nodemasks for various purposes




 .............
Please wait, loading kernel...
Allocated 00a00000 bytes for kernel @ 00200000
   Elf64 kernel loaded...
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line: panic=1 debug xmon=on
memory layout at init:
  alloc_bottom : 0000000000ac1000
  alloc_top    : 0000000010000000
  alloc_top_hi : 00000000da000000
  rmo_top      : 0000000010000000
  ram_top      : 00000000da000000
Looking for displays
found display   : /pci@800000020000002/pci@2/pci@1/display@0, opening ... done
instantiating rtas at 0x000000000f6a1000 ... done
0000000000000000 : boot cpu     0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
0000000000000004 : starting cpu hw idx 0000000000000004... done
0000000000000006 : starting cpu hw idx 0000000000000006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4
Device tree struct  0x0000000000cc4000 -> 0x0000000000cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #2 SMP Tue Jan 15 14:23:02 CET 2008
-----------------------------------------------------
ppc64_pft_size                = 0x1c
physicalMemorySize            = 0xda000000
htab_hash_mask                = 0x1fffff
-----------------------------------------------------
Linux version 2.6.24-rc7-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #2 SMP Tue Jan 15 14:23:02 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA             0 ->   892928
  Normal     892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    1:        0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: panic=1 debug xmon=on
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency   = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
Unable to handle kernel paging request for data at address 0x00000040
Faulting instruction address: 0xc000000000437470
cpu 0x0: Vector: 300 (Data Access) at [c00000000075b830]
    pc: c000000000437470: ._spin_lock+0x20/0x88
    lr: c0000000000f78a8: .cache_grow+0x7c/0x338
    sp: c00000000075bab0
   msr: 8000000000009032
   dar: 40
 dsisr: 40000000
  current = 0xc000000000665a50
  paca    = 0xc000000000666380
    pid   = 0, comm = swapper
enter ? for help
[c00000000075bb30] c0000000000f78a8 .cache_grow+0x7c/0x338
[c00000000075bbf0] c0000000000f7d04 .fallback_alloc+0x1a0/0x1f4
[c00000000075bca0] c0000000000f8544 .kmem_cache_alloc+0xec/0x150
[c00000000075bd40] c0000000000fb1c0 .kmem_cache_create+0x208/0x478
[c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
[c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
0:mon> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-15 15:09 crash in kmem_cache_init Olaf Hering
@ 2008-01-15 15:58 ` Olaf Hering
  2008-01-17 12:14 ` Pekka Enberg
  1 sibling, 0 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-15 15:58 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev

On Tue, Jan 15, Olaf Hering wrote:

> 
> Current linus tree crashes in kmem_cache_init, as shown below. The
> system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
> Firmware is 240_332, 2.6.23 boots ok with the same config.
> 
> There is a series of mm related patches in 2.6.24-rc1:
> commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

2.6.24-rc6-mm1-ppc64 boots past this point, but crashes later.
Likely unrelated to the kmem_cache_init bug:

...
matroxfb: 640x480x8bpp (virtual: 640x26214)
matroxfb: framebuffer at 0x40178000000, mapped to 0xd000080080080000, size 33554432
Console: switching to colour frame buffer device 80x30
fb0: MATROX frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ehci_hcd 0000:c8:01.2: EHCI Host Controller
ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:c8:01.2: irq 85, io mem 0x400a0002000
ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
Unable to handle kernel paging request for data at address 0x00000050
Faulting instruction address: 0xc0000000000fa1c4
cpu 0x7: Vector: 300 (Data Access) at [c0000000d82e7a70]
    pc: c0000000000fa1c4: .cache_reap+0x74/0x29c
    lr: c0000000000fa198: .cache_reap+0x48/0x29c
    sp: c0000000d82e7cf0
   msr: 8000000000009032
   dar: 50
 dsisr: 40000000
  current = 0xc0000000d82d85c0
  paca    = 0xc000000000668e00
    pid   = 27, comm = events/7
enter ? for help
[c0000000d82e7cf0] c00000000070be98 vmstat_update+0x0/0x18 (unreliable)
[c0000000d82e7da0] c000000000092994 .run_workqueue+0x120/0x210
[c0000000d82e7e40] c000000000093bb8 .worker_thread+0xcc/0xf0
[c0000000d82e7f00] c000000000097b70 .kthread+0x78/0xc4
[c0000000d82e7f90] c00000000002ab74 .kernel_thread+0x4c/0x68
7:mon> 
...

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-15 15:09 crash in kmem_cache_init Olaf Hering
  2008-01-15 15:58 ` Olaf Hering
@ 2008-01-17 12:14 ` Pekka Enberg
  2008-01-17 14:30   ` Christoph Lameter
  1 sibling, 1 reply; 61+ messages in thread
From: Pekka Enberg @ 2008-01-17 12:14 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Linux MM, linuxppc-dev, linux-kernel, clameter

Hi Olaf,

[Adding Christoph as cc.]

On Jan 15, 2008 5:09 PM, Olaf Hering <olaf@aepfle.de> wrote:
> Current linus tree crashes in kmem_cache_init, as shown below. The
> system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
> Firmware is 240_332, 2.6.23 boots ok with the same config.
>
> There is a series of mm related patches in 2.6.24-rc1:
> commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

So that's the "Memoryless nodes: Slab support" patch that I think
cause a similar oops while ago.

> Unable to handle kernel paging request for data at address 0x00000040
> Faulting instruction address: 0xc000000000437470
> cpu 0x0: Vector: 300 (Data Access) at [c00000000075b830]
>     pc: c000000000437470: ._spin_lock+0x20/0x88
>     lr: c0000000000f78a8: .cache_grow+0x7c/0x338
>     sp: c00000000075bab0
>    msr: 8000000000009032
>    dar: 40
>  dsisr: 40000000
>   current = 0xc000000000665a50
>   paca    = 0xc000000000666380
>     pid   = 0, comm = swapper
> enter ? for help
> [c00000000075bb30] c0000000000f78a8 .cache_grow+0x7c/0x338
> [c00000000075bbf0] c0000000000f7d04 .fallback_alloc+0x1a0/0x1f4
> [c00000000075bca0] c0000000000f8544 .kmem_cache_alloc+0xec/0x150
> [c00000000075bd40] c0000000000fb1c0 .kmem_cache_create+0x208/0x478
> [c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4
> [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
> [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0

Looks similar to the one discussed on linux-mm ("[BUG] at
mm/slab.c:3320" thread). Christoph?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 12:14 ` Pekka Enberg
@ 2008-01-17 14:30   ` Christoph Lameter
  2008-01-17 18:12     ` Olaf Hering
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-17 14:30 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linuxppc-dev, Olaf Hering, linux-kernel, Linux MM

On Thu, 17 Jan 2008, Pekka Enberg wrote:

> Looks similar to the one discussed on linux-mm ("[BUG] at
> mm/slab.c:3320" thread). Christoph?

Right. Try the latest version of the patch to fix it:

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2008-01-03 12:26:42.000000000 -0800
+++ linux-2.6/mm/slab.c	2008-01-09 15:59:49.000000000 -0800
@@ -2977,7 +2977,10 @@ retry:
 	}
 	l3 = cachep->nodelists[node];
 
-	BUG_ON(ac->avail > 0 || !l3);
+	if (!l3)
+		return NULL;
+
+	BUG_ON(ac->avail > 0);
 	spin_lock(&l3->list_lock);
 
 	/* See if we can refill from the shared array */
@@ -3224,7 +3227,7 @@ static void *alternate_node_alloc(struct
 		nid_alloc = cpuset_mem_spread_node();
 	else if (current->mempolicy)
 		nid_alloc = slab_node(current->mempolicy);
-	if (nid_alloc != nid_here)
+	if (nid_alloc != nid_here && node_state(nid_alloc, N_NORMAL_MEMORY))
 		return ____cache_alloc_node(cachep, flags, nid_alloc);
 	return NULL;
 }
@@ -3439,8 +3442,14 @@ __do_cache_alloc(struct kmem_cache *cach
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
- 	if (!objp)
- 		objp = ____cache_alloc_node(cache, flags, numa_node_id());
+ 	if (!objp) {
+		int node_id = numa_node_id();
+		if (likely(cache->nodelists[node_id])) /* fast path */
+ 			objp = ____cache_alloc_node(cache, flags, node_id);
+		else /* this function can do good fallback */
+			objp = __cache_alloc_node(cache, flags, node_id,
+					__builtin_return_address(0));
+	}
 
   out:
 	return objp;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 14:30   ` Christoph Lameter
@ 2008-01-17 18:12     ` Olaf Hering
  2008-01-17 18:58       ` Christoph Lameter
  2008-01-17 19:03       ` Christoph Lameter
  0 siblings, 2 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-17 18:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, Jan 17, Christoph Lameter wrote:

> On Thu, 17 Jan 2008, Pekka Enberg wrote:
> 
> > Looks similar to the one discussed on linux-mm ("[BUG] at
> > mm/slab.c:3320" thread). Christoph?
> 
> Right. Try the latest version of the patch to fix it:

The patch does not help.
 
> Index: linux-2.6/mm/slab.c
> ===================================================================
> --- linux-2.6.orig/mm/slab.c	2008-01-03 12:26:42.000000000 -0800
> +++ linux-2.6/mm/slab.c	2008-01-09 15:59:49.000000000 -0800
> @@ -2977,7 +2977,10 @@ retry:
>  	}
>  	l3 = cachep->nodelists[node];
>  
> -	BUG_ON(ac->avail > 0 || !l3);
> +	if (!l3)
> +		return NULL;
> +
> +	BUG_ON(ac->avail > 0);
>  	spin_lock(&l3->list_lock);
>  
>  	/* See if we can refill from the shared array */

Is this hunk supposed to go into cache_grow()? There is no NULL check
for l3.

But if I do that, it does not help:

freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3
Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32'

Rebooting in 1 seconds..    

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 18:12     ` Olaf Hering
@ 2008-01-17 18:58       ` Christoph Lameter
  2008-01-17 19:54         ` Olaf Hering
  2008-01-17 21:15         ` Olaf Hering
  2008-01-17 19:03       ` Christoph Lameter
  1 sibling, 2 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-17 18:58 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, 17 Jan 2008, Olaf Hering wrote:

> The patch does not help.

Duh. We need to know more about the problem.

> > --- linux-2.6.orig/mm/slab.c	2008-01-03 12:26:42.000000000 -0800
> > +++ linux-2.6/mm/slab.c	2008-01-09 15:59:49.000000000 -0800
> > @@ -2977,7 +2977,10 @@ retry:
> >  	}
> >  	l3 = cachep->nodelists[node];
> >  
> > -	BUG_ON(ac->avail > 0 || !l3);
> > +	if (!l3)
> > +		return NULL;
> > +
> > +	BUG_ON(ac->avail > 0);
> >  	spin_lock(&l3->list_lock);
> >  
> >  	/* See if we can refill from the shared array */
> 
> Is this hsupposed to go into cache_grow()? There is no NULL check
> for l3.

No its for cache_alloc_refill. cache_grow should only be called for
nodes that have memory. l3 is always used before cache_grow is called.

> freeing bootmem node 1
> Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
> cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3

Is there more backtrace information? What function called cache_grow?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 18:58       ` Christoph Lameter
@ 2008-01-17 19:54         ` Olaf Hering
  2008-01-17 20:20           ` Olaf Hering
  2008-01-17 21:15         ` Olaf Hering
  1 sibling, 1 reply; 61+ messages in thread
From: Olaf Hering @ 2008-01-17 19:54 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

[-- Attachment #1: Type: text/plain, Size: 4373 bytes --]

On Thu, Jan 17, Christoph Lameter wrote:

> > freeing bootmem node 1
> > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
> > cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3
> 
> Is there more backtrace information? What function called cache_grow?

I just put a 'if (!l3) return 0;' into cache_grow, the backtrace is the
one from the initial report.
Reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 does not change
anything.


Since -mm boots further, what patch should I try?

The kernel boots on a different p570.
See attached dmesg. huckleberry boots, cranberry crashes.


--- huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt	2008-01-17 20:48:18.510309000 +0100
+++ cranberry.suse.de-2.6.16.57-0.5-ppc64.txt	2008-01-17 20:48:09.425402000 +0100
@@ -1,56 +1,55 @@
 Page orders: linear mapping = 24, others = 12
-Found initrd at 0xc000000002700000:0xc000000002a93000
+Found initrd at 0xc000000001300000:0xc0000000016e6c1e
 Partition configured for 8 cpus.
 Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
 -----------------------------------------------------
-ppc64_pft_size                = 0x1b
+ppc64_pft_size                = 0x1c
 ppc64_interrupt_controller    = 0x2
 platform                      = 0x101
-physicalMemorySize            = 0x158000000
+physicalMemorySize            = 0xda000000
 ppc64_caches.dcache_line_size = 0x80
 ppc64_caches.icache_line_size = 0x80
 htab_address                  = 0x0000000000000000
-htab_hash_mask                = 0xfffff
+htab_hash_mask                = 0x1fffff
 -----------------------------------------------------
 [boot]0100 MM Init
 [boot]0100 MM Init Done
 Linux version 2.6.16.57-0.5-ppc64 (geeko@buildhost) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007
 [boot]0012 Setup Arch
-Node 0 Memory: 0x0-0xb0000000
-Node 1 Memory: 0xb0000000-0x158000000
+Node 0 Memory:
+Node 1 Memory: 0x0-0xda000000
 EEH: PCI Enhanced I/O Error Handling Enabled
-PPC64 nvram contains 7168 bytes
+PPC64 nvram contains 8192 bytes
 Using dedicated idle loop
-On node 0 totalpages: 720896
-  DMA zone: 720896 pages, LIFO batch:31
+On node 0 totalpages: 0
+  DMA zone: 0 pages, LIFO batch:0
   DMA32 zone: 0 pages, LIFO batch:0
   Normal zone: 0 pages, LIFO batch:0
   HighMem zone: 0 pages, LIFO batch:0
-On node 1 totalpages: 688128
-  DMA zone: 688128 pages, LIFO batch:31
+On node 1 totalpages: 892928
+  DMA zone: 892928 pages, LIFO batch:31
   DMA32 zone: 0 pages, LIFO batch:0
   Normal zone: 0 pages, LIFO batch:0
   HighMem zone: 0 pages, LIFO batch:0
 [boot]0015 Setup Done
 Built 2 zonelists
-Kernel command line: root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW500007445Q010-part5  xmon=on sysrq=1 quiet 
+Kernel command line: root=/dev/system/root  xmon=on sysrq=1 quiet 
 [boot]0020 XICS Init
 xics: no ISA interrupt controller
 [boot]0021 XICS Done
 PID hash table entries: 4096 (order: 12, 131072 bytes)
-time_init: decrementer frequency = 207.052000 MHz
-time_init: processor frequency   = 1654.344000 MHz
+time_init: decrementer frequency = 275.070000 MHz
+time_init: processor frequency   = 2197.800000 MHz
 Console: colour dummy device 80x25
-Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
-Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
-freeing bootmem node 0
+Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
+Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
 freeing bootmem node 1
-Memory: 5524952k/5636096k available (4464k kernel code, 111144k reserved, 1992k data, 836k bss, 264k init)
-Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480)
+Memory: 3494648k/3571712k available (4464k kernel code, 77064k reserved, 1992k data, 836k bss, 264k init)
+Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320)
 Security Framework v1.0.0 initialized
 Mount-cache hash table entries: 256
 checking if image is initramfs... it is
-Freeing initrd memory: 3660k freed
+Freeing initrd memory: 3995k freed
 Processor 1 found.
 Processor 2 found.
 Processor 3 found.
@@ -61,7 +60,7 @@ Processor 7 found.
 Brought up 8 CPUs
 Node 0 CPUs: 0-3
 Node 1 CPUs: 4-7
-migration_cost=41,0,4308
+migration_cost=38,0,3225
 NET: Registered protocol family 16
 PCI: Probing PCI hardware
 IOMMU table initialized, virtual merging enabled

[-- Attachment #2: huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt --]
[-- Type: text/plain, Size: 16674 bytes --]

Page orders: linear mapping = 24, others = 12
Found initrd at 0xc000000002700000:0xc000000002a93000
Partition configured for 8 cpus.
Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
-----------------------------------------------------
ppc64_pft_size                = 0x1b
ppc64_interrupt_controller    = 0x2
platform                      = 0x101
physicalMemorySize            = 0x158000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address                  = 0x0000000000000000
htab_hash_mask                = 0xfffff
-----------------------------------------------------
[boot]0100 MM Init
[boot]0100 MM Init Done
Linux version 2.6.16.57-0.5-ppc64 (geeko@buildhost) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007
[boot]0012 Setup Arch
Node 0 Memory: 0x0-0xb0000000
Node 1 Memory: 0xb0000000-0x158000000
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Using dedicated idle loop
On node 0 totalpages: 720896
  DMA zone: 720896 pages, LIFO batch:31
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
On node 1 totalpages: 688128
  DMA zone: 688128 pages, LIFO batch:31
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
[boot]0015 Setup Done
Built 2 zonelists
Kernel command line: root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW500007445Q010-part5  xmon=on sysrq=1 quiet 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 131072 bytes)
time_init: decrementer frequency = 207.052000 MHz
time_init: processor frequency   = 1654.344000 MHz
Console: colour dummy device 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 5524952k/5636096k available (4464k kernel code, 111144k reserved, 1992k data, 836k bss, 264k init)
Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
checking if image is initramfs... it is
Freeing initrd memory: 3660k freed
Processor 1 found.
Processor 2 found.
Processor 3 found.
Processor 4 found.
Processor 5 found.
Processor 6 found.
Processor 7 found.
Brought up 8 CPUs
Node 0 CPUs: 0-3
Node 1 CPUs: 4-7
migration_cost=41,0,4308
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging enabled
mapping IO 3fe00100000 -> d000080000000000, size: 100000
mapping IO 3fe00600000 -> d000080000100000, size: 100000
mapping IO 3fe00300000 -> d000080000200000, size: 100000
PCI: Probing PCI hardware done
Registering pmac pic with sysfs...
usbcore: registered new driver usbfs
usbcore: registered new driver hub
IBM eBus Device Driver
RTAS daemon started
RTAS: event: 109, Type: Platform Error, Severity: 2
probe_bus_pseries: processing c000000157ff7058
probe_bus_pseries: processing c000000157ff7228
probe_bus_pseries: processing c000000157ff7378
probe_bus_pseries: processing c000000157ff74e8
probe_bus_pseries: processing c000000157ff7658
audit: initializing netlink socket (disabled)
audit(1200599258.200:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
rpaphp: Slot [0001:00:02.0](PCI location=U7879.001.DQD02EK-P1-C3) registered
rpaphp: Slot [0001:00:02.2](PCI location=U7879.001.DQD02EK-P1-C4) registered
rpaphp: Slot [0001:00:02.4](PCI location=U7879.001.DQD02EK-P1-C5) registered
rpaphp: Slot [0001:00:02.6](PCI location=U7879.001.DQD02EK-P1-C6) registered
rpaphp: Slot [0002:00:02.0](PCI location=U7879.001.DQD02EK-P1-C1) registered
rpaphp: Slot [0002:00:02.6](PCI location=U7879.001.DQD02EK-P1-C2) registered
matroxfb: Matrox G450 detected
PInS data found at offset 31168
PInS memtype = 5
matroxfb: 640x480x8bpp (virtual: 640x26214)
matroxfb: framebuffer at 0x400C0000000, mapped to 0xd000080080004000, size 33554432
Console: switching to colour frame buffer device 80x30
fb0: MATROX frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
RAMDISK driver initialized: 16 RAM disks of 123456K size 1024 blocksize
input: Macintosh mouse button emulation as /class/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ehci_hcd 0000:c8:01.2: EHCI Host Controller
ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:c8:01.2: irq 101, io mem 0x400a0002000
ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: new device found, idVendor=0000, idProduct=0000
usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.16.57-0.5-ppc64 ehci_hcd
usb usb1: SerialNumber: 0000:c8:01.2
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd 0000:c8:01.0: OHCI Host Controller
ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:c8:01.0: irq 101, io mem 0x400a0001000
usb usb2: new device found, idVendor=0000, idProduct=0000
usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: OHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.16.57-0.5-ppc64 ohci_hcd
usb usb2: SerialNumber: 0000:c8:01.0
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
hub 2-0:1.0: over-current change on port 1
ohci_hcd 0000:c8:01.1: OHCI Host Controller
ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:c8:01.1: irq 101, io mem 0x400a0000000
usb usb3: new device found, idVendor=0000, idProduct=0000
usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: OHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.16.57-0.5-ppc64 ohci_hcd
usb usb3: SerialNumber: 0000:c8:01.1
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
hub 3-0:1.0: over-current change on port 1
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
oprofile: using ppc64/power5 performance monitoring.
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 15
Freeing unused kernel memory: 264k freed
SCSI subsystem initialized
libata version 2.00 loaded.
ipr: IBM Power RAID SCSI Device Driver version: 2.2.0.2 (November 14, 2007)
ipr 0000:c0:01.0: Found IOA with IRQ: 99
ipr 0000:c0:01.0: Starting IOA initialization sequence.
ipr 0000:c0:01.0: Adapter firmware version: 020A004E
ipr 0000:c0:01.0: IOA initialized.
scsi0 : IBM 570B Storage Adapter
  Vendor: IBM       Model: ST373453LC        Rev: C51A
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 10 08
SCSI device sda: drive cache: write through w/ FUA
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
sda: Write Protect is off
sda: Mode Sense: cb 00 10 08
SCSI device sda: drive cache: write through w/ FUA
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
sd 0:0:4:0: Attached scsi disk sda
  Vendor: IBM       Model: VSBPD3E   U4SCSI  Rev: 4812
  Type:   Enclosure                          ANSI SCSI revision: 02
sd 0:0:4:0: Attached scsi generic sg0 type 0
 0:0:15:0: Attached scsi generic sg1 type 13
scsi: unknown device type 31
  Vendor: IBM       Model: 570B001           Rev: 0150
  Type:   Unknown                            ANSI SCSI revision: 00
 0:255:255:255: Attached scsi generic sg2 type 31
ipr 0002:c8:01.0: Found IOA with IRQ: 133
ipr 0002:c8:01.0: Starting IOA initialization sequence.
ipr 0002:c8:01.0: Adapter firmware version: 020A004E
ipr 0002:c8:01.0: IOA initialized.
scsi1 : IBM 570B Storage Adapter
  Vendor: IBM       Model: IC35L073UCDY10-0  Rev: S28G
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 00 08
SCSI device sdb: drive cache: write through
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
sdb: Write Protect is off
sdb: Mode Sense: cb 00 00 08
SCSI device sdb: drive cache: write through
 sdb: sdb1
sd 1:0:3:0: Attached scsi disk sdb
sd 1:0:3:0: Attached scsi generic sg3 type 0
  Vendor: IBM       Model: ST373453LC        Rev: C51A
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB)
sdc: Write Protect is off
sdc: Mode Sense: cb 00 10 08
SCSI device sdc: drive cache: write through w/ FUA
SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB)
sdc: Write Protect is off
sdc: Mode Sense: cb 00 10 08
SCSI device sdc: drive cache: write through w/ FUA
 sdc: sdc1
sd 1:0:5:0: Attached scsi disk sdc
sd 1:0:5:0: Attached scsi generic sg4 type 0
  Vendor: IBM       Model: VSBPD3E   U4SCSI  Rev: 4812
  Type:   Enclosure                          ANSI SCSI revision: 02
 1:0:15:0: Attached scsi generic sg5 type 13
scsi: unknown device type 31
  Vendor: IBM       Model: 570B001           Rev: 0150
  Type:   Unknown                            ANSI SCSI revision: 00
 1:255:255:255: Attached scsi generic sg6 type 31
pata_pdc2027x 0002:d0:01.0: version 0.74-ac5
PCI: Enabling device: (0002:d0:01.0), cmd 3
pata_pdc2027x 0002:d0:01.0: PLL input clock 32760 kHz
ata1: PATA max UDMA/133 cmd 0xD0000800820887C0 ctl 0xD000080082088FDA bmdma 0xD000080082088000 irq 135
ata2: PATA max UDMA/133 cmd 0xD0000800820885C0 ctl 0xD000080082088DDA bmdma 0xD000080082088008 irq 135
scsi2 : pata_pdc2027x
ata1.00: ATAPI, max UDMA/33
ata1.00: configured for UDMA/33
scsi3 : pata_pdc2027x
ATA: abnormal status 0x8 on port 0xD0000800820885DF
  Vendor: IBM       Model: DROM00205         Rev: NR38
  Type:   CD-ROM                             ANSI SCSI revision: 02
 2:0:0:0: Attached scsi generic sg7 type 5
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 2:0:0:0: Attached scsi CD-ROM sr0
ReiserFS: sda5: found reiserfs format "3.6" with standard journal
ReiserFS: sda5: using ordered data mode
reiserfs: using flush barriers
ReiserFS: sda5: journal params: device sda5, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sda5: checking transaction log (sda5)
ReiserFS: sda5: Using r5 hash to sort names
Adding 1050616k swap on /dev/disk/by-label/vscsi_swap.  Priority:-1 extents:1 across:1050616k
Intel(R) PRO/1000 Network Driver - version 7.6.9.1-NAPI
Copyright (c) 1999-2007 Intel Corporation.
PCI: Enabling device: (0000:d0:01.0), cmd 3
e1000: 0000:d0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:0e:78
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0000:d0:01.1), cmd 3
e1000: 0000:d0:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:0e:79
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0001:c0:01.0), cmd 3
e1000: 0001:c0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:11:25:c0:5a:13
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0001:c8:01.0), cmd 3
e1000: 0001:c8:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:1b:ee
e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: Enabling device: (0001:c8:01.1), cmd 3
e1000: 0001:c8:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:1b:ef
e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection
md: md0 stopped.
device-mapper: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
dm-netlink version 0.0.2 loaded
md: bind<sdc1>
md: bind<sdb1>
md: raid0 personality registered for level 0
md0: setting max_sectors to 64, segment boundary to 16383
raid0: looking at sdb1
raid0:   comparing sdb1(71673856) with sdb1(71673856)
raid0:   END
raid0:   ==> UNIQUE
raid0: 1 zones
raid0: looking at sdc1
raid0:   comparing sdc1(71673856) with sdb1(71673856)
raid0:   EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 143347712 blocks.
raid0 : conf->hash_spacing is 143347712 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 8 bytes for hash.
loop: loaded (max 8 devices)
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
ReiserFS: sda6: found reiserfs format "3.6" with standard journal
ReiserFS: sda6: using ordered data mode
reiserfs: using flush barriers
ReiserFS: sda6: journal params: device sda6, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sda6: checking transaction log (sda6)
ReiserFS: sda6: Using r5 hash to sort names
AppArmor: AppArmor (version 2.0-19.43r6320) initialized
audit(1200599270.450:2): AppArmor (version 2.0-19.43r6320) initialized

ib_core: module not supported by Novell, setting U taint flag.
ib_mad: module not supported by Novell, setting U taint flag.
ib_mthca: module not supported by Novell, setting U taint flag.
ib_umad: module not supported by Novell, setting U taint flag.
ib_uverbs: module not supported by Novell, setting U taint flag.
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
ib_sa: module not supported by Novell, setting U taint flag.
ib_cm: module not supported by Novell, setting U taint flag.
ib_ipoib: module not supported by Novell, setting U taint flag.
iw_cm: module not supported by Novell, setting U taint flag.
ib_addr: module not supported by Novell, setting U taint flag.
rdma_cm: module not supported by Novell, setting U taint flag.
ib_sdp: module not supported by Novell, setting U taint flag.
NET: Registered protocol family 27
ib_srp: module not supported by Novell, setting U taint flag.
rdma_ucm: module not supported by Novell, setting U taint flag.
ADDRCONF(NETDEV_UP): eth0: link is not ready
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
audit(1200599280.383:3): audit_pid=3972 old=0 by auid=4294967295
sd 0:0:4:0: queue not ready for req c000000157bdb2a8
sd 0:0:4:0: queue not ready for req c000000157bdb2a8
sd 0:0:4:0: queue not ready for req c000000156d3b2a8
sd 0:0:4:0: queue not ready for req c000000156d3b2a8
sd 0:0:4:0: queue not ready for req c0000000b35ed2a8
sd 0:0:4:0: queue not ready for req c0000000b35ed2a8
sd 0:0:4:0: queue not ready for req c00000000fc71728
sd 0:0:4:0: queue not ready for req c00000000fc713c8
sd 0:0:4:0: queue not ready for req c000000156082188
sd 0:0:4:0: queue not ready for req c000000156082188
sd 0:0:4:0: queue not ready for req c00000000fc712a8
sd 0:0:4:0: queue not ready for req c00000000fc714e8
sd 0:0:4:0: queue not ready for req c00000000fc714e8
sd 0:0:4:0: queue not ready for req c00000000fc71608
sd 0:0:4:0: queue not ready for req c00000000fc71608
sd 0:0:4:0: queue not ready for req c00000000fd6c848
sd 0:0:4:0: queue not ready for req c00000000fd6c3c8
sd 0:0:4:0: queue not ready for req c00000000fd6c068
sd 0:0:4:0: queue not ready for req c00000000fd6c728
sd 0:0:4:0: queue not ready for req c0000001560824e8
sd 0:0:4:0: queue not ready for req c000000156082968
sd 0:0:4:0: queue not ready for req c000000003364728
sd 0:0:4:0: queue not ready for req c00000000fe41188
sd 0:0:4:0: queue not ready for req c00000000f8e0968

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 19:54         ` Olaf Hering
@ 2008-01-17 20:20           ` Olaf Hering
  2008-01-19  4:56             ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: Olaf Hering @ 2008-01-17 20:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, Jan 17, Olaf Hering wrote:

> Since -mm boots further, what patch should I try?

rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 20:20           ` Olaf Hering
@ 2008-01-19  4:56             ` Christoph Lameter
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-19  4:56 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, 17 Jan 2008, Olaf Hering wrote:

> On Thu, Jan 17, Olaf Hering wrote:
> 
> > Since -mm boots further, what patch should I try?
> 
> rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.

Sigh. It looks like we need alien cache structures in some cases for nodes 
that have no memory. We must allocate structures for all nodes regardless 
if they have allocatable memory or not.

 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 18:58       ` Christoph Lameter
  2008-01-17 19:54         ` Olaf Hering
@ 2008-01-17 21:15         ` Olaf Hering
  2008-01-18  6:56           ` Olaf Hering
                             ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-17 21:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, Jan 17, Christoph Lameter wrote:

> On Thu, 17 Jan 2008, Olaf Hering wrote:
> 
> > The patch does not help.
> 
> Duh. We need to know more about the problem.

cache_grow is called from 3 places. The third call has cleared l3 for
some reason.


....
Allocated 00a00000 bytes for kernel @ 00200000
   Elf64 kernel loaded...
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line:  xmon=on sysrq=1 debug panic=1 
memory layout at init:
  alloc_bottom : 0000000000ac1000
  alloc_top    : 0000000010000000
  alloc_top_hi : 00000000da000000
  rmo_top      : 0000000010000000
  ram_top      : 00000000da000000
Looking for displays
found display   : /pci@800000020000002/pci@2/pci@1/display@0, opening ... done
instantiating rtas at 0x000000000f6a1000 ... done
0000000000000000 : boot cpu     0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
0000000000000004 : starting cpu hw idx 0000000000000004... done
0000000000000006 : starting cpu hw idx 0000000000000006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4
Device tree struct  0x0000000000cc4000 -> 0x0000000000cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #34 SMP Thu Jan 17 22:06:41 CET 2008
-----------------------------------------------------
ppc64_pft_size                = 0x1c
physicalMemorySize            = 0xda000000
htab_hash_mask                = 0x1fffff
-----------------------------------------------------
Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #34 SMP Thu Jan 17 22:06:41 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA             0 ->   892928
  Normal     892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    1:        0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line:  xmon=on sysrq=1 debug panic=1 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency   = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0
------------[ cut here ]------------
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404
REGS: c00000000075b880 TRAP: 0700   Not tainted  (2.6.24-rc8-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000022  XER: 00000001
TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0
GPR00: 0000000000000004 c00000000075bb00 c0000000007544c0 0000000000000063 
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 
GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8 
GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000 
GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000492d0 
GPR24: 0000000000000000 c0000000006a4fb8 c0000000006a4fb8 c0000000005fdc80 
GPR28: 0000000000000000 00000000000412d0 c0000000006e5b80 0000000000000004 
NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c
LR [c0000000000f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c00000000075bb00] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
[c00000000075bc90] [c0000000000f842c] .kmem_cache_alloc+0xd0/0x294
[c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478
[c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc
[c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000 
381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468 
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0
------------[ cut here ]------------
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404
REGS: c00000000075b890 TRAP: 0700   Not tainted  (2.6.24-rc8-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000022  XER: 00000001
TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0
GPR00: 0000000000000004 c00000000075bb10 c0000000007544c0 0000000000000063 
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 
GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8 
GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000 
GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000492d0 
GPR24: 0000000000000000 00000000000080d0 c0000000006a4fb8 c0000000006a4fb8 
GPR28: 0000000000000000 00000000000412d0 c0000000006e5b80 0000000000000004 
NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c
LR [c0000000000f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c00000000075bb10] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8
[c00000000075bc90] [c0000000000f846c] .kmem_cache_alloc+0x110/0x294
[c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478
[c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc
[c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000 
381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468 
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 0000000000000000
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 0000000000000000
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 0000000000000000
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 0000000000000000
------------[ cut here ]------------
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404
REGS: c00000000075b890 TRAP: 0700   Not tainted  (2.6.24-rc8-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000022  XER: 00000001
TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0
GPR00: 0000000000000004 c00000000075bb10 c0000000007544c0 0000000000000063 
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 
GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8 
GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000 
GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000080d0 
GPR24: 0000000000000001 c0000000d9fe4b00 c0000000006a4fb8 0000000000000000 
GPR28: c0000000d8000000 00000000000000d0 c0000000006e5b80 0000000000000004 
NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c
LR [c0000000000f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c00000000075bb10] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4
[c00000000075bc90] [c0000000000f846c] .kmem_cache_alloc+0x110/0x294
[c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478
[c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc
[c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000 
381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468 
Unable to handle kernel paging request for data at address 0x00000040
Faulting instruction address: 0xc0000000004377b8
cpu 0x0: Vector: 300 (Data Access) at [c00000000075b810]
    pc: c0000000004377b8: ._spin_lock+0x20/0x88
    lr: c0000000000f790c: .cache_grow+0xe0/0x39c
    sp: c00000000075ba90
   msr: 8000000000009032
   dar: 40
 dsisr: 40000000
  current = 0xc000000000665a50
  paca    = 0xc000000000666380
    pid   = 0, comm = swapper
enter ? for help
[c00000000075bb10] c0000000000f790c .cache_grow+0xe0/0x39c
[c00000000075bbe0] c0000000000f7d68 .fallback_alloc+0x1a0/0x1f4
[c00000000075bc90] c0000000000f846c .kmem_cache_alloc+0x110/0x294
[c00000000075bd40] c0000000000fb4e8 .kmem_cache_create+0x208/0x478
[c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
[c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
0:mon> 



-- 
Used patch:

Index: linux-2.6.24-rc8/include/linux/olh.h
===================================================================
--- /dev/null
+++ linux-2.6.24-rc8/include/linux/olh.h
@@ -0,0 +1,6 @@
+#ifndef __LINUX_OLH_H
+#define __LINUX_OLH_H
+#define olh(fmt,args ...) \
+        printk(KERN_DEBUG "%s(%u) %s(%u):c%u,j%lu " fmt "\n",__FUNCTION__,__LINE__,current->comm,current->pid,smp_processor_id(),jiffies,##args)
+#endif
+
Index: linux-2.6.24-rc8/mm/slab.c
===================================================================
--- linux-2.6.24-rc8.orig/mm/slab.c
+++ linux-2.6.24-rc8/mm/slab.c
@@ -110,6 +110,7 @@
 #include       <linux/fault-inject.h>
 #include       <linux/rtmutex.h>
 #include       <linux/reciprocal_div.h>
+#include       <linux/olh.h>

 #include       <asm/cacheflush.h>
 #include       <asm/tlbflush.h>
@@ -2764,6 +2765,7 @@ static int cache_grow(struct kmem_cache 
        size_t offset;
        gfp_t local_flags;
        struct kmem_list3 *l3;
+       int i;
 
        /*
         * Be lazy and only check for valid flags here,  keeping it out of the
@@ -2772,6 +2774,9 @@ static int cache_grow(struct kmem_cache 
        BUG_ON(flags & GFP_SLAB_BUG_MASK);
        local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
+       for (i=0;i<4;i++)
+               olh("cachep %p nodeid %d l3 %p",cachep,i,cachep->nodelists[nodeid]);
+       WARN_ON(1);
        /* Take the l3 list lock to change the colour_next on this node */
        check_irq_off();
        l3 = cachep->nodelists[nodeid];

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 21:15         ` Olaf Hering
@ 2008-01-18  6:56           ` Olaf Hering
  2008-01-18 18:42             ` Christoph Lameter
  2008-01-19  4:55             ` Christoph Lameter
  2008-01-18 18:47           ` Christoph Lameter
  2008-01-18 18:51           ` Christoph Lameter
  2 siblings, 2 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-18  6:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, Jan 17, Olaf Hering wrote:

> On Thu, Jan 17, Christoph Lameter wrote:
> 
> > On Thu, 17 Jan 2008, Olaf Hering wrote:
> > 
> > > The patch does not help.
> > 
> > Duh. We need to know more about the problem.
> 
> cache_grow is called from 3 places. The third call has cleared l3 for
> some reason.

Typo in debug patch.

calls cache_grow with nodeid 0
> [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
calls cache_grow with nodeid 0
> [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8

calls cache_grow with nodeid 1
> [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18  6:56           ` Olaf Hering
@ 2008-01-18 18:42             ` Christoph Lameter
  2008-01-19  4:55             ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-18 18:42 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8
> 
> calls cache_grow with nodeid 1
> > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4

Hmmm... fallback_alloc should not be called during bootstrap.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18  6:56           ` Olaf Hering
  2008-01-18 18:42             ` Christoph Lameter
@ 2008-01-19  4:55             ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-19  4:55 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8
> 
> calls cache_grow with nodeid 1
> > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4

Okay that makes sense. You have no node 0 with normal memory but the node 
assigned to the executing processor is zero (correct?). Thus it needs to 
fallback to node 1 and that is not possible during bootstrap. You need to 
run kmem_cache_init() on a cpu on a processor with memory.

Or we need to revert the patch which would allocate control 
structures again for all online nodes regardless if they have memory or 
not.

Does reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 change the 
situation? (However, we tried this on the other thread without success).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 21:15         ` Olaf Hering
  2008-01-18  6:56           ` Olaf Hering
@ 2008-01-18 18:47           ` Christoph Lameter
  2008-01-18 21:30             ` Mel Gorman
  2008-01-18 18:51           ` Christoph Lameter
  2 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-18 18:47 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, 17 Jan 2008, Olaf Hering wrote:

> early_node_map[1] active PFN ranges
>     1:        0 ->   892928
> Could not find start_pfn for node 0

Corrupted min_pfn?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 18:47           ` Christoph Lameter
@ 2008-01-18 21:30             ` Mel Gorman
  2008-01-18 21:43               ` Christoph Lameter
  2008-01-18 22:16               ` Christoph Lameter
  0 siblings, 2 replies; 61+ messages in thread
From: Mel Gorman @ 2008-01-18 21:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linuxppc-dev, Olaf Hering, Pekka Enberg, linux-kernel, Linux MM

On (18/01/08 10:47), Christoph Lameter didst pronounce:
> On Thu, 17 Jan 2008, Olaf Hering wrote:
> 
> > early_node_map[1] active PFN ranges
> >     1:        0 ->   892928
> > Could not find start_pfn for node 0
> 
> Corrupted min_pfn?
> 

Doubtful. Node 0 has no memory but it is still being initialised.

Still, I looked closer at what is going on when that message gets
displayed and I see this in free_area_init_nodes()

        for_each_online_node(nid) {
                pg_data_t *pgdat = NODE_DATA(nid);
                free_area_init_node(nid, pgdat, NULL,
                                find_min_pfn_for_node(nid), NULL);

                /* Any memory on that node */
                if (pgdat->node_present_pages)
                        node_set_state(nid, N_HIGH_MEMORY);
                check_for_regular_memory(pgdat);
        }

This "Any memory on that node" thing is new and it says if there is any
memory on the node, set N_HIGH_MEMORY. Fine I guess, I haven't tracked these
changes closely. It calls check_for_regular_memory() which looks like

static void check_for_regular_memory(pg_data_t *pgdat)
{
#ifdef CONFIG_HIGHMEM
        enum zone_type zone_type;

        for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
                struct zone *zone = &pgdat->node_zones[zone_type];
                if (zone->present_pages)
                        node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
        }
#endif
}

i.e. go through the other zones and if any of them have memory, set
N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
POWER.... That sounds bad.

mel@arnold:~/git/linux-2.6/mm$ grep -n N_NORMAL_MEMORY slab.c 
1593:           for_each_node_state(nid, N_NORMAL_MEMORY) {
1971:   for_each_node_state(node, N_NORMAL_MEMORY) {
2102:                   for_each_node_state(node, N_NORMAL_MEMORY) {
3818:   for_each_node_state(node, N_NORMAL_MEMORY) {

and one of them is in kmem_cache_init(). That seems very significant.
Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
being set would cause trouble for slab?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 21:30             ` Mel Gorman
@ 2008-01-18 21:43               ` Christoph Lameter
  2008-01-18 22:16               ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-18 21:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linuxppc-dev, Olaf Hering, Pekka Enberg, linux-kernel, Linux MM

On Fri, 18 Jan 2008, Mel Gorman wrote:

> static void check_for_regular_memory(pg_data_t *pgdat)
> {
> #ifdef CONFIG_HIGHMEM
>         enum zone_type zone_type;
> 
>         for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
>                 struct zone *zone = &pgdat->node_zones[zone_type];
>                 if (zone->present_pages)
>                         node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
>         }
> #endif
> }
> 
> i.e. go through the other zones and if any of them have memory, set
> N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
> PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
> POWER.... That sounds bad.

Argh. We may need to do a

node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY) in the !HIGHMEM case.

> and one of them is in kmem_cache_init(). That seems very significant.
> Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
> being set would cause trouble for slab?

Yes. That results in the per node structures not being created and thus l3 
== NULL. Explains our failures.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 21:30             ` Mel Gorman
  2008-01-18 21:43               ` Christoph Lameter
@ 2008-01-18 22:16               ` Christoph Lameter
  2008-01-18 22:19                 ` Nish Aravamudan
                                   ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-18 22:16 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	KAMEZAWA Hiroyuki

Could you try this patch?

Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

It seems that we only scan through zones to set N_NORMAL_MEMORY only if
CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set N_NORMAL_MEMORY
in the !CONFIG_HIGHMEM case.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2008-01-18 14:08:41.000000000 -0800
+++ linux-2.6/mm/page_alloc.c	2008-01-18 14:13:34.000000000 -0800
@@ -3812,7 +3812,6 @@ restart:
 /* Any regular memory on that node ? */
 static void check_for_regular_memory(pg_data_t *pgdat)
 {
-#ifdef CONFIG_HIGHMEM
 	enum zone_type zone_type;
 
 	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
@@ -3820,7 +3819,6 @@ static void check_for_regular_memory(pg_
 		if (zone->present_pages)
 			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
 	}
-#endif
 }
 
 /**

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 22:16               ` Christoph Lameter
@ 2008-01-18 22:19                 ` Nish Aravamudan
  2008-01-18 22:38                 ` Christoph Lameter
  2008-01-18 22:57                 ` Olaf Hering
  2 siblings, 0 replies; 61+ messages in thread
From: Nish Aravamudan @ 2008-01-18 22:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	KAMEZAWA Hiroyuki

On 1/18/08, Christoph Lameter <clameter@sgi.com> wrote:
> Could you try this patch?
>
> Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support
> HIGHMEM
>
> It seems that we only scan through zones to set N_NORMAL_MEMORY only if
> CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set
> N_NORMAL_MEMORY
> in the !CONFIG_HIGHMEM case.

I'm testing this exact patch right now on the machine Mel saw the issues with.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 22:16               ` Christoph Lameter
  2008-01-18 22:19                 ` Nish Aravamudan
@ 2008-01-18 22:38                 ` Christoph Lameter
  2008-01-18 22:57                 ` Olaf Hering
  2 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-18 22:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	KAMEZAWA Hiroyuki

On Fri, 18 Jan 2008, Christoph Lameter wrote:

> Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

If !CONFIG_HIGHMEM then

enum node_states {
#ifdef CONFIG_HIGHMEM
        N_HIGH_MEMORY,          /* The node has regular or high memory */
#else
        N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif

So
	for_each_online_node(nid) {
                pg_data_t *pgdat = NODE_DATA(nid);
                free_area_init_node(nid, pgdat, NULL,
                                find_min_pfn_for_node(nid), NULL);

                /* Any memory on that node */
                if (pgdat->node_present_pages)
                        node_set_state(nid, N_HIGH_MEMORY);
			^^^ sets N_NORMAL_MEMORY      
          	check_for_regular_memory(pgdat);
        }

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 22:16               ` Christoph Lameter
  2008-01-18 22:19                 ` Nish Aravamudan
  2008-01-18 22:38                 ` Christoph Lameter
@ 2008-01-18 22:57                 ` Olaf Hering
  2008-01-22 19:54                   ` Mel Gorman
  2 siblings, 1 reply; 61+ messages in thread
From: Olaf Hering @ 2008-01-18 22:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	KAMEZAWA Hiroyuki

On Fri, Jan 18, Christoph Lameter wrote:

> Could you try this patch?

Does not help, same crash.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-18 22:57                 ` Olaf Hering
@ 2008-01-22 19:54                   ` Mel Gorman
  2008-01-22 20:11                     ` Christoph Lameter
  2008-01-22 21:45                     ` Olaf Hering
  0 siblings, 2 replies; 61+ messages in thread
From: Mel Gorman @ 2008-01-22 19:54 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On (18/01/08 23:57), Olaf Hering didst pronounce:
> On Fri, Jan 18, Christoph Lameter wrote:
> 
> > Could you try this patch?
> 
> Does not help, same crash.
> 

Hi Olaf,

It was suggested this problem was the same as another slab-related boot problem
that was fixed for 2.6.24 by reverting a change. This fix can be found at
http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
. Can you please check on your machine if it fixes your problem?

I am 99.9999% it will *not* fix your problem because there was two bugs, not
one as previously believed. On two test machines here, this kmem_cache_init
problem still happens even with the revert which fixed a third machine. I
was delayed in testing because these boxen unavailable from Friday until
yesterday evening (a stellar display of timing). It was missed on TKO because
it was SLAB-specific and those machines were testing SLUB. I found that the
patch below was necessary to fix the problem.

Olaf, please confirm whether you need the patch below as well as the
revert to make your machine boot.

Christoph/Pekka, this patch is papering over the problem and something
more fundamental may be going wrong. The crash occurs because l3 is NULL
and the cache is kmem_cache so this is early in the boot process. It is
selecting l3 based on node 2 which is correct in terms of available memory
but it initialises the lists on node 0 because that is the node the CPUs are
located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
parts of the log for seeing the memoryless nodes in relation to CPUs is;

early_node_map[1] active PFN ranges
    2:        0 ->  1048576
Processor 1 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[2]
Processor 2 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[3]
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
Node 2 CPUs:

Can you see a better solution than this?

====
Recent changes to how slab operates mean a situation can occur on systems
with memoryless nodes whereby the nodeid used when growing the slab does
not map to the correct kmem_list3. The following patch adds the necessary
check to the indicated preferred nodeid and if it is bogus, use numa_node_id() instead.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>

--- 
 mm/slab.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
--- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c	2008-01-22 17:46:32.000000000 +0000
+++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c	2008-01-22 18:42:53.000000000 +0000
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
 	/* Take the l3 list lock to change the colour_next on this node */
 	check_irq_off();
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
+	BUG_ON(!l3);
 	spin_lock(&l3->list_lock);

 	/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct
 	int x;

 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
 	BUG_ON(!l3);

 retry:

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 19:54                   ` Mel Gorman
@ 2008-01-22 20:11                     ` Christoph Lameter
  2008-01-22 21:26                       ` Mel Gorman
  2008-01-22 21:45                     ` Olaf Hering
  1 sibling, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 20:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Tue, 22 Jan 2008, Mel Gorman wrote:

> Christoph/Pekka, this patch is papering over the problem and something
> more fundamental may be going wrong. The crash occurs because l3 is NULL
> and the cache is kmem_cache so this is early in the boot process. It is
> selecting l3 based on node 2 which is correct in terms of available memory
> but it initialises the lists on node 0 because that is the node the CPUs are
> located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
> parts of the log for seeing the memoryless nodes in relation to CPUs is;

Would it be possible to run the bootstrap on a cpu that has a 
node with memory associated to it? I believe we had the same situation 
last year when GFP_THISNODE was introduced?

After you reverted the slab memoryless node patch there should be per node 
structures created for node 0 unless the node is marked offline. Is it? If 
so then you are booting a cpu that is associated with an offline node. 

> Can you see a better solution than this?

Well this means that bootstrap will work by introducing foreign objects 
into the per cpu queue (should only hold per cpu objects). They will 
later be consumed and then the queues will contain the right objects so 
the effect of the patch is minimal.

I thought we fixed the similar situation last year by dropping 
GFP_THISNODE for some allocations?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 20:11                     ` Christoph Lameter
@ 2008-01-22 21:26                       ` Mel Gorman
  2008-01-22 21:34                         ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2008-01-22 21:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On (22/01/08 12:11), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
> 
> > Christoph/Pekka, this patch is papering over the problem and something
> > more fundamental may be going wrong. The crash occurs because l3 is NULL
> > and the cache is kmem_cache so this is early in the boot process. It is
> > selecting l3 based on node 2 which is correct in terms of available memory
> > but it initialises the lists on node 0 because that is the node the CPUs are
> > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
> > parts of the log for seeing the memoryless nodes in relation to CPUs is;
> 
> Would it be possible to run the bootstrap on a cpu that has a 
> node with memory associated to it?

Not in the way the machine is currently configured. All the CPUs appear to
be on a node with no memory. It's best to assume I cannot get the machine
reconfigured (which just hides the bug anyway). Physically, it's thousands
of miles away so I can't do the work. I can get lab support to do the job
but that will take a fair while and at the end of the day, it doesn't tell
us a lot. We know that other PPC64 machines work so it's not a general problem.

> I believe we had the same situation 
> last year when GFP_THISNODE was introduced?
> 

It feels vaguely familiar but I don't recall the details in sufficient detail
to recognise if this is the same problem or not.

> After you reverted the slab memoryless node patch there should be per node 
> structures created for node 0 unless the node is marked offline. Is it? If 
> so then you are booting a cpu that is associated with an offline node. 
> 

I'll roll a patch that prints out the online states before startup and
see what it looks like.

> > Can you see a better solution than this?
> 
> Well this means that bootstrap will work by introducing foreign objects 
> into the per cpu queue (should only hold per cpu objects). They will 
> later be consumed and then the queues will contain the right objects so 
> the effect of the patch is minimal.
> 

By minimal, do you mean that you expect it to break in some other
respect later or minimal as in "this is bad but should not have no
adverse impact".

> I thought we fixed the similar situation last year by dropping 
> GFP_THISNODE for some allocations?
> 

Whatever this was a problem fixed in the past or not, it's broken again now
:( . It's possible that there is a __GFP_THISNODE that can be dropped early
at boot-time that would also fix this problem in a way that doesn't
affect runtime (like altering cache_grow in my patch does).

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 21:26                       ` Mel Gorman
@ 2008-01-22 21:34                         ` Christoph Lameter
  2008-01-22 22:50                           ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 21:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Tue, 22 Jan 2008, Mel Gorman wrote:

> > After you reverted the slab memoryless node patch there should be per node 
> > structures created for node 0 unless the node is marked offline. Is it? If 
> > so then you are booting a cpu that is associated with an offline node. 
> > 
> 
> I'll roll a patch that prints out the online states before startup and
> see what it looks like.

Ok. Great.

> 
> > > Can you see a better solution than this?
> > 
> > Well this means that bootstrap will work by introducing foreign objects 
> > into the per cpu queue (should only hold per cpu objects). They will 
> > later be consumed and then the queues will contain the right objects so 
> > the effect of the patch is minimal.
> > 
> 
> By minimal, do you mean that you expect it to break in some other
> respect later or minimal as in "this is bad but should not have no
> adverse impact".

Should not have any adverse impact after the objects from the cpu queue 
have been consumed. If the cache_reaper tries to shift objects back 
from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure 
you run the tests with full debugging please.

> Whatever this was a problem fixed in the past or not, it's broken again now
> :( . It's possible that there is a __GFP_THISNODE that can be dropped early
> at boot-time that would also fix this problem in a way that doesn't
> affect runtime (like altering cache_grow in my patch does).

The dropping of GFP_THISNODE has the same effect as your patch. 
Objects from another node get into the per cpu queue. And on free we 
assume that per cpu queue objects are from the local node. If debug is on 
then we check that with BUG_ONs.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 21:34                         ` Christoph Lameter
@ 2008-01-22 22:50                           ` Mel Gorman
  2008-01-22 22:57                             ` Christoph Lameter
  2008-01-22 22:59                             ` Pekka Enberg
  0 siblings, 2 replies; 61+ messages in thread
From: Mel Gorman @ 2008-01-22 22:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

[-- Attachment #1: Type: text/plain, Size: 8266 bytes --]

On (22/01/08 13:34), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
> 
> > > After you reverted the slab memoryless node patch there should be per node 
> > > structures created for node 0 unless the node is marked offline. Is it? If 
> > > so then you are booting a cpu that is associated with an offline node. 
> > > 
> > 
> > I'll roll a patch that prints out the online states before startup and
> > see what it looks like.
> 
> Ok. Great.
> 

The dmesg output is below.


> > 
> > > > Can you see a better solution than this?
> > > 
> > > Well this means that bootstrap will work by introducing foreign objects 
> > > into the per cpu queue (should only hold per cpu objects). They will 
> > > later be consumed and then the queues will contain the right objects so 
> > > the effect of the patch is minimal.
> > > 
> > 
> > By minimal, do you mean that you expect it to break in some other
> > respect later or minimal as in "this is bad but should not have no
> > adverse impact".
> 
> Should not have any adverse impact after the objects from the cpu queue 
> have been consumed. If the cache_reaper tries to shift objects back 
> from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure 
> you run the tests with full debugging please.
> 

I am not running a full range of tests at the moment. Just getting boot
first. I'll queue up a range of tests to run with DEBUG on now but it'll
be the morning before I have the results.

> > Whatever this was a problem fixed in the past or not, it's broken again now
> > :( . It's possible that there is a __GFP_THISNODE that can be dropped early
> > at boot-time that would also fix this problem in a way that doesn't
> > affect runtime (like altering cache_grow in my patch does).
> 
> The dropping of GFP_THISNODE has the same effect as your patch. 

The dropping of it totally? If so, this patch might fix a boot but it'll
potentially be a performance regression on NUMA machines that only have
nodes with memory, right?

> Objects from another node get into the per cpu queue. And on free we 
> assume that per cpu queue objects are from the local node. If debug is on 
> then we check that with BUG_ONs.
> 

The interesting parts of the dmesg output are

Online nodes
o 0
o 2
Nodes with regular memory
o 2
Current running CPU 0 is associated with node 0
Current node is 0

So node 2 has regular memory but it's trying to use node 0 at a glance.
I've attached the patch I used against 2.6.24-rc8. It includes the revert.

Here is the full output


Please wait, loading kernel...
   Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded at 02400000, size: 1192 Kbytes
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 loglevel=8 
memory layout at init:
  alloc_bottom : 000000000252a000
  alloc_top    : 0000000008000000
  alloc_top_hi : 0000000100000000
  rmo_top      : 0000000008000000
  ram_top      : 0000000100000000
Looking for displays
instantiating rtas at 0x00000000077d9000 ... done
0000000000000000 : boot cpu     0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000262b000 -> 0x000000000262c1d3
Device tree struct  0x000000000262d000 -> 0x0000000002635000
Calling quiesce ...
returning from prom_init
Partition configured for 4 cpus.
Starting Linux PPC64 #1 SMP Tue Jan 22 17:15:48 EST 2008
-----------------------------------------------------
ppc64_pft_size                = 0x1a
physicalMemorySize            = 0x100000000
htab_hash_mask                = 0x7ffff
-----------------------------------------------------
Linux version 2.6.24-rc8-autokern1 (root@gekko-lp3.ltc.austin.ibm.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP Tue Jan 22 17:15:48 EST 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA             0 ->  1048576
  Normal    1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    2:        0 ->  1048576
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 1034240
Policy zone: DMA
Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 238.059000 MHz
time_init: processor frequency   = 1904.472000 MHz
clocksource: timebase mult[10cd746] shift[22] registered
clockevent: decrementer mult[3cf1] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg0] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 2
Memory: 4105560k/4194304k available (5004k kernel code, 88744k reserved, 876k data, 559k bss, 272k init)
Online nodes
o 0
o 2
Nodes with regular memory
o 2
Current running CPU 0 is associated with node 0
Current node is 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 1
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 2
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 3
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 4
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 5
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 6
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 7
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 8
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 9
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 10
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 11
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 12
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 13
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 14
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 15
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 16
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 17
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 18
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 19
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 20
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 21
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 22
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 23
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 24
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 25
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 26
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 27
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 28
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 29
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 30
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 31
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 32
kmem_cache_init Setting kmem_cache initkmem_list3 0
Unable to handle kernel paging request for data at address 0x00000040
Faulting instruction address: 0xc0000000003c8c00
cpu 0x0: Vector: 300 (Data Access) at [c0000000005c3840]
    pc: c0000000003c8c00: __lock_text_start+0x20/0x88
    lr: c0000000000dadec: .cache_grow+0x7c/0x338
    sp: c0000000005c3ac0
   msr: 8000000000009032
   dar: 40
 dsisr: 40000000
  current = 0xc000000000500f10
  paca    = 0xc000000000501b80
    pid   = 0, comm = swapper
enter ? for help
[c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338
[c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224
[c0000000005c3cb0] c0000000000db958 .kmem_cache_alloc+0xe0/0x14c
[c0000000005c3d50] c0000000000dcccc .kmem_cache_create+0x230/0x4cc
[c0000000005c3e30] c0000000004c05f4 .kmem_cache_init+0x310/0x640
[c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc
[c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0
0:mon>

[-- Attachment #2: debug-slab-with-revert.diff --]
[-- Type: text/x-diff, Size: 5708 bytes --]

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-005-debug-slab/mm/slab.c
--- linux-2.6.24-rc8-clean/mm/slab.c	2008-01-16 04:22:48.000000000 +0000
+++ linux-2.6.24-rc8-005-debug-slab/mm/slab.c	2008-01-22 21:36:50.000000000 +0000
@@ -348,6 +348,7 @@ static int slab_early_init = 1;
 
 static void kmem_list3_init(struct kmem_list3 *parent)
 {
+	printk(" o kmem_list3_init\n");
 	INIT_LIST_HEAD(&parent->slabs_full);
 	INIT_LIST_HEAD(&parent->slabs_partial);
 	INIT_LIST_HEAD(&parent->slabs_free);
@@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long 
 	 * kmem_list3 and not this cpu's kmem_list3
 	 */
 
+	printk("cpuup_prepare %ld\n", cpu);
 	list_for_each_entry(cachep, &cache_chain, next) {
 		/*
 		 * Set up the size64 kmemlist for cpu before we can
@@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long 
 		 * node has not already allocated this
 		 */
 		if (!cachep->nodelists[node]) {
+			printk(" o allocing %s %d\n", cachep->name, node);
 			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
 			if (!l3)
 				goto bad;
@@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long 
 			 * protection here.
 			 */
 			cachep->nodelists[node] = l3;
+			printk(" o l3 setup\n");
 		}
 
 		spin_lock_irq(&cachep->nodelists[node]->list_lock);
@@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long 
 	}
 	return 0;
 bad:
+	printk(" o bad\n");
 	cpuup_canceled(cpu);
 	return -ENOMEM;
 }
@@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache 
 	spin_lock_init(&ptr->list_lock);
 
 	MAKE_ALL_LISTS(cachep, ptr, nodeid);
+	printk("init_list RESETTING %s node %d\n", cachep->name, nodeid);
 	cachep->nodelists[nodeid] = ptr;
 	local_irq_enable();
 }
@@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void)
 		numa_platform = 0;
 	}
 
+	printk("Online nodes\n");
+	for_each_online_node(node)
+		printk("o %d\n", node);
+	printk("Nodes with regular memory\n");
+	for_each_node_state(node, N_NORMAL_MEMORY)
+		printk("o %d\n", node);
+	printk("Current running CPU %d is associated with node %d\n",
+		smp_processor_id(),
+		cpu_to_node(smp_processor_id()));
+	printk("Current node is %d\n",
+		numa_node_id());
+
 	for (i = 0; i < NUM_INIT_LISTS; i++) {
 		kmem_list3_init(&initkmem_list3[i]);
 		if (i < MAX_NUMNODES)
 			cache_cache.nodelists[i] = NULL;
+		printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, i);
 	}
 
 	/*
@@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void)
 	cache_cache.colour_off = cache_line_size();
 	cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
 	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];
+	printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, node);
+	printk("kmem_cache_init Setting %s initkmem_list3 %d\n", cache_cache.name, node);
 
 	/*
 	 * struct kmem_cache size depends on nr_node_ids, which
@@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void)
 		/* Replace the static kmem_list3 structures for the boot cpu */
 		init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node);
 
-		for_each_node_state(nid, N_NORMAL_MEMORY) {
+		for_each_online_node(nid) {
 			init_list(malloc_sizes[INDEX_AC].cs_cachep,
 				  &initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct 
 {
 	int node;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	printk("set_up_list3s %s index %d\n", cachep->name, index);
+	for_each_online_node(node) {
 		cachep->nodelists[node] = &initkmem_list3[index + node];
 		cachep->nodelists[node]->next_reap = jiffies +
 		    REAPTIMEOUT_LIST3 +
 		    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+		printk("set_up_list3s %s index %d\n", cachep->name, index);
 	}
 }
 
@@ -2099,11 +2122,13 @@ static int __init_refok setup_cpu_cache(
 			g_cpucache_up = PARTIAL_L3;
 		} else {
 			int node;
-			for_each_node_state(node, N_NORMAL_MEMORY) {
+			printk("setup_cpu_cache %s\n", cachep->name);
+			for_each_online_node(node) {
 				cachep->nodelists[node] =
 				    kmalloc_node(sizeof(struct kmem_list3),
 						GFP_KERNEL, node);
 				BUG_ON(!cachep->nodelists[node]);
+				printk(" o allocated node %d\n", node);
 				kmem_list3_init(cachep->nodelists[node]);
 			}
 		}
@@ -3815,8 +3840,10 @@ static int alloc_kmemlist(struct kmem_ca
 	struct array_cache *new_shared;
 	struct array_cache **new_alien = NULL;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	printk("alloc_kmemlist %s\n", cachep->name);
+	for_each_online_node(node) {
 
+		printk(" o node %d\n", node);
                 if (use_alien_caches) {
                         new_alien = alloc_alien_cache(node, cachep->limit);
                         if (!new_alien)
@@ -3837,6 +3864,7 @@ static int alloc_kmemlist(struct kmem_ca
 		l3 = cachep->nodelists[node];
 		if (l3) {
 			struct array_cache *shared = l3->shared;
+			printk(" o l3 exists\n");
 
 			spin_lock_irq(&l3->list_lock);
 
@@ -3856,10 +3884,12 @@ static int alloc_kmemlist(struct kmem_ca
 			free_alien_cache(new_alien);
 			continue;
 		}
+		printk(" o allocing l3\n");
 		l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node);
 		if (!l3) {
 			free_alien_cache(new_alien);
 			kfree(new_shared);
+			printk(" o allocing l3 failed\n");
 			goto fail;
 		}
 
@@ -3871,6 +3901,7 @@ static int alloc_kmemlist(struct kmem_ca
 		l3->free_limit = (1 + nr_cpus_node(node)) *
 					cachep->batchcount + cachep->num;
 		cachep->nodelists[node] = l3;
+		printk(" o setting node %d 0x%lX\n", node, (unsigned long)l3);
 	}
 	return 0;
 
@@ -3886,6 +3917,7 @@ fail:
 				free_alien_cache(l3->alien);
 				kfree(l3);
 				cachep->nodelists[node] = NULL;
+				printk(" o setting node %d FAIL NULL\n", node);
 			}
 			node--;
 		}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 22:50                           ` Mel Gorman
@ 2008-01-22 22:57                             ` Christoph Lameter
  2008-01-22 23:10                               ` Mel Gorman
  2008-01-22 22:59                             ` Pekka Enberg
  1 sibling, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 22:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Tue, 22 Jan 2008, Mel Gorman wrote:

> > > Whatever this was a problem fixed in the past or not, it's broken again now
> > > :( . It's possible that there is a __GFP_THISNODE that can be dropped early
> > > at boot-time that would also fix this problem in a way that doesn't
> > > affect runtime (like altering cache_grow in my patch does).
> > 
> > The dropping of GFP_THISNODE has the same effect as your patch. 
> 
> The dropping of it totally? If so, this patch might fix a boot but it'll
> potentially be a performance regression on NUMA machines that only have
> nodes with memory, right?

No the dropping during early allocations.,

> o 0
> o 2
> Nodes with regular memory
> o 2
> Current running CPU 0 is associated with node 0
> Current node is 0
> 
> So node 2 has regular memory but it's trying to use node 0 at a glance.
> I've attached the patch I used against 2.6.24-rc8. It includes the revert.

We need the current processor to be attached to a node that has 
memory. We cannot fall back that early because the structures for the 
other nodes do not exist yet.

> Online nodes
> o 0
> o 2
> Nodes with regular memory
> o 2
> Current running CPU 0 is associated with node 0
> Current node is 0
>  o kmem_list3_init

This needs to be node 2.

> [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338
> [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224

Fallback during bootstrap.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 22:57                             ` Christoph Lameter
@ 2008-01-22 23:10                               ` Mel Gorman
  2008-01-22 23:14                                 ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2008-01-22 23:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On (22/01/08 14:57), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
> 
> > > > Whatever this was a problem fixed in the past or not, it's broken again now
> > > > :( . It's possible that there is a __GFP_THISNODE that can be dropped early
> > > > at boot-time that would also fix this problem in a way that doesn't
> > > > affect runtime (like altering cache_grow in my patch does).
> > > 
> > > The dropping of GFP_THISNODE has the same effect as your patch. 
> > 
> > The dropping of it totally? If so, this patch might fix a boot but it'll
> > potentially be a performance regression on NUMA machines that only have
> > nodes with memory, right?
> 
> No the dropping during early allocations.,
> 

We can live with that if the machine otherwise survives during tests.
They are kicked off at the moment with CONFIG_SLAB_DEBUG set but the point
is moot if the patch doesn't work for Olaf. Am still waiting to hear if
the two patches in combination work for him.

> > o 0
> > o 2
> > Nodes with regular memory
> > o 2
> > Current running CPU 0 is associated with node 0
> > Current node is 0
> > 
> > So node 2 has regular memory but it's trying to use node 0 at a glance.
> > I've attached the patch I used against 2.6.24-rc8. It includes the revert.
> 
> We need the current processor to be attached to a node that has 
> memory. We cannot fall back that early because the structures for the 
> other nodes do not exist yet.
> 

Or bodge it early in the boot process so that a node with memory is
always used.

> > Online nodes
> > o 0
> > o 2
> > Nodes with regular memory
> > o 2
> > Current running CPU 0 is associated with node 0
> > Current node is 0
> >  o kmem_list3_init
> 
> This needs to be node 2.
> 

Rather it should be 2. I'll admit the physical setup of this machine is
.... less than ideal but clearly it's something that can happen even if
it's a bad idea.

> > [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338
> > [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224
> 
> Fallback during bootstrap.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 23:10                               ` Mel Gorman
@ 2008-01-22 23:14                                 ` Christoph Lameter
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 23:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Tue, 22 Jan 2008, Mel Gorman wrote:

> Rather it should be 2. I'll admit the physical setup of this machine is
> .... less than ideal but clearly it's something that can happen even if
> it's a bad idea.

Ok. Lets hope that Pekka's find does the trick. But this would mean that 
fallback gets memory from node 2 for the page allocator. Then fallback 
alloc is going to try to insert it into the l3 of node 2 which is not 
there yet. So another ooops. Sigh.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 22:50                           ` Mel Gorman
  2008-01-22 22:57                             ` Christoph Lameter
@ 2008-01-22 22:59                             ` Pekka Enberg
  2008-01-22 23:12                               ` Christoph Lameter
  1 sibling, 1 reply; 61+ messages in thread
From: Pekka Enberg @ 2008-01-22 22:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

Mel Gorman wrote:
> Faulting instruction address: 0xc0000000003c8c00
> cpu 0x0: Vector: 300 (Data Access) at [c0000000005c3840]
>     pc: c0000000003c8c00: __lock_text_start+0x20/0x88
>     lr: c0000000000dadec: .cache_grow+0x7c/0x338
>     sp: c0000000005c3ac0
>    msr: 8000000000009032
>    dar: 40
>  dsisr: 40000000
>   current = 0xc000000000500f10
>   paca    = 0xc000000000501b80
>     pid   = 0, comm = swapper
> enter ? for help
> [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338
> [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224
> [c0000000005c3cb0] c0000000000db958 .kmem_cache_alloc+0xe0/0x14c
> [c0000000005c3d50] c0000000000dcccc .kmem_cache_create+0x230/0x4cc
> [c0000000005c3e30] c0000000004c05f4 .kmem_cache_init+0x310/0x640
> [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc
> [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0
> 0:mon>

I mentioned this already but received no response (maybe I am missing 
something totally obvious here):

When we call fallback_alloc() because the current node has ->nodelists 
set to NULL, we end up calling kmem_getpages() with -1 as the node id 
which is then translated to numa_node_id() by alloc_pages_node. But the 
reason we called fallback_alloc() in the first place is because 
numa_node_id() doesn't have a ->nodelist which makes cache_grow() oops.

			Pekka

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 22:59                             ` Pekka Enberg
@ 2008-01-22 23:12                               ` Christoph Lameter
  2008-01-22 23:18                                 ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 23:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Pekka Enberg wrote:

> When we call fallback_alloc() because the current node has ->nodelists set to
> NULL, we end up calling kmem_getpages() with -1 as the node id which is then
> translated to numa_node_id() by alloc_pages_node. But the reason we called
> fallback_alloc() in the first place is because numa_node_id() doesn't have a
> ->nodelist which makes cache_grow() oops.

Right, if nodeid == -1 then we need to call alloc_pages... 
Essentiall a revert of 50c85a19e7b3928b5b5188524c44ffcbacdd4e35 from 2005.

But I doubt that this is it. The fallback logic was added later and it 
worked fine.


---
 mm/slab.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2008-01-22 15:05:26.185452369 -0800
+++ linux-2.6/mm/slab.c	2008-01-22 15:05:59.301637009 -0800
@@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
 		flags |= __GFP_RECLAIMABLE;
 
-	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	if (nodeid == -1)
+		page = alloc_pages(flags, cachep->gfporder);
+	else
+		page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+
 	if (!page)
 		return NULL;
 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 23:12                               ` Christoph Lameter
@ 2008-01-22 23:18                                 ` Christoph Lameter
  2008-01-23  8:19                                   ` Pekka Enberg
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 23:18 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

On Tue, 22 Jan 2008, Christoph Lameter wrote:

> But I doubt that this is it. The fallback logic was added later and it 
> worked fine.

My patch is useless (fascinating history of the changelog there through). 
fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that 
alloc_pages_node() will try to allocate on the current node but fallback 
to neighboring node if nothing is there....

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 23:18                                 ` Christoph Lameter
@ 2008-01-23  8:19                                   ` Pekka Enberg
  2008-01-23  8:40                                     ` Olaf Hering
  0 siblings, 1 reply; 61+ messages in thread
From: Pekka Enberg @ 2008-01-23  8:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

Hi Christoph,

On Jan 23, 2008 1:18 AM, Christoph Lameter <clameter@sgi.com> wrote:
> My patch is useless (fascinating history of the changelog there through).
> fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that
> alloc_pages_node() will try to allocate on the current node but fallback
> to neighboring node if nothing is there....

Sure, but I was referring to the scenario where current node _has_
pages available but no ->nodelists. Olaf, did you try it?

                        Pekka

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-23  8:19                                   ` Pekka Enberg
@ 2008-01-23  8:40                                     ` Olaf Hering
  0 siblings, 0 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-23  8:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Wed, Jan 23, Pekka Enberg wrote:

> Hi Christoph,
> 
> On Jan 23, 2008 1:18 AM, Christoph Lameter <clameter@sgi.com> wrote:
> > My patch is useless (fascinating history of the changelog there through).
> > fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that
> > alloc_pages_node() will try to allocate on the current node but fallback
> > to neighboring node if nothing is there....
> 
> Sure, but I was referring to the scenario where current node _has_
> pages available but no ->nodelists. Olaf, did you try it?

Does not help.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 19:54                   ` Mel Gorman
  2008-01-22 20:11                     ` Christoph Lameter
@ 2008-01-22 21:45                     ` Olaf Hering
  2008-01-22 22:12                       ` Nish Aravamudan
  2008-01-22 22:23                       ` Christoph Lameter
  1 sibling, 2 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-22 21:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Tue, Jan 22, Mel Gorman wrote:

> http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
> .. Can you please check on your machine if it fixes your problem?

It does not fix or change the nature of the crash.

> Olaf, please confirm whether you need the patch below as well as the
> revert to make your machine boot.

It crashes now in a different way if the patch below is applied:

Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA             0 ->   892928
  Normal     892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    1:        0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1  
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency   = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
Unable to handle kernel paging request for data at address 0x00000058
Faulting instruction address: 0xc0000000000fe018
cpu 0x0: Vector: 300 (Data Access) at [c00000000075bac0]
    pc: c0000000000fe018: .setup_cpu_cache+0x184/0x1f4
    lr: c0000000000fdfa8: .setup_cpu_cache+0x114/0x1f4
    sp: c00000000075bd40
   msr: 8000000000009032
   dar: 58
 dsisr: 42000000
  current = 0xc000000000665a50
  paca    = 0xc000000000666380
    pid   = 0, comm = swapper
enter ? for help
[c00000000075bd40] c0000000000fb368 .kmem_cache_create+0x3c0/0x478 (unreliable)
[c00000000075be20] c0000000005e6780 .kmem_cache_init+0x284/0x4f4
[c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
[c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
0:mon> 

0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
2106                                    BUG_ON(!cachep->nodelists[node]);
2107                                    kmem_list3_init(cachep->nodelists[node]);
2108                            }
2109                    }
2110            }
2111            cachep->nodelists[numa_node_id()]->next_reap =
2112                            jiffies + REAPTIMEOUT_LIST3 +
2113                            ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
2114
2115            cpu_cache_get(cachep)->avail = 0;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 21:45                     ` Olaf Hering
@ 2008-01-22 22:12                       ` Nish Aravamudan
  2008-01-22 22:23                       ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Nish Aravamudan @ 2008-01-22 22:12 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki, Christoph Lameter

On 1/22/08, Olaf Hering <olaf@aepfle.de> wrote:
> On Tue, Jan 22, Mel Gorman wrote:
>
> > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
> > .. Can you please check on your machine if it fixes your problem?
>
> It does not fix or change the nature of the crash.
>
> > Olaf, please confirm whether you need the patch below as well as the
> > revert to make your machine boot.
>
> It crashes now in a different way if the patch below is applied:

Was this with the revert Mel mentioned applied as well? I get the
feeling both patches are needed to fix up the memoryless SLAB issue.

> Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008

<snip>

> early_node_map[1] active PFN ranges
>     1:        0 ->   892928

<snip>

> Unable to handle kernel paging request for data at address 0x00000058
> Faulting instruction address: 0xc0000000000fe018
> cpu 0x0: Vector: 300 (Data Access) at [c00000000075bac0]
>     pc: c0000000000fe018: .setup_cpu_cache+0x184/0x1f4
>     lr: c0000000000fdfa8: .setup_cpu_cache+0x114/0x1f4
>     sp: c00000000075bd40
>    msr: 8000000000009032
>    dar: 58
>  dsisr: 42000000
>   current = 0xc000000000665a50
>   paca    = 0xc000000000666380
>     pid   = 0, comm = swapper
> enter ? for help
> [c00000000075bd40] c0000000000fb368 .kmem_cache_create+0x3c0/0x478 (unreliable)
> [c00000000075be20] c0000000005e6780 .kmem_cache_init+0x284/0x4f4
> [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
> [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
> 0:mon>
>
> 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> 2106                                    BUG_ON(!cachep->nodelists[node]);
> 2107                                    kmem_list3_init(cachep->nodelists[node]);

I might be barking up the wrong tree, but this block above is supposed
to set up the cachep->nodeslists[*] that are used immediately below.
But if the loop wasn't changed from N_NORMAL_MEMORY to N_ONLINE or
whatever, you might get a bad access right below for node 0 that has
no memory, if that's the node we're running on...

> 2108                            }
> 2109                    }
> 2110            }
> 2111            cachep->nodelists[numa_node_id()]->next_reap =
> 2112                            jiffies + REAPTIMEOUT_LIST3 +
> 2113                            ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> 2114
> 2115            cpu_cache_get(cachep)->avail = 0;

Thanks,
Nish

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 21:45                     ` Olaf Hering
  2008-01-22 22:12                       ` Nish Aravamudan
@ 2008-01-22 22:23                       ` Christoph Lameter
  2008-01-23  7:58                         ` Olaf Hering
  1 sibling, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-22 22:23 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Tue, 22 Jan 2008, Olaf Hering wrote:

> It crashes now in a different way if the patch below is applied:

Yup no l3 structure for the current node. We are early in boostrap. You 
could just check if the l3 is there and if not just skip starting the 
reaper? This will be redone later anyways. Not sure if this will solve all 
your issues though. An l3 for the current node that we are booting on 
needs to be created early on for SLAB bootstrap to succeed. AFAICT SLUB 
doesnt care and simply uses whatever the page allocator gives it for the 
cpu slab. We may have gotten there because you only tested with SLUB 
recently and thus changes got in that broke SLAB boot assumptions.

> 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> 2106                                    BUG_ON(!cachep->nodelists[node]);
> 2107                                    kmem_list3_init(cachep->nodelists[node]);
> 2108                            }
> 2109                    }
> 2110            }

if (cachep->nodelists[numa_node_id()])
	return;

> 2111            cachep->nodelists[numa_node_id()]->next_reap =
> 2112                            jiffies + REAPTIMEOUT_LIST3 +
> 2113                            ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> 2114
> 2115            cpu_cache_get(cachep)->avail = 0;
> 
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-22 22:23                       ` Christoph Lameter
@ 2008-01-23  7:58                         ` Olaf Hering
  2008-01-23 10:50                           ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Olaf Hering @ 2008-01-23  7:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Tue, Jan 22, Christoph Lameter wrote:

> > 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> > 2106                                    BUG_ON(!cachep->nodelists[node]);
> > 2107                                    kmem_list3_init(cachep->nodelists[node]);
> > 2108                            }
> > 2109                    }
> > 2110            }
> 
> if (cachep->nodelists[numa_node_id()])
> 	return;

Does not help.


Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #48 SMP Wed Jan 23 08:54:23 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA             0 ->   892928
  Normal     892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    1:        0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1  
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency   = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32(DMA)'

Rebooting in 1 seconds..

---
 mm/slab.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void)
 		/* Replace the static kmem_list3 structures for the boot cpu */
 		init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node);
 
-		for_each_node_state(nid, N_NORMAL_MEMORY) {
+		for_each_online_node(nid) {
 			init_list(malloc_sizes[INDEX_AC].cs_cachep,
 				  &initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct 
 {
 	int node;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	for_each_online_node(node) {
 		cachep->nodelists[node] = &initkmem_list3[index + node];
 		cachep->nodelists[node]->next_reap = jiffies +
 		    REAPTIMEOUT_LIST3 +
@@ -2108,6 +2108,8 @@ static int __init_refok setup_cpu_cache(
 			}
 		}
 	}
+	if (!cachep->nodelists[numa_node_id()])
+		return -ENODEV;
 	cachep->nodelists[numa_node_id()]->next_reap =
 			jiffies + REAPTIMEOUT_LIST3 +
 			((unsigned long)cachep) % REAPTIMEOUT_LIST3;
@@ -2775,6 +2777,11 @@ static int cache_grow(struct kmem_cache 
 	/* Take the l3 list lock to change the colour_next on this node */
 	check_irq_off();
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
+	BUG_ON(!l3);
 	spin_lock(&l3->list_lock);
 
 	/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3324,10 @@ static void *____cache_alloc_node(struct
 	int x;
 
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
 	BUG_ON(!l3);
 
 retry:
@@ -3815,7 +3826,7 @@ static int alloc_kmemlist(struct kmem_ca
 	struct array_cache *new_shared;
 	struct array_cache **new_alien = NULL;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	for_each_online_node(node) {
 
                 if (use_alien_caches) {
                         new_alien = alloc_alien_cache(node, cachep->limit);

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-23  7:58                         ` Olaf Hering
@ 2008-01-23 10:50                           ` Mel Gorman
  2008-01-23 12:14                             ` Olaf Hering
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2008-01-23 10:50 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On (23/01/08 08:58), Olaf Hering didst pronounce:
> On Tue, Jan 22, Christoph Lameter wrote:
> 
> > > 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> > > 2106                                    BUG_ON(!cachep->nodelists[node]);
> > > 2107                                    kmem_list3_init(cachep->nodelists[node]);
> > > 2108                            }
> > > 2109                    }
> > > 2110            }
> > 
> > if (cachep->nodelists[numa_node_id()])
> > 	return;
> 
> Does not help.
> 

Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
following patch against 2.6.24-rc8 please? It contains the debug information
that helped me figure out what was going wrong on the PPC64 machine here,
the revert and the !l3 checks (i.e. the two patches that made machines I
have access to work). Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-015_debug_slab/mm/slab.c
--- linux-2.6.24-rc8-clean/mm/slab.c	2008-01-16 04:22:48.000000000 +0000
+++ linux-2.6.24-rc8-015_debug_slab/mm/slab.c	2008-01-23 10:44:36.000000000 +0000
@@ -348,6 +348,7 @@ static int slab_early_init = 1;
 
 static void kmem_list3_init(struct kmem_list3 *parent)
 {
+	printk(" o kmem_list3_init\n");
 	INIT_LIST_HEAD(&parent->slabs_full);
 	INIT_LIST_HEAD(&parent->slabs_partial);
 	INIT_LIST_HEAD(&parent->slabs_free);
@@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long 
 	 * kmem_list3 and not this cpu's kmem_list3
 	 */
 
+	printk("cpuup_prepare %ld\n", cpu);
 	list_for_each_entry(cachep, &cache_chain, next) {
 		/*
 		 * Set up the size64 kmemlist for cpu before we can
@@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long 
 		 * node has not already allocated this
 		 */
 		if (!cachep->nodelists[node]) {
+			printk(" o allocing %s %d\n", cachep->name, node);
 			l3 = kmalloc_node(memsize, GFP_KERNEL, node);
 			if (!l3)
 				goto bad;
@@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long 
 			 * protection here.
 			 */
 			cachep->nodelists[node] = l3;
+			printk(" o l3 setup\n");
 		}
 
 		spin_lock_irq(&cachep->nodelists[node]->list_lock);
@@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long 
 	}
 	return 0;
 bad:
+	printk(" o bad\n");
 	cpuup_canceled(cpu);
 	return -ENOMEM;
 }
@@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache 
 	spin_lock_init(&ptr->list_lock);
 
 	MAKE_ALL_LISTS(cachep, ptr, nodeid);
+	printk("init_list RESETTING %s node %d\n", cachep->name, nodeid);
 	cachep->nodelists[nodeid] = ptr;
 	local_irq_enable();
 }
@@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void)
 		numa_platform = 0;
 	}
 
+	printk("Online nodes\n");
+	for_each_online_node(node)
+		printk("o %d\n", node);
+	printk("Nodes with regular memory\n");
+	for_each_node_state(node, N_NORMAL_MEMORY)
+		printk("o %d\n", node);
+	printk("Current running CPU %d is associated with node %d\n",
+		smp_processor_id(),
+		cpu_to_node(smp_processor_id()));
+	printk("Current node is %d\n",
+		numa_node_id());
+
 	for (i = 0; i < NUM_INIT_LISTS; i++) {
 		kmem_list3_init(&initkmem_list3[i]);
 		if (i < MAX_NUMNODES)
 			cache_cache.nodelists[i] = NULL;
+		printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, i);
 	}
 
 	/*
@@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void)
 	cache_cache.colour_off = cache_line_size();
 	cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
 	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];
+	printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, node);
+	printk("kmem_cache_init Setting %s initkmem_list3 %d\n", cache_cache.name, node);
 
 	/*
 	 * struct kmem_cache size depends on nr_node_ids, which
@@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void)
 		/* Replace the static kmem_list3 structures for the boot cpu */
 		init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node);
 
-		for_each_node_state(nid, N_NORMAL_MEMORY) {
+		for_each_online_node(nid) {
 			init_list(malloc_sizes[INDEX_AC].cs_cachep,
 				  &initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct 
 {
 	int node;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	printk("set_up_list3s %s index %d\n", cachep->name, index);
+	for_each_online_node(node) {
 		cachep->nodelists[node] = &initkmem_list3[index + node];
 		cachep->nodelists[node]->next_reap = jiffies +
 		    REAPTIMEOUT_LIST3 +
 		    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+		printk("set_up_list3s %s index %d\n", cachep->name, index);
 	}
 }
 
@@ -2099,11 +2122,13 @@ static int __init_refok setup_cpu_cache(
 			g_cpucache_up = PARTIAL_L3;
 		} else {
 			int node;
-			for_each_node_state(node, N_NORMAL_MEMORY) {
+			printk("setup_cpu_cache %s\n", cachep->name);
+			for_each_online_node(node) {
 				cachep->nodelists[node] =
 				    kmalloc_node(sizeof(struct kmem_list3),
 						GFP_KERNEL, node);
 				BUG_ON(!cachep->nodelists[node]);
+				printk(" o allocated node %d\n", node);
 				kmem_list3_init(cachep->nodelists[node]);
 			}
 		}
@@ -2775,6 +2800,11 @@ static int cache_grow(struct kmem_cache 
 	/* Take the l3 list lock to change the colour_next on this node */
 	check_irq_off();
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
+	BUG_ON(!l3);
 	spin_lock(&l3->list_lock);
 
 	/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3347,10 @@ static void *____cache_alloc_node(struct
 	int x;
 
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
 	BUG_ON(!l3);
 
 retry:
@@ -3815,8 +3849,10 @@ static int alloc_kmemlist(struct kmem_ca
 	struct array_cache *new_shared;
 	struct array_cache **new_alien = NULL;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	printk("alloc_kmemlist %s\n", cachep->name);
+	for_each_online_node(node) {
 
+		printk(" o node %d\n", node);
                 if (use_alien_caches) {
                         new_alien = alloc_alien_cache(node, cachep->limit);
                         if (!new_alien)
@@ -3837,6 +3873,7 @@ static int alloc_kmemlist(struct kmem_ca
 		l3 = cachep->nodelists[node];
 		if (l3) {
 			struct array_cache *shared = l3->shared;
+			printk(" o l3 exists\n");
 
 			spin_lock_irq(&l3->list_lock);
 
@@ -3856,10 +3893,12 @@ static int alloc_kmemlist(struct kmem_ca
 			free_alien_cache(new_alien);
 			continue;
 		}
+		printk(" o allocing l3\n");
 		l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node);
 		if (!l3) {
 			free_alien_cache(new_alien);
 			kfree(new_shared);
+			printk(" o allocing l3 failed\n");
 			goto fail;
 		}
 
@@ -3871,6 +3910,7 @@ static int alloc_kmemlist(struct kmem_ca
 		l3->free_limit = (1 + nr_cpus_node(node)) *
 					cachep->batchcount + cachep->num;
 		cachep->nodelists[node] = l3;
+		printk(" o setting node %d 0x%lX\n", node, (unsigned long)l3);
 	}
 	return 0;
 
@@ -3886,6 +3926,7 @@ fail:
 				free_alien_cache(l3->alien);
 				kfree(l3);
 				cachep->nodelists[node] = NULL;
+				printk(" o setting node %d FAIL NULL\n", node);
 			}
 			node--;
 		}

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-23 10:50                           ` Mel Gorman
@ 2008-01-23 12:14                             ` Olaf Hering
  2008-01-23 12:52                               ` Olaf Hering
  2008-01-23 13:41                               ` crash in kmem_cache_init Mel Gorman
  0 siblings, 2 replies; 61+ messages in thread
From: Olaf Hering @ 2008-01-23 12:14 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Wed, Jan 23, Mel Gorman wrote:

> Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
> following patch against 2.6.24-rc8 please? It contains the debug information
> that helped me figure out what was going wrong on the PPC64 machine here,
> the revert and the !l3 checks (i.e. the two patches that made machines I
> have access to work). Thanks

It boots with your change.


boot: x
Please wait, loading kernel...
Allocated 00a00000 bytes for kernel @ 00200000
   Elf64 kernel loaded...
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line: debug xmon=on panic=1 loglevel=8 
memory layout at init:
  alloc_bottom : 0000000000ac1000
  alloc_top    : 0000000010000000
  alloc_top_hi : 00000000da000000
  rmo_top      : 0000000010000000
  ram_top      : 00000000da000000
Looking for displays
found display   : /pci@800000020000002/pci@2/pci@1/display@0, opening ... done
instantiating rtas at 0x000000000f6a1000 ... done
0000000000000000 : boot cpu     0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
0000000000000004 : starting cpu hw idx 0000000000000004... done
0000000000000006 : starting cpu hw idx 0000000000000006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4
Device tree struct  0x0000000000cc4000 -> 0x0000000000cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #52 SMP Wed Jan 23 13:05:38 CET 2008
-----------------------------------------------------
ppc64_pft_size                = 0x1c
physicalMemorySize            = 0xda000000
htab_hash_mask                = 0x1fffff
-----------------------------------------------------
Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #52 SMP Wed Jan 23 13:05:38 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA             0 ->   892928
  Normal     892928 ->   892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    1:        0 ->   892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 880720
Policy zone: DMA
Kernel command line: debug xmon=on panic=1 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency   = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
Online nodes
o 0
o 1
Nodes with regular memory
o 1
Current running CPU 0 is associated with node 0
Current node is 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 0
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 1
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 2
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 3
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 4
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 5
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 6
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 7
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 8
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 9
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 10
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 11
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 12
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 13
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 14
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 15
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 16
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 17
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 18
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 19
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 20
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 21
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 22
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 23
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 24
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 25
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 26
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 27
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 28
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 29
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 30
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 31
 o kmem_list3_init
kmem_cache_init Setting kmem_cache NULL 32
kmem_cache_init Setting kmem_cache NULL 0
kmem_cache_init Setting kmem_cache initkmem_list3 0
set_up_list3s size-32 index 1
set_up_list3s size-32 index 1
set_up_list3s size-32 index 1
set_up_list3s size-128 index 17
set_up_list3s size-128 index 17
set_up_list3s size-128 index 17
setup_cpu_cache size-32(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-64
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-64(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-128(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-256
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-256(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-512
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-512(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-1024
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-1024(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-2048
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-2048(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-4096
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-4096(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-8192
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-8192(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-16384
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-16384(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-32768
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-32768(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-65536
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-65536(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-131072
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-131072(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-262144
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-262144(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-524288
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-524288(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-1048576
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-1048576(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-2097152
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-2097152(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-4194304
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-4194304(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-8388608
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-8388608(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-16777216
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
setup_cpu_cache size-16777216(DMA)
 o allocated node 0
 o kmem_list3_init
 o allocated node 1
 o kmem_list3_init
init_list RESETTING kmem_cache node 0
init_list RESETTING size-32 node 0
init_list RESETTING size-128 node 0
init_list RESETTING size-32 node 1
init_list RESETTING size-128 node 1
alloc_kmemlist size-16777216(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-16777216
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-8388608(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-8388608
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-4194304(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-4194304
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-2097152(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-2097152
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-1048576(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-1048576
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-524288(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-524288
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-262144(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-262144
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-131072(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-131072
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-65536(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-65536
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-32768(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-32768
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-16384(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-16384
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-8192(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-8192
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-4096(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-4096
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-2048(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-2048
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-1024(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-1024
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-512(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-512
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-256(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-256
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-128(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-64(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-64
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-32(DMA)
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-128
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist size-32
 o node 0
 o l3 exists
 o node 1
 o l3 exists
alloc_kmemlist kmem_cache
 o node 0
 o l3 exists
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D802FA00
alloc_kmemlist numa_policy
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D802FC00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D802FD80
alloc_kmemlist shared_policy_node
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D802FF00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D803E180
Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320)
alloc_kmemlist pid_1
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D803E300
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D803E480
alloc_kmemlist pid_namespace
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D803E600
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D803E780
alloc_kmemlist pgd_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D803E900
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D803EA80
alloc_kmemlist pud_pmd_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D803EC00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D803ED80
alloc_kmemlist anon_vma
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D803EF00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D804C180
alloc_kmemlist task_struct
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D804C300
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D804C480
alloc_kmemlist sighand_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D804C600
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D804C780
alloc_kmemlist signal_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D804C900
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D804CA80
alloc_kmemlist files_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D804CC00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D804CD80
alloc_kmemlist fs_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D804CF00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8057180
alloc_kmemlist vm_area_struct
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8057300
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8057480
alloc_kmemlist mm_struct
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8057600
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8057780
alloc_kmemlist buffer_head
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8057900
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8057A80
alloc_kmemlist idr_layer_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8057C80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8057E00
alloc_kmemlist key_jar
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8057F80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8066200
Security Framework initialized
Capability LSM initialized
Failure registering Root Plug module with the kernel
Failure registering Root Plug  module with primary security module.
alloc_kmemlist names_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8066380
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8066500
alloc_kmemlist filp
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8066680
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8066800
alloc_kmemlist dentry
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8066980
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8066B00
alloc_kmemlist inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8066C80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8066E00
alloc_kmemlist mnt_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8066F80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8074200
Mount-cache hash table entries: 256
alloc_kmemlist sysfs_dir_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8074380
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8074500
alloc_kmemlist bdev_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8074700
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8074880
alloc_kmemlist radix_tree_node
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8074A00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8074B80
alloc_kmemlist sigqueue
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8074D00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8074E80
alloc_kmemlist proc_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D808E100
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D808E280
alloc_kmemlist taskstats
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D808E400
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D808E580
alloc_kmemlist task_delay_info
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D808E700
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D808E880
cpuup_prepare 1
clockevent: decrementer mult[466a] shift[16] cpu[1]
Processor 1 found.
cpuup_prepare 2
clockevent: decrementer mult[466a] shift[16] cpu[2]
Processor 2 found.
cpuup_prepare 3
clockevent: decrementer mult[466a] shift[16] cpu[3]
Processor 3 found.
cpuup_prepare 4
clockevent: decrementer mult[466a] shift[16] cpu[4]
Processor 4 found.
cpuup_prepare 5
clockevent: decrementer mult[466a] shift[16] cpu[5]
Processor 5 found.
cpuup_prepare 6
clockevent: decrementer mult[466a] shift[16] cpu[6]
Processor 6 found.
cpuup_prepare 7
clockevent: decrementer mult[466a] shift[16] cpu[7]
Processor 7 found.
Brought up 8 CPUs
Node 0 CPUs: 0-3
Node 1 CPUs: 4-7
net_namespace: 120 bytes
alloc_kmemlist file_lock_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D82C6680
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D82C6800
alloc_kmemlist skbuff_head_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D82C6980
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D82C6B00
alloc_kmemlist skbuff_fclone_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D82C6D00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D82C6E80
alloc_kmemlist sock_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8372180
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8372300
NET: Registered protocol family 16
IBM eBus Device Driver
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging enabled
PCI: Probing PCI hardware done
Registering pmac pic with sysfs...
alloc_kmemlist bio
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83E9580
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83E9700
alloc_kmemlist biovec-1
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83E9880
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83E9A00
alloc_kmemlist biovec-4
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83E9B80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83E9D00
alloc_kmemlist biovec-16
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83E9E80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83F4100
alloc_kmemlist biovec-64
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83F4300
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83F4480
alloc_kmemlist biovec-128
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83F4600
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83F4780
alloc_kmemlist biovec-256
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D83F4900
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D83F4A80
alloc_kmemlist blkdev_requests
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8401580
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8401700
alloc_kmemlist blkdev_queue
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8401880
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8401A00
alloc_kmemlist blkdev_ioc
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8401B80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8401D00
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
alloc_kmemlist eventpoll_epi
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8439A80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8439C00
alloc_kmemlist eventpoll_pwq
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8439D80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8439F00
alloc_kmemlist TCP
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8486380
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8486500
alloc_kmemlist request_sock_TCP
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8486680
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8486800
alloc_kmemlist tw_sock_TCP
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8486980
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8486B00
alloc_kmemlist UDP
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8486D00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D8486E80
alloc_kmemlist RAW
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D849D180
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D849D300
NET: Registered protocol family 2
Time: timebase clocksource has been installed.
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 3
Switched to high resolution mode on CPU 4
Switched to high resolution mode on CPU 5
Switched to high resolution mode on CPU 6
Switched to high resolution mode on CPU 7
alloc_kmemlist arp_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D849D480
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D849D600
alloc_kmemlist ip_dst_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D849DC80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D849DE00
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
alloc_kmemlist xfrm_dst_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84A8300
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84A8480
alloc_kmemlist secpath_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84A8600
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84A8780
alloc_kmemlist inet_peer_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84A8900
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84A8A80
alloc_kmemlist tcp_bind_bucket
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84A8C00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84A8D80
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
alloc_kmemlist UDP-Lite
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84A8F80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84C6200
alloc_kmemlist ip_mrt_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84C6380
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84C6500
alloc_kmemlist rtas_flash_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D8294880
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84E5F80
alloc_kmemlist hugepte_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84EA800
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84EA680
alloc_kmemlist uid_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84EA400
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84EA280
alloc_kmemlist posix_timers_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D84EA100
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D84EAF00
alloc_kmemlist nsproxy
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85BB180
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85BB300
audit: initializing netlink socket (disabled)
audit(1201090162.460:1): initialized
RTAS daemon started
RTAS: event: 88, Type: Platform Error, Severity: 2
Total HugeTLB memory allocated, 0
alloc_kmemlist shmem_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85BB680
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85BB800
alloc_kmemlist fasync_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85BBA00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85BBB80
alloc_kmemlist kiocb
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85BBD00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85BBE80
alloc_kmemlist kioctx
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85EF180
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85EF300
alloc_kmemlist inotify_watch_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85EF880
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85EFA00
alloc_kmemlist inotify_event_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85EFB80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85EFD00
VFS: Disk quotas dquot_6.5.1
alloc_kmemlist dquot
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85EFE80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85FD100
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
alloc_kmemlist dnotify_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85FD280
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85FD400
alloc_kmemlist reiser_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85FD600
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85FD780
alloc_kmemlist ext3_xattr
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85FD980
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85FDB00
alloc_kmemlist ext3_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D85FDD00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D85FDE80
alloc_kmemlist revoke_record
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6036100
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6036280
alloc_kmemlist revoke_table
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6036400
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6036580
alloc_kmemlist journal_head
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6036700
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6036880
alloc_kmemlist journal_handle
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6036A00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6036B80
alloc_kmemlist ext2_xattr
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6036D80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6036F00
alloc_kmemlist ext2_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6046200
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6046380
alloc_kmemlist hugetlbfs_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6046580
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6046700
alloc_kmemlist fat_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6046900
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6046A80
alloc_kmemlist fat_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6046C80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6046E00
alloc_kmemlist isofs_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6052100
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6052280
alloc_kmemlist mqueue_inode_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6052500
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6052680
alloc_kmemlist bsg_cmd
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6052880
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6052A00
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
alloc_kmemlist cfq_queue
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6052C00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6052D80
alloc_kmemlist cfq_io_context
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D6052F00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D6070180
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
rpaphp: Slot [0001:00:02.0](PCI location=U7879.001.DQD04M6-P1-C3) registered
rpaphp: Slot [0001:00:02.2](PCI location=U7879.001.DQD04M6-P1-C4) registered
rpaphp: Slot [0001:00:02.4](PCI location=U7879.001.DQD04M6-P1-C5) registered
rpaphp: Slot [0001:00:02.6](PCI location=U7879.001.DQD04M6-P1-C6) registered
rpaphp: Slot [0002:00:02.0](PCI location=U7879.001.DQD04M6-P1-C1) registered
rpaphp: Slot [0002:00:02.6](PCI location=U7879.001.DQD04M6-P1-C2) registered
matroxfb: Matrox G450 detected
PInS data found at offset 31168
PInS memtype = 5
matroxfb: 640x480x8bpp (virtual: 640x26214)
matroxfb: framebuffer at 0x40178000000, mapped to 0xd000080080080000, size 33554432
Console: switching to colour frame buffer device 80x30
fb0: MATROX frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ehci_hcd 0000:c8:01.2: EHCI Host Controller
ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:c8:01.2: irq 85, io mem 0x400a0002000
ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0000:c8:01.0: OHCI Host Controller
ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:c8:01.0: irq 85, io mem 0x400a0001000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ohci_hcd 0000:c8:01.1: OHCI Host Controller
ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:c8:01.1: irq 85, io mem 0x400a0000000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
mice: PS/2 mouse device common for all mice
EDAC MC: Ver: 2.1.0 Jan 23 2008
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
/home/olaf/kernel/git/linux-2.6.24-rc8/drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
oprofile: using ppc64/power5+ performance monitoring.
alloc_kmemlist flow_cache
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D612FA80
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D612FC00
alloc_kmemlist UNIX
 o node 0
 o allocing l3
 o kmem_list3_init
 o setting node 0 0xC0000000D612FE00
 o node 1
 o allocing l3
 o kmem_list3_init
 o setting node 1 0xC0000000D612FF80
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 15
registered taskstats version 1
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
VFS: Cannot open root device "<NULL>" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
Rebooting in 1 seconds..    

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-23 12:14                             ` Olaf Hering
@ 2008-01-23 12:52                               ` Olaf Hering
  2008-01-23 13:55                                 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman
  2008-01-23 13:41                               ` crash in kmem_cache_init Mel Gorman
  1 sibling, 1 reply; 61+ messages in thread
From: Olaf Hering @ 2008-01-23 12:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Wed, Jan 23, Olaf Hering wrote:

> On Wed, Jan 23, Mel Gorman wrote:
> 
> > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
> > following patch against 2.6.24-rc8 please? It contains the debug information
> > that helped me figure out what was going wrong on the PPC64 machine here,
> > the revert and the !l3 checks (i.e. the two patches that made machines I
> > have access to work). Thanks
> 
> It boots with your change.

This version of the patch boots ok for me:
Maybe I made a mistake with earlier patches, no idea.

---
 mm/slab.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void)
 		/* Replace the static kmem_list3 structures for the boot cpu */
 		init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node);
 
-		for_each_node_state(nid, N_NORMAL_MEMORY) {
+		for_each_online_node(nid) {
 			init_list(malloc_sizes[INDEX_AC].cs_cachep,
 				  &initkmem_list3[SIZE_AC + nid], nid);
 
@@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct 
 {
 	int node;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	for_each_online_node(node) {
 		cachep->nodelists[node] = &initkmem_list3[index + node];
 		cachep->nodelists[node]->next_reap = jiffies +
 		    REAPTIMEOUT_LIST3 +
@@ -2099,7 +2099,7 @@ static int __init_refok setup_cpu_cache(
 			g_cpucache_up = PARTIAL_L3;
 		} else {
 			int node;
-			for_each_node_state(node, N_NORMAL_MEMORY) {
+			for_each_online_node(node) {
 				cachep->nodelists[node] =
 				    kmalloc_node(sizeof(struct kmem_list3),
 						GFP_KERNEL, node);
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
 	/* Take the l3 list lock to change the colour_next on this node */
 	check_irq_off();
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
+	BUG_ON(!l3);
 	spin_lock(&l3->list_lock);
 
 	/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct
 	int x;
 
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
 	BUG_ON(!l3);
 
 retry:
@@ -3815,7 +3824,7 @@ static int alloc_kmemlist(struct kmem_ca
 	struct array_cache *new_shared;
 	struct array_cache **new_alien = NULL;
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
+	for_each_online_node(node) {
 
                 if (use_alien_caches) {
                         new_alien = alloc_alien_cache(node, cachep->limit);

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 12:52                               ` Olaf Hering
@ 2008-01-23 13:55                                 ` Mel Gorman
  2008-01-23 14:18                                   ` Pekka J Enberg
                                                     ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Mel Gorman @ 2008-01-23 13:55 UTC (permalink / raw)
  To: akpm, Christoph Lameter, Pekka Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan,
	KAMEZAWA Hiroyuki

This patch in combination with a partial revert of commit
04231b3002ac53f8a64a7bd142fde3fa4b6808c6 fixes a regression between 2.6.23
and 2.6.24-rc8 where a PPC64 machine with all CPUS on a memoryless node fails
to boot. If approved by the SLAB maintainers, it should be merged for 2.6.24.

With memoryless-node configurations, it is possible that all the CPUs are
associated with a node with no memory. Early in the boot process, nodelists
are not setup that allow fallback_alloc to work, an Oops occurs and the
machine fails to boot.

This patch adds the necessary checks to make sure a kmem_list3 exists for
the preferred node used when growing the cache. If the preferred node has
no nodelist then the currently running node is used instead. This
problem only affects the SLAB allocator, SLUB appears to work fine.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>

---
 mm/slab.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
--- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c	2008-01-22 17:46:32.000000000 +0000
+++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c	2008-01-22 18:42:53.000000000 +0000
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
 	/* Take the l3 list lock to change the colour_next on this node */
 	check_irq_off();
 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
+	BUG_ON(!l3);
 	spin_lock(&l3->list_lock);

 	/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct
 	int x;

 	l3 = cachep->nodelists[nodeid];
+	if (!l3) {
+		nodeid = numa_node_id();
+		l3 = cachep->nodelists[nodeid];
+	}
 	BUG_ON(!l3);

 retry:

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 13:55                                 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman
@ 2008-01-23 14:18                                   ` Pekka J Enberg
  2008-01-23 14:32                                     ` Pekka J Enberg
  2008-01-23 18:35                                     ` Christoph Lameter
  2008-01-23 14:27                                   ` Olaf Hering
  2008-01-23 18:41                                   ` Christoph Lameter
  2 siblings, 2 replies; 61+ messages in thread
From: Pekka J Enberg @ 2008-01-23 14:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

Hi Mel,

On Wed, 23 Jan 2008, Mel Gorman wrote:
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
> --- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c	2008-01-22 17:46:32.000000000 +0000
> +++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c	2008-01-22 18:42:53.000000000 +0000
> @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache 
>  	/* Take the l3 list lock to change the colour_next on this node */
>  	check_irq_off();
>  	l3 = cachep->nodelists[nodeid];
> +	if (!l3) {
> +		nodeid = numa_node_id();
> +		l3 = cachep->nodelists[nodeid];
> +	}
> +	BUG_ON(!l3);
>  	spin_lock(&l3->list_lock);
>  
>  	/* Get colour for the slab, and cal the next value. */
> @@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct
>  	int x;
>  
>  	l3 = cachep->nodelists[nodeid];
> +	if (!l3) {
> +		nodeid = numa_node_id();
> +		l3 = cachep->nodelists[nodeid];
> +	}

What guarantees that current node ->nodelists is never NULL?

I still think Christoph's kmem_getpages() patch is correct (to fix 
cache_grow() oops) but I overlooked the fact that none the callers of 
____cache_alloc_node() deal with bootstrapping (with the exception of 
__cache_alloc_node() that even has a comment about it).

But what I am really wondering about is, why wasn't the 
N_NORMAL_MEMORY revert enough? I assume this used to work before so what 
more do we need to revert for 2.6.24?

			Pekka

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 14:18                                   ` Pekka J Enberg
@ 2008-01-23 14:32                                     ` Pekka J Enberg
  2008-01-23 14:49                                       ` Pekka J Enberg
  2008-01-23 18:35                                     ` Christoph Lameter
  1 sibling, 1 reply; 61+ messages in thread
From: Pekka J Enberg @ 2008-01-23 14:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> I still think Christoph's kmem_getpages() patch is correct (to fix 
> cache_grow() oops) but I overlooked the fact that none the callers of 
> ____cache_alloc_node() deal with bootstrapping (with the exception of 
> __cache_alloc_node() that even has a comment about it).

So something like this (totally untested) patch on top of current git:

---
 mm/slab.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
 		flags |= __GFP_RECLAIMABLE;
 
-	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	if (nodeid == -1)
+		page = alloc_pages(flags, cachep->gfporder);
+	else
+		page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+
 	if (!page)
 		return NULL;
 
@@ -2976,8 +2980,9 @@ retry:
 		batchcount = BATCHREFILL_LIMIT;
 	}
 	l3 = cachep->nodelists[node];
+	if (!l3)
+		return NULL;
 
-	BUG_ON(ac->avail > 0 || !l3);
 	spin_lock(&l3->list_lock);
 
 	/* See if we can refill from the shared array */
@@ -3317,7 +3322,8 @@ static void *____cache_alloc_node(struct
 	int x;
 
 	l3 = cachep->nodelists[nodeid];
-	BUG_ON(!l3);
+	if (!l3)
+		return fallback_alloc(cachep, flags);
 
 retry:
 	check_irq_off();
@@ -3394,12 +3400,6 @@ __cache_alloc_node(struct kmem_cache *ca
 	if (unlikely(nodeid == -1))
 		nodeid = numa_node_id();
 
-	if (unlikely(!cachep->nodelists[nodeid])) {
-		/* Node not bootstrapped yet */
-		ptr = fallback_alloc(cachep, flags);
-		goto out;
-	}
-
 	if (nodeid == numa_node_id()) {
 		/*
 		 * Use the locally cached objects if possible.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 14:32                                     ` Pekka J Enberg
@ 2008-01-23 14:49                                       ` Pekka J Enberg
  2008-01-23 15:56                                         ` Mel Gorman
  2008-01-23 18:36                                         ` Christoph Lameter
  0 siblings, 2 replies; 61+ messages in thread
From: Pekka J Enberg @ 2008-01-23 14:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> > I still think Christoph's kmem_getpages() patch is correct (to fix 
> > cache_grow() oops) but I overlooked the fact that none the callers of 
> > ____cache_alloc_node() deal with bootstrapping (with the exception of 
> > __cache_alloc_node() that even has a comment about it).
> 
> So something like this (totally untested) patch on top of current git:

Sorry, removed a BUG_ON() from cache_alloc_refill() by mistake, here's a 
better one:

[PATCH] slab: fix allocation on memoryless nodes
From: Pekka Enberg <penberg@cs.helsinki.fi>

As memoryless nodes do not have a nodelist, change cache_alloc_refill() to bail
out for those and let ____cache_alloc_node() always deal with that by resorting
to fallback_alloc().

Furthermore, don't let kmem_getpages() call alloc_pages_node() if nodeid passed
to it is -1 as the latter will always translate that to numa_node_id() which
might not have ->nodelist that caused the invocation of fallback_alloc() in the
first place (for example, during bootstrap).

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
---
 mm/slab.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
 		flags |= __GFP_RECLAIMABLE;
 
-	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+	if (nodeid == -1)
+		page = alloc_pages(flags, cachep->gfporder);
+	else
+		page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+
 	if (!page)
 		return NULL;
 
@@ -2975,9 +2979,11 @@ retry:
 		 */
 		batchcount = BATCHREFILL_LIMIT;
 	}
+	BUG_ON(ac->avail > 0);
 	l3 = cachep->nodelists[node];
+	if (!l3)
+		return NULL;
 
-	BUG_ON(ac->avail > 0 || !l3);
 	spin_lock(&l3->list_lock);
 
 	/* See if we can refill from the shared array */
@@ -3317,7 +3323,8 @@ static void *____cache_alloc_node(struct
 	int x;
 
 	l3 = cachep->nodelists[nodeid];
-	BUG_ON(!l3);
+	if (!l3)
+		return fallback_alloc(cachep, flags);
 
 retry:
 	check_irq_off();
@@ -3394,12 +3401,6 @@ __cache_alloc_node(struct kmem_cache *ca
 	if (unlikely(nodeid == -1))
 		nodeid = numa_node_id();
 
-	if (unlikely(!cachep->nodelists[nodeid])) {
-		/* Node not bootstrapped yet */
-		ptr = fallback_alloc(cachep, flags);
-		goto out;
-	}
-
 	if (nodeid == numa_node_id()) {
 		/*
 		 * Use the locally cached objects if possible.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 14:49                                       ` Pekka J Enberg
@ 2008-01-23 15:56                                         ` Mel Gorman
  2008-01-23 17:29                                           ` Pekka J Enberg
  2008-01-23 18:36                                         ` Christoph Lameter
  1 sibling, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2008-01-23 15:56 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On (23/01/08 16:49), Pekka J Enberg didst pronounce:
> Hi,
> 
> On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> > > I still think Christoph's kmem_getpages() patch is correct (to fix 
> > > cache_grow() oops) but I overlooked the fact that none the callers of 
> > > ____cache_alloc_node() deal with bootstrapping (with the exception of 
> > > __cache_alloc_node() that even has a comment about it).
> > 
> > So something like this (totally untested) patch on top of current git:
> 
> Sorry, removed a BUG_ON() from cache_alloc_refill() by mistake, here's a 
> better one:
> 

Applied in combination with the N_NORMAL_MEMORY revert and it fails to
boot. Console is as follows;

Linux version 2.6.24-rc8-autokern1 (root@gekko-lp3.ltc.austin.ibm.com)
(gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #2 SMP Wed Jan 23
10:37:36 EST 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA             0 ->  1048576
  Normal    1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
    2:        0 ->  1048576
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on.  Total pages: 1034240
Policy zone: DMA
Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6
ABAT:1201101591 loglevel=8 
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 238.059000 MHz
time_init: processor frequency   = 1904.472000 MHz
clocksource: timebase mult[10cd746] shift[22] registered
clockevent: decrementer mult[3cf1] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg0] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 2
Memory: 4105560k/4194304k available (5004k kernel code, 88744k reserved,
876k data, 559k bss, 272k init)
Unable to handle kernel paging request for data at address 0x00000040
Faulting instruction address: 0xc0000000003c8ae8
cpu 0x0: Vector: 300 (Data Access) at [c0000000005c3840]
    pc: c0000000003c8ae8: __lock_text_start+0x20/0x88
    lr: c0000000000dadb4: .cache_grow+0x7c/0x338
    sp: c0000000005c3ac0
   msr: 8000000000009032
   dar: 40
 dsisr: 40000000
  current = 0xc000000000500f10
  paca    = 0xc000000000501b80
    pid   = 0, comm = swapper
enter ? for help
[c0000000005c3b40] c0000000000dadb4 .cache_grow+0x7c/0x338
[c0000000005c3c00] c0000000000db518 .fallback_alloc+0x1c0/0x224
[c0000000005c3cb0] c0000000000db920 .kmem_cache_alloc+0xe0/0x14c
[c0000000005c3d50] c0000000000dcbd0 .kmem_cache_create+0x230/0x4cc
[c0000000005c3e30] c0000000004c049c .kmem_cache_init+0x1ec/0x51c
[c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc
[c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0

0xc0000000000dadb4 is in cache_grow (mm/slab.c:2782).
2777            local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
2778    
2779            /* Take the l3 list lock to change the colour_next on this node */
2780            check_irq_off();
2781            l3 = cachep->nodelists[nodeid];
2782            spin_lock(&l3->list_lock);
2783    
2784            /* Get colour for the slab, and cal the next value. */
2785            offset = l3->colour_next;
2786            l3->colour_next++;

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 15:56                                         ` Mel Gorman
@ 2008-01-23 17:29                                           ` Pekka J Enberg
  2008-01-23 17:42                                             ` Pekka J Enberg
                                                               ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Pekka J Enberg @ 2008-01-23 17:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

Hi,

On Wed, 23 Jan 2008, Mel Gorman wrote:
> Applied in combination with the N_NORMAL_MEMORY revert and it fails to
> boot. Console is as follows;

Thanks for testing!
 
On Wed, 23 Jan 2008, Mel Gorman wrote:
> [c0000000005c3b40] c0000000000dadb4 .cache_grow+0x7c/0x338
> [c0000000005c3c00] c0000000000db518 .fallback_alloc+0x1c0/0x224
> [c0000000005c3cb0] c0000000000db920 .kmem_cache_alloc+0xe0/0x14c
> [c0000000005c3d50] c0000000000dcbd0 .kmem_cache_create+0x230/0x4cc
> [c0000000005c3e30] c0000000004c049c .kmem_cache_init+0x1ec/0x51c
> [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc
> [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0
> 
> 0xc0000000000dadb4 is in cache_grow (mm/slab.c:2782).
> 2777            local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
> 2778    
> 2779            /* Take the l3 list lock to change the colour_next on this node */
> 2780            check_irq_off();
> 2781            l3 = cachep->nodelists[nodeid];
> 2782            spin_lock(&l3->list_lock);
> 2783    
> 2784            /* Get colour for the slab, and cal the next value. */
> 2785            offset = l3->colour_next;
> 2786            l3->colour_next++;

Ok, so it's too early to fallback_alloc() because in kmem_cache_init() we 
do:

        for (i = 0; i < NUM_INIT_LISTS; i++) {
                kmem_list3_init(&initkmem_list3[i]);
                if (i < MAX_NUMNODES)
                        cache_cache.nodelists[i] = NULL;
        }

Fine. But, why are we hitting fallback_alloc() in the first place? It's 
definitely not because of missing ->nodelists as we do:

        cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];

before attempting to set up kmalloc caches. Now, if I understood 
correctly, we're booting off a memoryless node so kmem_getpages() will 
return NULL thus forcing us to fallback_alloc() which is unavailable at 
this point.

As far as I can tell, there are two ways to fix this:

  (1) don't boot off a memoryless node (why are we doing this in the first 
      place?)
  (2) initialize cache_cache.nodelists with initmem_list3 equivalents
      for *each node hat has normal memory*

I am still wondering why this worked before, though.

			Pekka

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 17:29                                           ` Pekka J Enberg
@ 2008-01-23 17:42                                             ` Pekka J Enberg
  2008-01-23 18:51                                             ` Christoph Lameter
  2008-01-23 19:52                                             ` Nishanth Aravamudan
  2 siblings, 0 replies; 61+ messages in thread
From: Pekka J Enberg @ 2008-01-23 17:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Wed, 23 Jan 2008, Pekka J Enberg wrote:
> As far as I can tell, there are two ways to fix this:

[snip]
 
>   (2) initialize cache_cache.nodelists with initmem_list3 equivalents
>       for *each node hat has normal memory*

An untested patch follows:

---
 mm/slab.c |   39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -304,11 +304,11 @@ struct kmem_list3 {
 /*
  * Need this for bootstrapping a per node allocator.
  */
-#define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1)
+#define NUM_INIT_LISTS (3 * MAX_NUMNODES)
 struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
 #define	CACHE_CACHE 0
-#define	SIZE_AC 1
-#define	SIZE_L3 (1 + MAX_NUMNODES)
+#define	SIZE_AC MAX_NUMNODES
+#define	SIZE_L3 (2 * MAX_NUMNODES)
 
 static int drain_freelist(struct kmem_cache *cache,
 			struct kmem_list3 *l3, int tofree);
@@ -1410,6 +1410,22 @@ static void init_list(struct kmem_cache 
 }
 
 /*
+ * For setting up all the kmem_list3s for cache whose buffer_size is same as
+ * size of kmem_list3.
+ */
+static void __init set_up_list3s(struct kmem_cache *cachep, int index)
+{
+	int node;
+
+	for_each_node_state(node, N_NORMAL_MEMORY) {
+		cachep->nodelists[node] = &initkmem_list3[index + node];
+		cachep->nodelists[node]->next_reap = jiffies +
+		    REAPTIMEOUT_LIST3 +
+		    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
+	}
+}
+
+/*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
  */
@@ -1432,6 +1448,7 @@ void __init kmem_cache_init(void)
 		if (i < MAX_NUMNODES)
 			cache_cache.nodelists[i] = NULL;
 	}
+	set_up_list3s(&cache_cache, CACHE_CACHE);
 
 	/*
 	 * Fragmentation resistance on low memory - only use bigger
@@ -1964,22 +1981,6 @@ static void slab_destroy(struct kmem_cac
 	}
 }
 
-/*
- * For setting up all the kmem_list3s for cache whose buffer_size is same as
- * size of kmem_list3.
- */
-static void __init set_up_list3s(struct kmem_cache *cachep, int index)
-{
-	int node;
-
-	for_each_node_state(node, N_NORMAL_MEMORY) {
-		cachep->nodelists[node] = &initkmem_list3[index + node];
-		cachep->nodelists[node]->next_reap = jiffies +
-		    REAPTIMEOUT_LIST3 +
-		    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
-	}
-}
-
 static void __kmem_cache_destroy(struct kmem_cache *cachep)
 {
 	int i;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 17:29                                           ` Pekka J Enberg
  2008-01-23 17:42                                             ` Pekka J Enberg
@ 2008-01-23 18:51                                             ` Christoph Lameter
  2008-01-23 19:52                                             ` Nishanth Aravamudan
  2 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-23 18:51 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Pekka J Enberg wrote:

> Fine. But, why are we hitting fallback_alloc() in the first place? It's 
> definitely not because of missing ->nodelists as we do:
> 
>         cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];
> 
> before attempting to set up kmalloc caches. Now, if I understood 
> correctly, we're booting off a memoryless node so kmem_getpages() will 
> return NULL thus forcing us to fallback_alloc() which is unavailable at 
> this point.
> 
> As far as I can tell, there are two ways to fix this:
> 
>   (1) don't boot off a memoryless node (why are we doing this in the first 
>       place?)

Right. That is the solution that I would prefer.

>   (2) initialize cache_cache.nodelists with initmem_list3 equivalents
>       for *each node hat has normal memory*

Or simply do it for all. SLAB bootstrap is very complex thing though.

> 
> I am still wondering why this worked before, though.

I doubt it did ever work for SLAB.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 17:29                                           ` Pekka J Enberg
  2008-01-23 17:42                                             ` Pekka J Enberg
  2008-01-23 18:51                                             ` Christoph Lameter
@ 2008-01-23 19:52                                             ` Nishanth Aravamudan
  2008-01-23 21:02                                               ` Pekka Enberg
  2 siblings, 1 reply; 61+ messages in thread
From: Nishanth Aravamudan @ 2008-01-23 19:52 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, akpm, KAMEZAWA Hiroyuki,
	Christoph Lameter

On 23.01.2008 [19:29:15 +0200], Pekka J Enberg wrote:
> Hi,
> 
> On Wed, 23 Jan 2008, Mel Gorman wrote:
> > Applied in combination with the N_NORMAL_MEMORY revert and it fails to
> > boot. Console is as follows;
> 
> Thanks for testing!
> 
> On Wed, 23 Jan 2008, Mel Gorman wrote:
> > [c0000000005c3b40] c0000000000dadb4 .cache_grow+0x7c/0x338
> > [c0000000005c3c00] c0000000000db518 .fallback_alloc+0x1c0/0x224
> > [c0000000005c3cb0] c0000000000db920 .kmem_cache_alloc+0xe0/0x14c
> > [c0000000005c3d50] c0000000000dcbd0 .kmem_cache_create+0x230/0x4cc
> > [c0000000005c3e30] c0000000004c049c .kmem_cache_init+0x1ec/0x51c
> > [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc
> > [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0
> > 
> > 0xc0000000000dadb4 is in cache_grow (mm/slab.c:2782).
> > 2777            local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
> > 2778    
> > 2779            /* Take the l3 list lock to change the colour_next on this node */
> > 2780            check_irq_off();
> > 2781            l3 = cachep->nodelists[nodeid];
> > 2782            spin_lock(&l3->list_lock);
> > 2783    
> > 2784            /* Get colour for the slab, and cal the next value. */
> > 2785            offset = l3->colour_next;
> > 2786            l3->colour_next++;
> 
> Ok, so it's too early to fallback_alloc() because in kmem_cache_init() we 
> do:
> 
>         for (i = 0; i < NUM_INIT_LISTS; i++) {
>                 kmem_list3_init(&initkmem_list3[i]);
>                 if (i < MAX_NUMNODES)
>                         cache_cache.nodelists[i] = NULL;
>         }
> 
> Fine. But, why are we hitting fallback_alloc() in the first place? It's 
> definitely not because of missing ->nodelists as we do:
> 
>         cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];
> 
> before attempting to set up kmalloc caches. Now, if I understood 
> correctly, we're booting off a memoryless node so kmem_getpages() will 
> return NULL thus forcing us to fallback_alloc() which is unavailable at 
> this point.
> 
> As far as I can tell, there are two ways to fix this:
> 
>   (1) don't boot off a memoryless node (why are we doing this in the first 
>       place?)

On at least one of the machines in question, wasn't it the case that
node 0 had all the memory and node 1 had all the CPUs? In that case, you
would have to boot off a memoryless node? And as long as that is a
physically valid configuration, the kernel should handle it.

>   (2) initialize cache_cache.nodelists with initmem_list3 equivalents
>       for *each node hat has normal memory*
> 
> I am still wondering why this worked before, though.

I bet we didn't notice this breaking because SLUB became the default and
SLAB isn't on in the test.kernel.org testing, for instance. Perhaps we
should add a second set of runs for some of the boxes there to run with
CONFIG_SLAB on?

I'm curious if we know, for sure, of a kernel with CONFIG_SLAB=y that
has booted all of the boxes reporting issues? That is, did they all work
with 2.6.23?

Thanks,
Nish

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 19:52                                             ` Nishanth Aravamudan
@ 2008-01-23 21:02                                               ` Pekka Enberg
  2008-01-23 21:14                                                 ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: Pekka Enberg @ 2008-01-23 21:02 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, akpm, KAMEZAWA Hiroyuki,
	Christoph Lameter

Hi,

On Jan 23, 2008 9:52 PM, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> On at least one of the machines in question, wasn't it the case that
> node 0 had all the memory and node 1 had all the CPUs? In that case, you
> would have to boot off a memoryless node? And as long as that is a
> physically valid configuration, the kernel should handle it.

Agreed. Here's the patch that should fix it:

http://lkml.org/lkml/2008/1/23/332

On Jan 23, 2008 9:52 PM, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> I bet we didn't notice this breaking because SLUB became the default and
> SLAB isn't on in the test.kernel.org testing, for instance. Perhaps we
> should add a second set of runs for some of the boxes there to run with
> CONFIG_SLAB on?

Sure.

On Jan 23, 2008 9:52 PM, Nishanth Aravamudan <nacc@us.ibm.com> wrote:
> I'm curious if we know, for sure, of a kernel with CONFIG_SLAB=y that
> has booted all of the boxes reporting issues? That is, did they all work
> with 2.6.23?

I think Mel said that their configuration did work with 2.6.23
although I also wonder how that's possible. AFAIK there has been some
changes in the page allocator that might explain this. That is, if
kmem_getpages() returned pages for memoryless node before, bootstrap
would have worked.

                           Pekka

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 21:02                                               ` Pekka Enberg
@ 2008-01-23 21:14                                                 ` Christoph Lameter
  2008-01-23 21:36                                                   ` Nishanth Aravamudan
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Lameter @ 2008-01-23 21:14 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, Nishanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Pekka Enberg wrote:

> I think Mel said that their configuration did work with 2.6.23
> although I also wonder how that's possible. AFAIK there has been some
> changes in the page allocator that might explain this. That is, if
> kmem_getpages() returned pages for memoryless node before, bootstrap
> would have worked.

Regular kmem_getpages is called with GFP_THISNODE set. There was some 
breakage in 2.6.22 and before with GFP_THISNODE returning pages from the 
wrong node if a node had no memory. So it may have worked accidentally and 
in an unsafe manner because the pages would have been associated with the 
wrong node which could trigger bug ons and locking troubles.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 21:14                                                 ` Christoph Lameter
@ 2008-01-23 21:36                                                   ` Nishanth Aravamudan
  2008-01-24  3:13                                                     ` Christoph Lameter
  0 siblings, 1 reply; 61+ messages in thread
From: Nishanth Aravamudan @ 2008-01-23 21:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, akpm,
	KAMEZAWA Hiroyuki

On 23.01.2008 [13:14:26 -0800], Christoph Lameter wrote:
> On Wed, 23 Jan 2008, Pekka Enberg wrote:
> 
> > I think Mel said that their configuration did work with 2.6.23
> > although I also wonder how that's possible. AFAIK there has been some
> > changes in the page allocator that might explain this. That is, if
> > kmem_getpages() returned pages for memoryless node before, bootstrap
> > would have worked.
> 
> Regular kmem_getpages is called with GFP_THISNODE set. There was some
> breakage in 2.6.22 and before with GFP_THISNODE returning pages from
> the wrong node if a node had no memory. So it may have worked
> accidentally and in an unsafe manner because the pages would have been
> associated with the wrong node which could trigger bug ons and locking
> troubles.

Right, so it might have functioned before, but the correctness was
wobbly at best... Certainly the memoryless patch series has tightened
that up, but we missed these SLAB issues.

I see that your patch fixed Olaf's machine, Pekka. Nice work on
everyone's part tracking this stuff down.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 21:36                                                   ` Nishanth Aravamudan
@ 2008-01-24  3:13                                                     ` Christoph Lameter
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-24  3:13 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, akpm,
	KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Nishanth Aravamudan wrote:

> Right, so it might have functioned before, but the correctness was
> wobbly at best... Certainly the memoryless patch series has tightened
> that up, but we missed these SLAB issues.
> 
> I see that your patch fixed Olaf's machine, Pekka. Nice work on
> everyone's part tracking this stuff down.

Another important result is that I found that GFP_THISNODE is actually 
required for proper SLAB operation and not only an optimization. Fallback 
can lead to very bad results. I have two customer reported instances of 
SLAB corruption here that can be explained now due to fallback to another 
node. Foreign objects enter the per cpu queue. The wrong node lock is 
taken during cache_flusharray(). Fields in the struct slab can become 
corrupted. It typically hits the list field and the inuse field.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 14:49                                       ` Pekka J Enberg
  2008-01-23 15:56                                         ` Mel Gorman
@ 2008-01-23 18:36                                         ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-23 18:36 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Pekka J Enberg wrote:

> Furthermore, don't let kmem_getpages() call alloc_pages_node() if nodeid passed
> to it is -1 as the latter will always translate that to numa_node_id() which
> might not have ->nodelist that caused the invocation of fallback_alloc() in the
> first place (for example, during bootstrap).

kmem_getpages is called without GFP_THISNODE. This 
alloc_pages_node(numa_node_id(), ...) will fall back to the next node with 
memory.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 14:18                                   ` Pekka J Enberg
  2008-01-23 14:32                                     ` Pekka J Enberg
@ 2008-01-23 18:35                                     ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-23 18:35 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel,
	linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Pekka J Enberg wrote:

> I still think Christoph's kmem_getpages() patch is correct (to fix 
> cache_grow() oops) but I overlooked the fact that none the callers of 
> ____cache_alloc_node() deal with bootstrapping (with the exception of 
> __cache_alloc_node() that even has a comment about it).

My patch is useless. kmem_getpages called with nodeid == -1 falls back 
correctly to the available node. The problem is that the node structures 
for the page does not exist.
 
> But what I am really wondering about is, why wasn't the 
> N_NORMAL_MEMORY revert enough? I assume this used to work before so what 
> more do we need to revert for 2.6.24?

I think that is because SLUB relaxed the requirements on having regular 
memory on the boot node. Now the expectation is that SLAB can do the same.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 13:55                                 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman
  2008-01-23 14:18                                   ` Pekka J Enberg
@ 2008-01-23 14:27                                   ` Olaf Hering
  2008-01-23 14:42                                     ` Mel Gorman
  2008-01-23 18:41                                   ` Christoph Lameter
  2 siblings, 1 reply; 61+ messages in thread
From: Olaf Hering @ 2008-01-23 14:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, linuxppc-dev, linux-kernel, Linux MM,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On Wed, Jan 23, Mel Gorman wrote:

> This patch in combination with a partial revert of commit
> 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 fixes a regression between 2.6.23
> and 2.6.24-rc8 where a PPC64 machine with all CPUS on a memoryless node fails
> to boot. If approved by the SLAB maintainers, it should be merged for 2.6.24.

This change alone does not help, its not the version I tested.
Will all the changes below go into 2.6.24 as well, in a seperate patch?

-       for_each_node_state(node, N_NORMAL_MEMORY) {
+       for_each_online_node(node) {

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 14:27                                   ` Olaf Hering
@ 2008-01-23 14:42                                     ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2008-01-23 14:42 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, linuxppc-dev, linux-kernel, Linux MM,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On (23/01/08 15:27), Olaf Hering didst pronounce:
> On Wed, Jan 23, Mel Gorman wrote:
> 
> > This patch in combination with a partial revert of commit
> > 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 fixes a regression between 2.6.23
> > and 2.6.24-rc8 where a PPC64 machine with all CPUS on a memoryless node fails
> > to boot. If approved by the SLAB maintainers, it should be merged for 2.6.24.
> 
> This change alone does not help, its not the version I tested.
> Will all the changes below go into 2.6.24 as well, in a seperate patch?
> 
> -       for_each_node_state(node, N_NORMAL_MEMORY) {
> +       for_each_online_node(node) {

Those changes are already in a separate patch and have been sent. I don't
see it in git yet but it should be on the way.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node
  2008-01-23 13:55                                 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman
  2008-01-23 14:18                                   ` Pekka J Enberg
  2008-01-23 14:27                                   ` Olaf Hering
@ 2008-01-23 18:41                                   ` Christoph Lameter
  2 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-23 18:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel,
	linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan,
	akpm, KAMEZAWA Hiroyuki

On Wed, 23 Jan 2008, Mel Gorman wrote:

> This patch adds the necessary checks to make sure a kmem_list3 exists for
> the preferred node used when growing the cache. If the preferred node has
> no nodelist then the currently running node is used instead. This
> problem only affects the SLAB allocator, SLUB appears to work fine.

That is a dangerous thing to do. SLAB per cpu queues will contain foreign 
objects which may cause troubles when pushing the objects back. I think we 
may be lucky that these objects are consumed at boot. If all of the 
foreign objects are consumed at boot then we are fine. At least an 
explanation as to this issue should be added to the patch.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-23 12:14                             ` Olaf Hering
  2008-01-23 12:52                               ` Olaf Hering
@ 2008-01-23 13:41                               ` Mel Gorman
  1 sibling, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2008-01-23 13:41 UTC (permalink / raw)
  To: Olaf Hering
  Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev,
	Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm,
	KAMEZAWA Hiroyuki, Christoph Lameter

On (23/01/08 13:14), Olaf Hering didst pronounce:
> On Wed, Jan 23, Mel Gorman wrote:
> 
> > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the
> > following patch against 2.6.24-rc8 please? It contains the debug information
> > that helped me figure out what was going wrong on the PPC64 machine here,
> > the revert and the !l3 checks (i.e. the two patches that made machines I
> > have access to work). Thanks
> 
> It boots with your change.
> 

....... Nice one! As the only addition here is debugging output, I can
only assume that the two patches were being booted in isolation instead
of combination earlier. The two threads have been a little confused with
hand waving so that can easily happen.

Looking at your log;

> early_node_map[1] active PFN ranges
>     1:        0 ->   892928

All memory on node 1

> Online nodes
> o 0
> o 1
> Nodes with regular memory
> o 1
> Current running CPU 0 is associated with node 0
> Current node is 0

Running CPU associated with node 0 so other than being node 1 instead of
node 2, your machine is similar to the one I had the problem on in terms
of memoryless nodes and CPU configuration.

> VFS: Cannot open root device "<NULL>" or unknown-block(0,0)
> Please append a correct "root=" boot option; here are the available partitions:
> Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
> Rebooting in 1 seconds..    
> 

I see it failed to complete boot but I'm going to assume this is a relatively
normal commane-line, .config or initrd problem and not a regression of
some type.

I'll post a patch suitable for pick-up shortly. The two patches ran in
combination with CONFIG_DEBUG_SLAB a compile-based stress tests without
difficulty so hopefully there is not new surprises hiding in the corners.

Thanks Olaf.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 21:15         ` Olaf Hering
  2008-01-18  6:56           ` Olaf Hering
  2008-01-18 18:47           ` Christoph Lameter
@ 2008-01-18 18:51           ` Christoph Lameter
  2 siblings, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-18 18:51 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Mel Gorman, linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

On Thu, 17 Jan 2008, Olaf Hering wrote:

>   Normal     892928 ->   892928
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
>     1:        0 ->   892928
> Could not find start_pfn for node 0

We only have a single node that is node 1? And then we initialize nodes 0 
to 3?

> Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0

???

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: crash in kmem_cache_init
  2008-01-17 18:12     ` Olaf Hering
  2008-01-17 18:58       ` Christoph Lameter
@ 2008-01-17 19:03       ` Christoph Lameter
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Lameter @ 2008-01-17 19:03 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM

Could you try Pekka's suggestion of reverting  
04231b3002ac53f8a64a7bd142fde3fa4b6808c6 ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2008-01-24  3:13 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-15 15:09 crash in kmem_cache_init Olaf Hering
2008-01-15 15:58 ` Olaf Hering
2008-01-17 12:14 ` Pekka Enberg
2008-01-17 14:30   ` Christoph Lameter
2008-01-17 18:12     ` Olaf Hering
2008-01-17 18:58       ` Christoph Lameter
2008-01-17 19:54         ` Olaf Hering
2008-01-17 20:20           ` Olaf Hering
2008-01-19  4:56             ` Christoph Lameter
2008-01-17 21:15         ` Olaf Hering
2008-01-18  6:56           ` Olaf Hering
2008-01-18 18:42             ` Christoph Lameter
2008-01-19  4:55             ` Christoph Lameter
2008-01-18 18:47           ` Christoph Lameter
2008-01-18 21:30             ` Mel Gorman
2008-01-18 21:43               ` Christoph Lameter
2008-01-18 22:16               ` Christoph Lameter
2008-01-18 22:19                 ` Nish Aravamudan
2008-01-18 22:38                 ` Christoph Lameter
2008-01-18 22:57                 ` Olaf Hering
2008-01-22 19:54                   ` Mel Gorman
2008-01-22 20:11                     ` Christoph Lameter
2008-01-22 21:26                       ` Mel Gorman
2008-01-22 21:34                         ` Christoph Lameter
2008-01-22 22:50                           ` Mel Gorman
2008-01-22 22:57                             ` Christoph Lameter
2008-01-22 23:10                               ` Mel Gorman
2008-01-22 23:14                                 ` Christoph Lameter
2008-01-22 22:59                             ` Pekka Enberg
2008-01-22 23:12                               ` Christoph Lameter
2008-01-22 23:18                                 ` Christoph Lameter
2008-01-23  8:19                                   ` Pekka Enberg
2008-01-23  8:40                                     ` Olaf Hering
2008-01-22 21:45                     ` Olaf Hering
2008-01-22 22:12                       ` Nish Aravamudan
2008-01-22 22:23                       ` Christoph Lameter
2008-01-23  7:58                         ` Olaf Hering
2008-01-23 10:50                           ` Mel Gorman
2008-01-23 12:14                             ` Olaf Hering
2008-01-23 12:52                               ` Olaf Hering
2008-01-23 13:55                                 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman
2008-01-23 14:18                                   ` Pekka J Enberg
2008-01-23 14:32                                     ` Pekka J Enberg
2008-01-23 14:49                                       ` Pekka J Enberg
2008-01-23 15:56                                         ` Mel Gorman
2008-01-23 17:29                                           ` Pekka J Enberg
2008-01-23 17:42                                             ` Pekka J Enberg
2008-01-23 18:51                                             ` Christoph Lameter
2008-01-23 19:52                                             ` Nishanth Aravamudan
2008-01-23 21:02                                               ` Pekka Enberg
2008-01-23 21:14                                                 ` Christoph Lameter
2008-01-23 21:36                                                   ` Nishanth Aravamudan
2008-01-24  3:13                                                     ` Christoph Lameter
2008-01-23 18:36                                         ` Christoph Lameter
2008-01-23 18:35                                     ` Christoph Lameter
2008-01-23 14:27                                   ` Olaf Hering
2008-01-23 14:42                                     ` Mel Gorman
2008-01-23 18:41                                   ` Christoph Lameter
2008-01-23 13:41                               ` crash in kmem_cache_init Mel Gorman
2008-01-18 18:51           ` Christoph Lameter
2008-01-17 19:03       ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).