kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* kernel BUG in __cache_alloc_node at  linux-2.6.git/mm/slab.c:3177!
@ 2006-10-13 18:41 Will Schmidt
  2006-10-13 19:05 ` Christoph Lameter
  0 siblings, 1 reply; 47+ messages in thread
From: Will Schmidt @ 2006-10-13 18:41 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, Christoph Lameter

Hi Folks, 
    Am seeing a crash on a power5 LPAR when booting the linux-2.6 git
tree.  It's fairly early during boot, so I've included the whole log
below.   This partition has 8 procs, (shared, including threads), and
512M RAM.  

A bisect claims: 
765c4507af71c39aba21006bbd3ec809fe9714ff is first bad commit
commit 765c4507af71c39aba21006bbd3ec809fe9714ff
Author: Christoph Lameter <clameter@sgi.com>
Date:   Wed Sep 27 01:50:08 2006 -0700

    [PATCH] GFP_THISNODE for the slab allocator

Am willing to dig deeper, but looking for pointers on what to poke next.

Thanks, 
-Will

-----------------------------------------------------
ppc64_pft_size                = 0x18
physicalMemorySize            = 0x22000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address                  = 0x0000000000000000
htab_hash_mask                = 0x1ffff
-----------------------------------------------------
Linux version 2.6.19-rc1-gb8a3ad5b (willschm@airbag2) (gcc version 4.1.0
(SUSE Linux)) #56 SMP Fri Oct 13 13:06:18 CDT 2006
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: No capable adapters found
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA             0 ->   139264
  Normal     139264 ->   139264
early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->    90112
    1:    90112 ->   139264
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 136576
Kernel command line: root=/dev/sda3  xmon=on
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 530256k/557056k available (5508k kernel code, 30468k reserved,
2224k data, 543k bss, 244k init)
kernel BUG in __cache_alloc_node
at /development/kernels/linux-2.6.git/mm/slab.c:3177!
cpu 0x0: Vector: 700 (Program Check) at [c0000000007938d0]
    pc: c0000000000b3c78: .__cache_alloc_node+0x44/0x1e8
    lr: c0000000000b3ec8: .fallback_alloc+0xac/0xf0
    sp: c000000000793b50
   msr: 8000000000021032
  current = 0xc000000000583a90
  paca    = 0xc000000000584300
    pid   = 0, comm = swapper
kernel BUG in __cache_alloc_node
at /development/kernels/linux-2.6.git/mm/slab.c:3177!
enter ? for help
[c000000000793c00] c0000000000b3ec8 .fallback_alloc+0xac/0xf0
[c000000000793ca0] c0000000000b4478 .kmem_cache_zalloc+0xc8/0x11c
[c000000000793d40] c0000000000b6624 .kmem_cache_create+0x1e8/0x5e0
[c000000000793e30] c00000000053e834 .kmem_cache_init+0x1d8/0x4b0
[c000000000793ef0] c000000000524748 .start_kernel+0x244/0x328
[c000000000793f90] c0000000000084f8 .start_here_common+0x54/0x5c
0:mon>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at  linux-2.6.git/mm/slab.c:3177!
  2006-10-13 18:41 kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177! Will Schmidt
@ 2006-10-13 19:05 ` Christoph Lameter
  2006-10-13 19:53   ` Will Schmidt
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-13 19:05 UTC (permalink / raw)
  To: Will Schmidt; +Cc: linuxppc-dev, linux-kernel

On Fri, 13 Oct 2006, Will Schmidt wrote:

>     Am seeing a crash on a power5 LPAR when booting the linux-2.6 git
> tree.  It's fairly early during boot, so I've included the whole log
> below.   This partition has 8 procs, (shared, including threads), and
> 512M RAM.  

This looks like slab bootstrap. You are bootstrapping while having 
zonelists build with zones that are only going to be populated later? 
This will lead to incorrect NUMA placement of lots of slab structures on 
bootup.

Check if the patch below may cure the oops. Your memory is likely 
still placed on the wrong numa nodes since we have to fallback from 
the intended node.

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2006-10-13 11:59:55.000000000 -0700
+++ linux-2.6/mm/slab.c	2006-10-13 12:03:15.000000000 -0700
@@ -3154,7 +3154,8 @@ void *fallback_alloc(struct kmem_cache *
 
 	for (z = zonelist->zones; *z && !obj; z++)
 		if (zone_idx(*z) <= ZONE_NORMAL &&
-				cpuset_zone_allowed(*z, flags))
+				cpuset_zone_allowed(*z, flags) &&
+				(*z)->free_pages)
 			obj = __cache_alloc_node(cache,
 					flags | __GFP_THISNODE,
 					zone_to_nid(*z));

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 19:05 ` Christoph Lameter
@ 2006-10-13 19:53   ` Will Schmidt
  2006-10-13 20:57     ` Will Schmidt
  0 siblings, 1 reply; 47+ messages in thread
From: Will Schmidt @ 2006-10-13 19:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, linux-kernel

On Fri, 2006-13-10 at 12:05 -0700, Christoph Lameter wrote:
> On Fri, 13 Oct 2006, Will Schmidt wrote:
> 
> >     Am seeing a crash on a power5 LPAR when booting the linux-2.6 git
> > tree.  It's fairly early during boot, so I've included the whole log
> > below.   This partition has 8 procs, (shared, including threads), and
> > 512M RAM.  
> 
> This looks like slab bootstrap. You are bootstrapping while having 
> zonelists build with zones that are only going to be populated later? 
> This will lead to incorrect NUMA placement of lots of slab structures on 
> bootup.

I dont think so..   but it's not an area I'm very familiar with.   one
of the other PPC folks might chime in with something here.  

> 
> Check if the patch below may cure the oops. Your memory is likely 
> still placed on the wrong numa nodes since we have to fallback from 
> the intended node.

Nope, no change with this patch.

> 
> Index: linux-2.6/mm/slab.c
> ===================================================================
> --- linux-2.6.orig/mm/slab.c	2006-10-13 11:59:55.000000000 -0700
> +++ linux-2.6/mm/slab.c	2006-10-13 12:03:15.000000000 -0700
> @@ -3154,7 +3154,8 @@ void *fallback_alloc(struct kmem_cache *
> 
>  	for (z = zonelist->zones; *z && !obj; z++)
>  		if (zone_idx(*z) <= ZONE_NORMAL &&
> -				cpuset_zone_allowed(*z, flags))
> +				cpuset_zone_allowed(*z, flags) &&
> +				(*z)->free_pages)
>  			obj = __cache_alloc_node(cache,
>  					flags | __GFP_THISNODE,
>  					zone_to_nid(*z));


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 19:53   ` Will Schmidt
@ 2006-10-13 20:57     ` Will Schmidt
  2006-10-13 21:22       ` Nathan Lynch
  2006-10-13 22:22       ` Christoph Lameter
  0 siblings, 2 replies; 47+ messages in thread
From: Will Schmidt @ 2006-10-13 20:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, linux-kernel

On Fri, 2006-13-10 at 14:53 -0500, Will Schmidt wrote:
> On Fri, 2006-13-10 at 12:05 -0700, Christoph Lameter wrote:
> > On Fri, 13 Oct 2006, Will Schmidt wrote:
> > 
> > >     Am seeing a crash on a power5 LPAR when booting the linux-2.6 git
> > > tree.  It's fairly early during boot, so I've included the whole log
> > > below.   This partition has 8 procs, (shared, including threads), and
> > > 512M RAM.  
> > 
> > This looks like slab bootstrap. You are bootstrapping while having 
> > zonelists build with zones that are only going to be populated later? 
> > This will lead to incorrect NUMA placement of lots of slab structures on 
> > bootup.
> 
> I dont think so..   but it's not an area I'm very familiar with.   one
> of the other PPC folks might chime in with something here.  
> 
> > 
> > Check if the patch below may cure the oops. Your memory is likely 
> > still placed on the wrong numa nodes since we have to fallback from 
> > the intended node.
> 
> Nope, no change with this patch.
> 

Here is another boot log, with that patch applied, and with a numa=debug
parm. 

-----------------------------------------------------
ppc64_pft_size                = 0x18
physicalMemorySize            = 0x22000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address                  = 0x0000000000000000
htab_hash_mask                = 0x1ffff
-----------------------------------------------------
Linux version 2.6.19-rc1-gb8a3ad5b-dirty (willschm@airbag2) (gcc version
4.1.0 (SUSE Linux)) #60 SMP Fri Oct 13 14:48:20 CDT 2006
[boot]0012 Setup Arch
NUMA associativity depth for CPU/Memory: 3
adding cpu 0 to node 0
node 0
NODE_DATA() = c000000015ffee80
start_paddr = 8000000
end_paddr = 16000000
bootmap_paddr = 15ffc000
reserve_bootmem ffc0000 40000
reserve_bootmem 15ffc000 2000
reserve_bootmem 15ffee80 1180
node 1
NODE_DATA() = c000000021ff7c80
start_paddr = 0
end_paddr = 22000000
bootmap_paddr = 21ff2000
reserve_bootmem 0 847000
reserve_bootmem 264b000 9000
reserve_bootmem 77b2000 84e000
reserve_bootmem 21ff2000 5000
reserve_bootmem 21ff7c80 1180
reserve_bootmem 21ff8e58 71a4
No ramdisk, default root is /dev/sda2
EEH: No capable adapters found
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA             0 ->   139264
  Normal     139264 ->   139264
early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->    90112
    1:    90112 ->   139264
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 136576
Kernel command line: root=/dev/sda3  xmon=on  numa=debug
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 530256k/557056k available (5508k kernel code, 30468k reserved,
2224k data, 543k bss, 244k init)
kernel BUG in __cache_alloc_node
at /development/kernels/linux-2.6.git/mm/slab.c:3178!
cpu 0x0: Vector: 700 (Program Check) at [c0000000007938d0]
    pc: c0000000000b3c78: .__cache_alloc_node+0x44/0x1e8
    lr: c0000000000b3ed4: .fallback_alloc+0xb8/0xfc
    sp: c000000000793b50
   msr: 8000000000021032
  current = 0xc000000000583a90
  paca    = 0xc000000000584300
    pid   = 0, comm = swapper
kernel BUG in __cache_alloc_node
at /development/kernels/linux-2.6.git/mm/slab.c:3178!
enter ? for help
[c000000000793c00] c0000000000b3ed4 .fallback_alloc+0xb8/0xfc
[c000000000793ca0] c0000000000b4484 .kmem_cache_zalloc+0xc8/0x11c
[c000000000793d40] c0000000000b6630 .kmem_cache_create+0x1e8/0x5e0
[c000000000793e30] c00000000053e834 .kmem_cache_init+0x1d8/0x4b0
[c000000000793ef0] c000000000524748 .start_kernel+0x244/0x328
[c000000000793f90] c0000000000084f8 .start_here_common+0x54/0x5c
0:mon>                          



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 20:57     ` Will Schmidt
@ 2006-10-13 21:22       ` Nathan Lynch
  2006-10-13 21:34         ` Anton Blanchard
  2006-10-13 22:01         ` Mike Kravetz
  2006-10-13 22:22       ` Christoph Lameter
  1 sibling, 2 replies; 47+ messages in thread
From: Nathan Lynch @ 2006-10-13 21:22 UTC (permalink / raw)
  To: Will Schmidt; +Cc: Christoph Lameter, linuxppc-dev, linux-kernel

Will Schmidt wrote:
> On Fri, 2006-13-10 at 14:53 -0500, Will Schmidt wrote:
> > On Fri, 2006-13-10 at 12:05 -0700, Christoph Lameter wrote:
> > > On Fri, 13 Oct 2006, Will Schmidt wrote:
> > > 
> > > >     Am seeing a crash on a power5 LPAR when booting the linux-2.6 git
> > > > tree.  It's fairly early during boot, so I've included the whole log
> > > > below.   This partition has 8 procs, (shared, including threads), and
> > > > 512M RAM.  
> > > 
> > > This looks like slab bootstrap. You are bootstrapping while having 
> > > zonelists build with zones that are only going to be populated later? 
> > > This will lead to incorrect NUMA placement of lots of slab structures on 
> > > bootup.
> > 
> > I dont think so..   but it's not an area I'm very familiar with.   one
> > of the other PPC folks might chime in with something here.  
> > 
> > > 
> > > Check if the patch below may cure the oops. Your memory is likely 
> > > still placed on the wrong numa nodes since we have to fallback from 
> > > the intended node.
> > 
> > Nope, no change with this patch.
> > 
> 
> Here is another boot log, with that patch applied, and with a numa=debug
> parm. 
> 
> -----------------------------------------------------
> ppc64_pft_size                = 0x18
> physicalMemorySize            = 0x22000000
> ppc64_caches.dcache_line_size = 0x80
> ppc64_caches.icache_line_size = 0x80
> htab_address                  = 0x0000000000000000
> htab_hash_mask                = 0x1ffff
> -----------------------------------------------------
> Linux version 2.6.19-rc1-gb8a3ad5b-dirty (willschm@airbag2) (gcc version
> 4.1.0 (SUSE Linux)) #60 SMP Fri Oct 13 14:48:20 CDT 2006
> [boot]0012 Setup Arch
> NUMA associativity depth for CPU/Memory: 3
> adding cpu 0 to node 0
> node 0
> NODE_DATA() = c000000015ffee80
> start_paddr = 8000000
> end_paddr = 16000000
> bootmap_paddr = 15ffc000
> reserve_bootmem ffc0000 40000
> reserve_bootmem 15ffc000 2000
> reserve_bootmem 15ffee80 1180
> node 1
> NODE_DATA() = c000000021ff7c80
> start_paddr = 0
> end_paddr = 22000000

Strange, node 0 appears to be in the middle of node 1.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 21:22       ` Nathan Lynch
@ 2006-10-13 21:34         ` Anton Blanchard
  2006-10-13 22:01         ` Mike Kravetz
  1 sibling, 0 replies; 47+ messages in thread
From: Anton Blanchard @ 2006-10-13 21:34 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Will Schmidt, linuxppc-dev, linux-kernel, Christoph Lameter


Hi,

> Strange, node 0 appears to be in the middle of node 1.

Its an odd setup and may be a firmware issue but Ive seen it a number of
times on POWER5 boxes.

Anton

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 21:22       ` Nathan Lynch
  2006-10-13 21:34         ` Anton Blanchard
@ 2006-10-13 22:01         ` Mike Kravetz
  1 sibling, 0 replies; 47+ messages in thread
From: Mike Kravetz @ 2006-10-13 22:01 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Will Schmidt, linuxppc-dev, linux-kernel, Christoph Lameter

On Fri, Oct 13, 2006 at 04:22:02PM -0500, Nathan Lynch wrote:
> Will Schmidt wrote:
> > NUMA associativity depth for CPU/Memory: 3
> > adding cpu 0 to node 0
> > node 0
> > NODE_DATA() = c000000015ffee80
> > start_paddr = 8000000
> > end_paddr = 16000000
> > bootmap_paddr = 15ffc000
> > reserve_bootmem ffc0000 40000
> > reserve_bootmem 15ffc000 2000
> > reserve_bootmem 15ffee80 1180
> > node 1
> > NODE_DATA() = c000000021ff7c80
> > start_paddr = 0
> > end_paddr = 22000000
> 
> Strange, node 0 appears to be in the middle of node 1.

IIRC, this is fairly common.  Or, it was on the system/LPAR I had access
to.  I'd check again, but I lost easy access to that system. :(

-- 
Mike

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 20:57     ` Will Schmidt
  2006-10-13 21:22       ` Nathan Lynch
@ 2006-10-13 22:22       ` Christoph Lameter
  2006-10-16 16:00         ` Will Schmidt
  2006-10-16 19:20         ` Will Schmidt
  1 sibling, 2 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-13 22:22 UTC (permalink / raw)
  To: Will Schmidt; +Cc: linuxppc-dev, linux-kernel

Here is another fall back fix checking if the slab has already been setup 
for this node. MPOL_INTERLEAVE could redirect the allocation.

Index: linux-2.6.19-rc1-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc1-mm1.orig/mm/slab.c	2006-10-10 21:47:12.949563383 -0500
+++ linux-2.6.19-rc1-mm1/mm/slab.c	2006-10-13 17:21:31.937863714 -0500
@@ -3158,12 +3158,15 @@ void *fallback_alloc(struct kmem_cache *
 	struct zone **z;
 	void *obj = NULL;
 
-	for (z = zonelist->zones; *z && !obj; z++)
+	for (z = zonelist->zones; *z && !obj; z++) {
+		int nid = zone_to_nid(*z);
+
 		if (zone_idx(*z) <= ZONE_NORMAL &&
-				cpuset_zone_allowed(*z, flags))
+				cpuset_zone_allowed(*z, flags) &&
+				cache->nodelists[nid])
 			obj = __cache_alloc_node(cache,
-					flags | __GFP_THISNODE,
-					zone_to_nid(*z));
+					flags | __GFP_THISNODE, nid);
+	}
 	return obj;
 }
 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 22:22       ` Christoph Lameter
@ 2006-10-16 16:00         ` Will Schmidt
  2006-10-16 19:20         ` Will Schmidt
  1 sibling, 0 replies; 47+ messages in thread
From: Will Schmidt @ 2006-10-16 16:00 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, linux-kernel

On Fri, 2006-13-10 at 15:22 -0700, Christoph Lameter wrote:
> Here is another fall back fix checking if the slab has already been setup 
> for this node. MPOL_INTERLEAVE could redirect the allocation.
> 

with this patch applied, a different error in the same area.. 

freeing bootmem node 0
freeing bootmem node 1
Memory: 530256k/557056k available (5508k kernel code, 30468k reserved,
2224k data, 543k bss, 244k init)
Kernel panic - not syncing: kmem_cache_create(): failed to create slab
`size-32'



> Index: linux-2.6.19-rc1-mm1/mm/slab.c
> ===================================================================
> --- linux-2.6.19-rc1-mm1.orig/mm/slab.c	2006-10-10 21:47:12.949563383 -0500
> +++ linux-2.6.19-rc1-mm1/mm/slab.c	2006-10-13 17:21:31.937863714 -0500
> @@ -3158,12 +3158,15 @@ void *fallback_alloc(struct kmem_cache *
>  	struct zone **z;
>  	void *obj = NULL;
> 
> -	for (z = zonelist->zones; *z && !obj; z++)
> +	for (z = zonelist->zones; *z && !obj; z++) {
> +		int nid = zone_to_nid(*z);
> +
>  		if (zone_idx(*z) <= ZONE_NORMAL &&
> -				cpuset_zone_allowed(*z, flags))
> +				cpuset_zone_allowed(*z, flags) &&
> +				cache->nodelists[nid])
>  			obj = __cache_alloc_node(cache,
> -					flags | __GFP_THISNODE,
> -					zone_to_nid(*z));
> +					flags | __GFP_THISNODE, nid);
> +	}
>  	return obj;
>  }
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-13 22:22       ` Christoph Lameter
  2006-10-16 16:00         ` Will Schmidt
@ 2006-10-16 19:20         ` Will Schmidt
  2006-10-16 19:25           ` Christoph Lameter
  1 sibling, 1 reply; 47+ messages in thread
From: Will Schmidt @ 2006-10-16 19:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, linux-kernel


Here is the content of /sys/devices/system/node/*  and /proc/meminfo.  
This is from the same partition, booted with a 2.6.16-ish distro
kernel. 

notice that the node1/meminfo MemUsed value seems just a little bit
elevated.   MemFree being larger than MemTotal seems a bit wrong too. 

14:07:43 0 willschm@airbag2:~> find /sys/devices/system/node -type f
-print -exec cat {} \;
/sys/devices/system/node/node1/distance
20 10
/sys/devices/system/node/node1/numastat
numa_hit 6279
numa_miss 141588
numa_foreign 0
interleave_hit 5218
local_node 0
other_node 147867
/sys/devices/system/node/node1/meminfo

Node 1 MemTotal:       327680 kB
Node 1 MemFree:        435704 kB
Node 1 MemUsed:      18446744073709443592 kB
Node 1 Active:          41412 kB
Node 1 Inactive:        19976 kB
Node 1 HighTotal:           0 kB
Node 1 HighFree:            0 kB
Node 1 LowTotal:       327680 kB
Node 1 LowFree:        435704 kB
Node 1 Dirty:               0 kB
Node 1 Writeback:           0 kB
Node 1 Mapped:              0 kB
Node 1 Slab:                0 kB
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
/sys/devices/system/node/node1/cpumap
00000000,00000000,00000000,00000000
/sys/devices/system/node/node0/distance
10 20
/sys/devices/system/node/node0/numastat
numa_hit 0
numa_miss 0
numa_foreign 141749
interleave_hit 0
local_node 0
other_node 0
/sys/devices/system/node/node0/meminfo

Node 0 MemTotal:       229376 kB
Node 0 MemFree:             0 kB
Node 0 MemUsed:        229376 kB
Node 0 Active:              0 kB
Node 0 Inactive:            0 kB
Node 0 HighTotal:           0 kB
Node 0 HighFree:            0 kB
Node 0 LowTotal:       229376 kB
Node 0 LowFree:             0 kB
Node 0 Dirty:               8 kB
Node 0 Writeback:           0 kB
Node 0 Mapped:          33940 kB
Node 0 Slab:            25500 kB
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
/sys/devices/system/node/node0/cpumap
00000000,00000000,00000000,000000ff

---
14:07:45 0 willschm@airbag2:~> cat /proc/meminfo
MemTotal:       531628 kB
MemFree:        436000 kB
Buffers:          2880 kB
Cached:          35156 kB
SwapCached:          0 kB
Active:          41364 kB
Inactive:        19976 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       531628 kB
LowFree:        436000 kB
SwapTotal:      803240 kB
SwapFree:       803240 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:          33776 kB
Slab:            25332 kB
CommitLimit:   1069052 kB
Committed_AS:    81980 kB
PageTables:       1088 kB
VmallocTotal: 8589934592 kB
VmallocUsed:      2560 kB
VmallocChunk: 8589931608 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:    16384 kB






^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-16 19:20         ` Will Schmidt
@ 2006-10-16 19:25           ` Christoph Lameter
  2006-10-16 20:50             ` Will Schmidt
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-16 19:25 UTC (permalink / raw)
  To: Will Schmidt; +Cc: linuxppc-dev, linux-kernel

On Mon, 16 Oct 2006, Will Schmidt wrote:

> Node 1 MemTotal:       327680 kB
> Node 1 MemFree:        435704 kB

Too big.

> Node 1 MemUsed:      18446744073709443592 kB

Memused is going negative?

> Node 1 Active:          41412 kB
> Node 1 Inactive:        19976 kB
> Node 1 HighTotal:           0 kB
> Node 1 HighFree:            0 kB
> Node 1 LowTotal:       327680 kB
> Node 1 LowFree:        435704 kB
> Node 1 Dirty:               0 kB
> Node 1 Writeback:           0 kB
> Node 1 Mapped:              0 kB
> Node 1 Slab:                0 kB

zero slab??? That cannot be. The slab allocator always allocs on each 
node. Or is this <2.6.18 with the strange counters that we had before?


> Node 0 MemTotal:       229376 kB
> Node 0 MemFree:             0 kB
> Node 0 MemUsed:        229376 kB

Node 0 is filled up during bootup?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-16 19:25           ` Christoph Lameter
@ 2006-10-16 20:50             ` Will Schmidt
  2006-10-16 23:37               ` Christoph Lameter
  0 siblings, 1 reply; 47+ messages in thread
From: Will Schmidt @ 2006-10-16 20:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, linux-kernel

On Mon, 2006-16-10 at 12:25 -0700, Christoph Lameter wrote:


> zero slab??? That cannot be. The slab allocator always allocs on each 
> node. Or is this <2.6.18 with the strange counters that we had before?

This is output from 2.6.18-rc2.   MemFree, MemTotal, MemUsed still
wrong.  Node0 slab is still zero.  I've also attached the numa=debug
boot log from this boot, in case it has any clues that were missing from
the other boot log. 

15:40:53 0 willschm@airbag2:~> find /sys/devices/system/node/ -type f -exec cat {} \;
20 10
numa_hit 4952
numa_miss 152776
numa_foreign 0
interleave_hit 3176
local_node 0
other_node 157761

Node 1 MemTotal:       327680 kB
Node 1 MemFree:        441136 kB
Node 1 MemUsed:      18446744073709438160 kB
Node 1 Active:          39008 kB
Node 1 Inactive:        18040 kB
Node 1 HighTotal:           0 kB
Node 1 HighFree:            0 kB
Node 1 LowTotal:       327680 kB
Node 1 LowFree:        441136 kB
Node 1 Dirty:               0 kB
Node 1 Writeback:           0 kB
Node 1 FilePages:       39868 kB
Node 1 Mapped:          15080 kB
Node 1 AnonPages:       17172 kB
Node 1 PageTables:        956 kB
Node 1 NFS Unstable:        0 kB
Node 1 Bounce:              0 kB
Node 1 Slab:            26036 kB
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
00000000,00000000,00000000,00000000
10 20
numa_hit 0
numa_miss 0
numa_foreign 152941
interleave_hit 0
local_node 0
other_node 0

Node 0 MemTotal:       229376 kB
Node 0 MemFree:             0 kB
Node 0 MemUsed:        229376 kB
Node 0 Active:              0 kB
Node 0 Inactive:            0 kB
Node 0 HighTotal:           0 kB
Node 0 HighFree:            0 kB
Node 0 LowTotal:       229376 kB
Node 0 LowFree:             0 kB
Node 0 Dirty:               0 kB
Node 0 Writeback:           0 kB
Node 0 FilePages:           0 kB
Node 0 Mapped:              0 kB
Node 0 AnonPages:           0 kB
Node 0 PageTables:          0 kB
Node 0 NFS Unstable:        0 kB
Node 0 Bounce:              0 kB
Node 0 Slab:                0 kB
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
00000000,00000000,00000000,000000ff
15:40:57 0 willschm@airbag2:~>                

-
-----------------------------------------------------
ppc64_pft_size                = 0x18
physicalMemorySize            = 0x22000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address                  = 0x0000000000000000
htab_hash_mask                = 0x1ffff
-----------------------------------------------------
Linux version 2.6.18-rc2 (willschm@airbag2) (gcc version 4.1.0 (SUSE
Linux)) #1 SMP Mon Oct 16 15:27:37 CDT 2006
[boot]0012 Setup Arch
NUMA associativity depth for CPU/Memory: 3
add_region nid 1 start_pfn 0x0 pages 0x8000
add_region nid 0 start_pfn 0x8000 pages 0x2000
add_region nid 0 start_pfn 0xa000 pages 0x2000
add_region nid 0 start_pfn 0xc000 pages 0x2000
add_region nid 0 start_pfn 0xe000 pages 0x2000
add_region nid 0 start_pfn 0x10000 pages 0x2000
add_region nid 0 start_pfn 0x12000 pages 0x2000
add_region nid 0 start_pfn 0x14000 pages 0x2000
add_region nid 1 start_pfn 0x16000 pages 0x2000
add_region nid 1 start_pfn 0x18000 pages 0x2000
add_region nid 1 start_pfn 0x1a000 pages 0x2000
add_region nid 1 start_pfn 0x1c000 pages 0x2000
add_region nid 1 start_pfn 0x1e000 pages 0x2000
add_region nid 1 start_pfn 0x20000 pages 0x2000
Node 0 Memory: 0x8000000-0x16000000
Node 1 Memory: 0x0-0x8000000 0x16000000-0x22000000
adding cpu 0 to node 0
node 0
NODE_DATA() = c000000015ffd780
start_paddr = 8000000
end_paddr = 16000000
bootmap_paddr = 15ffb000
free_bootmem 8000000 e000000
reserve_bootmem ffc0000 40000
reserve_bootmem 15ffb000 2000
reserve_bootmem 15ffd780 2880
node 1
NODE_DATA() = c000000021ff6580
start_paddr = 0
end_paddr = 22000000
bootmap_paddr = 21ff1000
free_bootmem 0 8000000
free_bootmem 16000000 c000000
reserve_bootmem 0 802000
reserve_bootmem 2606000 9000
reserve_bootmem 77b2000 84e000
reserve_bootmem 21ff1000 5000
reserve_bootmem 21ff6580 2880
reserve_bootmem 21ff8e58 71a4
No ramdisk, default root is /dev/sda2
EEH: No capable adapters found
PPC64 nvram contains 7168 bytes
Using shared processor idle loop
free_area_init node 0 e000 8000 (hole: 0)
On node 0 totalpages: 57344
  DMA zone: 57344 pages, LIFO batch:15
free_area_init node 1 22000 0 (hole: e000)
On node 1 totalpages: 81920
  DMA zone: 81920 pages, LIFO batch:15
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 139264



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-16 20:50             ` Will Schmidt
@ 2006-10-16 23:37               ` Christoph Lameter
  2006-10-18  6:11                 ` Paul Mackerras
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-16 23:37 UTC (permalink / raw)
  To: Will Schmidt; +Cc: linuxppc-dev, linux-kernel

On Mon, 16 Oct 2006, Will Schmidt wrote:

> This is output from 2.6.18-rc2.   MemFree, MemTotal, MemUsed still
> wrong.  Node0 slab is still zero.  I've also attached the numa=debug
> boot log from this boot, in case it has any clues that were missing from
> the other boot log. 

It looks as if node 0 is allready full on bootup. The new code in 2.6.19 
controls locality in a more strict form in the slab. 2.6.18 and earlier 
were able to tolerate if a request for a page from the slab allocator for 
node 0 returns memory on node1 even if node 1 has not been bootstrapped 
yet. But this resulted in a problem in the slab because the node lists 
dedicated for node 0 now had memory from node 1 in it (which led to 
latency problems since slab code subsequently assumes that node local 
memory is very fast, which with corrupted per node lists is no longer 
true.).

You must bootstrap on a node that has memory available. If you would 
bootstrap the slab on node 1 that would work.

> Node 0 MemTotal:       229376 kB
> Node 0 MemUsed:        229376 kB

^^^^^ This node should not be full!!!

Increase memory on node 0 so that the slab can bootstrap.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-16 23:37               ` Christoph Lameter
@ 2006-10-18  6:11                 ` Paul Mackerras
  2006-10-18 15:12                   ` Christoph Lameter
  2006-10-18 16:06                   ` Christoph Lameter
  0 siblings, 2 replies; 47+ messages in thread
From: Paul Mackerras @ 2006-10-18  6:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

Christoph,

I also am hitting this BUG on a POWER5 partition.  The relevant boot
messages are:

Zone PFN ranges:
  DMA             0 ->   524288
  Normal     524288 ->   524288
early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->   278528
    1:   278528 ->   524288
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 513760
Kernel command line: root=/dev/sdc3
[snip]
freeing bootmem node 0
freeing bootmem node 1
Memory: 2046852k/2097152k available (5512k kernel code, 65056k reserved, 2204k data, 554k bss, 256k init)
kernel BUG in __cache_alloc_node at /home/paulus/kernel/powerpc/mm/slab.c:3177!

Since this is a virtualized system there is every possibility that the
memory we get won't be divided into nodes in the nice neat manner you
seem to be expecting.  It just depends on what memory the hypervisor
has free, and on what nodes, when the partition is booted.

In other words, the assumption that node pfn ranges won't overlap is
completely untenable for us.

Linus' tree is currently broken for us.  Any suggestions for how to
fix it, since I am not very familiar with the NUMA code?

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-18  6:11                 ` Paul Mackerras
@ 2006-10-18 15:12                   ` Christoph Lameter
  2006-10-18 21:19                     ` Paul Mackerras
  2006-10-18 16:06                   ` Christoph Lameter
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-18 15:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

On Wed, 18 Oct 2006, Paul Mackerras wrote:

> Since this is a virtualized system there is every possibility that the
> memory we get won't be divided into nodes in the nice neat manner you
> seem to be expecting.  It just depends on what memory the hypervisor
> has free, and on what nodes, when the partition is booted.

The only expectation is that memory is available on the node that you are 
bootstrapping the slab allocator from.
 
> In other words, the assumption that node pfn ranges won't overlap is
> completely untenable for us.

That does not matter for this problem.,
 
> Linus' tree is currently broken for us.  Any suggestions for how to
> fix it, since I am not very familiar with the NUMA code?

Have memory available for slab boot strap on node 0? Or modify the boot 
code in such a way that it runs on node 1 or any other node that has 
memory available.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-18 15:12                   ` Christoph Lameter
@ 2006-10-18 21:19                     ` Paul Mackerras
  2006-10-18 21:26                       ` Christoph Lameter
  2006-10-18 21:49                       ` Christoph Lameter
  0 siblings, 2 replies; 47+ messages in thread
From: Paul Mackerras @ 2006-10-18 21:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

Christoph Lameter writes:

> > Linus' tree is currently broken for us.  Any suggestions for how to
> > fix it, since I am not very familiar with the NUMA code?
> 
> Have memory available for slab boot strap on node 0? Or modify the boot 
> code in such a way that it runs on node 1 or any other node that has 
> memory available.

OK, then I don't understand.  There is about 1GB of memory on node 0,
which is about half of the partition's memory, and it is even in a
contiguous chunk, but it doesn't start at pfn 0:

early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->   278528
    1:   278528 ->   524288

So it's not that node 0 doesn't have any pages.  Any other clues?

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-18 21:19                     ` Paul Mackerras
@ 2006-10-18 21:26                       ` Christoph Lameter
  2006-10-18 21:49                       ` Christoph Lameter
  1 sibling, 0 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-18 21:26 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

On Thu, 19 Oct 2006, Paul Mackerras wrote:

> > Have memory available for slab boot strap on node 0? Or modify the boot 
> > code in such a way that it runs on node 1 or any other node that has 
> > memory available.
> 
> OK, then I don't understand.  There is about 1GB of memory on node 0,
> which is about half of the partition's memory, and it is even in a
> contiguous chunk, but it doesn't start at pfn 0:

And the memory is available? In some messages it showed that all of node 0 
memory was allocated on bootup! We end up in fallback_alloc which means 
that an allocation attempt failed to obtain memory. Could you figure out 
what exactly we are trying to allocate? Add some printk's? Why do we 
fallback?

> So it's not that node 0 doesn't have any pages.  Any other clues?

We are falling back. So something is going wrong. Either we request memory 
from an overallocated node or the page allocator for some other reason is 
not giving us the requested memory. If we figure out why then the fix is 
probably very simple.

I have no way of investigating the issue except by conjecture and code 
review since I have no ppc hardware.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-18 21:19                     ` Paul Mackerras
  2006-10-18 21:26                       ` Christoph Lameter
@ 2006-10-18 21:49                       ` Christoph Lameter
  2006-10-19  5:03                         ` Paul Mackerras
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-18 21:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

Here is patch to add some printk to try to figure out what is going on. 
Run with this and send me the console output leading up to the failure.


Index: linux-2.6.19-rc2-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-17 18:43:47.000000000 -0500
+++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-18 16:47:42.904912835 -0500
@@ -2005,6 +2005,7 @@ static int setup_cpu_cache(struct kmem_c
 		return enable_cpucache(cachep);
 
 	if (g_cpucache_up == NONE) {
+		printk(KERN_CRIT "setup_cpu_cache: NONE\n");
 		/*
 		 * Note: the first kmem_cache_create must create the cache
 		 * that's used by kmalloc(24), otherwise the creation of
@@ -2023,6 +2024,7 @@ static int setup_cpu_cache(struct kmem_c
 		else
 			g_cpucache_up = PARTIAL_AC;
 	} else {
+		printk(KERN_CRIT "setup_cpu_cache: PARTIAL\n");
 		cachep->array[smp_processor_id()] =
 			kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
 
@@ -2219,6 +2221,7 @@ kmem_cache_create (const char *name, siz
 	align = ralign;
 
 	/* Get cache's description obj. */
+	printk(KERN_CRIT "Get cache descritor\n");
 	cachep = kmem_cache_zalloc(&cache_cache, SLAB_KERNEL);
 	if (!cachep)
 		goto oops;
@@ -3082,6 +3085,7 @@ static inline void *____cache_alloc(stru
 	void *objp;
 	struct array_cache *ac;
 
+	printk(KERN_CRIT "__cache_alloc\n");
 	check_irq_off();
 	ac = cpu_cache_get(cachep);
 	if (likely(ac->avail)) {
@@ -3135,6 +3139,7 @@ static void *alternate_node_alloc(struct
 {
 	int nid_alloc, nid_here;
 
+	printk(KERN_CRIT "alternate_node_alloc\n");
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
 	nid_alloc = nid_here = numa_node_id();
@@ -3160,6 +3165,7 @@ void *fallback_alloc(struct kmem_cache *
 	struct zone **z;
 	void *obj = NULL;
 
+	printk(KERN_CRIT "fallback_alloc\n");
 	for (z = zonelist->zones; *z && !obj; z++)
 		if (zone_idx(*z) <= ZONE_NORMAL &&
 				cpuset_zone_allowed(*z, flags))
@@ -3181,6 +3187,8 @@ static void *__cache_alloc_node(struct k
 	void *obj;
 	int x;
 
+	printk("__cache_alloc_node %d\n", nodeid);
+
 	l3 = cachep->nodelists[nodeid];
 	BUG_ON(!l3);
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-18 21:49                       ` Christoph Lameter
@ 2006-10-19  5:03                         ` Paul Mackerras
  2006-10-19 16:16                           ` Christoph Lameter
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2006-10-19  5:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

Christoph Lameter writes:

> Here is patch to add some printk to try to figure out what is going on. 
> Run with this and send me the console output leading up to the failure.

Here...  Thanks for your help on this.  I'll poke a bit further.

Linux version 2.6.19-rc2-test (paulus@drongo) (gcc version 4.1.2 20060928 (prerelease) (Debian 4.1.1-15)) #37 SMP Thu Oct 19 14:05:18 EST 2006
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA             0 ->   524288
  Normal     524288 ->   524288
early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->   278528
    1:   278528 ->   524288
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 513760
Kernel command line: root=/dev/sdc3
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 2046852k/2097152k available (5512k kernel code, 65056k reserved, 2204k data, 554k bss, 256k init)
Get cache descritor
__cache_alloc
__cache_alloc_node 0
fallback_alloc
__cache_alloc_node 0
__cache_alloc_node 1
kernel BUG in __cache_alloc_node at /home/paulus/kernel/powerpc/mm/slab.c:3185!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19  5:03                         ` Paul Mackerras
@ 2006-10-19 16:16                           ` Christoph Lameter
  2006-10-19 16:30                             ` Anton Blanchard
  2006-10-19 20:38                             ` Will Schmidt
  0 siblings, 2 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 16:16 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

On Thu, 19 Oct 2006, Paul Mackerras wrote:

> Get cache descritor

Attempt to allocate the first descriptor for the first cache.

> __cache_alloc

Attempt to allocate from the caches of node 0 (which are empty on 
bootstrap). We try to replenish the caches of node 0 which should have 
succeeded. I guess that this failed due to no pages available on 
node 0. This should not happen!

It worked before 2.6.19 because the slab allocator allowed the page 
allocator to fallback to node 1. However, we then put pages from node 1 
on the per node lists for node 0. This was fixed in 2.6.19 using 
GFP_THISNODE.

> __cache_alloc_node 0

No we go to __cache_alloc_node because it knows how to get memory from 
differnet nodes (we should not get here at all there should be memory on 
node 0!)

> fallback_alloc

We failed another attempt to get memory from node 0. Now we are going down 
the zonelist.

> __cache_alloc_node 0

First attempt on node 0 (the head of the fallback list) which again has no 
pages available.

> __cache_alloc_node 1

Attempt to allocate from node 1 (second zone on the fallback list)

> kernel BUG in __cache_alloc_node at /home/paulus/kernel/powerpc/mm/slab.c:3185!

Node 1 has not been setup yet since we have not completed bootstrap so we 
BUG out.

Would you please make memory available on the node that you bootstrap 
the slab allocator on? numa_node_id() must point to a node that has memory 
available.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 16:16                           ` Christoph Lameter
@ 2006-10-19 16:30                             ` Anton Blanchard
  2006-10-19 16:49                               ` Christoph Lameter
  2006-10-19 17:03                               ` Christoph Lameter
  2006-10-19 20:38                             ` Will Schmidt
  1 sibling, 2 replies; 47+ messages in thread
From: Anton Blanchard @ 2006-10-19 16:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Mackerras, akpm, linuxppc-dev, linux-kernel


Hi,

> Would you please make memory available on the node that you bootstrap 
> the slab allocator on? numa_node_id() must point to a node that has memory 
> available.

So we've gone from something that worked around sub optimal memory
layouts to something that panics. Sounds like a step backwards to me.

Anton

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 16:30                             ` Anton Blanchard
@ 2006-10-19 16:49                               ` Christoph Lameter
  2006-10-19 22:23                                 ` Paul Mackerras
  2006-10-19 17:03                               ` Christoph Lameter
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 16:49 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Paul Mackerras, akpm, linuxppc-dev, linux-kernel

On Fri, 20 Oct 2006, Anton Blanchard wrote:

> > Would you please make memory available on the node that you bootstrap 
> > the slab allocator on? numa_node_id() must point to a node that has memory 
> > available.
> 
> So we've gone from something that worked around sub optimal memory
> layouts to something that panics. Sounds like a step backwards to me.

Could you confirm that there is indeed no memory on node 0? 

The expectation to have memory available on the node that you 
bootstrap on is not unrealistic.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 16:49                               ` Christoph Lameter
@ 2006-10-19 22:23                                 ` Paul Mackerras
  2006-10-19 22:31                                   ` Christoph Lameter
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2006-10-19 22:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Anton Blanchard, akpm, linuxppc-dev, linux-kernel

Christoph Lameter writes:

> Could you confirm that there is indeed no memory on node 0? 

There is about a gigabyte of memory on node 0.

> The expectation to have memory available on the node that you 
> bootstrap on is not unrealistic.

What exactly does "available" mean in this context?  The console log I
posted earlier showed node 0 as having an active PFN range of 32768 -
278528 (245760 pages, or 960MB), and then showed a "freeing bootmem
node 0" message, *before* we hit the BUG.

If "available" doesn't mean "there are active pages which have been
given to the VM system via free_all_bootmem_node()", what does it
mean?

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 22:23                                 ` Paul Mackerras
@ 2006-10-19 22:31                                   ` Christoph Lameter
  2006-10-20  7:18                                     ` Paul Mackerras
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 22:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Anton Blanchard, akpm, linuxppc-dev, linux-kernel

On Fri, 20 Oct 2006, Paul Mackerras wrote:

> What exactly does "available" mean in this context?  The console log I
> posted earlier showed node 0 as having an active PFN range of 32768 -
> 278528 (245760 pages, or 960MB), and then showed a "freeing bootmem
> node 0" message, *before* we hit the BUG.

Available in the sense that the page allocator can allocate from them. 
Will's console output shows that all memory of node 0 is allocated and not 
available.

> If "available" doesn't mean "there are active pages which have been
> given to the VM system via free_all_bootmem_node()", what does it
> mean?

The page allocator must be running and able to serve pages from the boot 
node. This fails for some reason and the slab cannot bootstrap. The memory 
not available is the first guess. Could you trace the allocation in the 
page allocator (__alloc_pages) when the slab attempts to bootstrap and 
figure out why exactly the allocation fails?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 22:31                                   ` Christoph Lameter
@ 2006-10-20  7:18                                     ` Paul Mackerras
  2006-10-20 14:18                                       ` Andy Whitcroft
  2006-10-20 17:34                                       ` Christoph Lameter
  0 siblings, 2 replies; 47+ messages in thread
From: Paul Mackerras @ 2006-10-20  7:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Anton Blanchard, akpm, linuxppc-dev, linux-kernel

Christoph Lameter writes:

> The page allocator must be running and able to serve pages from the boot 
> node. This fails for some reason and the slab cannot bootstrap. The memory 
> not available is the first guess. Could you trace the allocation in the 
> page allocator (__alloc_pages) when the slab attempts to bootstrap and 
> figure out why exactly the allocation fails?

What is happening is that all pages are getting their zone id field in
their page->flags set to point to zone for node 1 by memmap_init_zone
calling set_page_links (which does set_page_zone).  Thus, when those
pages get freed by free_all_bootmem_node, they all end up in the zone
for node 1.

memmap_init_zone is called (as memmap_init, since we don't have
__HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
is called from free_area_init_core.  Now the thing is that memmap_init
and init_currently_empty_zone are called with the node's start PFN and
size in pages, *including* holes.  On the partition I'm using we have
these PFN ranges for the nodes:

    1:        0 ->    32768
    0:    32768 ->   278528
    1:   278528 ->   524288

So node 0's start PFN is 32768 and its size is 245760 pages, and so we
correctly set pages 32786 to 278527 to be in the zone for node 0.
Then for node 1, we have the start PFN is 0 and the size is 524288, so
we then go through and set *all* pages of memory to be in the zone for
node 1, including the pages which are actually on node 0.

That's why we can't allocate any pages on node 0, and the kmem cache
bootstrapping blows up.

I don't know this code well enough to know what the correct fix is.
Clearly memmap_init_zone should only be touching the pages that are
actually present in the zone, but I don't know exactly what data
structures it should be using to know what those pages are.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20  7:18                                     ` Paul Mackerras
@ 2006-10-20 14:18                                       ` Andy Whitcroft
  2006-10-20 14:31                                         ` [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc Andy Whitcroft
                                                           ` (3 more replies)
  2006-10-20 17:34                                       ` Christoph Lameter
  1 sibling, 4 replies; 47+ messages in thread
From: Andy Whitcroft @ 2006-10-20 14:18 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Anton Blanchard, akpm, linuxppc-dev,
	linux-kernel, Mel Gorman, Mike Kravetz

Paul Mackerras wrote:
> Christoph Lameter writes:
> 
>> The page allocator must be running and able to serve pages from the boot 
>> node. This fails for some reason and the slab cannot bootstrap. The memory 
>> not available is the first guess. Could you trace the allocation in the 
>> page allocator (__alloc_pages) when the slab attempts to bootstrap and 
>> figure out why exactly the allocation fails?
> 
> What is happening is that all pages are getting their zone id field in
> their page->flags set to point to zone for node 1 by memmap_init_zone
> calling set_page_links (which does set_page_zone).  Thus, when those
> pages get freed by free_all_bootmem_node, they all end up in the zone
> for node 1.
> 
> memmap_init_zone is called (as memmap_init, since we don't have
> __HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
> is called from free_area_init_core.  Now the thing is that memmap_init
> and init_currently_empty_zone are called with the node's start PFN and
> size in pages, *including* holes.  On the partition I'm using we have
> these PFN ranges for the nodes:
> 
>     1:        0 ->    32768
>     0:    32768 ->   278528
>     1:   278528 ->   524288
> 
> So node 0's start PFN is 32768 and its size is 245760 pages, and so we
> correctly set pages 32786 to 278527 to be in the zone for node 0.
> Then for node 1, we have the start PFN is 0 and the size is 524288, so
> we then go through and set *all* pages of memory to be in the zone for
> node 1, including the pages which are actually on node 0.
> 
> That's why we can't allocate any pages on node 0, and the kmem cache
> bootstrapping blows up.
> 
> I don't know this code well enough to know what the correct fix is.
> Clearly memmap_init_zone should only be touching the pages that are
> actually present in the zone, but I don't know exactly what data
> structures it should be using to know what those pages are.

Mel Gorman and I have been poking at this from different ends.  Mel from
the context of this thread and myself trying to fix a machine which was
exhibiting on 32MB of ram in node 0 and the rest in node 1.

I remember that we used to have code to cope with this in the ppc64
architecture, indeed I remember reviewing it all that time ago.  Looking
at the current state of the tree it was removed in the two patches below
in mainline:
	"[PATCH] Remove SPAN_OTHER_NODES config definition"
	"[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"

These commits:
	f62859bb6871c5e4a8e591c60befc8caaf54db8c
	a94b3ab7eab4edcc9b2cb474b188f774c331adf7

I'll follow up to this email with the reversion patch we used in
testing.  It seems to sort this problem out at least, though now its
blam'ing in ibmveth, so am retesting with yet another patch.  This patch
reverts the two patches above and updates the commentry on the Kconfig
entry.

-apw

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc
  2006-10-20 14:18                                       ` Andy Whitcroft
@ 2006-10-20 14:31                                         ` Andy Whitcroft
  2006-10-20 16:30                                           ` Mel Gorman
  2006-10-20 14:59                                         ` kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177! Mike Kravetz
                                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Andy Whitcroft @ 2006-10-20 14:31 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Anton Blanchard, akpm, linuxppc-dev,
	linux-kernel, Mel Gorman, Mike Kravetz, Andy Whitcroft

Reintroduce NODES_SPAN_OTHER_NODES for powerpc

Revert "[PATCH] Remove SPAN_OTHER_NODES config definition"
    This reverts commit f62859bb6871c5e4a8e591c60befc8caaf54db8c.
Revert "[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
    This reverts commit a94b3ab7eab4edcc9b2cb474b188f774c331adf7.

Also update the comments to indicate that this is still required
and where its used.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8b69104..2bd9b7f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -751,6 +751,15 @@ config ARCH_MEMORY_PROBE
 	def_bool y
 	depends on MEMORY_HOTPLUG
 
+# Some NUMA nodes have memory ranges that span
+# other nodes.  Even though a pfn is valid and
+# between a node's start and end pfns, it may not
+# reside on that node.  See memmap_init_zone()
+# for details.
+config NODES_SPAN_OTHER_NODES
+	def_bool y
+	depends on NEED_MULTIPLE_NODES
+
 config PPC_64K_PAGES
 	bool "64k page size"
 	depends on PPC64
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index 9828663..d2833c1 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -184,6 +184,7 @@ CONFIG_SPLIT_PTLOCK_CPUS=4
 CONFIG_MIGRATION=y
 CONFIG_RESOURCES_64BIT=y
 CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
+CONFIG_NODES_SPAN_OTHER_NODES=y
 # CONFIG_PPC_64K_PAGES is not set
 CONFIG_SCHED_SMT=y
 CONFIG_PROC_DEVICETREE=y
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 59855b8..ed0762b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -674,6 +674,12 @@ #define sparse_init()	do {} while (0)
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_NODES_SPAN_OTHER_NODES
+#define early_pfn_in_nid(pfn, nid)	(early_pfn_to_nid(pfn) == (nid))
+#else
+#define early_pfn_in_nid(pfn, nid)	(1)
+#endif
+
 #ifndef early_pfn_valid
 #define early_pfn_valid(pfn)	(1)
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 40db96a..57fa189 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1688,6 +1688,8 @@ void __meminit memmap_init_zone(unsigned
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		if (!early_pfn_valid(pfn))
 			continue;
+		if (!early_pfn_in_nid(pfn, nid))
+			continue;
 		page = pfn_to_page(pfn);
 		set_page_links(page, zone, nid, pfn);
 		init_page_count(page);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc
  2006-10-20 14:31                                         ` [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc Andy Whitcroft
@ 2006-10-20 16:30                                           ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2006-10-20 16:30 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Paul Mackerras, Christoph Lameter, Anton Blanchard, akpm,
	linuxppc-dev, linux-kernel, Mike Kravetz

On Fri, 20 Oct 2006, Andy Whitcroft wrote:

> Reintroduce NODES_SPAN_OTHER_NODES for powerpc
>
> Revert "[PATCH] Remove SPAN_OTHER_NODES config definition"
>    This reverts commit f62859bb6871c5e4a8e591c60befc8caaf54db8c.
> Revert "[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
>    This reverts commit a94b3ab7eab4edcc9b2cb474b188f774c331adf7.
>
> Also update the comments to indicate that this is still required
> and where its used.
>
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

Acked-by: Mel Gorman <mel@csn.ul.ie>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 14:18                                       ` Andy Whitcroft
  2006-10-20 14:31                                         ` [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc Andy Whitcroft
@ 2006-10-20 14:59                                         ` Mike Kravetz
  2006-10-20 15:19                                         ` Will Schmidt
  2006-10-20 16:00                                         ` Andy Whitcroft
  3 siblings, 0 replies; 47+ messages in thread
From: Mike Kravetz @ 2006-10-20 14:59 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Paul Mackerras, Christoph Lameter, Anton Blanchard, akpm,
	linuxppc-dev, linux-kernel, Mel Gorman

On Fri, Oct 20, 2006 at 03:18:52PM +0100, Andy Whitcroft wrote:
> I remember that we used to have code to cope with this in the ppc64
> architecture, indeed I remember reviewing it all that time ago.  Looking
> at the current state of the tree it was removed in the two patches below
> in mainline:
> 	"[PATCH] Remove SPAN_OTHER_NODES config definition"
> 	"[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"

That was me.  Seem to remember some discussion that these were only
needed for DISCONTIGMEM, so I removed them when the DISCONTIGMEM option
for power went away.  But, that is clearly NOT the case.  Appears that
SPARSEMEM and the old slab code covered up the issue.  Sorry about that.

Thanks!
-- 
Mike

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 14:18                                       ` Andy Whitcroft
  2006-10-20 14:31                                         ` [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc Andy Whitcroft
  2006-10-20 14:59                                         ` kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177! Mike Kravetz
@ 2006-10-20 15:19                                         ` Will Schmidt
  2006-10-20 16:00                                         ` Andy Whitcroft
  3 siblings, 0 replies; 47+ messages in thread
From: Will Schmidt @ 2006-10-20 15:19 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Christoph Lameter, Anton Blanchard, akpm, linuxppc-dev,
	linux-kernel, Mel Gorman, Mike Kravetz

On Fri, 2006-20-10 at 15:18 +0100, Andy Whitcroft wrote:
> Paul Mackerras wrote:
> > Christoph Lameter writes:
> > 

I got dropped off the CC list somewhere..  :-(     

if something is bouncing, let me know,.. otherwise please dont do
that.. 


> Mel Gorman and I have been poking at this from different ends.  Mel from
> the context of this thread and myself trying to fix a machine which was
> exhibiting on 32MB of ram in node 0 and the rest in node 1.
> 
> I remember that we used to have code to cope with this in the ppc64
> architecture, indeed I remember reviewing it all that time ago.  Looking
> at the current state of the tree it was removed in the two patches below
> in mainline:
> 	"[PATCH] Remove SPAN_OTHER_NODES config definition"
> 	"[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
> 
> These commits:
> 	f62859bb6871c5e4a8e591c60befc8caaf54db8c
> 	a94b3ab7eab4edcc9b2cb474b188f774c331adf7
> 
> I'll follow up to this email with the reversion patch we used in
> testing.  It seems to sort this problem out at least, though now its
> blam'ing in ibmveth, so am retesting with yet another patch.  This patch
> reverts the two patches above and updates the commentry on the Kconfig
> entry.

I've got a couple LPARs that exhibit the problem, so can verify your
patch once I see it.. 

-Will


> 
> -apw


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 14:18                                       ` Andy Whitcroft
                                                           ` (2 preceding siblings ...)
  2006-10-20 15:19                                         ` Will Schmidt
@ 2006-10-20 16:00                                         ` Andy Whitcroft
  2006-10-20 17:09                                           ` Andrew Morton
  2006-10-20 17:13                                           ` Will Schmidt
  3 siblings, 2 replies; 47+ messages in thread
From: Andy Whitcroft @ 2006-10-20 16:00 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andy Whitcroft, Christoph Lameter, Anton Blanchard, akpm,
	linuxppc-dev, linux-kernel, Mel Gorman, Mike Kravetz,
	will_schmidt

Andy Whitcroft wrote:
> Paul Mackerras wrote:
>> Christoph Lameter writes:
>>
>>> The page allocator must be running and able to serve pages from the boot 
>>> node. This fails for some reason and the slab cannot bootstrap. The memory 
>>> not available is the first guess. Could you trace the allocation in the 
>>> page allocator (__alloc_pages) when the slab attempts to bootstrap and 
>>> figure out why exactly the allocation fails?
>> What is happening is that all pages are getting their zone id field in
>> their page->flags set to point to zone for node 1 by memmap_init_zone
>> calling set_page_links (which does set_page_zone).  Thus, when those
>> pages get freed by free_all_bootmem_node, they all end up in the zone
>> for node 1.
>>
>> memmap_init_zone is called (as memmap_init, since we don't have
>> __HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
>> is called from free_area_init_core.  Now the thing is that memmap_init
>> and init_currently_empty_zone are called with the node's start PFN and
>> size in pages, *including* holes.  On the partition I'm using we have
>> these PFN ranges for the nodes:
>>
>>     1:        0 ->    32768
>>     0:    32768 ->   278528
>>     1:   278528 ->   524288
>>
>> So node 0's start PFN is 32768 and its size is 245760 pages, and so we
>> correctly set pages 32786 to 278527 to be in the zone for node 0.
>> Then for node 1, we have the start PFN is 0 and the size is 524288, so
>> we then go through and set *all* pages of memory to be in the zone for
>> node 1, including the pages which are actually on node 0.
>>
>> That's why we can't allocate any pages on node 0, and the kmem cache
>> bootstrapping blows up.
>>
>> I don't know this code well enough to know what the correct fix is.
>> Clearly memmap_init_zone should only be touching the pages that are
>> actually present in the zone, but I don't know exactly what data
>> structures it should be using to know what those pages are.
> 
> Mel Gorman and I have been poking at this from different ends.  Mel from
> the context of this thread and myself trying to fix a machine which was
> exhibiting on 32MB of ram in node 0 and the rest in node 1.
> 
> I remember that we used to have code to cope with this in the ppc64
> architecture, indeed I remember reviewing it all that time ago.  Looking
> at the current state of the tree it was removed in the two patches below
> in mainline:
> 	"[PATCH] Remove SPAN_OTHER_NODES config definition"
> 	"[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
> 
> These commits:
> 	f62859bb6871c5e4a8e591c60befc8caaf54db8c
> 	a94b3ab7eab4edcc9b2cb474b188f774c331adf7
> 
> I'll follow up to this email with the reversion patch we used in
> testing.  It seems to sort this problem out at least, though now its
> blam'ing in ibmveth, so am retesting with yet another patch.  This patch
> reverts the two patches above and updates the commentry on the Kconfig
> entry.

Ok, I've just gotten a successful boot on this box for the first time in
like 15 git releases.  I needed the three patches below:

clameter-fallback_alloc_fix2 -- from earlier in this thread, under the
message ID below:
    <Pine.LNX.4.64.0610131515200.28279@schroedinger.engr.sgi.com>

Reintroduce-NODES_SPAN_OTHER_NODES-for-powerpc -- the patch I just
submitted, under the message ID below:
    <8a76dfd735e544016c5f04c98617b87d@pinky>

ibmveth-fix-index-increment-calculation -- this patch is already in -mm.

Feel free to take this as an ACK for the patches other than mine.

Acked-by: Andy Whitcroft <apw@shadowen.org>

-apw

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 16:00                                         ` Andy Whitcroft
@ 2006-10-20 17:09                                           ` Andrew Morton
  2006-10-20 17:46                                             ` Christoph Lameter
  2006-10-20 17:13                                           ` Will Schmidt
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2006-10-20 17:09 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Paul Mackerras, Christoph Lameter, Anton Blanchard, linuxppc-dev,
	linux-kernel, Mel Gorman, Mike Kravetz, will_schmidt, Jeff Garzik

On Fri, 20 Oct 2006 17:00:34 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> > I'll follow up to this email with the reversion patch we used in
> > testing.  It seems to sort this problem out at least, though now its
> > blam'ing in ibmveth, so am retesting with yet another patch.  This patch
> > reverts the two patches above and updates the commentry on the Kconfig
> > entry.
> 
> Ok, I've just gotten a successful boot on this box for the first time in
> like 15 git releases.  I needed the three patches below:
> 
> clameter-fallback_alloc_fix2 -- from earlier in this thread, under the
> message ID below:
>     <Pine.LNX.4.64.0610131515200.28279@schroedinger.engr.sgi.com>

That's this:

Here is another fall back fix checking if the slab has already been setup 
for this node. MPOL_INTERLEAVE could redirect the allocation.

Index: linux-2.6.19-rc1-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc1-mm1.orig/mm/slab.c	2006-10-10 21:47:12.949563383 -0500
+++ linux-2.6.19-rc1-mm1/mm/slab.c	2006-10-13 17:21:31.937863714 -0500
@@ -3158,12 +3158,15 @@ void *fallback_alloc(struct kmem_cache *
 	struct zone **z;
 	void *obj = NULL;
 
-	for (z = zonelist->zones; *z && !obj; z++)
+	for (z = zonelist->zones; *z && !obj; z++) {
+		int nid = zone_to_nid(*z);
+
 		if (zone_idx(*z) <= ZONE_NORMAL &&
-				cpuset_zone_allowed(*z, flags))
+				cpuset_zone_allowed(*z, flags) &&
+				cache->nodelists[nid])
 			obj = __cache_alloc_node(cache,
-					flags | __GFP_THISNODE,
-					zone_to_nid(*z));
+					flags | __GFP_THISNODE, nid);
+	}
 	return obj;
 }
 

Christoph, can you please finalise and resend that?

> Reintroduce-NODES_SPAN_OTHER_NODES-for-powerpc -- the patch I just
> submitted, under the message ID below:
>     <8a76dfd735e544016c5f04c98617b87d@pinky>

OK, I got that.

> ibmveth-fix-index-increment-calculation -- this patch is already in -mm.

Normally a Jeff thing, but small-and-simple.  I'll send that in to Linus
today.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 17:09                                           ` Andrew Morton
@ 2006-10-20 17:46                                             ` Christoph Lameter
  2006-10-20 18:07                                               ` Andy Whitcroft
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-20 17:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, Paul Mackerras, Anton Blanchard, linuxppc-dev,
	linux-kernel, Mel Gorman, Mike Kravetz, will_schmidt, Jeff Garzik

Here is the patch:

Slab: Do not fallback to nodes that have not been bootstrapped yet

The zonelist may contain zones of nodes that have not been bootstrapped 
and we will oops if we try to allocate from those zones. So check if the 
node information for the slab and the node have been setup before 
attempting an allocation. If it has not been setup then skip that zone.

Usually we will not encounter this situation since the slab bootstrap
code avoids falling back before we have setup the respective nodes but we 
seem to have a special needs for pppc.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.19-rc2-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-20 12:39:02.000000000 -0500
+++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-20 12:41:04.137684581 -0500
@@ -3160,12 +3160,15 @@ void *fallback_alloc(struct kmem_cache *
 	struct zone **z;
 	void *obj = NULL;
 
-	for (z = zonelist->zones; *z && !obj; z++)
+	for (z = zonelist->zones; *z && !obj; z++) {
+		int nid = zone_to_nid(*z);
+
 		if (zone_idx(*z) <= ZONE_NORMAL &&
-				cpuset_zone_allowed(*z, flags))
+				cpuset_zone_allowed(*z, flags) &&
+				cache->nodelists[nid])
 			obj = __cache_alloc_node(cache,
-					flags | __GFP_THISNODE,
-					zone_to_nid(*z));
+					flags | __GFP_THISNODE, nid);
+	}
 	return obj;
 }
 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 17:46                                             ` Christoph Lameter
@ 2006-10-20 18:07                                               ` Andy Whitcroft
  0 siblings, 0 replies; 47+ messages in thread
From: Andy Whitcroft @ 2006-10-20 18:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Paul Mackerras, Anton Blanchard, linuxppc-dev,
	linux-kernel, Mel Gorman, Mike Kravetz, will_schmidt, Jeff Garzik

Christoph Lameter wrote:
> Here is the patch:
> 
> Slab: Do not fallback to nodes that have not been bootstrapped yet
> 
> The zonelist may contain zones of nodes that have not been bootstrapped 
> and we will oops if we try to allocate from those zones. So check if the 
> node information for the slab and the node have been setup before 
> attempting an allocation. If it has not been setup then skip that zone.
> 
> Usually we will not encounter this situation since the slab bootstrap
> code avoids falling back before we have setup the respective nodes but we 
> seem to have a special needs for pppc.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.19-rc2-mm1/mm/slab.c
> ===================================================================
> --- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-20 12:39:02.000000000 -0500
> +++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-20 12:41:04.137684581 -0500
> @@ -3160,12 +3160,15 @@ void *fallback_alloc(struct kmem_cache *
>  	struct zone **z;
>  	void *obj = NULL;
>  
> -	for (z = zonelist->zones; *z && !obj; z++)
> +	for (z = zonelist->zones; *z && !obj; z++) {
> +		int nid = zone_to_nid(*z);
> +
>  		if (zone_idx(*z) <= ZONE_NORMAL &&
> -				cpuset_zone_allowed(*z, flags))
> +				cpuset_zone_allowed(*z, flags) &&
> +				cache->nodelists[nid])
>  			obj = __cache_alloc_node(cache,
> -					flags | __GFP_THISNODE,
> -					zone_to_nid(*z));
> +					flags | __GFP_THISNODE, nid);
> +	}
>  	return obj;
>  }
>  
> 

Applied this and the previous version, diff says they are identicle, so
my previous testing applies.

Acked-by: Andy Whitcroft <apw@shadowen.org>

-apw

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 16:00                                         ` Andy Whitcroft
  2006-10-20 17:09                                           ` Andrew Morton
@ 2006-10-20 17:13                                           ` Will Schmidt
  1 sibling, 0 replies; 47+ messages in thread
From: Will Schmidt @ 2006-10-20 17:13 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Paul Mackerras, Christoph Lameter, Anton Blanchard, akpm,
	linuxppc-dev, linux-kernel, Mel Gorman, Mike Kravetz

On Fri, 2006-20-10 at 17:00 +0100, Andy Whitcroft wrote:
> Andy Whitcroft wrote:
> > Paul Mackerras wrote:
> >> Christoph Lameter writes:

> Ok, I've just gotten a successful boot on this box for the first time in
> like 15 git releases.  I needed the three patches below:
> 
> clameter-fallback_alloc_fix2 -- from earlier in this thread, under the
> message ID below:
>     <Pine.LNX.4.64.0610131515200.28279@schroedinger.engr.sgi.com>
> 
> Reintroduce-NODES_SPAN_OTHER_NODES-for-powerpc -- the patch I just
> submitted, under the message ID below:
>     <8a76dfd735e544016c5f04c98617b87d@pinky>
> 
> ibmveth-fix-index-increment-calculation -- this patch is already in -mm.
> 
> Feel free to take this as an ACK for the patches other than mine.
> 
> Acked-by: Andy Whitcroft <apw@shadowen.org>
> 
> -apw

I've applied these three blobs to the linux-2.6.git tree and verified
that it does fix the problem.        
And a "Thanks!" to Christoph for being responsive.. even when the
problem wasnt introduced by him. :)

Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com>





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20  7:18                                     ` Paul Mackerras
  2006-10-20 14:18                                       ` Andy Whitcroft
@ 2006-10-20 17:34                                       ` Christoph Lameter
  2006-10-20 22:54                                         ` Paul Mackerras
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-20 17:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Anton Blanchard, akpm, linuxppc-dev, linux-kernel

On Fri, 20 Oct 2006, Paul Mackerras wrote:

> What is happening is that all pages are getting their zone id field in
> their page->flags set to point to zone for node 1 by memmap_init_zone
> calling set_page_links (which does set_page_zone).  Thus, when those
> pages get freed by free_all_bootmem_node, they all end up in the zone
> for node 1.

Ok. So no memory on node 0? Then my patch to reenable fallback in the slab
should have worked but it did not. Could you retest with the patch that 
Will tried? If that is not working the comment out the 3 lines with 
__GFP_THISNODE in get_page_from_freelist. That will reenable fallback 
globally. If that does not work then I doubt that this is my issue.

> memmap_init_zone is called (as memmap_init, since we don't have
> __HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
> is called from free_area_init_core.  Now the thing is that memmap_init
> and init_currently_empty_zone are called with the node's start PFN and
> size in pages, *including* holes.  On the partition I'm using we have
> these PFN ranges for the nodes:
> 
>     1:        0 ->    32768
>     0:    32768 ->   278528
>     1:   278528 ->   524288
> 
> So node 0's start PFN is 32768 and its size is 245760 pages, and so we
> correctly set pages 32786 to 278527 to be in the zone for node 0.
> Then for node 1, we have the start PFN is 0 and the size is 524288, so
> we then go through and set *all* pages of memory to be in the zone for
> node 1, including the pages which are actually on node 0.

I do not get it. You first mark all pages on node 0 then we run the bootup 
code and later we shift those pages into node 0? So the slab bootstrap is 
running when all pages are marked as being part of node 1 then later we 
switch those pages under it to node 0?
 
> I don't know this code well enough to know what the correct fix is.
> Clearly memmap_init_zone should only be touching the pages that are
> actually present in the zone, but I don't know exactly what data
> structures it should be using to know what those pages are.

The fix that I posted yesterday should have reenabled fallback in the 
slab during bootstrap and should have made the system work. Here it is 
again:

Index: linux-2.6.19-rc2-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-19 11:54:24.000000000 -0500
+++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-19 16:32:09.454825851 -0500
@@ -1589,7 +1589,10 @@ static void *kmem_getpages(struct kmem_c
 	 * the needed fallback ourselves since we want to serve from our
 	 * per node object lists first for other nodes.
 	 */
-	flags |= cachep->gfpflags | GFP_THISNODE;
+	if (g_cpucache_up != FULL)
+		flags |= cachep->gfpflags & ~__GFP_THISNODE;
+	else
+		flags |= cachep->gfpflags | GFP_THISNODE;
 
 	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-20 17:34                                       ` Christoph Lameter
@ 2006-10-20 22:54                                         ` Paul Mackerras
  0 siblings, 0 replies; 47+ messages in thread
From: Paul Mackerras @ 2006-10-20 22:54 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Anton Blanchard, akpm, linuxppc-dev, linux-kernel

Christoph Lameter writes:

> I do not get it. You first mark all pages on node 0 then we run the bootup 
> code and later we shift those pages into node 0? So the slab bootstrap is 
> running when all pages are marked as being part of node 1 then later we 
> switch those pages under it to node 0?

No, the bootmem code correctly marks all the pages in node 0 as being
in node 0.  Then it goes through and marks *all* pages as being in
node 1, because it marks all pages between the first and last pages in
the node as being in the node.  The first page in node 1 is before all
the pages in node 0, and the last page in node 1 is after all the
pages in node 0.

So we end up with the system thinking all the memory is in node 1,
although in fact half the memory is in node 0.

Anyway, it looks like this problem wasn't introduced by your patches,
and is solved by the patch Andy Whitcroft posted, so thanks for your
assistance with this.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 16:30                             ` Anton Blanchard
  2006-10-19 16:49                               ` Christoph Lameter
@ 2006-10-19 17:03                               ` Christoph Lameter
  2006-10-19 18:07                                 ` Christoph Lameter
  2006-10-19 20:37                                 ` Will Schmidt
  1 sibling, 2 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 17:03 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Paul Mackerras, akpm, linuxppc-dev, linux-kernel

I would expect this patch to fix your issues. This will allow fallback 
allocations to occur in the page allocator during slab bootstrap. This 
means your per node queues will be contaminated as they were before. After 
the slab allocator is fully booted then the per node queues will become 
gradually become node clean.

I think it would be better if the PPC arch would fix this issue 
by either making memory  available on node 0 or setting up node 1 as 
the boot node.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.19-rc2-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-19 11:54:24.000000000 -0500
+++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-19 11:59:24.208194796 -0500
@@ -1589,7 +1589,10 @@ static void *kmem_getpages(struct kmem_c
 	 * the needed fallback ourselves since we want to serve from our
 	 * per node object lists first for other nodes.
 	 */
-	flags |= cachep->gfpflags | GFP_THISNODE;
+	if (g_cpucache_up != FULL)
+		flags |= cachep->gfpflags;
+	else
+		flags |= cachep->gfpflags | GFP_THISNODE;
 
 	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 17:03                               ` Christoph Lameter
@ 2006-10-19 18:07                                 ` Christoph Lameter
  2006-10-19 20:37                                 ` Will Schmidt
  1 sibling, 0 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 18:07 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Paul Mackerras, akpm, linuxppc-dev, linux-kernel

On Thu, 19 Oct 2006, Christoph Lameter wrote:

> I would expect this patch to fix your issues. This will allow fallback 
> allocations to occur in the page allocator during slab bootstrap. This 
> means your per node queues will be contaminated as they were before. After 
> the slab allocator is fully booted then the per node queues will become 
> gradually become node clean.

Forgot to mention the results of this contamination: The bootstrap process 
exercises fine control over data structures to place them in such a way 
that the slab allocator can perform optimally. F.e. data structures are 
placed in such a way on nodes that a kmalloc does not need a single off 
node reference.

The contamination will disrupt this placement. The slab believes that 
memory is from a different node than were it actually came from. As a 
result key data structures (such as cpucache descriptors) are placed 
on the wrong node. kmalloc and other slab operations may require
off node allocations for every call. Depending on the NUMA factor this may 
have a significant influence on overall system performance (We have 
measured this effect to cause a drop of 20% in AIM7 performance!).

In addition to this stuff, I am right now dealing with huge page 
fault serialization (introduced to safely support DB2) and sparsemem 
continually causing nested table lookups in fundamental vm operations. All 
work of IBM people. Not interested in performance at all?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 17:03                               ` Christoph Lameter
  2006-10-19 18:07                                 ` Christoph Lameter
@ 2006-10-19 20:37                                 ` Will Schmidt
  2006-10-19 21:28                                   ` Christoph Lameter
  2006-10-19 21:39                                   ` Christoph Lameter
  1 sibling, 2 replies; 47+ messages in thread
From: Will Schmidt @ 2006-10-19 20:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Anton Blanchard, akpm, linuxppc-dev, Paul Mackerras, linux-kernel

On Thu, 2006-19-10 at 10:03 -0700, Christoph Lameter wrote:
> I would expect this patch to fix your issues. This will allow fallback 
> allocations to occur in the page allocator during slab bootstrap. This 
> means your per node queues will be contaminated as they were before. After 
> the slab allocator is fully booted then the per node queues will become 
> gradually become node clean.
> 
> I think it would be better if the PPC arch would fix this issue 
> by either making memory  available on node 0 or setting up node 1 as 
> the boot node.
> 

This didnt fix the problem on my box.  I tried this both against mm and
linux-2.6.git 



> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.19-rc2-mm1/mm/slab.c
> ===================================================================
> --- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-19 11:54:24.000000000 -0500
> +++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-19 11:59:24.208194796 -0500
> @@ -1589,7 +1589,10 @@ static void *kmem_getpages(struct kmem_c
>  	 * the needed fallback ourselves since we want to serve from our
>  	 * per node object lists first for other nodes.
>  	 */
> -	flags |= cachep->gfpflags | GFP_THISNODE;
> +	if (g_cpucache_up != FULL)
> +		flags |= cachep->gfpflags;
> +	else
> +		flags |= cachep->gfpflags | GFP_THISNODE;
> 
>  	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
>  	if (!page)
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 20:37                                 ` Will Schmidt
@ 2006-10-19 21:28                                   ` Christoph Lameter
  2006-10-19 21:43                                     ` Will Schmidt
  2006-10-19 21:39                                   ` Christoph Lameter
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 21:28 UTC (permalink / raw)
  To: Will Schmidt
  Cc: Anton Blanchard, akpm, linuxppc-dev, Paul Mackerras, linux-kernel

On Thu, 19 Oct 2006, Will Schmidt wrote:

> This didnt fix the problem on my box.  I tried this both against mm and
> linux-2.6.git 

Same failure condition? Would you also apply the printk patch and send 
me the output?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 21:28                                   ` Christoph Lameter
@ 2006-10-19 21:43                                     ` Will Schmidt
  2006-10-19 22:00                                       ` Christoph Lameter
  0 siblings, 1 reply; 47+ messages in thread
From: Will Schmidt @ 2006-10-19 21:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Anton Blanchard, akpm, linuxppc-dev, Paul Mackerras, linux-kernel

On Thu, 2006-19-10 at 14:28 -0700, Christoph Lameter wrote:
> On Thu, 19 Oct 2006, Will Schmidt wrote:
> 
> > This didnt fix the problem on my box.  I tried this both against mm and
> > linux-2.6.git 
> 
> Same failure condition? Would you also apply the printk patch and send 
> me the output?

Yup, here it is:

-----------------------------------------------------
ppc64_pft_size                = 0x18
physicalMemorySize            = 0x22000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address                  = 0x0000000000000000
htab_hash_mask                = 0x1ffff
-----------------------------------------------------
Linux version 2.6.19-rc2-mm1 (willschm@airbag2) (gcc version 4.1.0 (SUSE
Linux)) #2 SMP Thu Oct 19 16:37:26 CDT 2006
[boot]0012 Setup Arch
NUMA associativity depth for CPU/Memory: 3
adding cpu 0 to node 0
node 0
NODE_DATA() = c000000015ffed80
start_paddr = 8000000
end_paddr = 16000000
bootmap_paddr = 15ffc000
reserve_bootmem ffc0000 40000
reserve_bootmem 15ffc000 2000
reserve_bootmem 15ffed80 1280
node 1
NODE_DATA() = c000000021ff7b80
start_paddr = 0
end_paddr = 22000000
bootmap_paddr = 21ff2000
reserve_bootmem 0 851000
reserve_bootmem 2655000 9000
reserve_bootmem 77b2000 84e000
reserve_bootmem 21ff2000 5000
reserve_bootmem 21ff7b80 1280
reserve_bootmem 21ff8e58 71a4
No ramdisk, default root is /dev/sda2
EEH: No capable adapters found
PPC64 nvram contains 7168 bytes
Zone PFN ranges:
  DMA             0 ->   139264
  Normal     139264 ->   139264
early_node_map[3] active PFN ranges
    1:        0 ->    32768
    0:    32768 ->    90112
    1:    90112 ->   139264
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 136576
Kernel command line: root=/dev/sda3  xmon=on numa=debug
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 530216k/557056k available (5544k kernel code, 30508k reserved,
2232k data, 548k bss, 248k init)
Get cache descritor
__cache_alloc
__cache_alloc_node 0
fallback_alloc
__cache_alloc_node 0
__cache_alloc_node 1
kernel BUG in __cache_alloc_node
at /development/kernels/2.6-mm/mm/slab.c:3193!
cpu 0x0: Vector: 700 (Program Check) at [c00000000079b8d0]
    pc: c0000000000b70f8: .__cache_alloc_node+0x5c/0x208
    lr: c0000000000b70e0: .__cache_alloc_node+0x44/0x208
    sp: c00000000079bb50
   msr: 8000000000021032
  current = 0xc00000000058ca90
  paca    = 0xc00000000058d380
    pid   = 0, comm = swapper
kernel BUG in __cache_alloc_node
at /development/kernels/2.6-mm/mm/slab.c:3193!
enter ? for help
[c00000000079bc00] c0000000000b735c .fallback_alloc+0xb8/0xfc
[c00000000079bca0] c0000000000b7930 .kmem_cache_zalloc+0xd4/0x128
[c00000000079bd40] c0000000000b9af4 .kmem_cache_create+0x1f4/0x604
[c00000000079be30] c000000000546d98 .kmem_cache_init+0x1d8/0x4b0
[c00000000079bef0] c00000000052c748 .start_kernel+0x244/0x328
[c00000000079bf90] c0000000000084f8 .start_here_common+0x54/0x5c
0:mon>  


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 21:43                                     ` Will Schmidt
@ 2006-10-19 22:00                                       ` Christoph Lameter
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 22:00 UTC (permalink / raw)
  To: Will Schmidt
  Cc: Anton Blanchard, akpm, linuxppc-dev, Paul Mackerras, linux-kernel

On Thu, 19 Oct 2006, Will Schmidt wrote:

> Get cache descritor
> __cache_alloc
> __cache_alloc_node 0

Hmmm... Still no fallback? Weird, would you apply the other patch that 
filters the __GFP_THISNODE flag and try again? Could you try to add some 
printk's to the page allocator to figure out what is going on there? Or is 
it clear that the node is overallocated?

Could it be that the node online mask contains a node that has not been 
bootstrapped yet?

Dont you have someone who can debug this? This is kind of an awkward back 
and forth with me guessing what the system does. Someone with knowledge 
about the way NUMA is implemented in the arch code?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 20:37                                 ` Will Schmidt
  2006-10-19 21:28                                   ` Christoph Lameter
@ 2006-10-19 21:39                                   ` Christoph Lameter
  1 sibling, 0 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 21:39 UTC (permalink / raw)
  To: Will Schmidt
  Cc: Anton Blanchard, akpm, linuxppc-dev, Paul Mackerras, linux-kernel

On Thu, 19 Oct 2006, Will Schmidt wrote:

> This didnt fix the problem on my box.  I tried this both against mm and
> linux-2.6.git 

GFP_THISNODE is also set at a higher level for fallback but it should not 
be set for the initial allocation. If you try this with the debug printks
then please use this patch to make sure that all allocs fall back,

Index: linux-2.6.19-rc2-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc2-mm1.orig/mm/slab.c	2006-10-19 11:54:24.000000000 -0500
+++ linux-2.6.19-rc2-mm1/mm/slab.c	2006-10-19 16:32:09.454825851 -0500
@@ -1589,7 +1589,10 @@ static void *kmem_getpages(struct kmem_c
 	 * the needed fallback ourselves since we want to serve from our
 	 * per node object lists first for other nodes.
 	 */
-	flags |= cachep->gfpflags | GFP_THISNODE;
+	if (g_cpucache_up != FULL)
+		flags |= cachep->gfpflags & ~__GFP_THISNODE;
+	else
+		flags |= cachep->gfpflags | GFP_THISNODE;
 
 	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 16:16                           ` Christoph Lameter
  2006-10-19 16:30                             ` Anton Blanchard
@ 2006-10-19 20:38                             ` Will Schmidt
  2006-10-19 21:30                               ` Christoph Lameter
  1 sibling, 1 reply; 47+ messages in thread
From: Will Schmidt @ 2006-10-19 20:38 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Mackerras, akpm, linuxppc-dev, linux-kernel

On Thu, 2006-19-10 at 09:16 -0700, Christoph Lameter wrote:
> On Thu, 19 Oct 2006, Paul Mackerras wrote:
> 
> > Get cache descritor
> 
> Attempt to allocate the first descriptor for the first cache.
> 
> > __cache_alloc
> 
> Attempt to allocate from the caches of node 0 (which are empty on 
> bootstrap). We try to replenish the caches of node 0 which should have 
> succeeded. I guess that this failed due to no pages available on 
> node 0. This should not happen!

Is there a hook where we can see what/where the memory is going?  Does
it seem reasonable for all of the memory that is in node 0 to be
consumed? 
Mine appears to have... 
Node 0 MemTotal:       229376 kB
Node 0 MemFree:             0 kB
Node 0 MemUsed:        229376 kB

And one of Paul's earlier notes mentioned about a gig of ram on node0;

-Will




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-19 20:38                             ` Will Schmidt
@ 2006-10-19 21:30                               ` Christoph Lameter
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-19 21:30 UTC (permalink / raw)
  To: Will Schmidt; +Cc: Paul Mackerras, akpm, linuxppc-dev, linux-kernel

On Thu, 19 Oct 2006, Will Schmidt wrote:

> Is there a hook where we can see what/where the memory is going?  Does
> it seem reasonable for all of the memory that is in node 0 to be
> consumed? 
> Mine appears to have... 
> Node 0 MemTotal:       229376 kB
> Node 0 MemFree:             0 kB
> Node 0 MemUsed:        229376 kB

The memory is likely consumed before the slab allocator bootstrap code is 
reached.

> And one of Paul's earlier notes mentioned about a gig of ram on node0;

Yeah. I cannot make sense out of all of this. What is so special about 
node 0?



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
  2006-10-18  6:11                 ` Paul Mackerras
  2006-10-18 15:12                   ` Christoph Lameter
@ 2006-10-18 16:06                   ` Christoph Lameter
  1 sibling, 0 replies; 47+ messages in thread
From: Christoph Lameter @ 2006-10-18 16:06 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Will Schmidt, akpm, linuxppc-dev, linux-kernel

On Wed, 18 Oct 2006, Paul Mackerras wrote:

> Linus' tree is currently broken for us.  Any suggestions for how to
> fix it, since I am not very familiar with the NUMA code?

I am not very familiar with the powerpc code and what I got here is 
conjecture from various messages. It would help to get some clarification 
on what is going on with node 0 memory. Is there really no memory 
available from node 0 on bootup? Why is this? 

If this is the case then you already have had issues for long time with 
per node memory lists being contaminated on bootup.

Why would you attempt to boot linux on a memory node without 
memory?

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2006-10-20 22:54 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-13 18:41 kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177! Will Schmidt
2006-10-13 19:05 ` Christoph Lameter
2006-10-13 19:53   ` Will Schmidt
2006-10-13 20:57     ` Will Schmidt
2006-10-13 21:22       ` Nathan Lynch
2006-10-13 21:34         ` Anton Blanchard
2006-10-13 22:01         ` Mike Kravetz
2006-10-13 22:22       ` Christoph Lameter
2006-10-16 16:00         ` Will Schmidt
2006-10-16 19:20         ` Will Schmidt
2006-10-16 19:25           ` Christoph Lameter
2006-10-16 20:50             ` Will Schmidt
2006-10-16 23:37               ` Christoph Lameter
2006-10-18  6:11                 ` Paul Mackerras
2006-10-18 15:12                   ` Christoph Lameter
2006-10-18 21:19                     ` Paul Mackerras
2006-10-18 21:26                       ` Christoph Lameter
2006-10-18 21:49                       ` Christoph Lameter
2006-10-19  5:03                         ` Paul Mackerras
2006-10-19 16:16                           ` Christoph Lameter
2006-10-19 16:30                             ` Anton Blanchard
2006-10-19 16:49                               ` Christoph Lameter
2006-10-19 22:23                                 ` Paul Mackerras
2006-10-19 22:31                                   ` Christoph Lameter
2006-10-20  7:18                                     ` Paul Mackerras
2006-10-20 14:18                                       ` Andy Whitcroft
2006-10-20 14:31                                         ` [PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc Andy Whitcroft
2006-10-20 16:30                                           ` Mel Gorman
2006-10-20 14:59                                         ` kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177! Mike Kravetz
2006-10-20 15:19                                         ` Will Schmidt
2006-10-20 16:00                                         ` Andy Whitcroft
2006-10-20 17:09                                           ` Andrew Morton
2006-10-20 17:46                                             ` Christoph Lameter
2006-10-20 18:07                                               ` Andy Whitcroft
2006-10-20 17:13                                           ` Will Schmidt
2006-10-20 17:34                                       ` Christoph Lameter
2006-10-20 22:54                                         ` Paul Mackerras
2006-10-19 17:03                               ` Christoph Lameter
2006-10-19 18:07                                 ` Christoph Lameter
2006-10-19 20:37                                 ` Will Schmidt
2006-10-19 21:28                                   ` Christoph Lameter
2006-10-19 21:43                                     ` Will Schmidt
2006-10-19 22:00                                       ` Christoph Lameter
2006-10-19 21:39                                   ` Christoph Lameter
2006-10-19 20:38                             ` Will Schmidt
2006-10-19 21:30                               ` Christoph Lameter
2006-10-18 16:06                   ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox