* crash in kmem_cache_init @ 2008-01-15 15:09 Olaf Hering 2008-01-15 15:58 ` Olaf Hering 2008-01-17 12:14 ` Pekka Enberg 0 siblings, 2 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-15 15:09 UTC (permalink / raw) To: linux-kernel, linuxppc-dev Current linus tree crashes in kmem_cache_init, as shown below. The system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram. Firmware is 240_332, 2.6.23 boots ok with the same config. There is a series of mm related patches in 2.6.24-rc1: commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it, ==> .git/BISECT_LOG <== git-bisect start # good: [0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d] Linux 2.6.23 git-bisect good 0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d # bad: [cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2] Linux 2.6.24-rc1 git-bisect bad cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2 # good: [9ac52315d4cf5f561f36dabaf0720c00d3553162] sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields git-bisect good 9ac52315d4cf5f561f36dabaf0720c00d3553162 # bad: [b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0] add consts where appropriate in fs/nls/Kconfig fs/nls/Makefile fs/nls/nls_ascii.c fs/nls/nls_base.c fs/nls/nls_cp1250.c fs/nls/nls_cp1251.c fs/nls/nls_cp1255.c fs/nls/nls_cp437.c fs/nls/nls_cp737.c fs/nls/nls_cp775.c fs/nls/nls_cp850.c fs/nls/nls_cp852.c fs/nls/nls_cp855.c fs/nls/nls_cp857.c fs/nls/nls_cp860.c fs/nls/nls_cp861.c fs/nls/nls_cp862.c fs/nls/nls_cp863.c fs/nls/nls_cp864.c fs/nls/nls_cp865.c fs/nls/nls_cp866.c fs/nls/nls_cp869.c fs/nls/nls_cp874.c fs/nls/nls_cp932.c fs/nls/nls_cp936.c fs/nls/nls_cp949.c fs/nls/nls_cp950.c fs/nls/nls_euc-jp.c fs/nls/nls_iso8859-1.c fs/nls/nls_iso8859-13.c fs/nls/nls_iso8859-14.c fs/nls/nls_iso8859-15.c fs/nls/nls_iso8859-2.c fs/nls/nls_iso8859-3.c fs/nls/nls_iso8859-4.c fs/nls/nls_iso8859-5.c fs/nls/nls_iso8859-6.c fs/nls/nls_iso8859-7.c fs/nls/nls_iso8859-9.c fs/nls/nls_koi8-r.c fs/nls/nls_koi8-ru.c fs/nls/nls_koi8-u.c fs/nls/nls_utf8.c git-bisect bad b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0 # bad: [78a26e25ce4837a03ac3b6c32cdae1958e547639] uml: separate timer initialization git-bisect bad 78a26e25ce4837a03ac3b6c32cdae1958e547639 # good: [4acad72ded8e3f0211bd2a762e23c28229c61a51] [IPV6]: Consolidate the ip6_pol_route_(input|output) pair git-bisect good 4acad72ded8e3f0211bd2a762e23c28229c61a51 # good: [64da82efae0d7b5f7c478021840fd329f76d965d] Add support for PCMCIA card Sierra WIreless AC850 git-bisect good 64da82efae0d7b5f7c478021840fd329f76d965d # bad: [37b07e4163f7306aa735a6e250e8d22293e5b8de] memoryless nodes: fixup uses of node_online_map in generic code git-bisect bad 37b07e4163f7306aa735a6e250e8d22293e5b8de # good: [64649a58919e66ec21792dbb6c48cb3da22cbd7f] mm: trim more holes git-bisect good 64649a58919e66ec21792dbb6c48cb3da22cbd7f # good: [fb53b3094888be0cf8ddf052277654268904bdf5] smbfs: convert to new aops git-bisect good fb53b3094888be0cf8ddf052277654268904bdf5 # good: [13808910713a98cc1159291e62cdfec92cc94d05] Memoryless nodes: Generic management of nodemasks for various purposes ............. Please wait, loading kernel... Allocated 00a00000 bytes for kernel @ 00200000 Elf64 kernel loaded... OF stdout device is: /vdevice/vty@30000000 Hypertas detected, assuming LPAR ! command line: panic=1 debug xmon=on memory layout at init: alloc_bottom : 0000000000ac1000 alloc_top : 0000000010000000 alloc_top_hi : 00000000da000000 rmo_top : 0000000010000000 ram_top : 00000000da000000 Looking for displays found display : /pci@800000020000002/pci@2/pci@1/display@0, opening ... done instantiating rtas at 0x000000000f6a1000 ... done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done 0000000000000004 : starting cpu hw idx 0000000000000004... done 0000000000000006 : starting cpu hw idx 0000000000000006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4 Device tree struct 0x0000000000cc4000 -> 0x0000000000cd6000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #2 SMP Tue Jan 15 14:23:02 CET 2008 ----------------------------------------------------- ppc64_pft_size = 0x1c physicalMemorySize = 0xda000000 htab_hash_mask = 0x1fffff ----------------------------------------------------- Linux version 2.6.24-rc7-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #2 SMP Tue Jan 15 14:23:02 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: panic=1 debug xmon=on [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.070000 MHz time_init: processor frequency = 2197.800000 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Unable to handle kernel paging request for data at address 0x00000040 Faulting instruction address: 0xc000000000437470 cpu 0x0: Vector: 300 (Data Access) at [c00000000075b830] pc: c000000000437470: ._spin_lock+0x20/0x88 lr: c0000000000f78a8: .cache_grow+0x7c/0x338 sp: c00000000075bab0 msr: 8000000000009032 dar: 40 dsisr: 40000000 current = 0xc000000000665a50 paca = 0xc000000000666380 pid = 0, comm = swapper enter ? for help [c00000000075bb30] c0000000000f78a8 .cache_grow+0x7c/0x338 [c00000000075bbf0] c0000000000f7d04 .fallback_alloc+0x1a0/0x1f4 [c00000000075bca0] c0000000000f8544 .kmem_cache_alloc+0xec/0x150 [c00000000075bd40] c0000000000fb1c0 .kmem_cache_create+0x208/0x478 [c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4 [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0 0:mon> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-15 15:09 crash in kmem_cache_init Olaf Hering @ 2008-01-15 15:58 ` Olaf Hering 2008-01-17 12:14 ` Pekka Enberg 1 sibling, 0 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-15 15:58 UTC (permalink / raw) To: linux-kernel, linuxppc-dev On Tue, Jan 15, Olaf Hering wrote: > > Current linus tree crashes in kmem_cache_init, as shown below. The > system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram. > Firmware is 240_332, 2.6.23 boots ok with the same config. > > There is a series of mm related patches in 2.6.24-rc1: > commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it, 2.6.24-rc6-mm1-ppc64 boots past this point, but crashes later. Likely unrelated to the kmem_cache_init bug: ... matroxfb: 640x480x8bpp (virtual: 640x26214) matroxfb: framebuffer at 0x40178000000, mapped to 0xd000080080080000, size 33554432 Console: switching to colour frame buffer device 80x30 fb0: MATROX frame buffer device matroxfb_crtc2: secondary head of fb0 was registered as fb1 vio_register_driver: driver hvc_console registering HVSI: registered 0 devices Generic RTC Driver v1.07 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>) input: Macintosh mouse button emulation as /devices/virtual/input/input0 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ehci_hcd 0000:c8:01.2: EHCI Host Controller ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1 ehci_hcd 0000:c8:01.2: irq 85, io mem 0x400a0002000 ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 5 ports detected Unable to handle kernel paging request for data at address 0x00000050 Faulting instruction address: 0xc0000000000fa1c4 cpu 0x7: Vector: 300 (Data Access) at [c0000000d82e7a70] pc: c0000000000fa1c4: .cache_reap+0x74/0x29c lr: c0000000000fa198: .cache_reap+0x48/0x29c sp: c0000000d82e7cf0 msr: 8000000000009032 dar: 50 dsisr: 40000000 current = 0xc0000000d82d85c0 paca = 0xc000000000668e00 pid = 27, comm = events/7 enter ? for help [c0000000d82e7cf0] c00000000070be98 vmstat_update+0x0/0x18 (unreliable) [c0000000d82e7da0] c000000000092994 .run_workqueue+0x120/0x210 [c0000000d82e7e40] c000000000093bb8 .worker_thread+0xcc/0xf0 [c0000000d82e7f00] c000000000097b70 .kthread+0x78/0xc4 [c0000000d82e7f90] c00000000002ab74 .kernel_thread+0x4c/0x68 7:mon> ... ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-15 15:09 crash in kmem_cache_init Olaf Hering 2008-01-15 15:58 ` Olaf Hering @ 2008-01-17 12:14 ` Pekka Enberg 2008-01-17 14:30 ` Christoph Lameter 1 sibling, 1 reply; 61+ messages in thread From: Pekka Enberg @ 2008-01-17 12:14 UTC (permalink / raw) To: Olaf Hering; +Cc: Linux MM, linuxppc-dev, linux-kernel, clameter Hi Olaf, [Adding Christoph as cc.] On Jan 15, 2008 5:09 PM, Olaf Hering <olaf@aepfle.de> wrote: > Current linus tree crashes in kmem_cache_init, as shown below. The > system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram. > Firmware is 240_332, 2.6.23 boots ok with the same config. > > There is a series of mm related patches in 2.6.24-rc1: > commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it, So that's the "Memoryless nodes: Slab support" patch that I think cause a similar oops while ago. > Unable to handle kernel paging request for data at address 0x00000040 > Faulting instruction address: 0xc000000000437470 > cpu 0x0: Vector: 300 (Data Access) at [c00000000075b830] > pc: c000000000437470: ._spin_lock+0x20/0x88 > lr: c0000000000f78a8: .cache_grow+0x7c/0x338 > sp: c00000000075bab0 > msr: 8000000000009032 > dar: 40 > dsisr: 40000000 > current = 0xc000000000665a50 > paca = 0xc000000000666380 > pid = 0, comm = swapper > enter ? for help > [c00000000075bb30] c0000000000f78a8 .cache_grow+0x7c/0x338 > [c00000000075bbf0] c0000000000f7d04 .fallback_alloc+0x1a0/0x1f4 > [c00000000075bca0] c0000000000f8544 .kmem_cache_alloc+0xec/0x150 > [c00000000075bd40] c0000000000fb1c0 .kmem_cache_create+0x208/0x478 > [c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4 > [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc > [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0 Looks similar to the one discussed on linux-mm ("[BUG] at mm/slab.c:3320" thread). Christoph? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 12:14 ` Pekka Enberg @ 2008-01-17 14:30 ` Christoph Lameter 2008-01-17 18:12 ` Olaf Hering 0 siblings, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-17 14:30 UTC (permalink / raw) To: Pekka Enberg; +Cc: linuxppc-dev, Olaf Hering, linux-kernel, Linux MM On Thu, 17 Jan 2008, Pekka Enberg wrote: > Looks similar to the one discussed on linux-mm ("[BUG] at > mm/slab.c:3320" thread). Christoph? Right. Try the latest version of the patch to fix it: Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.000000000 -0800 +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.000000000 -0800 @@ -2977,7 +2977,10 @@ retry: } l3 = cachep->nodelists[node]; - BUG_ON(ac->avail > 0 || !l3); + if (!l3) + return NULL; + + BUG_ON(ac->avail > 0); spin_lock(&l3->list_lock); /* See if we can refill from the shared array */ @@ -3224,7 +3227,7 @@ static void *alternate_node_alloc(struct nid_alloc = cpuset_mem_spread_node(); else if (current->mempolicy) nid_alloc = slab_node(current->mempolicy); - if (nid_alloc != nid_here) + if (nid_alloc != nid_here && node_state(nid_alloc, N_NORMAL_MEMORY)) return ____cache_alloc_node(cachep, flags, nid_alloc); return NULL; } @@ -3439,8 +3442,14 @@ __do_cache_alloc(struct kmem_cache *cach * We may just have run out of memory on the local node. * ____cache_alloc_node() knows how to locate memory on other nodes */ - if (!objp) - objp = ____cache_alloc_node(cache, flags, numa_node_id()); + if (!objp) { + int node_id = numa_node_id(); + if (likely(cache->nodelists[node_id])) /* fast path */ + objp = ____cache_alloc_node(cache, flags, node_id); + else /* this function can do good fallback */ + objp = __cache_alloc_node(cache, flags, node_id, + __builtin_return_address(0)); + } out: return objp; ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 14:30 ` Christoph Lameter @ 2008-01-17 18:12 ` Olaf Hering 2008-01-17 18:58 ` Christoph Lameter 2008-01-17 19:03 ` Christoph Lameter 0 siblings, 2 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-17 18:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, Jan 17, Christoph Lameter wrote: > On Thu, 17 Jan 2008, Pekka Enberg wrote: > > > Looks similar to the one discussed on linux-mm ("[BUG] at > > mm/slab.c:3320" thread). Christoph? > > Right. Try the latest version of the patch to fix it: The patch does not help. > Index: linux-2.6/mm/slab.c > =================================================================== > --- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.000000000 -0800 > +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.000000000 -0800 > @@ -2977,7 +2977,10 @@ retry: > } > l3 = cachep->nodelists[node]; > > - BUG_ON(ac->avail > 0 || !l3); > + if (!l3) > + return NULL; > + > + BUG_ON(ac->avail > 0); > spin_lock(&l3->list_lock); > > /* See if we can refill from the shared array */ Is this hunk supposed to go into cache_grow()? There is no NULL check for l3. But if I do that, it does not help: freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3 Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32' Rebooting in 1 seconds.. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 18:12 ` Olaf Hering @ 2008-01-17 18:58 ` Christoph Lameter 2008-01-17 19:54 ` Olaf Hering 2008-01-17 21:15 ` Olaf Hering 2008-01-17 19:03 ` Christoph Lameter 1 sibling, 2 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-17 18:58 UTC (permalink / raw) To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, 17 Jan 2008, Olaf Hering wrote: > The patch does not help. Duh. We need to know more about the problem. > > --- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.000000000 -0800 > > +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.000000000 -0800 > > @@ -2977,7 +2977,10 @@ retry: > > } > > l3 = cachep->nodelists[node]; > > > > - BUG_ON(ac->avail > 0 || !l3); > > + if (!l3) > > + return NULL; > > + > > + BUG_ON(ac->avail > 0); > > spin_lock(&l3->list_lock); > > > > /* See if we can refill from the shared array */ > > Is this hsupposed to go into cache_grow()? There is no NULL check > for l3. No its for cache_alloc_refill. cache_grow should only be called for nodes that have memory. l3 is always used before cache_grow is called. > freeing bootmem node 1 > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) > cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3 Is there more backtrace information? What function called cache_grow? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 18:58 ` Christoph Lameter @ 2008-01-17 19:54 ` Olaf Hering 2008-01-17 20:20 ` Olaf Hering 2008-01-17 21:15 ` Olaf Hering 1 sibling, 1 reply; 61+ messages in thread From: Olaf Hering @ 2008-01-17 19:54 UTC (permalink / raw) To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM [-- Attachment #1: Type: text/plain, Size: 4373 bytes --] On Thu, Jan 17, Christoph Lameter wrote: > > freeing bootmem node 1 > > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) > > cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3 > > Is there more backtrace information? What function called cache_grow? I just put a 'if (!l3) return 0;' into cache_grow, the backtrace is the one from the initial report. Reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 does not change anything. Since -mm boots further, what patch should I try? The kernel boots on a different p570. See attached dmesg. huckleberry boots, cranberry crashes. --- huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:18.510309000 +0100 +++ cranberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:09.425402000 +0100 @@ -1,56 +1,55 @@ Page orders: linear mapping = 24, others = 12 -Found initrd at 0xc000000002700000:0xc000000002a93000 +Found initrd at 0xc000000001300000:0xc0000000016e6c1e Partition configured for 8 cpus. Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007 ----------------------------------------------------- -ppc64_pft_size = 0x1b +ppc64_pft_size = 0x1c ppc64_interrupt_controller = 0x2 platform = 0x101 -physicalMemorySize = 0x158000000 +physicalMemorySize = 0xda000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x0000000000000000 -htab_hash_mask = 0xfffff +htab_hash_mask = 0x1fffff ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.16.57-0.5-ppc64 (geeko@buildhost) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007 [boot]0012 Setup Arch -Node 0 Memory: 0x0-0xb0000000 -Node 1 Memory: 0xb0000000-0x158000000 +Node 0 Memory: +Node 1 Memory: 0x0-0xda000000 EEH: PCI Enhanced I/O Error Handling Enabled -PPC64 nvram contains 7168 bytes +PPC64 nvram contains 8192 bytes Using dedicated idle loop -On node 0 totalpages: 720896 - DMA zone: 720896 pages, LIFO batch:31 +On node 0 totalpages: 0 + DMA zone: 0 pages, LIFO batch:0 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 -On node 1 totalpages: 688128 - DMA zone: 688128 pages, LIFO batch:31 +On node 1 totalpages: 892928 + DMA zone: 892928 pages, LIFO batch:31 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 [boot]0015 Setup Done Built 2 zonelists -Kernel command line: root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW500007445Q010-part5 xmon=on sysrq=1 quiet +Kernel command line: root=/dev/system/root xmon=on sysrq=1 quiet [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) -time_init: decrementer frequency = 207.052000 MHz -time_init: processor frequency = 1654.344000 MHz +time_init: decrementer frequency = 275.070000 MHz +time_init: processor frequency = 2197.800000 MHz Console: colour dummy device 80x25 -Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) -Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) -freeing bootmem node 0 +Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) +Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 -Memory: 5524952k/5636096k available (4464k kernel code, 111144k reserved, 1992k data, 836k bss, 264k init) -Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480) +Memory: 3494648k/3571712k available (4464k kernel code, 77064k reserved, 1992k data, 836k bss, 264k init) +Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 checking if image is initramfs... it is -Freeing initrd memory: 3660k freed +Freeing initrd memory: 3995k freed Processor 1 found. Processor 2 found. Processor 3 found. @@ -61,7 +60,7 @@ Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-3 Node 1 CPUs: 4-7 -migration_cost=41,0,4308 +migration_cost=38,0,3225 NET: Registered protocol family 16 PCI: Probing PCI hardware IOMMU table initialized, virtual merging enabled [-- Attachment #2: huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt --] [-- Type: text/plain, Size: 16674 bytes --] Page orders: linear mapping = 24, others = 12 Found initrd at 0xc000000002700000:0xc000000002a93000 Partition configured for 8 cpus. Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007 ----------------------------------------------------- ppc64_pft_size = 0x1b ppc64_interrupt_controller = 0x2 platform = 0x101 physicalMemorySize = 0x158000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x0000000000000000 htab_hash_mask = 0xfffff ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.16.57-0.5-ppc64 (geeko@buildhost) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007 [boot]0012 Setup Arch Node 0 Memory: 0x0-0xb0000000 Node 1 Memory: 0xb0000000-0x158000000 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Using dedicated idle loop On node 0 totalpages: 720896 DMA zone: 720896 pages, LIFO batch:31 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 On node 1 totalpages: 688128 DMA zone: 688128 pages, LIFO batch:31 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 [boot]0015 Setup Done Built 2 zonelists Kernel command line: root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW500007445Q010-part5 xmon=on sysrq=1 quiet [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 207.052000 MHz time_init: processor frequency = 1654.344000 MHz Console: colour dummy device 80x25 Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) freeing bootmem node 0 freeing bootmem node 1 Memory: 5524952k/5636096k available (4464k kernel code, 111144k reserved, 1992k data, 836k bss, 264k init) Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 checking if image is initramfs... it is Freeing initrd memory: 3660k freed Processor 1 found. Processor 2 found. Processor 3 found. Processor 4 found. Processor 5 found. Processor 6 found. Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-3 Node 1 CPUs: 4-7 migration_cost=41,0,4308 NET: Registered protocol family 16 PCI: Probing PCI hardware IOMMU table initialized, virtual merging enabled mapping IO 3fe00100000 -> d000080000000000, size: 100000 mapping IO 3fe00600000 -> d000080000100000, size: 100000 mapping IO 3fe00300000 -> d000080000200000, size: 100000 PCI: Probing PCI hardware done Registering pmac pic with sysfs... usbcore: registered new driver usbfs usbcore: registered new driver hub IBM eBus Device Driver RTAS daemon started RTAS: event: 109, Type: Platform Error, Severity: 2 probe_bus_pseries: processing c000000157ff7058 probe_bus_pseries: processing c000000157ff7228 probe_bus_pseries: processing c000000157ff7378 probe_bus_pseries: processing c000000157ff74e8 probe_bus_pseries: processing c000000157ff7658 audit: initializing netlink socket (disabled) audit(1200599258.200:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1 rpaphp: Slot [0001:00:02.0](PCI location=U7879.001.DQD02EK-P1-C3) registered rpaphp: Slot [0001:00:02.2](PCI location=U7879.001.DQD02EK-P1-C4) registered rpaphp: Slot [0001:00:02.4](PCI location=U7879.001.DQD02EK-P1-C5) registered rpaphp: Slot [0001:00:02.6](PCI location=U7879.001.DQD02EK-P1-C6) registered rpaphp: Slot [0002:00:02.0](PCI location=U7879.001.DQD02EK-P1-C1) registered rpaphp: Slot [0002:00:02.6](PCI location=U7879.001.DQD02EK-P1-C2) registered matroxfb: Matrox G450 detected PInS data found at offset 31168 PInS memtype = 5 matroxfb: 640x480x8bpp (virtual: 640x26214) matroxfb: framebuffer at 0x400C0000000, mapped to 0xd000080080004000, size 33554432 Console: switching to colour frame buffer device 80x30 fb0: MATROX frame buffer device matroxfb_crtc2: secondary head of fb0 was registered as fb1 vio_register_driver: driver hvc_console registering HVSI: registered 0 devices Generic RTC Driver v1.07 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>) RAMDISK driver initialized: 16 RAM disks of 123456K size 1024 blocksize input: Macintosh mouse button emulation as /class/input/input0 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ehci_hcd 0000:c8:01.2: EHCI Host Controller ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1 ehci_hcd 0000:c8:01.2: irq 101, io mem 0x400a0002000 ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: new device found, idVendor=0000, idProduct=0000 usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb1: Product: EHCI Host Controller usb usb1: Manufacturer: Linux 2.6.16.57-0.5-ppc64 ehci_hcd usb usb1: SerialNumber: 0000:c8:01.2 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 5 ports detected ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ohci_hcd 0000:c8:01.0: OHCI Host Controller ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:c8:01.0: irq 101, io mem 0x400a0001000 usb usb2: new device found, idVendor=0000, idProduct=0000 usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: OHCI Host Controller usb usb2: Manufacturer: Linux 2.6.16.57-0.5-ppc64 ohci_hcd usb usb2: SerialNumber: 0000:c8:01.0 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected hub 2-0:1.0: over-current change on port 1 ohci_hcd 0000:c8:01.1: OHCI Host Controller ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3 ohci_hcd 0000:c8:01.1: irq 101, io mem 0x400a0000000 usb usb3: new device found, idVendor=0000, idProduct=0000 usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb3: Product: OHCI Host Controller usb usb3: Manufacturer: Linux 2.6.16.57-0.5-ppc64 ohci_hcd usb usb3: SerialNumber: 0000:c8:01.1 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected hub 3-0:1.0: over-current change on port 1 usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver mice: PS/2 mouse device common for all mice md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 oprofile: using ppc64/power5 performance monitoring. NET: Registered protocol family 2 IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered NET: Registered protocol family 1 NET: Registered protocol family 17 NET: Registered protocol family 15 Freeing unused kernel memory: 264k freed SCSI subsystem initialized libata version 2.00 loaded. ipr: IBM Power RAID SCSI Device Driver version: 2.2.0.2 (November 14, 2007) ipr 0000:c0:01.0: Found IOA with IRQ: 99 ipr 0000:c0:01.0: Starting IOA initialization sequence. ipr 0000:c0:01.0: Adapter firmware version: 020A004E ipr 0000:c0:01.0: IOA initialized. scsi0 : IBM 570B Storage Adapter Vendor: IBM Model: ST373453LC Rev: C51A Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off sda: Mode Sense: cb 00 10 08 SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB) sda: Write Protect is off sda: Mode Sense: cb 00 10 08 SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > sd 0:0:4:0: Attached scsi disk sda Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 sd 0:0:4:0: Attached scsi generic sg0 type 0 0:0:15:0: Attached scsi generic sg1 type 13 scsi: unknown device type 31 Vendor: IBM Model: 570B001 Rev: 0150 Type: Unknown ANSI SCSI revision: 00 0:255:255:255: Attached scsi generic sg2 type 31 ipr 0002:c8:01.0: Found IOA with IRQ: 133 ipr 0002:c8:01.0: Starting IOA initialization sequence. ipr 0002:c8:01.0: Adapter firmware version: 020A004E ipr 0002:c8:01.0: IOA initialized. scsi1 : IBM 570B Storage Adapter Vendor: IBM Model: IC35L073UCDY10-0 Rev: S28G Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB) sdb: Write Protect is off sdb: Mode Sense: cb 00 00 08 SCSI device sdb: drive cache: write through SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB) sdb: Write Protect is off sdb: Mode Sense: cb 00 00 08 SCSI device sdb: drive cache: write through sdb: sdb1 sd 1:0:3:0: Attached scsi disk sdb sd 1:0:3:0: Attached scsi generic sg3 type 0 Vendor: IBM Model: ST373453LC Rev: C51A Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB) sdc: Write Protect is off sdc: Mode Sense: cb 00 10 08 SCSI device sdc: drive cache: write through w/ FUA SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB) sdc: Write Protect is off sdc: Mode Sense: cb 00 10 08 SCSI device sdc: drive cache: write through w/ FUA sdc: sdc1 sd 1:0:5:0: Attached scsi disk sdc sd 1:0:5:0: Attached scsi generic sg4 type 0 Vendor: IBM Model: VSBPD3E U4SCSI Rev: 4812 Type: Enclosure ANSI SCSI revision: 02 1:0:15:0: Attached scsi generic sg5 type 13 scsi: unknown device type 31 Vendor: IBM Model: 570B001 Rev: 0150 Type: Unknown ANSI SCSI revision: 00 1:255:255:255: Attached scsi generic sg6 type 31 pata_pdc2027x 0002:d0:01.0: version 0.74-ac5 PCI: Enabling device: (0002:d0:01.0), cmd 3 pata_pdc2027x 0002:d0:01.0: PLL input clock 32760 kHz ata1: PATA max UDMA/133 cmd 0xD0000800820887C0 ctl 0xD000080082088FDA bmdma 0xD000080082088000 irq 135 ata2: PATA max UDMA/133 cmd 0xD0000800820885C0 ctl 0xD000080082088DDA bmdma 0xD000080082088008 irq 135 scsi2 : pata_pdc2027x ata1.00: ATAPI, max UDMA/33 ata1.00: configured for UDMA/33 scsi3 : pata_pdc2027x ATA: abnormal status 0x8 on port 0xD0000800820885DF Vendor: IBM Model: DROM00205 Rev: NR38 Type: CD-ROM ANSI SCSI revision: 02 2:0:0:0: Attached scsi generic sg7 type 5 sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 2:0:0:0: Attached scsi CD-ROM sr0 ReiserFS: sda5: found reiserfs format "3.6" with standard journal ReiserFS: sda5: using ordered data mode reiserfs: using flush barriers ReiserFS: sda5: journal params: device sda5, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: sda5: checking transaction log (sda5) ReiserFS: sda5: Using r5 hash to sort names Adding 1050616k swap on /dev/disk/by-label/vscsi_swap. Priority:-1 extents:1 across:1050616k Intel(R) PRO/1000 Network Driver - version 7.6.9.1-NAPI Copyright (c) 1999-2007 Intel Corporation. PCI: Enabling device: (0000:d0:01.0), cmd 3 e1000: 0000:d0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:0e:78 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0000:d0:01.1), cmd 3 e1000: 0000:d0:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:dd:0e:79 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0001:c0:01.0), cmd 3 e1000: 0001:c0:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:11:25:c0:5a:13 e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0001:c8:01.0), cmd 3 e1000: 0001:c8:01.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:1b:ee e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection PCI: Enabling device: (0001:c8:01.1), cmd 3 e1000: 0001:c8:01.1: e1000_probe: (PCI-X:133MHz:64-bit) 00:09:6b:6e:1b:ef e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection md: md0 stopped. device-mapper: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com dm-netlink version 0.0.2 loaded md: bind<sdc1> md: bind<sdb1> md: raid0 personality registered for level 0 md0: setting max_sectors to 64, segment boundary to 16383 raid0: looking at sdb1 raid0: comparing sdb1(71673856) with sdb1(71673856) raid0: END raid0: ==> UNIQUE raid0: 1 zones raid0: looking at sdc1 raid0: comparing sdc1(71673856) with sdb1(71673856) raid0: EQUAL raid0: FINAL 1 zones raid0: done. raid0 : md_size is 143347712 blocks. raid0 : conf->hash_spacing is 143347712 blocks. raid0 : nb_zone is 1. raid0 : Allocating 8 bytes for hash. loop: loaded (max 8 devices) kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. ReiserFS: sda6: found reiserfs format "3.6" with standard journal ReiserFS: sda6: using ordered data mode reiserfs: using flush barriers ReiserFS: sda6: journal params: device sda6, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: sda6: checking transaction log (sda6) ReiserFS: sda6: Using r5 hash to sort names AppArmor: AppArmor (version 2.0-19.43r6320) initialized audit(1200599270.450:2): AppArmor (version 2.0-19.43r6320) initialized ib_core: module not supported by Novell, setting U taint flag. ib_mad: module not supported by Novell, setting U taint flag. ib_mthca: module not supported by Novell, setting U taint flag. ib_umad: module not supported by Novell, setting U taint flag. ib_uverbs: module not supported by Novell, setting U taint flag. NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver ib_sa: module not supported by Novell, setting U taint flag. ib_cm: module not supported by Novell, setting U taint flag. ib_ipoib: module not supported by Novell, setting U taint flag. iw_cm: module not supported by Novell, setting U taint flag. ib_addr: module not supported by Novell, setting U taint flag. rdma_cm: module not supported by Novell, setting U taint flag. ib_sdp: module not supported by Novell, setting U taint flag. NET: Registered protocol family 27 ib_srp: module not supported by Novell, setting U taint flag. rdma_ucm: module not supported by Novell, setting U taint flag. ADDRCONF(NETDEV_UP): eth0: link is not ready e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready audit(1200599280.383:3): audit_pid=3972 old=0 by auid=4294967295 sd 0:0:4:0: queue not ready for req c000000157bdb2a8 sd 0:0:4:0: queue not ready for req c000000157bdb2a8 sd 0:0:4:0: queue not ready for req c000000156d3b2a8 sd 0:0:4:0: queue not ready for req c000000156d3b2a8 sd 0:0:4:0: queue not ready for req c0000000b35ed2a8 sd 0:0:4:0: queue not ready for req c0000000b35ed2a8 sd 0:0:4:0: queue not ready for req c00000000fc71728 sd 0:0:4:0: queue not ready for req c00000000fc713c8 sd 0:0:4:0: queue not ready for req c000000156082188 sd 0:0:4:0: queue not ready for req c000000156082188 sd 0:0:4:0: queue not ready for req c00000000fc712a8 sd 0:0:4:0: queue not ready for req c00000000fc714e8 sd 0:0:4:0: queue not ready for req c00000000fc714e8 sd 0:0:4:0: queue not ready for req c00000000fc71608 sd 0:0:4:0: queue not ready for req c00000000fc71608 sd 0:0:4:0: queue not ready for req c00000000fd6c848 sd 0:0:4:0: queue not ready for req c00000000fd6c3c8 sd 0:0:4:0: queue not ready for req c00000000fd6c068 sd 0:0:4:0: queue not ready for req c00000000fd6c728 sd 0:0:4:0: queue not ready for req c0000001560824e8 sd 0:0:4:0: queue not ready for req c000000156082968 sd 0:0:4:0: queue not ready for req c000000003364728 sd 0:0:4:0: queue not ready for req c00000000fe41188 sd 0:0:4:0: queue not ready for req c00000000f8e0968 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 19:54 ` Olaf Hering @ 2008-01-17 20:20 ` Olaf Hering 2008-01-19 4:56 ` Christoph Lameter 0 siblings, 1 reply; 61+ messages in thread From: Olaf Hering @ 2008-01-17 20:20 UTC (permalink / raw) To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, Jan 17, Olaf Hering wrote: > Since -mm boots further, what patch should I try? rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 20:20 ` Olaf Hering @ 2008-01-19 4:56 ` Christoph Lameter 0 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-19 4:56 UTC (permalink / raw) To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, 17 Jan 2008, Olaf Hering wrote: > On Thu, Jan 17, Olaf Hering wrote: > > > Since -mm boots further, what patch should I try? > > rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL. Sigh. It looks like we need alien cache structures in some cases for nodes that have no memory. We must allocate structures for all nodes regardless if they have allocatable memory or not. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 18:58 ` Christoph Lameter 2008-01-17 19:54 ` Olaf Hering @ 2008-01-17 21:15 ` Olaf Hering 2008-01-18 6:56 ` Olaf Hering ` (2 more replies) 1 sibling, 3 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-17 21:15 UTC (permalink / raw) To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, Jan 17, Christoph Lameter wrote: > On Thu, 17 Jan 2008, Olaf Hering wrote: > > > The patch does not help. > > Duh. We need to know more about the problem. cache_grow is called from 3 places. The third call has cleared l3 for some reason. .... Allocated 00a00000 bytes for kernel @ 00200000 Elf64 kernel loaded... OF stdout device is: /vdevice/vty@30000000 Hypertas detected, assuming LPAR ! command line: xmon=on sysrq=1 debug panic=1 memory layout at init: alloc_bottom : 0000000000ac1000 alloc_top : 0000000010000000 alloc_top_hi : 00000000da000000 rmo_top : 0000000010000000 ram_top : 00000000da000000 Looking for displays found display : /pci@800000020000002/pci@2/pci@1/display@0, opening ... done instantiating rtas at 0x000000000f6a1000 ... done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done 0000000000000004 : starting cpu hw idx 0000000000000004... done 0000000000000006 : starting cpu hw idx 0000000000000006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4 Device tree struct 0x0000000000cc4000 -> 0x0000000000cd6000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #34 SMP Thu Jan 17 22:06:41 CET 2008 ----------------------------------------------------- ppc64_pft_size = 0x1c physicalMemorySize = 0xda000000 htab_hash_mask = 0x1fffff ----------------------------------------------------- Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #34 SMP Thu Jan 17 22:06:41 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: xmon=on sysrq=1 debug panic=1 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.070000 MHz time_init: processor frequency = 2197.800000 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0 ------------[ cut here ]------------ Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779 NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404 REGS: c00000000075b880 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000022 XER: 00000001 TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0 GPR00: 0000000000000004 c00000000075bb00 c0000000007544c0 0000000000000063 GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8 GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000 GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000492d0 GPR24: 0000000000000000 c0000000006a4fb8 c0000000006a4fb8 c0000000005fdc80 GPR28: 0000000000000000 00000000000412d0 c0000000006e5b80 0000000000000004 NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c LR [c0000000000f78e0] .cache_grow+0xb4/0x39c Call Trace: [c00000000075bb00] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable) [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0 [c00000000075bc90] [c0000000000f842c] .kmem_cache_alloc+0xd0/0x294 [c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478 [c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4 [c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc [c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0 Instruction dump: e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000 381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0 ------------[ cut here ]------------ Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779 NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404 REGS: c00000000075b890 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000022 XER: 00000001 TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0 GPR00: 0000000000000004 c00000000075bb10 c0000000007544c0 0000000000000063 GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8 GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000 GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000492d0 GPR24: 0000000000000000 00000000000080d0 c0000000006a4fb8 c0000000006a4fb8 GPR28: 0000000000000000 00000000000412d0 c0000000006e5b80 0000000000000004 NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c LR [c0000000000f78e0] .cache_grow+0xb4/0x39c Call Trace: [c00000000075bb10] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable) [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8 [c00000000075bc90] [c0000000000f846c] .kmem_cache_alloc+0x110/0x294 [c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478 [c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4 [c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc [c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0 Instruction dump: e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000 381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 0000000000000000 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 0000000000000000 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 0000000000000000 cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 0000000000000000 ------------[ cut here ]------------ Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779 NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404 REGS: c00000000075b890 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64) MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000022 XER: 00000001 TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0 GPR00: 0000000000000004 c00000000075bb10 c0000000007544c0 0000000000000063 GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8 GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000 GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000080d0 GPR24: 0000000000000001 c0000000d9fe4b00 c0000000006a4fb8 0000000000000000 GPR28: c0000000d8000000 00000000000000d0 c0000000006e5b80 0000000000000004 NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c LR [c0000000000f78e0] .cache_grow+0xb4/0x39c Call Trace: [c00000000075bb10] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable) [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4 [c00000000075bc90] [c0000000000f846c] .kmem_cache_alloc+0x110/0x294 [c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478 [c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4 [c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc [c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0 Instruction dump: e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000 381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468 Unable to handle kernel paging request for data at address 0x00000040 Faulting instruction address: 0xc0000000004377b8 cpu 0x0: Vector: 300 (Data Access) at [c00000000075b810] pc: c0000000004377b8: ._spin_lock+0x20/0x88 lr: c0000000000f790c: .cache_grow+0xe0/0x39c sp: c00000000075ba90 msr: 8000000000009032 dar: 40 dsisr: 40000000 current = 0xc000000000665a50 paca = 0xc000000000666380 pid = 0, comm = swapper enter ? for help [c00000000075bb10] c0000000000f790c .cache_grow+0xe0/0x39c [c00000000075bbe0] c0000000000f7d68 .fallback_alloc+0x1a0/0x1f4 [c00000000075bc90] c0000000000f846c .kmem_cache_alloc+0x110/0x294 [c00000000075bd40] c0000000000fb4e8 .kmem_cache_create+0x208/0x478 [c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4 [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0 0:mon> -- Used patch: Index: linux-2.6.24-rc8/include/linux/olh.h =================================================================== --- /dev/null +++ linux-2.6.24-rc8/include/linux/olh.h @@ -0,0 +1,6 @@ +#ifndef __LINUX_OLH_H +#define __LINUX_OLH_H +#define olh(fmt,args ...) \ + printk(KERN_DEBUG "%s(%u) %s(%u):c%u,j%lu " fmt "\n",__FUNCTION__,__LINE__,current->comm,current->pid,smp_processor_id(),jiffies,##args) +#endif + Index: linux-2.6.24-rc8/mm/slab.c =================================================================== --- linux-2.6.24-rc8.orig/mm/slab.c +++ linux-2.6.24-rc8/mm/slab.c @@ -110,6 +110,7 @@ #include <linux/fault-inject.h> #include <linux/rtmutex.h> #include <linux/reciprocal_div.h> +#include <linux/olh.h> #include <asm/cacheflush.h> #include <asm/tlbflush.h> @@ -2764,6 +2765,7 @@ static int cache_grow(struct kmem_cache size_t offset; gfp_t local_flags; struct kmem_list3 *l3; + int i; /* * Be lazy and only check for valid flags here, keeping it out of the @@ -2772,6 +2774,9 @@ static int cache_grow(struct kmem_cache BUG_ON(flags & GFP_SLAB_BUG_MASK); local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); + for (i=0;i<4;i++) + olh("cachep %p nodeid %d l3 %p",cachep,i,cachep->nodelists[nodeid]); + WARN_ON(1); /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 21:15 ` Olaf Hering @ 2008-01-18 6:56 ` Olaf Hering 2008-01-18 18:42 ` Christoph Lameter 2008-01-19 4:55 ` Christoph Lameter 2008-01-18 18:47 ` Christoph Lameter 2008-01-18 18:51 ` Christoph Lameter 2 siblings, 2 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-18 6:56 UTC (permalink / raw) To: Christoph Lameter; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, Jan 17, Olaf Hering wrote: > On Thu, Jan 17, Christoph Lameter wrote: > > > On Thu, 17 Jan 2008, Olaf Hering wrote: > > > > > The patch does not help. > > > > Duh. We need to know more about the problem. > > cache_grow is called from 3 places. The third call has cleared l3 for > some reason. Typo in debug patch. calls cache_grow with nodeid 0 > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0 calls cache_grow with nodeid 0 > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8 calls cache_grow with nodeid 1 > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 6:56 ` Olaf Hering @ 2008-01-18 18:42 ` Christoph Lameter 2008-01-19 4:55 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-18 18:42 UTC (permalink / raw) To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Fri, 18 Jan 2008, Olaf Hering wrote: > calls cache_grow with nodeid 0 > > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0 > calls cache_grow with nodeid 0 > > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8 > > calls cache_grow with nodeid 1 > > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4 Hmmm... fallback_alloc should not be called during bootstrap. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 6:56 ` Olaf Hering 2008-01-18 18:42 ` Christoph Lameter @ 2008-01-19 4:55 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-19 4:55 UTC (permalink / raw) To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Fri, 18 Jan 2008, Olaf Hering wrote: > calls cache_grow with nodeid 0 > > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0 > calls cache_grow with nodeid 0 > > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8 > > calls cache_grow with nodeid 1 > > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4 Okay that makes sense. You have no node 0 with normal memory but the node assigned to the executing processor is zero (correct?). Thus it needs to fallback to node 1 and that is not possible during bootstrap. You need to run kmem_cache_init() on a cpu on a processor with memory. Or we need to revert the patch which would allocate control structures again for all online nodes regardless if they have memory or not. Does reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 change the situation? (However, we tried this on the other thread without success). ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 21:15 ` Olaf Hering 2008-01-18 6:56 ` Olaf Hering @ 2008-01-18 18:47 ` Christoph Lameter 2008-01-18 21:30 ` Mel Gorman 2008-01-18 18:51 ` Christoph Lameter 2 siblings, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-18 18:47 UTC (permalink / raw) To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, 17 Jan 2008, Olaf Hering wrote: > early_node_map[1] active PFN ranges > 1: 0 -> 892928 > Could not find start_pfn for node 0 Corrupted min_pfn? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 18:47 ` Christoph Lameter @ 2008-01-18 21:30 ` Mel Gorman 2008-01-18 21:43 ` Christoph Lameter 2008-01-18 22:16 ` Christoph Lameter 0 siblings, 2 replies; 61+ messages in thread From: Mel Gorman @ 2008-01-18 21:30 UTC (permalink / raw) To: Christoph Lameter Cc: linuxppc-dev, Olaf Hering, Pekka Enberg, linux-kernel, Linux MM On (18/01/08 10:47), Christoph Lameter didst pronounce: > On Thu, 17 Jan 2008, Olaf Hering wrote: > > > early_node_map[1] active PFN ranges > > 1: 0 -> 892928 > > Could not find start_pfn for node 0 > > Corrupted min_pfn? > Doubtful. Node 0 has no memory but it is still being initialised. Still, I looked closer at what is going on when that message gets displayed and I see this in free_area_init_nodes() for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, pgdat, NULL, find_min_pfn_for_node(nid), NULL); /* Any memory on that node */ if (pgdat->node_present_pages) node_set_state(nid, N_HIGH_MEMORY); check_for_regular_memory(pgdat); } This "Any memory on that node" thing is new and it says if there is any memory on the node, set N_HIGH_MEMORY. Fine I guess, I haven't tracked these changes closely. It calls check_for_regular_memory() which looks like static void check_for_regular_memory(pg_data_t *pgdat) { #ifdef CONFIG_HIGHMEM enum zone_type zone_type; for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) { struct zone *zone = &pgdat->node_zones[zone_type]; if (zone->present_pages) node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY); } #endif } i.e. go through the other zones and if any of them have memory, set N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on POWER.... That sounds bad. mel@arnold:~/git/linux-2.6/mm$ grep -n N_NORMAL_MEMORY slab.c 1593: for_each_node_state(nid, N_NORMAL_MEMORY) { 1971: for_each_node_state(node, N_NORMAL_MEMORY) { 2102: for_each_node_state(node, N_NORMAL_MEMORY) { 3818: for_each_node_state(node, N_NORMAL_MEMORY) { and one of them is in kmem_cache_init(). That seems very significant. Christoph, can you think of possibilities of where N_NORMAL_MEMORY not being set would cause trouble for slab? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 21:30 ` Mel Gorman @ 2008-01-18 21:43 ` Christoph Lameter 2008-01-18 22:16 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-18 21:43 UTC (permalink / raw) To: Mel Gorman Cc: linuxppc-dev, Olaf Hering, Pekka Enberg, linux-kernel, Linux MM On Fri, 18 Jan 2008, Mel Gorman wrote: > static void check_for_regular_memory(pg_data_t *pgdat) > { > #ifdef CONFIG_HIGHMEM > enum zone_type zone_type; > > for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) { > struct zone *zone = &pgdat->node_zones[zone_type]; > if (zone->present_pages) > node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY); > } > #endif > } > > i.e. go through the other zones and if any of them have memory, set > N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on > PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on > POWER.... That sounds bad. Argh. We may need to do a node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY) in the !HIGHMEM case. > and one of them is in kmem_cache_init(). That seems very significant. > Christoph, can you think of possibilities of where N_NORMAL_MEMORY not > being set would cause trouble for slab? Yes. That results in the per node structures not being created and thus l3 == NULL. Explains our failures. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 21:30 ` Mel Gorman 2008-01-18 21:43 ` Christoph Lameter @ 2008-01-18 22:16 ` Christoph Lameter 2008-01-18 22:19 ` Nish Aravamudan ` (2 more replies) 1 sibling, 3 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-18 22:16 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, KAMEZAWA Hiroyuki Could you try this patch? Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM It seems that we only scan through zones to set N_NORMAL_MEMORY only if CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set N_NORMAL_MEMORY in the !CONFIG_HIGHMEM case. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2008-01-18 14:08:41.000000000 -0800 +++ linux-2.6/mm/page_alloc.c 2008-01-18 14:13:34.000000000 -0800 @@ -3812,7 +3812,6 @@ restart: /* Any regular memory on that node ? */ static void check_for_regular_memory(pg_data_t *pgdat) { -#ifdef CONFIG_HIGHMEM enum zone_type zone_type; for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) { @@ -3820,7 +3819,6 @@ static void check_for_regular_memory(pg_ if (zone->present_pages) node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY); } -#endif } /** ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 22:16 ` Christoph Lameter @ 2008-01-18 22:19 ` Nish Aravamudan 2008-01-18 22:38 ` Christoph Lameter 2008-01-18 22:57 ` Olaf Hering 2 siblings, 0 replies; 61+ messages in thread From: Nish Aravamudan @ 2008-01-18 22:19 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, KAMEZAWA Hiroyuki On 1/18/08, Christoph Lameter <clameter@sgi.com> wrote: > Could you try this patch? > > Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support > HIGHMEM > > It seems that we only scan through zones to set N_NORMAL_MEMORY only if > CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set > N_NORMAL_MEMORY > in the !CONFIG_HIGHMEM case. I'm testing this exact patch right now on the machine Mel saw the issues with. Thanks, Nish ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 22:16 ` Christoph Lameter 2008-01-18 22:19 ` Nish Aravamudan @ 2008-01-18 22:38 ` Christoph Lameter 2008-01-18 22:57 ` Olaf Hering 2 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-18 22:38 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, KAMEZAWA Hiroyuki On Fri, 18 Jan 2008, Christoph Lameter wrote: > Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM If !CONFIG_HIGHMEM then enum node_states { #ifdef CONFIG_HIGHMEM N_HIGH_MEMORY, /* The node has regular or high memory */ #else N_HIGH_MEMORY = N_NORMAL_MEMORY, #endif So for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, pgdat, NULL, find_min_pfn_for_node(nid), NULL); /* Any memory on that node */ if (pgdat->node_present_pages) node_set_state(nid, N_HIGH_MEMORY); ^^^ sets N_NORMAL_MEMORY check_for_regular_memory(pgdat); } ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 22:16 ` Christoph Lameter 2008-01-18 22:19 ` Nish Aravamudan 2008-01-18 22:38 ` Christoph Lameter @ 2008-01-18 22:57 ` Olaf Hering 2008-01-22 19:54 ` Mel Gorman 2 siblings, 1 reply; 61+ messages in thread From: Olaf Hering @ 2008-01-18 22:57 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, KAMEZAWA Hiroyuki On Fri, Jan 18, Christoph Lameter wrote: > Could you try this patch? Does not help, same crash. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-18 22:57 ` Olaf Hering @ 2008-01-22 19:54 ` Mel Gorman 2008-01-22 20:11 ` Christoph Lameter 2008-01-22 21:45 ` Olaf Hering 0 siblings, 2 replies; 61+ messages in thread From: Mel Gorman @ 2008-01-22 19:54 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On (18/01/08 23:57), Olaf Hering didst pronounce: > On Fri, Jan 18, Christoph Lameter wrote: > > > Could you try this patch? > > Does not help, same crash. > Hi Olaf, It was suggested this problem was the same as another slab-related boot problem that was fixed for 2.6.24 by reverting a change. This fix can be found at http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch . Can you please check on your machine if it fixes your problem? I am 99.9999% it will *not* fix your problem because there was two bugs, not one as previously believed. On two test machines here, this kmem_cache_init problem still happens even with the revert which fixed a third machine. I was delayed in testing because these boxen unavailable from Friday until yesterday evening (a stellar display of timing). It was missed on TKO because it was SLAB-specific and those machines were testing SLUB. I found that the patch below was necessary to fix the problem. Olaf, please confirm whether you need the patch below as well as the revert to make your machine boot. Christoph/Pekka, this patch is papering over the problem and something more fundamental may be going wrong. The crash occurs because l3 is NULL and the cache is kmem_cache so this is early in the boot process. It is selecting l3 based on node 2 which is correct in terms of available memory but it initialises the lists on node 0 because that is the node the CPUs are located. Hence later it uses an uninitialised nodelists and BLAM. Relevant parts of the log for seeing the memoryless nodes in relation to CPUs is; early_node_map[1] active PFN ranges 2: 0 -> 1048576 Processor 1 found. clockevent: decrementer mult[3cf1] shift[16] cpu[2] Processor 2 found. clockevent: decrementer mult[3cf1] shift[16] cpu[3] Processor 3 found. Brought up 4 CPUs Node 0 CPUs: 0-3 Node 2 CPUs: Can you see a better solution than this? ==== Recent changes to how slab operates mean a situation can occur on systems with memoryless nodes whereby the nodeid used when growing the slab does not map to the correct kmem_list3. The following patch adds the necessary check to the indicated preferred nodeid and if it is bogus, use numa_node_id() instead. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/slab.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c --- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 2008-01-22 17:46:32.000000000 +0000 +++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c 2008-01-22 18:42:53.000000000 +0000 @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 19:54 ` Mel Gorman @ 2008-01-22 20:11 ` Christoph Lameter 2008-01-22 21:26 ` Mel Gorman 2008-01-22 21:45 ` Olaf Hering 1 sibling, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 20:11 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, 22 Jan 2008, Mel Gorman wrote: > Christoph/Pekka, this patch is papering over the problem and something > more fundamental may be going wrong. The crash occurs because l3 is NULL > and the cache is kmem_cache so this is early in the boot process. It is > selecting l3 based on node 2 which is correct in terms of available memory > but it initialises the lists on node 0 because that is the node the CPUs are > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant > parts of the log for seeing the memoryless nodes in relation to CPUs is; Would it be possible to run the bootstrap on a cpu that has a node with memory associated to it? I believe we had the same situation last year when GFP_THISNODE was introduced? After you reverted the slab memoryless node patch there should be per node structures created for node 0 unless the node is marked offline. Is it? If so then you are booting a cpu that is associated with an offline node. > Can you see a better solution than this? Well this means that bootstrap will work by introducing foreign objects into the per cpu queue (should only hold per cpu objects). They will later be consumed and then the queues will contain the right objects so the effect of the patch is minimal. I thought we fixed the similar situation last year by dropping GFP_THISNODE for some allocations? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 20:11 ` Christoph Lameter @ 2008-01-22 21:26 ` Mel Gorman 2008-01-22 21:34 ` Christoph Lameter 0 siblings, 1 reply; 61+ messages in thread From: Mel Gorman @ 2008-01-22 21:26 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On (22/01/08 12:11), Christoph Lameter didst pronounce: > On Tue, 22 Jan 2008, Mel Gorman wrote: > > > Christoph/Pekka, this patch is papering over the problem and something > > more fundamental may be going wrong. The crash occurs because l3 is NULL > > and the cache is kmem_cache so this is early in the boot process. It is > > selecting l3 based on node 2 which is correct in terms of available memory > > but it initialises the lists on node 0 because that is the node the CPUs are > > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant > > parts of the log for seeing the memoryless nodes in relation to CPUs is; > > Would it be possible to run the bootstrap on a cpu that has a > node with memory associated to it? Not in the way the machine is currently configured. All the CPUs appear to be on a node with no memory. It's best to assume I cannot get the machine reconfigured (which just hides the bug anyway). Physically, it's thousands of miles away so I can't do the work. I can get lab support to do the job but that will take a fair while and at the end of the day, it doesn't tell us a lot. We know that other PPC64 machines work so it's not a general problem. > I believe we had the same situation > last year when GFP_THISNODE was introduced? > It feels vaguely familiar but I don't recall the details in sufficient detail to recognise if this is the same problem or not. > After you reverted the slab memoryless node patch there should be per node > structures created for node 0 unless the node is marked offline. Is it? If > so then you are booting a cpu that is associated with an offline node. > I'll roll a patch that prints out the online states before startup and see what it looks like. > > Can you see a better solution than this? > > Well this means that bootstrap will work by introducing foreign objects > into the per cpu queue (should only hold per cpu objects). They will > later be consumed and then the queues will contain the right objects so > the effect of the patch is minimal. > By minimal, do you mean that you expect it to break in some other respect later or minimal as in "this is bad but should not have no adverse impact". > I thought we fixed the similar situation last year by dropping > GFP_THISNODE for some allocations? > Whatever this was a problem fixed in the past or not, it's broken again now :( . It's possible that there is a __GFP_THISNODE that can be dropped early at boot-time that would also fix this problem in a way that doesn't affect runtime (like altering cache_grow in my patch does). -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 21:26 ` Mel Gorman @ 2008-01-22 21:34 ` Christoph Lameter 2008-01-22 22:50 ` Mel Gorman 0 siblings, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 21:34 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, 22 Jan 2008, Mel Gorman wrote: > > After you reverted the slab memoryless node patch there should be per node > > structures created for node 0 unless the node is marked offline. Is it? If > > so then you are booting a cpu that is associated with an offline node. > > > > I'll roll a patch that prints out the online states before startup and > see what it looks like. Ok. Great. > > > > Can you see a better solution than this? > > > > Well this means that bootstrap will work by introducing foreign objects > > into the per cpu queue (should only hold per cpu objects). They will > > later be consumed and then the queues will contain the right objects so > > the effect of the patch is minimal. > > > > By minimal, do you mean that you expect it to break in some other > respect later or minimal as in "this is bad but should not have no > adverse impact". Should not have any adverse impact after the objects from the cpu queue have been consumed. If the cache_reaper tries to shift objects back from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure you run the tests with full debugging please. > Whatever this was a problem fixed in the past or not, it's broken again now > :( . It's possible that there is a __GFP_THISNODE that can be dropped early > at boot-time that would also fix this problem in a way that doesn't > affect runtime (like altering cache_grow in my patch does). The dropping of GFP_THISNODE has the same effect as your patch. Objects from another node get into the per cpu queue. And on free we assume that per cpu queue objects are from the local node. If debug is on then we check that with BUG_ONs. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 21:34 ` Christoph Lameter @ 2008-01-22 22:50 ` Mel Gorman 2008-01-22 22:57 ` Christoph Lameter 2008-01-22 22:59 ` Pekka Enberg 0 siblings, 2 replies; 61+ messages in thread From: Mel Gorman @ 2008-01-22 22:50 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki [-- Attachment #1: Type: text/plain, Size: 8266 bytes --] On (22/01/08 13:34), Christoph Lameter didst pronounce: > On Tue, 22 Jan 2008, Mel Gorman wrote: > > > > After you reverted the slab memoryless node patch there should be per node > > > structures created for node 0 unless the node is marked offline. Is it? If > > > so then you are booting a cpu that is associated with an offline node. > > > > > > > I'll roll a patch that prints out the online states before startup and > > see what it looks like. > > Ok. Great. > The dmesg output is below. > > > > > > Can you see a better solution than this? > > > > > > Well this means that bootstrap will work by introducing foreign objects > > > into the per cpu queue (should only hold per cpu objects). They will > > > later be consumed and then the queues will contain the right objects so > > > the effect of the patch is minimal. > > > > > > > By minimal, do you mean that you expect it to break in some other > > respect later or minimal as in "this is bad but should not have no > > adverse impact". > > Should not have any adverse impact after the objects from the cpu queue > have been consumed. If the cache_reaper tries to shift objects back > from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure > you run the tests with full debugging please. > I am not running a full range of tests at the moment. Just getting boot first. I'll queue up a range of tests to run with DEBUG on now but it'll be the morning before I have the results. > > Whatever this was a problem fixed in the past or not, it's broken again now > > :( . It's possible that there is a __GFP_THISNODE that can be dropped early > > at boot-time that would also fix this problem in a way that doesn't > > affect runtime (like altering cache_grow in my patch does). > > The dropping of GFP_THISNODE has the same effect as your patch. The dropping of it totally? If so, this patch might fix a boot but it'll potentially be a performance regression on NUMA machines that only have nodes with memory, right? > Objects from another node get into the per cpu queue. And on free we > assume that per cpu queue objects are from the local node. If debug is on > then we check that with BUG_ONs. > The interesting parts of the dmesg output are Online nodes o 0 o 2 Nodes with regular memory o 2 Current running CPU 0 is associated with node 0 Current node is 0 So node 2 has regular memory but it's trying to use node 0 at a glance. I've attached the patch I used against 2.6.24-rc8. It includes the revert. Here is the full output Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 02400000, size: 1192 Kbytes OF stdout device is: /vdevice/vty@30000000 Hypertas detected, assuming LPAR ! command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 loglevel=8 memory layout at init: alloc_bottom : 000000000252a000 alloc_top : 0000000008000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000008000000 ram_top : 0000000100000000 Looking for displays instantiating rtas at 0x00000000077d9000 ... done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x000000000262b000 -> 0x000000000262c1d3 Device tree struct 0x000000000262d000 -> 0x0000000002635000 Calling quiesce ... returning from prom_init Partition configured for 4 cpus. Starting Linux PPC64 #1 SMP Tue Jan 22 17:15:48 EST 2008 ----------------------------------------------------- ppc64_pft_size = 0x1a physicalMemorySize = 0x100000000 htab_hash_mask = 0x7ffff ----------------------------------------------------- Linux version 2.6.24-rc8-autokern1 (root@gekko-lp3.ltc.austin.ibm.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP Tue Jan 22 17:15:48 EST 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Zone PFN ranges: DMA 0 -> 1048576 Normal 1048576 -> 1048576 Movable zone start PFN for each node early_node_map[1] active PFN ranges 2: 0 -> 1048576 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 1034240 Policy zone: DMA Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 loglevel=8 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 238.059000 MHz time_init: processor frequency = 1904.472000 MHz clocksource: timebase mult[10cd746] shift[22] registered clockevent: decrementer mult[3cf1] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg0] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 2 Memory: 4105560k/4194304k available (5004k kernel code, 88744k reserved, 876k data, 559k bss, 272k init) Online nodes o 0 o 2 Nodes with regular memory o 2 Current running CPU 0 is associated with node 0 Current node is 0 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 0 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 1 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 2 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 3 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 4 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 5 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 6 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 7 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 8 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 9 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 10 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 11 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 12 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 13 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 14 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 15 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 16 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 17 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 18 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 19 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 20 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 21 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 22 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 23 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 24 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 25 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 26 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 27 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 28 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 29 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 30 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 31 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 32 kmem_cache_init Setting kmem_cache initkmem_list3 0 Unable to handle kernel paging request for data at address 0x00000040 Faulting instruction address: 0xc0000000003c8c00 cpu 0x0: Vector: 300 (Data Access) at [c0000000005c3840] pc: c0000000003c8c00: __lock_text_start+0x20/0x88 lr: c0000000000dadec: .cache_grow+0x7c/0x338 sp: c0000000005c3ac0 msr: 8000000000009032 dar: 40 dsisr: 40000000 current = 0xc000000000500f10 paca = 0xc000000000501b80 pid = 0, comm = swapper enter ? for help [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338 [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224 [c0000000005c3cb0] c0000000000db958 .kmem_cache_alloc+0xe0/0x14c [c0000000005c3d50] c0000000000dcccc .kmem_cache_create+0x230/0x4cc [c0000000005c3e30] c0000000004c05f4 .kmem_cache_init+0x310/0x640 [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0 0:mon> [-- Attachment #2: debug-slab-with-revert.diff --] [-- Type: text/x-diff, Size: 5708 bytes --] diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-005-debug-slab/mm/slab.c --- linux-2.6.24-rc8-clean/mm/slab.c 2008-01-16 04:22:48.000000000 +0000 +++ linux-2.6.24-rc8-005-debug-slab/mm/slab.c 2008-01-22 21:36:50.000000000 +0000 @@ -348,6 +348,7 @@ static int slab_early_init = 1; static void kmem_list3_init(struct kmem_list3 *parent) { + printk(" o kmem_list3_init\n"); INIT_LIST_HEAD(&parent->slabs_full); INIT_LIST_HEAD(&parent->slabs_partial); INIT_LIST_HEAD(&parent->slabs_free); @@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long * kmem_list3 and not this cpu's kmem_list3 */ + printk("cpuup_prepare %ld\n", cpu); list_for_each_entry(cachep, &cache_chain, next) { /* * Set up the size64 kmemlist for cpu before we can @@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long * node has not already allocated this */ if (!cachep->nodelists[node]) { + printk(" o allocing %s %d\n", cachep->name, node); l3 = kmalloc_node(memsize, GFP_KERNEL, node); if (!l3) goto bad; @@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long * protection here. */ cachep->nodelists[node] = l3; + printk(" o l3 setup\n"); } spin_lock_irq(&cachep->nodelists[node]->list_lock); @@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long } return 0; bad: + printk(" o bad\n"); cpuup_canceled(cpu); return -ENOMEM; } @@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache spin_lock_init(&ptr->list_lock); MAKE_ALL_LISTS(cachep, ptr, nodeid); + printk("init_list RESETTING %s node %d\n", cachep->name, nodeid); cachep->nodelists[nodeid] = ptr; local_irq_enable(); } @@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void) numa_platform = 0; } + printk("Online nodes\n"); + for_each_online_node(node) + printk("o %d\n", node); + printk("Nodes with regular memory\n"); + for_each_node_state(node, N_NORMAL_MEMORY) + printk("o %d\n", node); + printk("Current running CPU %d is associated with node %d\n", + smp_processor_id(), + cpu_to_node(smp_processor_id())); + printk("Current node is %d\n", + numa_node_id()); + for (i = 0; i < NUM_INIT_LISTS; i++) { kmem_list3_init(&initkmem_list3[i]); if (i < MAX_NUMNODES) cache_cache.nodelists[i] = NULL; + printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, i); } /* @@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void) cache_cache.colour_off = cache_line_size(); cache_cache.array[smp_processor_id()] = &initarray_cache.cache; cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE]; + printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, node); + printk("kmem_cache_init Setting %s initkmem_list3 %d\n", cache_cache.name, node); /* * struct kmem_cache size depends on nr_node_ids, which @@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + printk("set_up_list3s %s index %d\n", cachep->name, index); + for_each_online_node(node) { cachep->nodelists[node] = &initkmem_list3[index + node]; cachep->nodelists[node]->next_reap = jiffies + REAPTIMEOUT_LIST3 + ((unsigned long)cachep) % REAPTIMEOUT_LIST3; + printk("set_up_list3s %s index %d\n", cachep->name, index); } } @@ -2099,11 +2122,13 @@ static int __init_refok setup_cpu_cache( g_cpucache_up = PARTIAL_L3; } else { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + printk("setup_cpu_cache %s\n", cachep->name); + for_each_online_node(node) { cachep->nodelists[node] = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node); BUG_ON(!cachep->nodelists[node]); + printk(" o allocated node %d\n", node); kmem_list3_init(cachep->nodelists[node]); } } @@ -3815,8 +3840,10 @@ static int alloc_kmemlist(struct kmem_ca struct array_cache *new_shared; struct array_cache **new_alien = NULL; - for_each_node_state(node, N_NORMAL_MEMORY) { + printk("alloc_kmemlist %s\n", cachep->name); + for_each_online_node(node) { + printk(" o node %d\n", node); if (use_alien_caches) { new_alien = alloc_alien_cache(node, cachep->limit); if (!new_alien) @@ -3837,6 +3864,7 @@ static int alloc_kmemlist(struct kmem_ca l3 = cachep->nodelists[node]; if (l3) { struct array_cache *shared = l3->shared; + printk(" o l3 exists\n"); spin_lock_irq(&l3->list_lock); @@ -3856,10 +3884,12 @@ static int alloc_kmemlist(struct kmem_ca free_alien_cache(new_alien); continue; } + printk(" o allocing l3\n"); l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node); if (!l3) { free_alien_cache(new_alien); kfree(new_shared); + printk(" o allocing l3 failed\n"); goto fail; } @@ -3871,6 +3901,7 @@ static int alloc_kmemlist(struct kmem_ca l3->free_limit = (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num; cachep->nodelists[node] = l3; + printk(" o setting node %d 0x%lX\n", node, (unsigned long)l3); } return 0; @@ -3886,6 +3917,7 @@ fail: free_alien_cache(l3->alien); kfree(l3); cachep->nodelists[node] = NULL; + printk(" o setting node %d FAIL NULL\n", node); } node--; } ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 22:50 ` Mel Gorman @ 2008-01-22 22:57 ` Christoph Lameter 2008-01-22 23:10 ` Mel Gorman 2008-01-22 22:59 ` Pekka Enberg 1 sibling, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 22:57 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, 22 Jan 2008, Mel Gorman wrote: > > > Whatever this was a problem fixed in the past or not, it's broken again now > > > :( . It's possible that there is a __GFP_THISNODE that can be dropped early > > > at boot-time that would also fix this problem in a way that doesn't > > > affect runtime (like altering cache_grow in my patch does). > > > > The dropping of GFP_THISNODE has the same effect as your patch. > > The dropping of it totally? If so, this patch might fix a boot but it'll > potentially be a performance regression on NUMA machines that only have > nodes with memory, right? No the dropping during early allocations., > o 0 > o 2 > Nodes with regular memory > o 2 > Current running CPU 0 is associated with node 0 > Current node is 0 > > So node 2 has regular memory but it's trying to use node 0 at a glance. > I've attached the patch I used against 2.6.24-rc8. It includes the revert. We need the current processor to be attached to a node that has memory. We cannot fall back that early because the structures for the other nodes do not exist yet. > Online nodes > o 0 > o 2 > Nodes with regular memory > o 2 > Current running CPU 0 is associated with node 0 > Current node is 0 > o kmem_list3_init This needs to be node 2. > [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338 > [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224 Fallback during bootstrap. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 22:57 ` Christoph Lameter @ 2008-01-22 23:10 ` Mel Gorman 2008-01-22 23:14 ` Christoph Lameter 0 siblings, 1 reply; 61+ messages in thread From: Mel Gorman @ 2008-01-22 23:10 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On (22/01/08 14:57), Christoph Lameter didst pronounce: > On Tue, 22 Jan 2008, Mel Gorman wrote: > > > > > Whatever this was a problem fixed in the past or not, it's broken again now > > > > :( . It's possible that there is a __GFP_THISNODE that can be dropped early > > > > at boot-time that would also fix this problem in a way that doesn't > > > > affect runtime (like altering cache_grow in my patch does). > > > > > > The dropping of GFP_THISNODE has the same effect as your patch. > > > > The dropping of it totally? If so, this patch might fix a boot but it'll > > potentially be a performance regression on NUMA machines that only have > > nodes with memory, right? > > No the dropping during early allocations., > We can live with that if the machine otherwise survives during tests. They are kicked off at the moment with CONFIG_SLAB_DEBUG set but the point is moot if the patch doesn't work for Olaf. Am still waiting to hear if the two patches in combination work for him. > > o 0 > > o 2 > > Nodes with regular memory > > o 2 > > Current running CPU 0 is associated with node 0 > > Current node is 0 > > > > So node 2 has regular memory but it's trying to use node 0 at a glance. > > I've attached the patch I used against 2.6.24-rc8. It includes the revert. > > We need the current processor to be attached to a node that has > memory. We cannot fall back that early because the structures for the > other nodes do not exist yet. > Or bodge it early in the boot process so that a node with memory is always used. > > Online nodes > > o 0 > > o 2 > > Nodes with regular memory > > o 2 > > Current running CPU 0 is associated with node 0 > > Current node is 0 > > o kmem_list3_init > > This needs to be node 2. > Rather it should be 2. I'll admit the physical setup of this machine is .... less than ideal but clearly it's something that can happen even if it's a bad idea. > > [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338 > > [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224 > > Fallback during bootstrap. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 23:10 ` Mel Gorman @ 2008-01-22 23:14 ` Christoph Lameter 0 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 23:14 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, 22 Jan 2008, Mel Gorman wrote: > Rather it should be 2. I'll admit the physical setup of this machine is > .... less than ideal but clearly it's something that can happen even if > it's a bad idea. Ok. Lets hope that Pekka's find does the trick. But this would mean that fallback gets memory from node 2 for the page allocator. Then fallback alloc is going to try to insert it into the l3 of node 2 which is not there yet. So another ooops. Sigh. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 22:50 ` Mel Gorman 2008-01-22 22:57 ` Christoph Lameter @ 2008-01-22 22:59 ` Pekka Enberg 2008-01-22 23:12 ` Christoph Lameter 1 sibling, 1 reply; 61+ messages in thread From: Pekka Enberg @ 2008-01-22 22:59 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter Hi, Mel Gorman wrote: > Faulting instruction address: 0xc0000000003c8c00 > cpu 0x0: Vector: 300 (Data Access) at [c0000000005c3840] > pc: c0000000003c8c00: __lock_text_start+0x20/0x88 > lr: c0000000000dadec: .cache_grow+0x7c/0x338 > sp: c0000000005c3ac0 > msr: 8000000000009032 > dar: 40 > dsisr: 40000000 > current = 0xc000000000500f10 > paca = 0xc000000000501b80 > pid = 0, comm = swapper > enter ? for help > [c0000000005c3b40] c0000000000dadec .cache_grow+0x7c/0x338 > [c0000000005c3c00] c0000000000db54c .fallback_alloc+0x1c0/0x224 > [c0000000005c3cb0] c0000000000db958 .kmem_cache_alloc+0xe0/0x14c > [c0000000005c3d50] c0000000000dcccc .kmem_cache_create+0x230/0x4cc > [c0000000005c3e30] c0000000004c05f4 .kmem_cache_init+0x310/0x640 > [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc > [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0 > 0:mon> I mentioned this already but received no response (maybe I am missing something totally obvious here): When we call fallback_alloc() because the current node has ->nodelists set to NULL, we end up calling kmem_getpages() with -1 as the node id which is then translated to numa_node_id() by alloc_pages_node. But the reason we called fallback_alloc() in the first place is because numa_node_id() doesn't have a ->nodelist which makes cache_grow() oops. Pekka ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 22:59 ` Pekka Enberg @ 2008-01-22 23:12 ` Christoph Lameter 2008-01-22 23:18 ` Christoph Lameter 0 siblings, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 23:12 UTC (permalink / raw) To: Pekka Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Pekka Enberg wrote: > When we call fallback_alloc() because the current node has ->nodelists set to > NULL, we end up calling kmem_getpages() with -1 as the node id which is then > translated to numa_node_id() by alloc_pages_node. But the reason we called > fallback_alloc() in the first place is because numa_node_id() doesn't have a > ->nodelist which makes cache_grow() oops. Right, if nodeid == -1 then we need to call alloc_pages... Essentiall a revert of 50c85a19e7b3928b5b5188524c44ffcbacdd4e35 from 2005. But I doubt that this is it. The fallback logic was added later and it worked fine. --- mm/slab.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c 2008-01-22 15:05:26.185452369 -0800 +++ linux-2.6/mm/slab.c 2008-01-22 15:05:59.301637009 -0800 @@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; - page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (nodeid == -1) + page = alloc_pages(flags, cachep->gfporder); + else + page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (!page) return NULL; ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 23:12 ` Christoph Lameter @ 2008-01-22 23:18 ` Christoph Lameter 2008-01-23 8:19 ` Pekka Enberg 0 siblings, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 23:18 UTC (permalink / raw) To: Pekka Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, 22 Jan 2008, Christoph Lameter wrote: > But I doubt that this is it. The fallback logic was added later and it > worked fine. My patch is useless (fascinating history of the changelog there through). fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that alloc_pages_node() will try to allocate on the current node but fallback to neighboring node if nothing is there.... ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 23:18 ` Christoph Lameter @ 2008-01-23 8:19 ` Pekka Enberg 2008-01-23 8:40 ` Olaf Hering 0 siblings, 1 reply; 61+ messages in thread From: Pekka Enberg @ 2008-01-23 8:19 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki Hi Christoph, On Jan 23, 2008 1:18 AM, Christoph Lameter <clameter@sgi.com> wrote: > My patch is useless (fascinating history of the changelog there through). > fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that > alloc_pages_node() will try to allocate on the current node but fallback > to neighboring node if nothing is there.... Sure, but I was referring to the scenario where current node _has_ pages available but no ->nodelists. Olaf, did you try it? Pekka ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-23 8:19 ` Pekka Enberg @ 2008-01-23 8:40 ` Olaf Hering 0 siblings, 0 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-23 8:40 UTC (permalink / raw) To: Pekka Enberg Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Wed, Jan 23, Pekka Enberg wrote: > Hi Christoph, > > On Jan 23, 2008 1:18 AM, Christoph Lameter <clameter@sgi.com> wrote: > > My patch is useless (fascinating history of the changelog there through). > > fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that > > alloc_pages_node() will try to allocate on the current node but fallback > > to neighboring node if nothing is there.... > > Sure, but I was referring to the scenario where current node _has_ > pages available but no ->nodelists. Olaf, did you try it? Does not help. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 19:54 ` Mel Gorman 2008-01-22 20:11 ` Christoph Lameter @ 2008-01-22 21:45 ` Olaf Hering 2008-01-22 22:12 ` Nish Aravamudan 2008-01-22 22:23 ` Christoph Lameter 1 sibling, 2 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-22 21:45 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Tue, Jan 22, Mel Gorman wrote: > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch > .. Can you please check on your machine if it fixes your problem? It does not fix or change the nature of the crash. > Olaf, please confirm whether you need the patch below as well as the > revert to make your machine boot. It crashes now in a different way if the patch below is applied: Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: debug xmon=on panic=1 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.070000 MHz time_init: processor frequency = 2197.800000 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Unable to handle kernel paging request for data at address 0x00000058 Faulting instruction address: 0xc0000000000fe018 cpu 0x0: Vector: 300 (Data Access) at [c00000000075bac0] pc: c0000000000fe018: .setup_cpu_cache+0x184/0x1f4 lr: c0000000000fdfa8: .setup_cpu_cache+0x114/0x1f4 sp: c00000000075bd40 msr: 8000000000009032 dar: 58 dsisr: 42000000 current = 0xc000000000665a50 paca = 0xc000000000666380 pid = 0, comm = swapper enter ? for help [c00000000075bd40] c0000000000fb368 .kmem_cache_create+0x3c0/0x478 (unreliable) [c00000000075be20] c0000000005e6780 .kmem_cache_init+0x284/0x4f4 [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0 0:mon> 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). 2106 BUG_ON(!cachep->nodelists[node]); 2107 kmem_list3_init(cachep->nodelists[node]); 2108 } 2109 } 2110 } 2111 cachep->nodelists[numa_node_id()]->next_reap = 2112 jiffies + REAPTIMEOUT_LIST3 + 2113 ((unsigned long)cachep) % REAPTIMEOUT_LIST3; 2114 2115 cpu_cache_get(cachep)->avail = 0; ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 21:45 ` Olaf Hering @ 2008-01-22 22:12 ` Nish Aravamudan 2008-01-22 22:23 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Nish Aravamudan @ 2008-01-22 22:12 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On 1/22/08, Olaf Hering <olaf@aepfle.de> wrote: > On Tue, Jan 22, Mel Gorman wrote: > > > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch > > .. Can you please check on your machine if it fixes your problem? > > It does not fix or change the nature of the crash. > > > Olaf, please confirm whether you need the patch below as well as the > > revert to make your machine boot. > > It crashes now in a different way if the patch below is applied: Was this with the revert Mel mentioned applied as well? I get the feeling both patches are needed to fix up the memoryless SLAB issue. > Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008 <snip> > early_node_map[1] active PFN ranges > 1: 0 -> 892928 <snip> > Unable to handle kernel paging request for data at address 0x00000058 > Faulting instruction address: 0xc0000000000fe018 > cpu 0x0: Vector: 300 (Data Access) at [c00000000075bac0] > pc: c0000000000fe018: .setup_cpu_cache+0x184/0x1f4 > lr: c0000000000fdfa8: .setup_cpu_cache+0x114/0x1f4 > sp: c00000000075bd40 > msr: 8000000000009032 > dar: 58 > dsisr: 42000000 > current = 0xc000000000665a50 > paca = 0xc000000000666380 > pid = 0, comm = swapper > enter ? for help > [c00000000075bd40] c0000000000fb368 .kmem_cache_create+0x3c0/0x478 (unreliable) > [c00000000075be20] c0000000005e6780 .kmem_cache_init+0x284/0x4f4 > [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc > [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0 > 0:mon> > > 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > 2106 BUG_ON(!cachep->nodelists[node]); > 2107 kmem_list3_init(cachep->nodelists[node]); I might be barking up the wrong tree, but this block above is supposed to set up the cachep->nodeslists[*] that are used immediately below. But if the loop wasn't changed from N_NORMAL_MEMORY to N_ONLINE or whatever, you might get a bad access right below for node 0 that has no memory, if that's the node we're running on... > 2108 } > 2109 } > 2110 } > 2111 cachep->nodelists[numa_node_id()]->next_reap = > 2112 jiffies + REAPTIMEOUT_LIST3 + > 2113 ((unsigned long)cachep) % REAPTIMEOUT_LIST3; > 2114 > 2115 cpu_cache_get(cachep)->avail = 0; Thanks, Nish ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 21:45 ` Olaf Hering 2008-01-22 22:12 ` Nish Aravamudan @ 2008-01-22 22:23 ` Christoph Lameter 2008-01-23 7:58 ` Olaf Hering 1 sibling, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-22 22:23 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, 22 Jan 2008, Olaf Hering wrote: > It crashes now in a different way if the patch below is applied: Yup no l3 structure for the current node. We are early in boostrap. You could just check if the l3 is there and if not just skip starting the reaper? This will be redone later anyways. Not sure if this will solve all your issues though. An l3 for the current node that we are booting on needs to be created early on for SLAB bootstrap to succeed. AFAICT SLUB doesnt care and simply uses whatever the page allocator gives it for the cpu slab. We may have gotten there because you only tested with SLUB recently and thus changes got in that broke SLAB boot assumptions. > 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > 2106 BUG_ON(!cachep->nodelists[node]); > 2107 kmem_list3_init(cachep->nodelists[node]); > 2108 } > 2109 } > 2110 } if (cachep->nodelists[numa_node_id()]) return; > 2111 cachep->nodelists[numa_node_id()]->next_reap = > 2112 jiffies + REAPTIMEOUT_LIST3 + > 2113 ((unsigned long)cachep) % REAPTIMEOUT_LIST3; > 2114 > 2115 cpu_cache_get(cachep)->avail = 0; > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-22 22:23 ` Christoph Lameter @ 2008-01-23 7:58 ` Olaf Hering 2008-01-23 10:50 ` Mel Gorman 0 siblings, 1 reply; 61+ messages in thread From: Olaf Hering @ 2008-01-23 7:58 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Tue, Jan 22, Christoph Lameter wrote: > > 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > > 2106 BUG_ON(!cachep->nodelists[node]); > > 2107 kmem_list3_init(cachep->nodelists[node]); > > 2108 } > > 2109 } > > 2110 } > > if (cachep->nodelists[numa_node_id()]) > return; Does not help. Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #48 SMP Wed Jan 23 08:54:23 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: debug xmon=on panic=1 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.070000 MHz time_init: processor frequency = 2197.800000 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32(DMA)' Rebooting in 1 seconds.. --- mm/slab.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) --- a/mm/slab.c +++ b/mm/slab.c @@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { cachep->nodelists[node] = &initkmem_list3[index + node]; cachep->nodelists[node]->next_reap = jiffies + REAPTIMEOUT_LIST3 + @@ -2108,6 +2108,8 @@ static int __init_refok setup_cpu_cache( } } } + if (!cachep->nodelists[numa_node_id()]) + return -ENODEV; cachep->nodelists[numa_node_id()]->next_reap = jiffies + REAPTIMEOUT_LIST3 + ((unsigned long)cachep) % REAPTIMEOUT_LIST3; @@ -2775,6 +2777,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3324,10 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: @@ -3815,7 +3826,7 @@ static int alloc_kmemlist(struct kmem_ca struct array_cache *new_shared; struct array_cache **new_alien = NULL; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { if (use_alien_caches) { new_alien = alloc_alien_cache(node, cachep->limit); ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-23 7:58 ` Olaf Hering @ 2008-01-23 10:50 ` Mel Gorman 2008-01-23 12:14 ` Olaf Hering 0 siblings, 1 reply; 61+ messages in thread From: Mel Gorman @ 2008-01-23 10:50 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On (23/01/08 08:58), Olaf Hering didst pronounce: > On Tue, Jan 22, Christoph Lameter wrote: > > > > 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > > > 2106 BUG_ON(!cachep->nodelists[node]); > > > 2107 kmem_list3_init(cachep->nodelists[node]); > > > 2108 } > > > 2109 } > > > 2110 } > > > > if (cachep->nodelists[numa_node_id()]) > > return; > > Does not help. > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the following patch against 2.6.24-rc8 please? It contains the debug information that helped me figure out what was going wrong on the PPC64 machine here, the revert and the !l3 checks (i.e. the two patches that made machines I have access to work). Thanks diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-015_debug_slab/mm/slab.c --- linux-2.6.24-rc8-clean/mm/slab.c 2008-01-16 04:22:48.000000000 +0000 +++ linux-2.6.24-rc8-015_debug_slab/mm/slab.c 2008-01-23 10:44:36.000000000 +0000 @@ -348,6 +348,7 @@ static int slab_early_init = 1; static void kmem_list3_init(struct kmem_list3 *parent) { + printk(" o kmem_list3_init\n"); INIT_LIST_HEAD(&parent->slabs_full); INIT_LIST_HEAD(&parent->slabs_partial); INIT_LIST_HEAD(&parent->slabs_free); @@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long * kmem_list3 and not this cpu's kmem_list3 */ + printk("cpuup_prepare %ld\n", cpu); list_for_each_entry(cachep, &cache_chain, next) { /* * Set up the size64 kmemlist for cpu before we can @@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long * node has not already allocated this */ if (!cachep->nodelists[node]) { + printk(" o allocing %s %d\n", cachep->name, node); l3 = kmalloc_node(memsize, GFP_KERNEL, node); if (!l3) goto bad; @@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long * protection here. */ cachep->nodelists[node] = l3; + printk(" o l3 setup\n"); } spin_lock_irq(&cachep->nodelists[node]->list_lock); @@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long } return 0; bad: + printk(" o bad\n"); cpuup_canceled(cpu); return -ENOMEM; } @@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache spin_lock_init(&ptr->list_lock); MAKE_ALL_LISTS(cachep, ptr, nodeid); + printk("init_list RESETTING %s node %d\n", cachep->name, nodeid); cachep->nodelists[nodeid] = ptr; local_irq_enable(); } @@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void) numa_platform = 0; } + printk("Online nodes\n"); + for_each_online_node(node) + printk("o %d\n", node); + printk("Nodes with regular memory\n"); + for_each_node_state(node, N_NORMAL_MEMORY) + printk("o %d\n", node); + printk("Current running CPU %d is associated with node %d\n", + smp_processor_id(), + cpu_to_node(smp_processor_id())); + printk("Current node is %d\n", + numa_node_id()); + for (i = 0; i < NUM_INIT_LISTS; i++) { kmem_list3_init(&initkmem_list3[i]); if (i < MAX_NUMNODES) cache_cache.nodelists[i] = NULL; + printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, i); } /* @@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void) cache_cache.colour_off = cache_line_size(); cache_cache.array[smp_processor_id()] = &initarray_cache.cache; cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE]; + printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, node); + printk("kmem_cache_init Setting %s initkmem_list3 %d\n", cache_cache.name, node); /* * struct kmem_cache size depends on nr_node_ids, which @@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + printk("set_up_list3s %s index %d\n", cachep->name, index); + for_each_online_node(node) { cachep->nodelists[node] = &initkmem_list3[index + node]; cachep->nodelists[node]->next_reap = jiffies + REAPTIMEOUT_LIST3 + ((unsigned long)cachep) % REAPTIMEOUT_LIST3; + printk("set_up_list3s %s index %d\n", cachep->name, index); } } @@ -2099,11 +2122,13 @@ static int __init_refok setup_cpu_cache( g_cpucache_up = PARTIAL_L3; } else { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + printk("setup_cpu_cache %s\n", cachep->name); + for_each_online_node(node) { cachep->nodelists[node] = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node); BUG_ON(!cachep->nodelists[node]); + printk(" o allocated node %d\n", node); kmem_list3_init(cachep->nodelists[node]); } } @@ -2775,6 +2800,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3347,10 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: @@ -3815,8 +3849,10 @@ static int alloc_kmemlist(struct kmem_ca struct array_cache *new_shared; struct array_cache **new_alien = NULL; - for_each_node_state(node, N_NORMAL_MEMORY) { + printk("alloc_kmemlist %s\n", cachep->name); + for_each_online_node(node) { + printk(" o node %d\n", node); if (use_alien_caches) { new_alien = alloc_alien_cache(node, cachep->limit); if (!new_alien) @@ -3837,6 +3873,7 @@ static int alloc_kmemlist(struct kmem_ca l3 = cachep->nodelists[node]; if (l3) { struct array_cache *shared = l3->shared; + printk(" o l3 exists\n"); spin_lock_irq(&l3->list_lock); @@ -3856,10 +3893,12 @@ static int alloc_kmemlist(struct kmem_ca free_alien_cache(new_alien); continue; } + printk(" o allocing l3\n"); l3 = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node); if (!l3) { free_alien_cache(new_alien); kfree(new_shared); + printk(" o allocing l3 failed\n"); goto fail; } @@ -3871,6 +3910,7 @@ static int alloc_kmemlist(struct kmem_ca l3->free_limit = (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num; cachep->nodelists[node] = l3; + printk(" o setting node %d 0x%lX\n", node, (unsigned long)l3); } return 0; @@ -3886,6 +3926,7 @@ fail: free_alien_cache(l3->alien); kfree(l3); cachep->nodelists[node] = NULL; + printk(" o setting node %d FAIL NULL\n", node); } node--; } -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-23 10:50 ` Mel Gorman @ 2008-01-23 12:14 ` Olaf Hering 2008-01-23 12:52 ` Olaf Hering 2008-01-23 13:41 ` crash in kmem_cache_init Mel Gorman 0 siblings, 2 replies; 61+ messages in thread From: Olaf Hering @ 2008-01-23 12:14 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Wed, Jan 23, Mel Gorman wrote: > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the > following patch against 2.6.24-rc8 please? It contains the debug information > that helped me figure out what was going wrong on the PPC64 machine here, > the revert and the !l3 checks (i.e. the two patches that made machines I > have access to work). Thanks It boots with your change. boot: x Please wait, loading kernel... Allocated 00a00000 bytes for kernel @ 00200000 Elf64 kernel loaded... OF stdout device is: /vdevice/vty@30000000 Hypertas detected, assuming LPAR ! command line: debug xmon=on panic=1 loglevel=8 memory layout at init: alloc_bottom : 0000000000ac1000 alloc_top : 0000000010000000 alloc_top_hi : 00000000da000000 rmo_top : 0000000010000000 ram_top : 00000000da000000 Looking for displays found display : /pci@800000020000002/pci@2/pci@1/display@0, opening ... done instantiating rtas at 0x000000000f6a1000 ... done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done 0000000000000004 : starting cpu hw idx 0000000000000004... done 0000000000000006 : starting cpu hw idx 0000000000000006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4 Device tree struct 0x0000000000cc4000 -> 0x0000000000cd6000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #52 SMP Wed Jan 23 13:05:38 CET 2008 ----------------------------------------------------- ppc64_pft_size = 0x1c physicalMemorySize = 0xda000000 htab_hash_mask = 0x1fffff ----------------------------------------------------- Linux version 2.6.24-rc8-ppc64 (olaf@lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #52 SMP Wed Jan 23 13:05:38 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: debug xmon=on panic=1 loglevel=8 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.070000 MHz time_init: processor frequency = 2197.800000 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Online nodes o 0 o 1 Nodes with regular memory o 1 Current running CPU 0 is associated with node 0 Current node is 0 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 0 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 1 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 2 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 3 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 4 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 5 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 6 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 7 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 8 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 9 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 10 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 11 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 12 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 13 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 14 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 15 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 16 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 17 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 18 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 19 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 20 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 21 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 22 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 23 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 24 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 25 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 26 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 27 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 28 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 29 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 30 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 31 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 32 kmem_cache_init Setting kmem_cache NULL 0 kmem_cache_init Setting kmem_cache initkmem_list3 0 set_up_list3s size-32 index 1 set_up_list3s size-32 index 1 set_up_list3s size-32 index 1 set_up_list3s size-128 index 17 set_up_list3s size-128 index 17 set_up_list3s size-128 index 17 setup_cpu_cache size-32(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-64 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-64(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-128(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-256 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-256(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-512 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-512(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-1024 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-1024(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-2048 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-2048(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-4096 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-4096(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-8192 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-8192(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-16384 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-16384(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-32768 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-32768(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-65536 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-65536(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-131072 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-131072(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-262144 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-262144(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-524288 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-524288(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-1048576 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-1048576(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-2097152 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-2097152(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-4194304 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-4194304(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-8388608 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-8388608(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-16777216 o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init setup_cpu_cache size-16777216(DMA) o allocated node 0 o kmem_list3_init o allocated node 1 o kmem_list3_init init_list RESETTING kmem_cache node 0 init_list RESETTING size-32 node 0 init_list RESETTING size-128 node 0 init_list RESETTING size-32 node 1 init_list RESETTING size-128 node 1 alloc_kmemlist size-16777216(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-16777216 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-8388608(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-8388608 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-4194304(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-4194304 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-2097152(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-2097152 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-1048576(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-1048576 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-524288(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-524288 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-262144(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-262144 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-131072(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-131072 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-65536(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-65536 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-32768(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-32768 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-16384(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-16384 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-8192(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-8192 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-4096(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-4096 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-2048(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-2048 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-1024(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-1024 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-512(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-512 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-256(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-256 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-128(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-64(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-64 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-32(DMA) o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-128 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist size-32 o node 0 o l3 exists o node 1 o l3 exists alloc_kmemlist kmem_cache o node 0 o l3 exists o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D802FA00 alloc_kmemlist numa_policy o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D802FC00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D802FD80 alloc_kmemlist shared_policy_node o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D802FF00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D803E180 Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320) alloc_kmemlist pid_1 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D803E300 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D803E480 alloc_kmemlist pid_namespace o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D803E600 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D803E780 alloc_kmemlist pgd_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D803E900 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D803EA80 alloc_kmemlist pud_pmd_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D803EC00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D803ED80 alloc_kmemlist anon_vma o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D803EF00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D804C180 alloc_kmemlist task_struct o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D804C300 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D804C480 alloc_kmemlist sighand_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D804C600 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D804C780 alloc_kmemlist signal_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D804C900 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D804CA80 alloc_kmemlist files_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D804CC00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D804CD80 alloc_kmemlist fs_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D804CF00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8057180 alloc_kmemlist vm_area_struct o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8057300 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8057480 alloc_kmemlist mm_struct o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8057600 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8057780 alloc_kmemlist buffer_head o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8057900 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8057A80 alloc_kmemlist idr_layer_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8057C80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8057E00 alloc_kmemlist key_jar o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8057F80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8066200 Security Framework initialized Capability LSM initialized Failure registering Root Plug module with the kernel Failure registering Root Plug module with primary security module. alloc_kmemlist names_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8066380 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8066500 alloc_kmemlist filp o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8066680 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8066800 alloc_kmemlist dentry o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8066980 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8066B00 alloc_kmemlist inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8066C80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8066E00 alloc_kmemlist mnt_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8066F80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8074200 Mount-cache hash table entries: 256 alloc_kmemlist sysfs_dir_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8074380 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8074500 alloc_kmemlist bdev_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8074700 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8074880 alloc_kmemlist radix_tree_node o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8074A00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8074B80 alloc_kmemlist sigqueue o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8074D00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8074E80 alloc_kmemlist proc_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D808E100 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D808E280 alloc_kmemlist taskstats o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D808E400 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D808E580 alloc_kmemlist task_delay_info o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D808E700 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D808E880 cpuup_prepare 1 clockevent: decrementer mult[466a] shift[16] cpu[1] Processor 1 found. cpuup_prepare 2 clockevent: decrementer mult[466a] shift[16] cpu[2] Processor 2 found. cpuup_prepare 3 clockevent: decrementer mult[466a] shift[16] cpu[3] Processor 3 found. cpuup_prepare 4 clockevent: decrementer mult[466a] shift[16] cpu[4] Processor 4 found. cpuup_prepare 5 clockevent: decrementer mult[466a] shift[16] cpu[5] Processor 5 found. cpuup_prepare 6 clockevent: decrementer mult[466a] shift[16] cpu[6] Processor 6 found. cpuup_prepare 7 clockevent: decrementer mult[466a] shift[16] cpu[7] Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-3 Node 1 CPUs: 4-7 net_namespace: 120 bytes alloc_kmemlist file_lock_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D82C6680 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D82C6800 alloc_kmemlist skbuff_head_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D82C6980 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D82C6B00 alloc_kmemlist skbuff_fclone_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D82C6D00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D82C6E80 alloc_kmemlist sock_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8372180 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8372300 NET: Registered protocol family 16 IBM eBus Device Driver PCI: Probing PCI hardware IOMMU table initialized, virtual merging enabled PCI: Probing PCI hardware done Registering pmac pic with sysfs... alloc_kmemlist bio o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83E9580 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83E9700 alloc_kmemlist biovec-1 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83E9880 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83E9A00 alloc_kmemlist biovec-4 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83E9B80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83E9D00 alloc_kmemlist biovec-16 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83E9E80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83F4100 alloc_kmemlist biovec-64 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83F4300 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83F4480 alloc_kmemlist biovec-128 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83F4600 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83F4780 alloc_kmemlist biovec-256 o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D83F4900 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D83F4A80 alloc_kmemlist blkdev_requests o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8401580 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8401700 alloc_kmemlist blkdev_queue o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8401880 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8401A00 alloc_kmemlist blkdev_ioc o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8401B80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8401D00 usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb alloc_kmemlist eventpoll_epi o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8439A80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8439C00 alloc_kmemlist eventpoll_pwq o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8439D80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8439F00 alloc_kmemlist TCP o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8486380 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8486500 alloc_kmemlist request_sock_TCP o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8486680 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8486800 alloc_kmemlist tw_sock_TCP o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8486980 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8486B00 alloc_kmemlist UDP o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8486D00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D8486E80 alloc_kmemlist RAW o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D849D180 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D849D300 NET: Registered protocol family 2 Time: timebase clocksource has been installed. Switched to high resolution mode on CPU 0 Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 2 Switched to high resolution mode on CPU 3 Switched to high resolution mode on CPU 4 Switched to high resolution mode on CPU 5 Switched to high resolution mode on CPU 6 Switched to high resolution mode on CPU 7 alloc_kmemlist arp_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D849D480 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D849D600 alloc_kmemlist ip_dst_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D849DC80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D849DE00 IP route cache hash table entries: 131072 (order: 8, 1048576 bytes) alloc_kmemlist xfrm_dst_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84A8300 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84A8480 alloc_kmemlist secpath_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84A8600 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84A8780 alloc_kmemlist inet_peer_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84A8900 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84A8A80 alloc_kmemlist tcp_bind_bucket o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84A8C00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84A8D80 TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered alloc_kmemlist UDP-Lite o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84A8F80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84C6200 alloc_kmemlist ip_mrt_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84C6380 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84C6500 alloc_kmemlist rtas_flash_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D8294880 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84E5F80 alloc_kmemlist hugepte_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84EA800 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84EA680 alloc_kmemlist uid_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84EA400 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84EA280 alloc_kmemlist posix_timers_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D84EA100 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D84EAF00 alloc_kmemlist nsproxy o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85BB180 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85BB300 audit: initializing netlink socket (disabled) audit(1201090162.460:1): initialized RTAS daemon started RTAS: event: 88, Type: Platform Error, Severity: 2 Total HugeTLB memory allocated, 0 alloc_kmemlist shmem_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85BB680 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85BB800 alloc_kmemlist fasync_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85BBA00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85BBB80 alloc_kmemlist kiocb o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85BBD00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85BBE80 alloc_kmemlist kioctx o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85EF180 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85EF300 alloc_kmemlist inotify_watch_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85EF880 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85EFA00 alloc_kmemlist inotify_event_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85EFB80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85EFD00 VFS: Disk quotas dquot_6.5.1 alloc_kmemlist dquot o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85EFE80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85FD100 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) alloc_kmemlist dnotify_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85FD280 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85FD400 alloc_kmemlist reiser_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85FD600 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85FD780 alloc_kmemlist ext3_xattr o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85FD980 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85FDB00 alloc_kmemlist ext3_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D85FDD00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D85FDE80 alloc_kmemlist revoke_record o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6036100 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6036280 alloc_kmemlist revoke_table o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6036400 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6036580 alloc_kmemlist journal_head o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6036700 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6036880 alloc_kmemlist journal_handle o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6036A00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6036B80 alloc_kmemlist ext2_xattr o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6036D80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6036F00 alloc_kmemlist ext2_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6046200 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6046380 alloc_kmemlist hugetlbfs_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6046580 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6046700 alloc_kmemlist fat_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6046900 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6046A80 alloc_kmemlist fat_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6046C80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6046E00 alloc_kmemlist isofs_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6052100 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6052280 alloc_kmemlist mqueue_inode_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6052500 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6052680 alloc_kmemlist bsg_cmd o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6052880 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6052A00 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered alloc_kmemlist cfq_queue o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6052C00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6052D80 alloc_kmemlist cfq_io_context o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D6052F00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D6070180 io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1 rpaphp: Slot [0001:00:02.0](PCI location=U7879.001.DQD04M6-P1-C3) registered rpaphp: Slot [0001:00:02.2](PCI location=U7879.001.DQD04M6-P1-C4) registered rpaphp: Slot [0001:00:02.4](PCI location=U7879.001.DQD04M6-P1-C5) registered rpaphp: Slot [0001:00:02.6](PCI location=U7879.001.DQD04M6-P1-C6) registered rpaphp: Slot [0002:00:02.0](PCI location=U7879.001.DQD04M6-P1-C1) registered rpaphp: Slot [0002:00:02.6](PCI location=U7879.001.DQD04M6-P1-C2) registered matroxfb: Matrox G450 detected PInS data found at offset 31168 PInS memtype = 5 matroxfb: 640x480x8bpp (virtual: 640x26214) matroxfb: framebuffer at 0x40178000000, mapped to 0xd000080080080000, size 33554432 Console: switching to colour frame buffer device 80x30 fb0: MATROX frame buffer device matroxfb_crtc2: secondary head of fb0 was registered as fb1 vio_register_driver: driver hvc_console registering HVSI: registered 0 devices Generic RTC Driver v1.07 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>) input: Macintosh mouse button emulation as /devices/virtual/input/input0 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ehci_hcd 0000:c8:01.2: EHCI Host Controller ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1 ehci_hcd 0000:c8:01.2: irq 85, io mem 0x400a0002000 ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 5 ports detected ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver ohci_hcd 0000:c8:01.0: OHCI Host Controller ohci_hcd 0000:c8:01.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:c8:01.0: irq 85, io mem 0x400a0001000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected ohci_hcd 0000:c8:01.1: OHCI Host Controller ohci_hcd 0000:c8:01.1: new USB bus registered, assigned bus number 3 ohci_hcd 0000:c8:01.1: irq 85, io mem 0x400a0000000 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected mice: PS/2 mouse device common for all mice EDAC MC: Ver: 2.1.0 Jan 23 2008 usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid /home/olaf/kernel/git/linux-2.6.24-rc8/drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver oprofile: using ppc64/power5+ performance monitoring. alloc_kmemlist flow_cache o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D612FA80 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D612FC00 alloc_kmemlist UNIX o node 0 o allocing l3 o kmem_list3_init o setting node 0 0xC0000000D612FE00 o node 1 o allocing l3 o kmem_list3_init o setting node 1 0xC0000000D612FF80 NET: Registered protocol family 1 NET: Registered protocol family 17 NET: Registered protocol family 15 registered taskstats version 1 md: Autodetecting RAID arrays. md: Scanned 0 and added 0 devices. md: autorun ... md: ... autorun DONE. VFS: Cannot open root device "<NULL>" or unknown-block(0,0) Please append a correct "root=" boot option; here are the available partitions: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) Rebooting in 1 seconds.. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-23 12:14 ` Olaf Hering @ 2008-01-23 12:52 ` Olaf Hering 2008-01-23 13:55 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman 2008-01-23 13:41 ` crash in kmem_cache_init Mel Gorman 1 sibling, 1 reply; 61+ messages in thread From: Olaf Hering @ 2008-01-23 12:52 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Wed, Jan 23, Olaf Hering wrote: > On Wed, Jan 23, Mel Gorman wrote: > > > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the > > following patch against 2.6.24-rc8 please? It contains the debug information > > that helped me figure out what was going wrong on the PPC64 machine here, > > the revert and the !l3 checks (i.e. the two patches that made machines I > > have access to work). Thanks > > It boots with your change. This version of the patch boots ok for me: Maybe I made a mistake with earlier patches, no idea. --- mm/slab.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) --- a/mm/slab.c +++ b/mm/slab.c @@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { cachep->nodelists[node] = &initkmem_list3[index + node]; cachep->nodelists[node]->next_reap = jiffies + REAPTIMEOUT_LIST3 + @@ -2099,7 +2099,7 @@ static int __init_refok setup_cpu_cache( g_cpucache_up = PARTIAL_L3; } else { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { cachep->nodelists[node] = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node); @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: @@ -3815,7 +3824,7 @@ static int alloc_kmemlist(struct kmem_ca struct array_cache *new_shared; struct array_cache **new_alien = NULL; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { if (use_alien_caches) { new_alien = alloc_alien_cache(node, cachep->limit); ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 12:52 ` Olaf Hering @ 2008-01-23 13:55 ` Mel Gorman 2008-01-23 14:18 ` Pekka J Enberg ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Mel Gorman @ 2008-01-23 13:55 UTC (permalink / raw) To: akpm, Christoph Lameter, Pekka Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, KAMEZAWA Hiroyuki This patch in combination with a partial revert of commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 fixes a regression between 2.6.23 and 2.6.24-rc8 where a PPC64 machine with all CPUS on a memoryless node fails to boot. If approved by the SLAB maintainers, it should be merged for 2.6.24. With memoryless-node configurations, it is possible that all the CPUs are associated with a node with no memory. Early in the boot process, nodelists are not setup that allow fallback_alloc to work, an Oops occurs and the machine fails to boot. This patch adds the necessary checks to make sure a kmem_list3 exists for the preferred node used when growing the cache. If the preferred node has no nodelist then the currently running node is used instead. This problem only affects the SLAB allocator, SLUB appears to work fine. Signed-off-by: Mel Gorman <mel@csn.ul.ie> --- mm/slab.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c --- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 2008-01-22 17:46:32.000000000 +0000 +++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c 2008-01-22 18:42:53.000000000 +0000 @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 13:55 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman @ 2008-01-23 14:18 ` Pekka J Enberg 2008-01-23 14:32 ` Pekka J Enberg 2008-01-23 18:35 ` Christoph Lameter 2008-01-23 14:27 ` Olaf Hering 2008-01-23 18:41 ` Christoph Lameter 2 siblings, 2 replies; 61+ messages in thread From: Pekka J Enberg @ 2008-01-23 14:18 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter Hi Mel, On Wed, 23 Jan 2008, Mel Gorman wrote: > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c > --- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 2008-01-22 17:46:32.000000000 +0000 > +++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c 2008-01-22 18:42:53.000000000 +0000 > @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache > /* Take the l3 list lock to change the colour_next on this node */ > check_irq_off(); > l3 = cachep->nodelists[nodeid]; > + if (!l3) { > + nodeid = numa_node_id(); > + l3 = cachep->nodelists[nodeid]; > + } > + BUG_ON(!l3); > spin_lock(&l3->list_lock); > > /* Get colour for the slab, and cal the next value. */ > @@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct > int x; > > l3 = cachep->nodelists[nodeid]; > + if (!l3) { > + nodeid = numa_node_id(); > + l3 = cachep->nodelists[nodeid]; > + } What guarantees that current node ->nodelists is never NULL? I still think Christoph's kmem_getpages() patch is correct (to fix cache_grow() oops) but I overlooked the fact that none the callers of ____cache_alloc_node() deal with bootstrapping (with the exception of __cache_alloc_node() that even has a comment about it). But what I am really wondering about is, why wasn't the N_NORMAL_MEMORY revert enough? I assume this used to work before so what more do we need to revert for 2.6.24? Pekka ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 14:18 ` Pekka J Enberg @ 2008-01-23 14:32 ` Pekka J Enberg 2008-01-23 14:49 ` Pekka J Enberg 2008-01-23 18:35 ` Christoph Lameter 1 sibling, 1 reply; 61+ messages in thread From: Pekka J Enberg @ 2008-01-23 14:32 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Wed, 23 Jan 2008, Pekka J Enberg wrote: > I still think Christoph's kmem_getpages() patch is correct (to fix > cache_grow() oops) but I overlooked the fact that none the callers of > ____cache_alloc_node() deal with bootstrapping (with the exception of > __cache_alloc_node() that even has a comment about it). So something like this (totally untested) patch on top of current git: --- mm/slab.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c +++ linux-2.6/mm/slab.c @@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; - page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (nodeid == -1) + page = alloc_pages(flags, cachep->gfporder); + else + page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (!page) return NULL; @@ -2976,8 +2980,9 @@ retry: batchcount = BATCHREFILL_LIMIT; } l3 = cachep->nodelists[node]; + if (!l3) + return NULL; - BUG_ON(ac->avail > 0 || !l3); spin_lock(&l3->list_lock); /* See if we can refill from the shared array */ @@ -3317,7 +3322,8 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; - BUG_ON(!l3); + if (!l3) + return fallback_alloc(cachep, flags); retry: check_irq_off(); @@ -3394,12 +3400,6 @@ __cache_alloc_node(struct kmem_cache *ca if (unlikely(nodeid == -1)) nodeid = numa_node_id(); - if (unlikely(!cachep->nodelists[nodeid])) { - /* Node not bootstrapped yet */ - ptr = fallback_alloc(cachep, flags); - goto out; - } - if (nodeid == numa_node_id()) { /* * Use the locally cached objects if possible. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 14:32 ` Pekka J Enberg @ 2008-01-23 14:49 ` Pekka J Enberg 2008-01-23 15:56 ` Mel Gorman 2008-01-23 18:36 ` Christoph Lameter 0 siblings, 2 replies; 61+ messages in thread From: Pekka J Enberg @ 2008-01-23 14:49 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter Hi, On Wed, 23 Jan 2008, Pekka J Enberg wrote: > > I still think Christoph's kmem_getpages() patch is correct (to fix > > cache_grow() oops) but I overlooked the fact that none the callers of > > ____cache_alloc_node() deal with bootstrapping (with the exception of > > __cache_alloc_node() that even has a comment about it). > > So something like this (totally untested) patch on top of current git: Sorry, removed a BUG_ON() from cache_alloc_refill() by mistake, here's a better one: [PATCH] slab: fix allocation on memoryless nodes From: Pekka Enberg <penberg@cs.helsinki.fi> As memoryless nodes do not have a nodelist, change cache_alloc_refill() to bail out for those and let ____cache_alloc_node() always deal with that by resorting to fallback_alloc(). Furthermore, don't let kmem_getpages() call alloc_pages_node() if nodeid passed to it is -1 as the latter will always translate that to numa_node_id() which might not have ->nodelist that caused the invocation of fallback_alloc() in the first place (for example, during bootstrap). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> --- mm/slab.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c +++ linux-2.6/mm/slab.c @@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; - page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (nodeid == -1) + page = alloc_pages(flags, cachep->gfporder); + else + page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (!page) return NULL; @@ -2975,9 +2979,11 @@ retry: */ batchcount = BATCHREFILL_LIMIT; } + BUG_ON(ac->avail > 0); l3 = cachep->nodelists[node]; + if (!l3) + return NULL; - BUG_ON(ac->avail > 0 || !l3); spin_lock(&l3->list_lock); /* See if we can refill from the shared array */ @@ -3317,7 +3323,8 @@ static void *____cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; - BUG_ON(!l3); + if (!l3) + return fallback_alloc(cachep, flags); retry: check_irq_off(); @@ -3394,12 +3401,6 @@ __cache_alloc_node(struct kmem_cache *ca if (unlikely(nodeid == -1)) nodeid = numa_node_id(); - if (unlikely(!cachep->nodelists[nodeid])) { - /* Node not bootstrapped yet */ - ptr = fallback_alloc(cachep, flags); - goto out; - } - if (nodeid == numa_node_id()) { /* * Use the locally cached objects if possible. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 14:49 ` Pekka J Enberg @ 2008-01-23 15:56 ` Mel Gorman 2008-01-23 17:29 ` Pekka J Enberg 2008-01-23 18:36 ` Christoph Lameter 1 sibling, 1 reply; 61+ messages in thread From: Mel Gorman @ 2008-01-23 15:56 UTC (permalink / raw) To: Pekka J Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On (23/01/08 16:49), Pekka J Enberg didst pronounce: > Hi, > > On Wed, 23 Jan 2008, Pekka J Enberg wrote: > > > I still think Christoph's kmem_getpages() patch is correct (to fix > > > cache_grow() oops) but I overlooked the fact that none the callers of > > > ____cache_alloc_node() deal with bootstrapping (with the exception of > > > __cache_alloc_node() that even has a comment about it). > > > > So something like this (totally untested) patch on top of current git: > > Sorry, removed a BUG_ON() from cache_alloc_refill() by mistake, here's a > better one: > Applied in combination with the N_NORMAL_MEMORY revert and it fails to boot. Console is as follows; Linux version 2.6.24-rc8-autokern1 (root@gekko-lp3.ltc.austin.ibm.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #2 SMP Wed Jan 23 10:37:36 EST 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Zone PFN ranges: DMA 0 -> 1048576 Normal 1048576 -> 1048576 Movable zone start PFN for each node early_node_map[1] active PFN ranges 2: 0 -> 1048576 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 1034240 Policy zone: DMA Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201101591 loglevel=8 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 238.059000 MHz time_init: processor frequency = 1904.472000 MHz clocksource: timebase mult[10cd746] shift[22] registered clockevent: decrementer mult[3cf1] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg0] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 2 Memory: 4105560k/4194304k available (5004k kernel code, 88744k reserved, 876k data, 559k bss, 272k init) Unable to handle kernel paging request for data at address 0x00000040 Faulting instruction address: 0xc0000000003c8ae8 cpu 0x0: Vector: 300 (Data Access) at [c0000000005c3840] pc: c0000000003c8ae8: __lock_text_start+0x20/0x88 lr: c0000000000dadb4: .cache_grow+0x7c/0x338 sp: c0000000005c3ac0 msr: 8000000000009032 dar: 40 dsisr: 40000000 current = 0xc000000000500f10 paca = 0xc000000000501b80 pid = 0, comm = swapper enter ? for help [c0000000005c3b40] c0000000000dadb4 .cache_grow+0x7c/0x338 [c0000000005c3c00] c0000000000db518 .fallback_alloc+0x1c0/0x224 [c0000000005c3cb0] c0000000000db920 .kmem_cache_alloc+0xe0/0x14c [c0000000005c3d50] c0000000000dcbd0 .kmem_cache_create+0x230/0x4cc [c0000000005c3e30] c0000000004c049c .kmem_cache_init+0x1ec/0x51c [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0 0xc0000000000dadb4 is in cache_grow (mm/slab.c:2782). 2777 local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); 2778 2779 /* Take the l3 list lock to change the colour_next on this node */ 2780 check_irq_off(); 2781 l3 = cachep->nodelists[nodeid]; 2782 spin_lock(&l3->list_lock); 2783 2784 /* Get colour for the slab, and cal the next value. */ 2785 offset = l3->colour_next; 2786 l3->colour_next++; -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 15:56 ` Mel Gorman @ 2008-01-23 17:29 ` Pekka J Enberg 2008-01-23 17:42 ` Pekka J Enberg ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Pekka J Enberg @ 2008-01-23 17:29 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter Hi, On Wed, 23 Jan 2008, Mel Gorman wrote: > Applied in combination with the N_NORMAL_MEMORY revert and it fails to > boot. Console is as follows; Thanks for testing! On Wed, 23 Jan 2008, Mel Gorman wrote: > [c0000000005c3b40] c0000000000dadb4 .cache_grow+0x7c/0x338 > [c0000000005c3c00] c0000000000db518 .fallback_alloc+0x1c0/0x224 > [c0000000005c3cb0] c0000000000db920 .kmem_cache_alloc+0xe0/0x14c > [c0000000005c3d50] c0000000000dcbd0 .kmem_cache_create+0x230/0x4cc > [c0000000005c3e30] c0000000004c049c .kmem_cache_init+0x1ec/0x51c > [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc > [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0 > > 0xc0000000000dadb4 is in cache_grow (mm/slab.c:2782). > 2777 local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); > 2778 > 2779 /* Take the l3 list lock to change the colour_next on this node */ > 2780 check_irq_off(); > 2781 l3 = cachep->nodelists[nodeid]; > 2782 spin_lock(&l3->list_lock); > 2783 > 2784 /* Get colour for the slab, and cal the next value. */ > 2785 offset = l3->colour_next; > 2786 l3->colour_next++; Ok, so it's too early to fallback_alloc() because in kmem_cache_init() we do: for (i = 0; i < NUM_INIT_LISTS; i++) { kmem_list3_init(&initkmem_list3[i]); if (i < MAX_NUMNODES) cache_cache.nodelists[i] = NULL; } Fine. But, why are we hitting fallback_alloc() in the first place? It's definitely not because of missing ->nodelists as we do: cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE]; before attempting to set up kmalloc caches. Now, if I understood correctly, we're booting off a memoryless node so kmem_getpages() will return NULL thus forcing us to fallback_alloc() which is unavailable at this point. As far as I can tell, there are two ways to fix this: (1) don't boot off a memoryless node (why are we doing this in the first place?) (2) initialize cache_cache.nodelists with initmem_list3 equivalents for *each node hat has normal memory* I am still wondering why this worked before, though. Pekka ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 17:29 ` Pekka J Enberg @ 2008-01-23 17:42 ` Pekka J Enberg 2008-01-23 18:51 ` Christoph Lameter 2008-01-23 19:52 ` Nishanth Aravamudan 2 siblings, 0 replies; 61+ messages in thread From: Pekka J Enberg @ 2008-01-23 17:42 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Wed, 23 Jan 2008, Pekka J Enberg wrote: > As far as I can tell, there are two ways to fix this: [snip] > (2) initialize cache_cache.nodelists with initmem_list3 equivalents > for *each node hat has normal memory* An untested patch follows: --- mm/slab.c | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) Index: linux-2.6/mm/slab.c =================================================================== --- linux-2.6.orig/mm/slab.c +++ linux-2.6/mm/slab.c @@ -304,11 +304,11 @@ struct kmem_list3 { /* * Need this for bootstrapping a per node allocator. */ -#define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1) +#define NUM_INIT_LISTS (3 * MAX_NUMNODES) struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS]; #define CACHE_CACHE 0 -#define SIZE_AC 1 -#define SIZE_L3 (1 + MAX_NUMNODES) +#define SIZE_AC MAX_NUMNODES +#define SIZE_L3 (2 * MAX_NUMNODES) static int drain_freelist(struct kmem_cache *cache, struct kmem_list3 *l3, int tofree); @@ -1410,6 +1410,22 @@ static void init_list(struct kmem_cache } /* + * For setting up all the kmem_list3s for cache whose buffer_size is same as + * size of kmem_list3. + */ +static void __init set_up_list3s(struct kmem_cache *cachep, int index) +{ + int node; + + for_each_node_state(node, N_NORMAL_MEMORY) { + cachep->nodelists[node] = &initkmem_list3[index + node]; + cachep->nodelists[node]->next_reap = jiffies + + REAPTIMEOUT_LIST3 + + ((unsigned long)cachep) % REAPTIMEOUT_LIST3; + } +} + +/* * Initialisation. Called after the page allocator have been initialised and * before smp_init(). */ @@ -1432,6 +1448,7 @@ void __init kmem_cache_init(void) if (i < MAX_NUMNODES) cache_cache.nodelists[i] = NULL; } + set_up_list3s(&cache_cache, CACHE_CACHE); /* * Fragmentation resistance on low memory - only use bigger @@ -1964,22 +1981,6 @@ static void slab_destroy(struct kmem_cac } } -/* - * For setting up all the kmem_list3s for cache whose buffer_size is same as - * size of kmem_list3. - */ -static void __init set_up_list3s(struct kmem_cache *cachep, int index) -{ - int node; - - for_each_node_state(node, N_NORMAL_MEMORY) { - cachep->nodelists[node] = &initkmem_list3[index + node]; - cachep->nodelists[node]->next_reap = jiffies + - REAPTIMEOUT_LIST3 + - ((unsigned long)cachep) % REAPTIMEOUT_LIST3; - } -} - static void __kmem_cache_destroy(struct kmem_cache *cachep) { int i; ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 17:29 ` Pekka J Enberg 2008-01-23 17:42 ` Pekka J Enberg @ 2008-01-23 18:51 ` Christoph Lameter 2008-01-23 19:52 ` Nishanth Aravamudan 2 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-23 18:51 UTC (permalink / raw) To: Pekka J Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Pekka J Enberg wrote: > Fine. But, why are we hitting fallback_alloc() in the first place? It's > definitely not because of missing ->nodelists as we do: > > cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE]; > > before attempting to set up kmalloc caches. Now, if I understood > correctly, we're booting off a memoryless node so kmem_getpages() will > return NULL thus forcing us to fallback_alloc() which is unavailable at > this point. > > As far as I can tell, there are two ways to fix this: > > (1) don't boot off a memoryless node (why are we doing this in the first > place?) Right. That is the solution that I would prefer. > (2) initialize cache_cache.nodelists with initmem_list3 equivalents > for *each node hat has normal memory* Or simply do it for all. SLAB bootstrap is very complex thing though. > > I am still wondering why this worked before, though. I doubt it did ever work for SLAB. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 17:29 ` Pekka J Enberg 2008-01-23 17:42 ` Pekka J Enberg 2008-01-23 18:51 ` Christoph Lameter @ 2008-01-23 19:52 ` Nishanth Aravamudan 2008-01-23 21:02 ` Pekka Enberg 2 siblings, 1 reply; 61+ messages in thread From: Nishanth Aravamudan @ 2008-01-23 19:52 UTC (permalink / raw) To: Pekka J Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On 23.01.2008 [19:29:15 +0200], Pekka J Enberg wrote: > Hi, > > On Wed, 23 Jan 2008, Mel Gorman wrote: > > Applied in combination with the N_NORMAL_MEMORY revert and it fails to > > boot. Console is as follows; > > Thanks for testing! > > On Wed, 23 Jan 2008, Mel Gorman wrote: > > [c0000000005c3b40] c0000000000dadb4 .cache_grow+0x7c/0x338 > > [c0000000005c3c00] c0000000000db518 .fallback_alloc+0x1c0/0x224 > > [c0000000005c3cb0] c0000000000db920 .kmem_cache_alloc+0xe0/0x14c > > [c0000000005c3d50] c0000000000dcbd0 .kmem_cache_create+0x230/0x4cc > > [c0000000005c3e30] c0000000004c049c .kmem_cache_init+0x1ec/0x51c > > [c0000000005c3ee0] c00000000049f8d8 .start_kernel+0x304/0x3fc > > [c0000000005c3f90] c000000000008594 .start_here_common+0x54/0xc0 > > > > 0xc0000000000dadb4 is in cache_grow (mm/slab.c:2782). > > 2777 local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); > > 2778 > > 2779 /* Take the l3 list lock to change the colour_next on this node */ > > 2780 check_irq_off(); > > 2781 l3 = cachep->nodelists[nodeid]; > > 2782 spin_lock(&l3->list_lock); > > 2783 > > 2784 /* Get colour for the slab, and cal the next value. */ > > 2785 offset = l3->colour_next; > > 2786 l3->colour_next++; > > Ok, so it's too early to fallback_alloc() because in kmem_cache_init() we > do: > > for (i = 0; i < NUM_INIT_LISTS; i++) { > kmem_list3_init(&initkmem_list3[i]); > if (i < MAX_NUMNODES) > cache_cache.nodelists[i] = NULL; > } > > Fine. But, why are we hitting fallback_alloc() in the first place? It's > definitely not because of missing ->nodelists as we do: > > cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE]; > > before attempting to set up kmalloc caches. Now, if I understood > correctly, we're booting off a memoryless node so kmem_getpages() will > return NULL thus forcing us to fallback_alloc() which is unavailable at > this point. > > As far as I can tell, there are two ways to fix this: > > (1) don't boot off a memoryless node (why are we doing this in the first > place?) On at least one of the machines in question, wasn't it the case that node 0 had all the memory and node 1 had all the CPUs? In that case, you would have to boot off a memoryless node? And as long as that is a physically valid configuration, the kernel should handle it. > (2) initialize cache_cache.nodelists with initmem_list3 equivalents > for *each node hat has normal memory* > > I am still wondering why this worked before, though. I bet we didn't notice this breaking because SLUB became the default and SLAB isn't on in the test.kernel.org testing, for instance. Perhaps we should add a second set of runs for some of the boxes there to run with CONFIG_SLAB on? I'm curious if we know, for sure, of a kernel with CONFIG_SLAB=y that has booted all of the boxes reporting issues? That is, did they all work with 2.6.23? Thanks, Nish ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 19:52 ` Nishanth Aravamudan @ 2008-01-23 21:02 ` Pekka Enberg 2008-01-23 21:14 ` Christoph Lameter 0 siblings, 1 reply; 61+ messages in thread From: Pekka Enberg @ 2008-01-23 21:02 UTC (permalink / raw) To: Nishanth Aravamudan Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, akpm, KAMEZAWA Hiroyuki, Christoph Lameter Hi, On Jan 23, 2008 9:52 PM, Nishanth Aravamudan <nacc@us.ibm.com> wrote: > On at least one of the machines in question, wasn't it the case that > node 0 had all the memory and node 1 had all the CPUs? In that case, you > would have to boot off a memoryless node? And as long as that is a > physically valid configuration, the kernel should handle it. Agreed. Here's the patch that should fix it: http://lkml.org/lkml/2008/1/23/332 On Jan 23, 2008 9:52 PM, Nishanth Aravamudan <nacc@us.ibm.com> wrote: > I bet we didn't notice this breaking because SLUB became the default and > SLAB isn't on in the test.kernel.org testing, for instance. Perhaps we > should add a second set of runs for some of the boxes there to run with > CONFIG_SLAB on? Sure. On Jan 23, 2008 9:52 PM, Nishanth Aravamudan <nacc@us.ibm.com> wrote: > I'm curious if we know, for sure, of a kernel with CONFIG_SLAB=y that > has booted all of the boxes reporting issues? That is, did they all work > with 2.6.23? I think Mel said that their configuration did work with 2.6.23 although I also wonder how that's possible. AFAIK there has been some changes in the page allocator that might explain this. That is, if kmem_getpages() returned pages for memoryless node before, bootstrap would have worked. Pekka ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 21:02 ` Pekka Enberg @ 2008-01-23 21:14 ` Christoph Lameter 2008-01-23 21:36 ` Nishanth Aravamudan 0 siblings, 1 reply; 61+ messages in thread From: Christoph Lameter @ 2008-01-23 21:14 UTC (permalink / raw) To: Pekka Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, Nishanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Pekka Enberg wrote: > I think Mel said that their configuration did work with 2.6.23 > although I also wonder how that's possible. AFAIK there has been some > changes in the page allocator that might explain this. That is, if > kmem_getpages() returned pages for memoryless node before, bootstrap > would have worked. Regular kmem_getpages is called with GFP_THISNODE set. There was some breakage in 2.6.22 and before with GFP_THISNODE returning pages from the wrong node if a node had no memory. So it may have worked accidentally and in an unsafe manner because the pages would have been associated with the wrong node which could trigger bug ons and locking troubles. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 21:14 ` Christoph Lameter @ 2008-01-23 21:36 ` Nishanth Aravamudan 2008-01-24 3:13 ` Christoph Lameter 0 siblings, 1 reply; 61+ messages in thread From: Nishanth Aravamudan @ 2008-01-23 21:36 UTC (permalink / raw) To: Christoph Lameter Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, akpm, KAMEZAWA Hiroyuki On 23.01.2008 [13:14:26 -0800], Christoph Lameter wrote: > On Wed, 23 Jan 2008, Pekka Enberg wrote: > > > I think Mel said that their configuration did work with 2.6.23 > > although I also wonder how that's possible. AFAIK there has been some > > changes in the page allocator that might explain this. That is, if > > kmem_getpages() returned pages for memoryless node before, bootstrap > > would have worked. > > Regular kmem_getpages is called with GFP_THISNODE set. There was some > breakage in 2.6.22 and before with GFP_THISNODE returning pages from > the wrong node if a node had no memory. So it may have worked > accidentally and in an unsafe manner because the pages would have been > associated with the wrong node which could trigger bug ons and locking > troubles. Right, so it might have functioned before, but the correctness was wobbly at best... Certainly the memoryless patch series has tightened that up, but we missed these SLAB issues. I see that your patch fixed Olaf's machine, Pekka. Nice work on everyone's part tracking this stuff down. Thanks, Nish ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 21:36 ` Nishanth Aravamudan @ 2008-01-24 3:13 ` Christoph Lameter 0 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-24 3:13 UTC (permalink / raw) To: Nishanth Aravamudan Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Nishanth Aravamudan wrote: > Right, so it might have functioned before, but the correctness was > wobbly at best... Certainly the memoryless patch series has tightened > that up, but we missed these SLAB issues. > > I see that your patch fixed Olaf's machine, Pekka. Nice work on > everyone's part tracking this stuff down. Another important result is that I found that GFP_THISNODE is actually required for proper SLAB operation and not only an optimization. Fallback can lead to very bad results. I have two customer reported instances of SLAB corruption here that can be explained now due to fallback to another node. Foreign objects enter the per cpu queue. The wrong node lock is taken during cache_flusharray(). Fields in the struct slab can become corrupted. It typically hits the list field and the inuse field. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 14:49 ` Pekka J Enberg 2008-01-23 15:56 ` Mel Gorman @ 2008-01-23 18:36 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-23 18:36 UTC (permalink / raw) To: Pekka J Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Pekka J Enberg wrote: > Furthermore, don't let kmem_getpages() call alloc_pages_node() if nodeid passed > to it is -1 as the latter will always translate that to numa_node_id() which > might not have ->nodelist that caused the invocation of fallback_alloc() in the > first place (for example, during bootstrap). kmem_getpages is called without GFP_THISNODE. This alloc_pages_node(numa_node_id(), ...) will fall back to the next node with memory. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 14:18 ` Pekka J Enberg 2008-01-23 14:32 ` Pekka J Enberg @ 2008-01-23 18:35 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-23 18:35 UTC (permalink / raw) To: Pekka J Enberg Cc: lee.schermerhorn, Olaf Hering, Linux MM, Mel Gorman, linux-kernel, linuxppc-dev, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Pekka J Enberg wrote: > I still think Christoph's kmem_getpages() patch is correct (to fix > cache_grow() oops) but I overlooked the fact that none the callers of > ____cache_alloc_node() deal with bootstrapping (with the exception of > __cache_alloc_node() that even has a comment about it). My patch is useless. kmem_getpages called with nodeid == -1 falls back correctly to the available node. The problem is that the node structures for the page does not exist. > But what I am really wondering about is, why wasn't the > N_NORMAL_MEMORY revert enough? I assume this used to work before so what > more do we need to revert for 2.6.24? I think that is because SLUB relaxed the requirements on having regular memory on the boot node. Now the expectation is that SLAB can do the same. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 13:55 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman 2008-01-23 14:18 ` Pekka J Enberg @ 2008-01-23 14:27 ` Olaf Hering 2008-01-23 14:42 ` Mel Gorman 2008-01-23 18:41 ` Christoph Lameter 2 siblings, 1 reply; 61+ messages in thread From: Olaf Hering @ 2008-01-23 14:27 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, linuxppc-dev, linux-kernel, Linux MM, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On Wed, Jan 23, Mel Gorman wrote: > This patch in combination with a partial revert of commit > 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 fixes a regression between 2.6.23 > and 2.6.24-rc8 where a PPC64 machine with all CPUS on a memoryless node fails > to boot. If approved by the SLAB maintainers, it should be merged for 2.6.24. This change alone does not help, its not the version I tested. Will all the changes below go into 2.6.24 as well, in a seperate patch? - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 14:27 ` Olaf Hering @ 2008-01-23 14:42 ` Mel Gorman 0 siblings, 0 replies; 61+ messages in thread From: Mel Gorman @ 2008-01-23 14:42 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, linuxppc-dev, linux-kernel, Linux MM, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On (23/01/08 15:27), Olaf Hering didst pronounce: > On Wed, Jan 23, Mel Gorman wrote: > > > This patch in combination with a partial revert of commit > > 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 fixes a regression between 2.6.23 > > and 2.6.24-rc8 where a PPC64 machine with all CPUS on a memoryless node fails > > to boot. If approved by the SLAB maintainers, it should be merged for 2.6.24. > > This change alone does not help, its not the version I tested. > Will all the changes below go into 2.6.24 as well, in a seperate patch? > > - for_each_node_state(node, N_NORMAL_MEMORY) { > + for_each_online_node(node) { Those changes are already in a separate patch and have been sent. I don't see it in git yet but it should be on the way. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node 2008-01-23 13:55 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman 2008-01-23 14:18 ` Pekka J Enberg 2008-01-23 14:27 ` Olaf Hering @ 2008-01-23 18:41 ` Christoph Lameter 2 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-23 18:41 UTC (permalink / raw) To: Mel Gorman Cc: lee.schermerhorn, Olaf Hering, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki On Wed, 23 Jan 2008, Mel Gorman wrote: > This patch adds the necessary checks to make sure a kmem_list3 exists for > the preferred node used when growing the cache. If the preferred node has > no nodelist then the currently running node is used instead. This > problem only affects the SLAB allocator, SLUB appears to work fine. That is a dangerous thing to do. SLAB per cpu queues will contain foreign objects which may cause troubles when pushing the objects back. I think we may be lucky that these objects are consumed at boot. If all of the foreign objects are consumed at boot then we are fine. At least an explanation as to this issue should be added to the patch. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-23 12:14 ` Olaf Hering 2008-01-23 12:52 ` Olaf Hering @ 2008-01-23 13:41 ` Mel Gorman 1 sibling, 0 replies; 61+ messages in thread From: Mel Gorman @ 2008-01-23 13:41 UTC (permalink / raw) To: Olaf Hering Cc: lee.schermerhorn, Linux MM, linux-kernel, linuxppc-dev, Pekka Enberg, Aneesh Kumar K.V, hanth Aravamudan, akpm, KAMEZAWA Hiroyuki, Christoph Lameter On (23/01/08 13:14), Olaf Hering didst pronounce: > On Wed, Jan 23, Mel Gorman wrote: > > > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the > > following patch against 2.6.24-rc8 please? It contains the debug information > > that helped me figure out what was going wrong on the PPC64 machine here, > > the revert and the !l3 checks (i.e. the two patches that made machines I > > have access to work). Thanks > > It boots with your change. > ....... Nice one! As the only addition here is debugging output, I can only assume that the two patches were being booted in isolation instead of combination earlier. The two threads have been a little confused with hand waving so that can easily happen. Looking at your log; > early_node_map[1] active PFN ranges > 1: 0 -> 892928 All memory on node 1 > Online nodes > o 0 > o 1 > Nodes with regular memory > o 1 > Current running CPU 0 is associated with node 0 > Current node is 0 Running CPU associated with node 0 so other than being node 1 instead of node 2, your machine is similar to the one I had the problem on in terms of memoryless nodes and CPU configuration. > VFS: Cannot open root device "<NULL>" or unknown-block(0,0) > Please append a correct "root=" boot option; here are the available partitions: > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) > Rebooting in 1 seconds.. > I see it failed to complete boot but I'm going to assume this is a relatively normal commane-line, .config or initrd problem and not a regression of some type. I'll post a patch suitable for pick-up shortly. The two patches ran in combination with CONFIG_DEBUG_SLAB a compile-based stress tests without difficulty so hopefully there is not new surprises hiding in the corners. Thanks Olaf. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 21:15 ` Olaf Hering 2008-01-18 6:56 ` Olaf Hering 2008-01-18 18:47 ` Christoph Lameter @ 2008-01-18 18:51 ` Christoph Lameter 2 siblings, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-18 18:51 UTC (permalink / raw) To: Olaf Hering Cc: Mel Gorman, linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM On Thu, 17 Jan 2008, Olaf Hering wrote: > Normal 892928 -> 892928 > Movable zone start PFN for each node > early_node_map[1] active PFN ranges > 1: 0 -> 892928 > Could not find start_pfn for node 0 We only have a single node that is node 1? And then we initialize nodes 0 to 3? > Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) > cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0 > cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0 > cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0 > cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0 ??? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: crash in kmem_cache_init 2008-01-17 18:12 ` Olaf Hering 2008-01-17 18:58 ` Christoph Lameter @ 2008-01-17 19:03 ` Christoph Lameter 1 sibling, 0 replies; 61+ messages in thread From: Christoph Lameter @ 2008-01-17 19:03 UTC (permalink / raw) To: Olaf Hering; +Cc: linuxppc-dev, Pekka Enberg, linux-kernel, Linux MM Could you try Pekka's suggestion of reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 ? ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2008-01-24 3:13 UTC | newest] Thread overview: 61+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-15 15:09 crash in kmem_cache_init Olaf Hering 2008-01-15 15:58 ` Olaf Hering 2008-01-17 12:14 ` Pekka Enberg 2008-01-17 14:30 ` Christoph Lameter 2008-01-17 18:12 ` Olaf Hering 2008-01-17 18:58 ` Christoph Lameter 2008-01-17 19:54 ` Olaf Hering 2008-01-17 20:20 ` Olaf Hering 2008-01-19 4:56 ` Christoph Lameter 2008-01-17 21:15 ` Olaf Hering 2008-01-18 6:56 ` Olaf Hering 2008-01-18 18:42 ` Christoph Lameter 2008-01-19 4:55 ` Christoph Lameter 2008-01-18 18:47 ` Christoph Lameter 2008-01-18 21:30 ` Mel Gorman 2008-01-18 21:43 ` Christoph Lameter 2008-01-18 22:16 ` Christoph Lameter 2008-01-18 22:19 ` Nish Aravamudan 2008-01-18 22:38 ` Christoph Lameter 2008-01-18 22:57 ` Olaf Hering 2008-01-22 19:54 ` Mel Gorman 2008-01-22 20:11 ` Christoph Lameter 2008-01-22 21:26 ` Mel Gorman 2008-01-22 21:34 ` Christoph Lameter 2008-01-22 22:50 ` Mel Gorman 2008-01-22 22:57 ` Christoph Lameter 2008-01-22 23:10 ` Mel Gorman 2008-01-22 23:14 ` Christoph Lameter 2008-01-22 22:59 ` Pekka Enberg 2008-01-22 23:12 ` Christoph Lameter 2008-01-22 23:18 ` Christoph Lameter 2008-01-23 8:19 ` Pekka Enberg 2008-01-23 8:40 ` Olaf Hering 2008-01-22 21:45 ` Olaf Hering 2008-01-22 22:12 ` Nish Aravamudan 2008-01-22 22:23 ` Christoph Lameter 2008-01-23 7:58 ` Olaf Hering 2008-01-23 10:50 ` Mel Gorman 2008-01-23 12:14 ` Olaf Hering 2008-01-23 12:52 ` Olaf Hering 2008-01-23 13:55 ` [PATCH] Fix boot problem in situations where the boot CPU is running on a memoryless node Mel Gorman 2008-01-23 14:18 ` Pekka J Enberg 2008-01-23 14:32 ` Pekka J Enberg 2008-01-23 14:49 ` Pekka J Enberg 2008-01-23 15:56 ` Mel Gorman 2008-01-23 17:29 ` Pekka J Enberg 2008-01-23 17:42 ` Pekka J Enberg 2008-01-23 18:51 ` Christoph Lameter 2008-01-23 19:52 ` Nishanth Aravamudan 2008-01-23 21:02 ` Pekka Enberg 2008-01-23 21:14 ` Christoph Lameter 2008-01-23 21:36 ` Nishanth Aravamudan 2008-01-24 3:13 ` Christoph Lameter 2008-01-23 18:36 ` Christoph Lameter 2008-01-23 18:35 ` Christoph Lameter 2008-01-23 14:27 ` Olaf Hering 2008-01-23 14:42 ` Mel Gorman 2008-01-23 18:41 ` Christoph Lameter 2008-01-23 13:41 ` crash in kmem_cache_init Mel Gorman 2008-01-18 18:51 ` Christoph Lameter 2008-01-17 19:03 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).