public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* sparsemem panic in 2.6.17-rc5-mm1 and -mm2
@ 2006-06-06  0:51 Martin Bligh
  2006-06-06  3:07 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Martin Bligh @ 2006-06-06  0:51 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: LKML, Andrew Morton

http://test.kernel.org/abat/34264/debug/console.log

Only seems to happen on the sparsemem runs. Possibly a side-effect
of the page migration stuff, manifesting itself differently?
Or maybe not?

Out of Memory: Kill process 1 (idle) score 0 and children.
divide error: 0000 [#1]
SMP
last sysfs file:
CPU:    0
EIP:    0060:[<c013be6a>]    Not tainted VLI
EFLAGS: 00010246   (2.6.17-rc5-mm2-autokern1 #1)
EIP is at shrink_active_list+0x5b/0x382
eax: 00000000   ebx: 00000064   ecx: 00000000   edx: 00000000
esi: c0474500   edi: c03a8e64   ebp: c03a8dbc   esp: c03a8d9c
ds: 007b   es: 007b   ss: 0068
Process idle (pid: 1, threadinfo=c03a8000 task=c0769000)
Stack: 00000000 00000000 00000020 00000004 c03a8dac c03a8dac c03a8db4 
c03a8db4
        c03a8dbc c03a8dbc c03a8dfc c0137a14 00000000 c0137a37 c03a8df8 
c03a8df4
        c0474000 00000000 00000000 c03a8e64 c0137c5a 00000000 00028028 
000dc0e0
Call Trace:
  <c0137a14> get_writeback_state+0x30/0x35  <c0137a37> 
get_dirty_limits+0x1e/0xc4
  <c0137c5a> throttle_vm_writeout+0x18/0x53  <c013c221> 
shrink_zone+0x90/0xc1
  <c013c29f> shrink_zones+0x4d/0x5e  <c013c39d> try_to_free_pages+0xed/0x1a8
  <c0136a91> __alloc_pages+0x16e/0x26a  <c014e6c9> kmem_getpages+0x5b/0xac
  <c014f42c> cache_grow+0xb5/0x147  <c014f655> 
cache_alloc_refill+0x197/0x1d3
  <c014fad0> kmem_cache_alloc+0x4f/0x5e  <c0276dd8> sk_alloc+0x15/0x63
  <c02bb9e0> inet_create+0xfb/0x21a  <c027546d> __sock_create+0xc0/0xea
  <c02754b0> sock_create_kern+0xb/0xe  <c03c413b> icmp_init+0x3a/0xc3
  <c03c445c> inet_init+0x12b/0x174  <c03aa7f6> do_initcalls+0x53/0xe4
  <c01320d8> register_irq_proc+0x6a/0x90  <c0180000> 
xlate_proc_name+0x87/0x90
  <c0100349> init+0x41/0xdc  <c0100308> init+0x0/0xdc
  <c01009d5> kernel_thread_helper+0x5/0xb
Code: 04 24 00 00 00 00 8d 44 24 10 89 44 24 10 89 44 24 14 83 79 10 00 
74 38 8b 8a bc 01 00 00 6b 47 04 64 bb 64 00 00 00 31 d2 d3 fb <f7> 35 
0c 6f 45 c0 ba 02 00 00 00 89 d1 99 f7 f9 01 d8 03 47 18
EIP: [<c013be6a>] shrink_active_list+0x5b/0x382 SS:ESP 0068:c03a8d9c

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  0:51 sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Martin Bligh
@ 2006-06-06  3:07 ` Andrew Morton
  2006-06-06  5:19   ` KAMEZAWA Hiroyuki
  2006-06-06 23:42 ` Andrew Morton
  2006-06-07 17:41 ` Andy Whitcroft
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2006-06-06  3:07 UTC (permalink / raw)
  To: Martin Bligh; +Cc: apw, linux-kernel

On Mon, 05 Jun 2006 17:51:00 -0700
Martin Bligh <mbligh@google.com> wrote:

> http://test.kernel.org/abat/34264/debug/console.log
> 
> Only seems to happen on the sparsemem runs. Possibly a side-effect
> of the page migration stuff, manifesting itself differently?
> Or maybe not?
> 
> Out of Memory: Kill process 1 (idle) score 0 and children.
> divide error: 0000 [#1]
> SMP
> last sysfs file:
> CPU:    0
> EIP:    0060:[<c013be6a>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.17-rc5-mm2-autokern1 #1)
> EIP is at shrink_active_list+0x5b/0x382
> eax: 00000000   ebx: 00000064   ecx: 00000000   edx: 00000000
> esi: c0474500   edi: c03a8e64   ebp: c03a8dbc   esp: c03a8d9c
> ds: 007b   es: 007b   ss: 0068
> Process idle (pid: 1, threadinfo=c03a8000 task=c0769000)
> Stack: 00000000 00000000 00000020 00000004 c03a8dac c03a8dac c03a8db4 
> c03a8db4
>         c03a8dbc c03a8dbc c03a8dfc c0137a14 00000000 c0137a37 c03a8df8 
> c03a8df4
>         c0474000 00000000 00000000 c03a8e64 c0137c5a 00000000 00028028 
> 000dc0e0
> Call Trace:
>   <c0137a14> get_writeback_state+0x30/0x35  <c0137a37> 
> get_dirty_limits+0x1e/0xc4
>   <c0137c5a> throttle_vm_writeout+0x18/0x53  <c013c221> 
> shrink_zone+0x90/0xc1
>   <c013c29f> shrink_zones+0x4d/0x5e  <c013c39d> try_to_free_pages+0xed/0x1a8
>   <c0136a91> __alloc_pages+0x16e/0x26a  <c014e6c9> kmem_getpages+0x5b/0xac
>   <c014f42c> cache_grow+0xb5/0x147  <c014f655> 
> cache_alloc_refill+0x197/0x1d3
>   <c014fad0> kmem_cache_alloc+0x4f/0x5e  <c0276dd8> sk_alloc+0x15/0x63
>   <c02bb9e0> inet_create+0xfb/0x21a  <c027546d> __sock_create+0xc0/0xea
>   <c02754b0> sock_create_kern+0xb/0xe  <c03c413b> icmp_init+0x3a/0xc3
>   <c03c445c> inet_init+0x12b/0x174  <c03aa7f6> do_initcalls+0x53/0xe4
>   <c01320d8> register_irq_proc+0x6a/0x90  <c0180000> 
> xlate_proc_name+0x87/0x90
>   <c0100349> init+0x41/0xdc  <c0100308> init+0x0/0xdc
>   <c01009d5> kernel_thread_helper+0x5/0xb
> Code: 04 24 00 00 00 00 8d 44 24 10 89 44 24 10 89 44 24 14 83 79 10 00 
> 74 38 8b 8a bc 01 00 00 6b 47 04 64 bb 64 00 00 00 31 d2 d3 fb <f7> 35 
> 0c 6f 45 c0 ba 02 00 00 00 89 d1 99 f7 f9 01 d8 03 47 18
> EIP: [<c013be6a>] shrink_active_list+0x5b/0x382 SS:ESP 0068:c03a8d9c

rofl.  Certainly someone's broken something.  I assume the divide-by-zero
is due to total_memory being zero.

We shouldn't be running kswapd_init() as an initcall because sometimes when
things are broken we will run page reclaim during boot.

So I'd assume there's something wrong in the memory setup which is causing
us to enter page reclaim far too early.

Something like this should prevent the immediate oops...

diff -puN mm/vmscan.c~run-kswapd_init-earlier mm/vmscan.c
--- devel/mm/vmscan.c~run-kswapd_init-earlier	2006-06-05 20:02:53.000000000 -0700
+++ devel-akpm/mm/vmscan.c	2006-06-05 20:03:12.000000000 -0700
@@ -1346,7 +1346,7 @@ static int cpu_callback(struct notifier_
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
-static int __init kswapd_init(void)
+int __init kswapd_init(void)
 {
 	pg_data_t *pgdat;
 
@@ -1365,8 +1365,6 @@ static int __init kswapd_init(void)
 	return 0;
 }
 
-module_init(kswapd_init)
-
 #ifdef CONFIG_NUMA
 /*
  * Zone reclaim mode
diff -puN include/linux/swap.h~run-kswapd_init-earlier include/linux/swap.h
--- devel/include/linux/swap.h~run-kswapd_init-earlier	2006-06-05 20:02:53.000000000 -0700
+++ devel-akpm/include/linux/swap.h	2006-06-05 20:03:51.000000000 -0700
@@ -176,6 +176,7 @@ extern unsigned long try_to_free_pages(s
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
+extern int kswapd_init(void);
 
 /* possible outcome of pageout() */
 typedef enum {
diff -puN init/main.c~run-kswapd_init-earlier init/main.c
--- devel/init/main.c~run-kswapd_init-earlier	2006-06-05 20:02:53.000000000 -0700
+++ devel-akpm/init/main.c	2006-06-05 20:04:15.000000000 -0700
@@ -50,6 +50,7 @@
 #include <linux/mempolicy.h>
 #include <linux/key.h>
 #include <linux/root_dev.h>
+#include <linux/swap.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -523,6 +524,7 @@ asmlinkage void __init start_kernel(void
 	kmem_cache_init();
 	setup_per_cpu_pageset();
 	numa_policy_init();
+	kswapd_init();
 	if (late_time_init)
 		late_time_init();
 	calibrate_delay();
_


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
@ 2006-06-06  3:50 Chuck Ebbert
  0 siblings, 0 replies; 21+ messages in thread
From: Chuck Ebbert @ 2006-06-06  3:50 UTC (permalink / raw)
  To: Martin Bligh; +Cc: Andy Whitcroft, Andrew Morton, linux-kernel, Yasunori Goto

In-Reply-To: <4484D174.7080902@google.com>

On Mon, 05 Jun 2006 17:51:00 -0700, Martin Bligh wrote:

> http://test.kernel.org/abat/34264/debug/console.log
> 
> Only seems to happen on the sparsemem runs. Possibly a side-effect
> of the page migration stuff, manifesting itself differently?
> Or maybe not?
> 
> Out of Memory: Kill process 1 (idle) score 0 and children.
> divide error: 0000 [#1]
> SMP
> last sysfs file:
> CPU:    0
> EIP:    0060:[<c013be6a>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.17-rc5-mm2-autokern1 #1)
> EIP is at shrink_active_list+0x5b/0x382
> eax: 00000000   ebx: 00000064   ecx: 00000000   edx: 00000000
> esi: c0474500   edi: c03a8e64   ebp: c03a8dbc   esp: c03a8d9c
> ds: 007b   es: 007b   ss: 0068
> Process idle (pid: 1, threadinfo=c03a8000 task=c0769000)
> Stack: 00000000 00000000 00000020 00000004 c03a8dac c03a8dac c03a8db4 c03a8db4
>         c03a8dbc c03a8dbc c03a8dfc c0137a14 00000000 c0137a37 c03a8df8 c03a8df4
>         c0474000 00000000 00000000 c03a8e64 c0137c5a 00000000 00028028 000dc0e0
> Call Trace:
>   <c0137a14> get_writeback_state+0x30/0x35  <c0137a37> get_dirty_limits+0x1e/0xc4
>   <c0137c5a> throttle_vm_writeout+0x18/0x53  <c013c221> shrink_zone+0x90/0xc1
>   <c013c29f> shrink_zones+0x4d/0x5e  <c013c39d> try_to_free_pages+0xed/0x1a8
>   <c0136a91> __alloc_pages+0x16e/0x26a  <c014e6c9> kmem_getpages+0x5b/0xac
>   <c014f42c> cache_grow+0xb5/0x147  <c014f655> cache_alloc_refill+0x197/0x1d3
>   <c014fad0> kmem_cache_alloc+0x4f/0x5e  <c0276dd8> sk_alloc+0x15/0x63
>   <c02bb9e0> inet_create+0xfb/0x21a  <c027546d> __sock_create+0xc0/0xea
>   <c02754b0> sock_create_kern+0xb/0xe  <c03c413b> icmp_init+0x3a/0xc3
>   <c03c445c> inet_init+0x12b/0x174  <c03aa7f6> do_initcalls+0x53/0xe4
>   <c01320d8> register_irq_proc+0x6a/0x90  <c0180000> xlate_proc_name+0x87/0x90
>   <c0100349> init+0x41/0xdc  <c0100308> init+0x0/0xdc
>   <c01009d5> kernel_thread_helper+0x5/0xb
> Code: 04 24 00 00 00 00 8d 44 24 10 89 44 24 10 89 44 24 14 83 79 10 00 
> 74 38 8b 8a bc 01 00 00 6b 47 04 64 bb 64 00 00 00 31 d2 d3 fb <f7> 35 
> 0c 6f 45 c0 ba 02 00 00 00 89 d1 99 f7 f9 01 d8 03 47 18
> EIP: [<c013be6a>] shrink_active_list+0x5b/0x382 SS:ESP 0068:c03a8d9c


static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
                                struct scan_control *sc)
{
        unsigned long pgmoved;
        int pgdeactivate = 0;
        unsigned long pgscanned;
        LIST_HEAD(l_hold);      /* The pages which were snipped off */
        LIST_HEAD(l_inactive);  /* Pages to go onto the inactive_list */
        LIST_HEAD(l_active);    /* Pages to go onto the active_list */
        struct page *page;
        struct pagevec pvec;
        int reclaim_mapped = 0;

        if (sc->may_swap) {
                long mapped_ratio;
                long distress;
                long swap_tendency;

                /*
                 * `distress' is a measure of how much trouble we're having
                 * reclaiming pages.  0 -> no problems.  100 -> great trouble.
                 */
                distress = 100 >> zone->prev_priority;

                /*
                 * The point of this algorithm is to decide when to start
                 * reclaiming mapped memory instead of just pagecache.  Work out
                 * how much memory
                 * is mapped.
                 */
====>           mapped_ratio = (sc->nr_mapped * 100) / total_memory;


total_memory == 0 here

-- 
Chuck


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  3:07 ` Andrew Morton
@ 2006-06-06  5:19   ` KAMEZAWA Hiroyuki
  2006-06-06  5:36     ` Yasunori Goto
  0 siblings, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-06  5:19 UTC (permalink / raw)
  To: mbligh; +Cc: akpm, apw, linux-kernel

On Mon, 5 Jun 2006 20:07:27 -0700
Andrew Morton <akpm@osdl.org> wrote:

> On Mon, 05 Jun 2006 17:51:00 -0700
> Martin Bligh <mbligh@google.com> wrote:
> 
> > http://test.kernel.org/abat/34264/debug/console.log
> > 
> > Only seems to happen on the sparsemem runs. Possibly a side-effect
> > of the page migration stuff, manifesting itself differently?
> > Or maybe not?
> > 
> > Out of Memory: Kill process 1 (idle) score 0 and children.
> > divide error: 0000 [#1]
> > SMP
> > last sysfs file:
> > CPU:    0
> > EIP:    0060:[<c013be6a>]    Not tainted VLI
> > EFLAGS: 00010246   (2.6.17-rc5-mm2-autokern1 #1)
> > EIP is at shrink_active_list+0x5b/0x382
> > eax: 00000000   ebx: 00000064   ecx: 00000000   edx: 00000000
> > esi: c0474500   edi: c03a8e64   ebp: c03a8dbc   esp: c03a8d9c
> > ds: 007b   es: 007b   ss: 0068
> > Process idle (pid: 1, threadinfo=c03a8000 task=c0769000)
> > Stack: 00000000 00000000 00000020 00000004 c03a8dac c03a8dac c03a8db4 
> > c03a8db4
> >         c03a8dbc c03a8dbc c03a8dfc c0137a14 00000000 c0137a37 c03a8df8 
> > c03a8df4
> >         c0474000 00000000 00000000 c03a8e64 c0137c5a 00000000 00028028 
> > 000dc0e0
> > Call Trace:
> >   <c0137a14> get_writeback_state+0x30/0x35  <c0137a37> 
> > get_dirty_limits+0x1e/0xc4
> >   <c0137c5a> throttle_vm_writeout+0x18/0x53  <c013c221> 
> > shrink_zone+0x90/0xc1
> >   <c013c29f> shrink_zones+0x4d/0x5e  <c013c39d> try_to_free_pages+0xed/0x1a8
> >   <c0136a91> __alloc_pages+0x16e/0x26a  <c014e6c9> kmem_getpages+0x5b/0xac
> >   <c014f42c> cache_grow+0xb5/0x147  <c014f655> 
> > cache_alloc_refill+0x197/0x1d3
> >   <c014fad0> kmem_cache_alloc+0x4f/0x5e  <c0276dd8> sk_alloc+0x15/0x63
> >   <c02bb9e0> inet_create+0xfb/0x21a  <c027546d> __sock_create+0xc0/0xea
> >   <c02754b0> sock_create_kern+0xb/0xe  <c03c413b> icmp_init+0x3a/0xc3
> >   <c03c445c> inet_init+0x12b/0x174  <c03aa7f6> do_initcalls+0x53/0xe4
> >   <c01320d8> register_irq_proc+0x6a/0x90  <c0180000> 
> > xlate_proc_name+0x87/0x90
> >   <c0100349> init+0x41/0xdc  <c0100308> init+0x0/0xdc
> >   <c01009d5> kernel_thread_helper+0x5/0xb
> > Code: 04 24 00 00 00 00 8d 44 24 10 89 44 24 10 89 44 24 14 83 79 10 00 
> > 74 38 8b 8a bc 01 00 00 6b 47 04 64 bb 64 00 00 00 31 d2 d3 fb <f7> 35 
> > 0c 6f 45 c0 ba 02 00 00 00 89 d1 99 f7 f9 01 d8 03 47 18
> > EIP: [<c013be6a>] shrink_active_list+0x5b/0x382 SS:ESP 0068:c03a8d9c
> 
> rofl.  Certainly someone's broken something.  I assume the divide-by-zero
> is due to total_memory being zero.
> 
> We shouldn't be running kswapd_init() as an initcall because sometimes when
> things are broken we will run page reclaim during boot.
> 
> So I'd assume there's something wrong in the memory setup which is causing
> us to enter page reclaim far too early.
> 

I looked back into 2.6.15, 2.6.16. 
It looks -mm's time of initialization of "total_memory" is not changed from them.
(yes, Andrew's fix looks sane.)

I'm intersted in the following texts in the log.
==
Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
Node 0 DMA32: empty
Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
Node 0 HighMem: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 2*2048kB 3962*4096kB = 16233724kB
Node 1 DMA: empty
Node 1 DMA32: empty
Node 1 Normal: empty
Node 1 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
Node 2 DMA: empty
Node 2 DMA32: empty
Node 2 Normal: empty
Node 2 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
Node 3 DMA: empty
Node 3 DMA32: empty
Node 3 Normal: empty
Node 3 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 3811*4096kB = 15611532kB
==
Looks 64GB memory. but there are only HIGHMEM, no NORMAL, DMA. so, shrink_zone() worked.

Martin, could you show memory layout of this host ?

-Kame










^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  5:19   ` KAMEZAWA Hiroyuki
@ 2006-06-06  5:36     ` Yasunori Goto
  2006-06-06  7:27       ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Yasunori Goto @ 2006-06-06  5:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: mbligh, akpm, apw, linux-kernel, Chuck Ebbert


> I looked back into 2.6.15, 2.6.16. 
> It looks -mm's time of initialization of "total_memory" is not changed from them.
> (yes, Andrew's fix looks sane.)
> 
> I'm intersted in the following texts in the log.
> ==
> Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> Node 0 DMA32: empty
> Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> Node 0 HighMem: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 2*2048kB 3962*4096kB = 16233724kB
> Node 1 DMA: empty
> Node 1 DMA32: empty
> Node 1 Normal: empty
> Node 1 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
> Node 2 DMA: empty
> Node 2 DMA32: empty
> Node 2 Normal: empty
> Node 2 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
> Node 3 DMA: empty
> Node 3 DMA32: empty
> Node 3 Normal: empty
> Node 3 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 3811*4096kB = 15611532kB
> ==
> Looks 64GB memory. but there are only HIGHMEM, no NORMAL, DMA. so, shrink_zone() worked.

Its log shows there are some memory in DMA and NORMAL just immediately
before that.....

> Active:2 inactive:15 dirty:0 writeback:0 unstable:0 free:16287272 slab:1823 mapped:0 pagetables:0
> Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
> Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
> Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:385024kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0

It looks like that something wasted all of DMA(16MB) and NORMAL(385MB)
zone suddenly. Hmmm...

Bye.

-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  5:36     ` Yasunori Goto
@ 2006-06-06  7:27       ` Andrew Morton
  2006-06-07  0:43         ` KAMEZAWA Hiroyuki
  2006-06-07  9:26         ` Andy Whitcroft
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2006-06-06  7:27 UTC (permalink / raw)
  To: Yasunori Goto; +Cc: kamezawa.hiroyu, mbligh, apw, linux-kernel, 76306.1226

On Tue, 06 Jun 2006 14:36:14 +0900
Yasunori Goto <y-goto@jp.fujitsu.com> wrote:

> 
> > I looked back into 2.6.15, 2.6.16. 
> > It looks -mm's time of initialization of "total_memory" is not changed from them.
> > (yes, Andrew's fix looks sane.)
> > 
> > I'm intersted in the following texts in the log.
> > ==
> > Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > Node 0 DMA32: empty
> > Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
> > Node 0 HighMem: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 2*2048kB 3962*4096kB = 16233724kB
> > Node 1 DMA: empty
> > Node 1 DMA32: empty
> > Node 1 Normal: empty
> > Node 1 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
> > Node 2 DMA: empty
> > Node 2 DMA32: empty
> > Node 2 Normal: empty
> > Node 2 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
> > Node 3 DMA: empty
> > Node 3 DMA32: empty
> > Node 3 Normal: empty
> > Node 3 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 3811*4096kB = 15611532kB
> > ==
> > Looks 64GB memory. but there are only HIGHMEM, no NORMAL, DMA. so, shrink_zone() worked.
> 
> Its log shows there are some memory in DMA and NORMAL just immediately
> before that.....
> 
> > Active:2 inactive:15 dirty:0 writeback:0 unstable:0 free:16287272 slab:1823 mapped:0 pagetables:0
> > Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> > Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> > Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:385024kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> 
> It looks like that something wasted all of DMA(16MB) and NORMAL(385MB)
> zone suddenly. Hmmm...
> 

I tried sparsemem on my little x86 box here.  Boots OK, after fixing up the
kswapd_init() patch (below).

I'm wondering why I have 4k of highmem:

MemTotal:       898200 kB
MemFree:        832936 kB
Buffers:          8824 kB
Cached:          30140 kB
SwapCached:          0 kB
Active:          25052 kB
Inactive:        20800 kB
HighTotal:           4 kB
HighFree:            4 kB
LowTotal:       898196 kB
LowFree:        832932 kB
SwapTotal:     1020116 kB
SwapFree:      1020116 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:          10340 kB
Slab:            10252 kB
CommitLimit:   1469216 kB
Committed_AS:    15496 kB
PageTables:        528 kB
VmallocTotal:   114680 kB
VmallocUsed:       648 kB
VmallocChunk:   113980 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     4096 kB

The dmesg is at http://www.zip.com.au/~akpm/linux/patches/stuff/log-vmm. 
The machine has 900MB of memory (9*128M).


<enables UNALIGNED_ZONE_BOUNDARIES like the nice message says>
<http://www.zip.com.au/~akpm/linux/patches/stuff/log-vmm-2>

Nope, I still have a 4k highmem zone.



btw Andy, that UNALIGNED_ZONE_BOUNDARIES message is useless.  Only 0.1% of
users even have the knowledge how to recompile their kernel, let alone the
inclination.  Can we do something smarter here?

<goes off to use his one-page highmem zone for something>



--- devel/mm/vmscan.c~initialise-total_memory-earlier	2006-06-05 23:59:50.000000000 -0700
+++ devel-akpm/mm/vmscan.c	2006-06-06 00:00:59.000000000 -0700
@@ -111,7 +111,7 @@ struct shrinker {
  * From 0 .. 100.  Higher means more swappy.
  */
 int vm_swappiness = 60;
-static long total_memory;
+long total_memory;
 
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
@@ -1499,7 +1499,6 @@ static int __init kswapd_init(void)
 	for_each_online_node(nid)
  		kswapd_run(nid);
 
-	total_memory = nr_free_pagecache_pages();
 	hotcpu_notifier(cpu_callback, 0);
 	return 0;
 }
diff -puN mm/page_alloc.c~initialise-total_memory-earlier mm/page_alloc.c
--- devel/mm/page_alloc.c~initialise-total_memory-earlier	2006-06-06 00:00:13.000000000 -0700
+++ devel-akpm/mm/page_alloc.c	2006-06-06 00:01:28.000000000 -0700
@@ -1725,9 +1725,9 @@ void __meminit build_all_zonelists(void)
 		stop_machine_run(__build_all_zonelists, NULL, NR_CPUS);
 		/* cpuset refresh routine should be here */
 	}
-
-	printk("Built %i zonelists\n", num_online_nodes());
-
+	total_memory = nr_free_pagecache_pages();
+	printk("Built %i zonelists.  Total memory: %ld pages\n",
+			num_online_nodes(), total_memory);
 }
 
 /*
diff -puN include/linux/swap.h~initialise-total_memory-earlier include/linux/swap.h
--- devel/include/linux/swap.h~initialise-total_memory-earlier	2006-06-06 00:00:44.000000000 -0700
+++ devel-akpm/include/linux/swap.h	2006-06-06 00:00:56.000000000 -0700
@@ -185,6 +185,7 @@ extern unsigned long try_to_free_pages(s
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
+extern long total_memory;
 
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
_


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  0:51 sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Martin Bligh
  2006-06-06  3:07 ` Andrew Morton
@ 2006-06-06 23:42 ` Andrew Morton
  2006-06-07  9:16   ` Mel Gorman
  2006-06-07 17:38   ` Andy Whitcroft
  2006-06-07 17:41 ` Andy Whitcroft
  2 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2006-06-06 23:42 UTC (permalink / raw)
  To: Martin Bligh; +Cc: apw, linux-kernel

On Mon, 05 Jun 2006 17:51:00 -0700
Martin Bligh <mbligh@google.com> wrote:

> http://test.kernel.org/abat/34264/debug/console.log

What sort of machine is this, anyway?

> WARNING: Not an IBM x440/NUMAQ and CONFIG_NUMA enabled!

And is it expected that ZONE_NORMAL only has 384MB?  That seems awfully low
for a 64GB x86 machine.  Could be that we went oom because we chose to
allocate really big hash tables, based on the total amount of memory?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  7:27       ` Andrew Morton
@ 2006-06-07  0:43         ` KAMEZAWA Hiroyuki
  2006-06-07  4:58           ` Andrew Morton
  2006-06-07  9:26         ` Andy Whitcroft
  1 sibling, 1 reply; 21+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-07  0:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: y-goto, mbligh, apw, linux-kernel, 76306.1226

On Tue, 6 Jun 2006 00:27:58 -0700
Andrew Morton <akpm@osdl.org> wrote:

> 
> I tried sparsemem on my little x86 box here.  Boots OK, after fixing up the
> kswapd_init() patch (below).
> 
> I'm wondering why I have 4k of highmem:
> 

Could you show /proc/iomem of your 4k HIGHMEM box ?
Does 4k HIGHMEM exist only when SPARSEMEM is selected ?

Thanks,
-Kame






> MemTotal:       898200 kB
> MemFree:        832936 kB
> Buffers:          8824 kB
> Cached:          30140 kB
> SwapCached:          0 kB
> Active:          25052 kB
> Inactive:        20800 kB
> HighTotal:           4 kB
> HighFree:            4 kB
> LowTotal:       898196 kB
> LowFree:        832932 kB
> SwapTotal:     1020116 kB
> SwapFree:      1020116 kB
> Dirty:               0 kB
> Writeback:           0 kB
> Mapped:          10340 kB
> Slab:            10252 kB
> CommitLimit:   1469216 kB
> Committed_AS:    15496 kB
> PageTables:        528 kB
> VmallocTotal:   114680 kB
> VmallocUsed:       648 kB
> VmallocChunk:   113980 kB
> HugePages_Total:     0
> HugePages_Free:      0
> HugePages_Rsvd:      0
> Hugepagesize:     4096 kB
> 
> The dmesg is at http://www.zip.com.au/~akpm/linux/patches/stuff/log-vmm. 
> The machine has 900MB of memory (9*128M).
> 
> 
> <enables UNALIGNED_ZONE_BOUNDARIES like the nice message says>
> <http://www.zip.com.au/~akpm/linux/patches/stuff/log-vmm-2>
> 
> Nope, I still have a 4k highmem zone.
> 
> 
> 
> btw Andy, that UNALIGNED_ZONE_BOUNDARIES message is useless.  Only 0.1% of
> users even have the knowledge how to recompile their kernel, let alone the
> inclination.  Can we do something smarter here?
> 
> <goes off to use his one-page highmem zone for something>
> 
> 
> 
> --- devel/mm/vmscan.c~initialise-total_memory-earlier	2006-06-05 23:59:50.000000000 -0700
> +++ devel-akpm/mm/vmscan.c	2006-06-06 00:00:59.000000000 -0700
> @@ -111,7 +111,7 @@ struct shrinker {
>   * From 0 .. 100.  Higher means more swappy.
>   */
>  int vm_swappiness = 60;
> -static long total_memory;
> +long total_memory;
>  
>  static LIST_HEAD(shrinker_list);
>  static DECLARE_RWSEM(shrinker_rwsem);
> @@ -1499,7 +1499,6 @@ static int __init kswapd_init(void)
>  	for_each_online_node(nid)
>   		kswapd_run(nid);
>  
> -	total_memory = nr_free_pagecache_pages();
>  	hotcpu_notifier(cpu_callback, 0);
>  	return 0;
>  }
> diff -puN mm/page_alloc.c~initialise-total_memory-earlier mm/page_alloc.c
> --- devel/mm/page_alloc.c~initialise-total_memory-earlier	2006-06-06 00:00:13.000000000 -0700
> +++ devel-akpm/mm/page_alloc.c	2006-06-06 00:01:28.000000000 -0700
> @@ -1725,9 +1725,9 @@ void __meminit build_all_zonelists(void)
>  		stop_machine_run(__build_all_zonelists, NULL, NR_CPUS);
>  		/* cpuset refresh routine should be here */
>  	}
> -
> -	printk("Built %i zonelists\n", num_online_nodes());
> -
> +	total_memory = nr_free_pagecache_pages();
> +	printk("Built %i zonelists.  Total memory: %ld pages\n",
> +			num_online_nodes(), total_memory);
>  }
>  
>  /*
> diff -puN include/linux/swap.h~initialise-total_memory-earlier include/linux/swap.h
> --- devel/include/linux/swap.h~initialise-total_memory-earlier	2006-06-06 00:00:44.000000000 -0700
> +++ devel-akpm/include/linux/swap.h	2006-06-06 00:00:56.000000000 -0700
> @@ -185,6 +185,7 @@ extern unsigned long try_to_free_pages(s
>  extern unsigned long shrink_all_memory(unsigned long nr_pages);
>  extern int vm_swappiness;
>  extern int remove_mapping(struct address_space *mapping, struct page *page);
> +extern long total_memory;
>  
>  #ifdef CONFIG_NUMA
>  extern int zone_reclaim_mode;
> _
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07  0:43         ` KAMEZAWA Hiroyuki
@ 2006-06-07  4:58           ` Andrew Morton
  2006-06-07  5:36             ` Rusty Russell
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2006-06-07  4:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: y-goto, mbligh, apw, linux-kernel, 76306.1226, Ingo Molnar,
	Arjan van de Ven, Gerd Hoffmann, Rusty Russell, Zachary Amsden

On Wed, 7 Jun 2006 09:43:55 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 6 Jun 2006 00:27:58 -0700
> Andrew Morton <akpm@osdl.org> wrote:
> 
> > 
> > I tried sparsemem on my little x86 box here.  Boots OK, after fixing up the
> > kswapd_init() patch (below).
> > 
> > I'm wondering why I have 4k of highmem:
> > 
> 
> Could you show /proc/iomem of your 4k HIGHMEM box ?
> Does 4k HIGHMEM exist only when SPARSEMEM is selected ?

Turns out that my 4 kbyte highmem zone (at least, as reported in
/proc/meminfo) is due to

vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-tidy.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-arch_vma_name-fix.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-vs-x86_64-mm-reliable-stack-trace-support-i386.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-vs-x86_64-mm-reliable-stack-trace-support-i386-2.patch

I don't think that was intended.

It'll be a screwup in the handling of MAXMEM.  That patch is doing strange
things with MAXMEM.  They are unchangelogged, uncommented and
possibly-hacky-looking things too, so I have no intention of fixing it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07  4:58           ` Andrew Morton
@ 2006-06-07  5:36             ` Rusty Russell
  2006-06-07  5:50               ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Rusty Russell @ 2006-06-07  5:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, y-goto, mbligh, apw, linux-kernel, 76306.1226,
	Ingo Molnar, Arjan van de Ven, Gerd Hoffmann, Zachary Amsden

On Tue, 2006-06-06 at 21:58 -0700, Andrew Morton wrote:
> On Wed, 7 Jun 2006 09:43:55 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Tue, 6 Jun 2006 00:27:58 -0700
> > Andrew Morton <akpm@osdl.org> wrote:
> > 
> > > 
> > > I tried sparsemem on my little x86 box here.  Boots OK, after fixing up the
> > > kswapd_init() patch (below).
> > > 
> > > I'm wondering why I have 4k of highmem:
> > > 
> > 
> > Could you show /proc/iomem of your 4k HIGHMEM box ?
> > Does 4k HIGHMEM exist only when SPARSEMEM is selected ?
> 
> Turns out that my 4 kbyte highmem zone (at least, as reported in
> /proc/meminfo) is due to
> 
> vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma.patch

Thanks for this report, Andrew!

	Yes, MAXMEM is reduced by one page in the patch, taking into account
that kernel memory tops out at __FIXADDR_TOP, not 0xFFFFFFFF.  AFAICT
this is in fact a bugfix, which becomes more important when
__FIXADDR_TOP can be moved to create a larger memory hole (as for
hypervisors).

	You now have 1 page more memory available in your system.  Use it
wisely.

I'm sure Gerd will slap me if I'm wrong on this.  Here's the patch
fragment:

-#define MAXMEM                 (-__PAGE_OFFSET-__VMALLOC_RESERVE)
+#define MAXMEM                 (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)

Cheers,
Rusty.
-- 
 ccontrol: http://ccontrol.ozlabs.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07  5:36             ` Rusty Russell
@ 2006-06-07  5:50               ` Andrew Morton
  2006-06-07  6:49                 ` Rusty Russell
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2006-06-07  5:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kamezawa.hiroyu, y-goto, mbligh, apw, linux-kernel, 76306.1226,
	mingo, arjan, kraxel, zach

On Wed, 07 Jun 2006 15:36:25 +1000
Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Tue, 2006-06-06 at 21:58 -0700, Andrew Morton wrote:
> > On Wed, 7 Jun 2006 09:43:55 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > On Tue, 6 Jun 2006 00:27:58 -0700
> > > Andrew Morton <akpm@osdl.org> wrote:
> > > 
> > > > 
> > > > I tried sparsemem on my little x86 box here.  Boots OK, after fixing up the
> > > > kswapd_init() patch (below).
> > > > 
> > > > I'm wondering why I have 4k of highmem:
> > > > 
> > > 
> > > Could you show /proc/iomem of your 4k HIGHMEM box ?
> > > Does 4k HIGHMEM exist only when SPARSEMEM is selected ?
> > 
> > Turns out that my 4 kbyte highmem zone (at least, as reported in
> > /proc/meminfo) is due to
> > 
> > vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma.patch
> 
> Thanks for this report, Andrew!
> 
> 	Yes, MAXMEM is reduced by one page in the patch, taking into account
> that kernel memory tops out at __FIXADDR_TOP, not 0xFFFFFFFF.  AFAICT
> this is in fact a bugfix, which becomes more important when
> __FIXADDR_TOP can be moved to create a larger memory hole (as for
> hypervisors).
> 
> 	You now have 1 page more memory available in your system.

The kernel has differing opinions about that:

 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000038000000 (usable)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
896MB LOWMEM available.

>  Use it
> wisely.


> I'm sure Gerd will slap me if I'm wrong on this.  Here's the patch
> fragment:
> 
> -#define MAXMEM                 (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> +#define MAXMEM                 (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)

Well.  Applying this with `patch -R' would presumably restore the situation.
But not having a clue why this change was made, I didn't bother trying it.

>From what you're saying, it appears that this patch is an unrelated change,
to fix the off-by-one?  And that if this machine had anything other than
exactly 7*128MB of physical memory, I wouldn't have noticed?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07  5:50               ` Andrew Morton
@ 2006-06-07  6:49                 ` Rusty Russell
  0 siblings, 0 replies; 21+ messages in thread
From: Rusty Russell @ 2006-06-07  6:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kamezawa.hiroyu, y-goto, mbligh, apw, linux-kernel, 76306.1226,
	mingo, arjan, kraxel, zach

On Tue, 2006-06-06 at 22:50 -0700, Andrew Morton wrote:
> On Wed, 07 Jun 2006 15:36:25 +1000
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> > 	You now have 1 page more memory available in your system.
> 
> The kernel has differing opinions about that:
> 
>  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 0000000038000000 (usable)
>  BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
>  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
> 0MB HIGHMEM available.
> 896MB LOWMEM available.

Sure, the pages are reserved, but we still map them into the kernel
address space.

> > I'm sure Gerd will slap me if I'm wrong on this.  Here's the patch
> > fragment:
> > 
> > -#define MAXMEM                 (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > +#define MAXMEM                 (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)
> 
> Well.  Applying this with `patch -R' would presumably restore the situation.
> But not having a clue why this change was made, I didn't bother trying it.

Actually, the comment above __VMALLOC_RESERVE already says "This much
address space is reserved for vmalloc() and iomap() as well as fixmap
mappings.", so it should have already been taken into account there.

So, please revert this.  When we introduce an actual CONFIG_MEMORY hole,
we'll patch in an explicit "-MEMHOLE_SIZE" or something here.

> From what you're saying, it appears that this patch is an unrelated change,
> to fix the off-by-one?  And that if this machine had anything other than
> exactly 7*128MB of physical memory, I wouldn't have noticed?

You wouldn't have noticed, yes.  But I'm not convinced the "fix" was
right anyway.

Thanks for chasing this!
Rusty.
-- 
 ccontrol: http://ccontrol.ozlabs.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06 23:42 ` Andrew Morton
@ 2006-06-07  9:16   ` Mel Gorman
  2006-06-07 17:38   ` Andy Whitcroft
  1 sibling, 0 replies; 21+ messages in thread
From: Mel Gorman @ 2006-06-07  9:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin Bligh, apw, linux-kernel

On (06/06/06 16:42), Andrew Morton didst pronounce:
> On Mon, 05 Jun 2006 17:51:00 -0700
> Martin Bligh <mbligh@google.com> wrote:
> 
> > http://test.kernel.org/abat/34264/debug/console.log
> 
> What sort of machine is this, anyway?
> 
> > WARNING: Not an IBM x440/NUMAQ and CONFIG_NUMA enabled!
> 

It's an IBM x440 that is not recognised by that check. Later in the log,
we see

IBM eserver xSeries 440 detected: force use of acpi=ht

> And is it expected that ZONE_NORMAL only has 384MB?  That seems awfully low
> for a 64GB x86 machine.  Could be that we went oom because we chose to
> allocate really big hash tables, based on the total amount of memory?
> 

The log reports 392MB LOWMEM available. As it is 32 bit machine, it started
with about 896MB but consumes much of it with mem_maps.  The log shows
calculate_numa_remap_pages() reporting

Reserving 35328 pages of KVA for lmem_map of node 0
Shrinking node 0 from 4456448 pages to 4421120 pages
Reserving 31232 pages of KVA for lmem_map of node 1
Shrinking node 1 from 8650752 pages to 8619520 pages
Reserving 31232 pages of KVA for lmem_map of node 2
Shrinking node 2 from 12845056 pages to 12813824 pages
Reserving 29184 pages of KVA for lmem_map of node 3
Shrinking node 3 from 16777216 pages to 16748032 pages
Reserving total of 126976 pages for numa KVA remap
reserve_pages = 126976 find_max_low_pfn() ~ 229375

That is 400MB gone already which is mapped to lowmem. A little later in the
log, it says

min_low_pfn = 1140, max_low_pfn = 100352, highstart_pfn = 100352

so about 4MB is missing from the beginning (probably the kernel image),
so we're down to 396ish. Not sure where the last 4MB is exactly but you
get the idea.

While it is possible we are going OOM due to the size of lowmem, it's
doubtful to be the only cause.
http://test.kernel.org/functional/index.html shows that elm3b67 (the
machine in question) has passed loads of tests in the past.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  7:27       ` Andrew Morton
  2006-06-07  0:43         ` KAMEZAWA Hiroyuki
@ 2006-06-07  9:26         ` Andy Whitcroft
  2006-06-07 16:29           ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Andy Whitcroft @ 2006-06-07  9:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yasunori Goto, kamezawa.hiroyu, mbligh, linux-kernel, 76306.1226

Andrew Morton wrote:
> On Tue, 06 Jun 2006 14:36:14 +0900
> Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> 
> 
>>>I looked back into 2.6.15, 2.6.16. 
>>>It looks -mm's time of initialization of "total_memory" is not changed from them.
>>>(yes, Andrew's fix looks sane.)
>>>
>>>I'm intersted in the following texts in the log.
>>>==
>>>Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>Node 0 DMA32: empty
>>>Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>Node 0 HighMem: 1*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 2*2048kB 3962*4096kB = 16233724kB
>>>Node 1 DMA: empty
>>>Node 1 DMA32: empty
>>>Node 1 Normal: empty
>>>Node 1 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
>>>Node 2 DMA: empty
>>>Node 2 DMA32: empty
>>>Node 2 Normal: empty
>>>Node 2 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 4065*4096kB = 16651916kB
>>>Node 3 DMA: empty
>>>Node 3 DMA32: empty
>>>Node 3 Normal: empty
>>>Node 3 HighMem: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 3811*4096kB = 15611532kB
>>>==
>>>Looks 64GB memory. but there are only HIGHMEM, no NORMAL, DMA. so, shrink_zone() worked.
>>
>>Its log shows there are some memory in DMA and NORMAL just immediately
>>before that.....
>>
>>
>>>Active:2 inactive:15 dirty:0 writeback:0 unstable:0 free:16287272 slab:1823 mapped:0 pagetables:0
>>>Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? no
>>
>>lowmem_reserve[]: 0 0 0 0
>>
>>>Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
>>
>>lowmem_reserve[]: 0 0 0 0
>>
>>>Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:385024kB pages_scanned:0 all_unreclaimable? no
>>
>>lowmem_reserve[]: 0 0 0 0
>>
>>It looks like that something wasted all of DMA(16MB) and NORMAL(385MB)
>>zone suddenly. Hmmm...
>>
> 
> 
> I tried sparsemem on my little x86 box here.  Boots OK, after fixing up the
> kswapd_init() patch (below).
> 
> I'm wondering why I have 4k of highmem:
> 
> MemTotal:       898200 kB
> MemFree:        832936 kB
> Buffers:          8824 kB
> Cached:          30140 kB
> SwapCached:          0 kB
> Active:          25052 kB
> Inactive:        20800 kB
> HighTotal:           4 kB
> HighFree:            4 kB
> LowTotal:       898196 kB
> LowFree:        832932 kB
> SwapTotal:     1020116 kB
> SwapFree:      1020116 kB
> Dirty:               0 kB
> Writeback:           0 kB
> Mapped:          10340 kB
> Slab:            10252 kB
> CommitLimit:   1469216 kB
> Committed_AS:    15496 kB
> PageTables:        528 kB
> VmallocTotal:   114680 kB
> VmallocUsed:       648 kB
> VmallocChunk:   113980 kB
> HugePages_Total:     0
> HugePages_Free:      0
> HugePages_Rsvd:      0
> Hugepagesize:     4096 kB
> 
> The dmesg is at http://www.zip.com.au/~akpm/linux/patches/stuff/log-vmm. 
> The machine has 900MB of memory (9*128M).
> 
> 
> <enables UNALIGNED_ZONE_BOUNDARIES like the nice message says>
> <http://www.zip.com.au/~akpm/linux/patches/stuff/log-vmm-2>
> 
> Nope, I still have a 4k highmem zone.
> 
> 
> 
> btw Andy, that UNALIGNED_ZONE_BOUNDARIES message is useless.  Only 0.1% of
> users even have the knowledge how to recompile their kernel, let alone the
> inclination.  Can we do something smarter here?

Yes, valid point there.  The overall plan is that this should never come
out as the option should be on unless the architecture is ensuring
alignment.  Right now the only architecture which is so marked is x86.
I wonder if we should also be tainting the kernel at that point so its
obvious to 'us' that a kernel has this problem?

The other option is to just turn the check on all the time.  It is two
shift and mask + a compare on two cache lines that we definatly are
examining anyhow to make the merge checks.

Hmmmm.

-apw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07  9:26         ` Andy Whitcroft
@ 2006-06-07 16:29           ` Andrew Morton
  2006-06-07 16:35             ` Andrew Morton
  2006-06-07 17:22             ` sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Andy Whitcroft
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2006-06-07 16:29 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: y-goto, kamezawa.hiroyu, mbligh, linux-kernel, 76306.1226

On Wed, 07 Jun 2006 10:26:03 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> > btw Andy, that UNALIGNED_ZONE_BOUNDARIES message is useless.  Only 0.1% of
> > users even have the knowledge how to recompile their kernel, let alone the
> > inclination.  Can we do something smarter here?
> 
> Yes, valid point there.  The overall plan is that this should never come
> out as the option should be on unless the architecture is ensuring
> alignment.  Right now the only architecture which is so marked is x86.
> I wonder if we should also be tainting the kernel at that point so its
> obvious to 'us' that a kernel has this problem?

Better to make things just work if we can.

> The other option is to just turn the check on all the time.  It is two
> shift and mask + a compare on two cache lines that we definatly are
> examining anyhow to make the merge checks.

Sounds OK to me.

Note that the code can be optimised:

	if (page_zone_id(page) != page_zone_id(buddy))

...

static inline int page_zone_id(struct page *page)
{
	return (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK;
}

We don't need to perform the shift to make that comparison.  If the
compiler's sufficiently smart it will be able to optimise that for us.

<checks>

        shrl    $30, %edx       #, <variable>.flags
        shrl    $30, %eax       #, <variable>.flags
        cmpl    %eax, %edx      # <variable>.flags, <variable>.flags

Nope, not smart enough.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07 16:29           ` Andrew Morton
@ 2006-06-07 16:35             ` Andrew Morton
  2006-06-07 17:50               ` Andy Whitcroft
  2006-06-15 12:28               ` [PATCH] zone handle unaligned zone boundaries Andy Whitcroft
  2006-06-07 17:22             ` sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Andy Whitcroft
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2006-06-07 16:35 UTC (permalink / raw)
  To: apw, y-goto, kamezawa.hiroyu, mbligh, linux-kernel, 76306.1226

On Wed, 7 Jun 2006 09:29:50 -0700
Andrew Morton <akpm@osdl.org> wrote:

> Note that the code can be optimised:
> 
> 	if (page_zone_id(page) != page_zone_id(buddy))
> 
> ...
> 
> static inline int page_zone_id(struct page *page)
> {
> 	return (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK;
> }
> 
> We don't need to perform the shift to make that comparison.  If the
> compiler's sufficiently smart it will be able to optimise that for us.
> 
> <checks>
> 
>         shrl    $30, %edx       #, <variable>.flags
>         shrl    $30, %eax       #, <variable>.flags
>         cmpl    %eax, %edx      # <variable>.flags, <variable>.flags
> 
> Nope, not smart enough.

I take it back:

.LFB856:
	.loc 1 314 0
.LVL540:
	pushl	%ebp	#
.LCFI419:
	movl	%esp, %ebp	#,
.LCFI420:
	pushl	%ebx	#
.LCFI421:
	.loc 1 314 0
	movl	%edx, %ebx	# buddy, buddy
	.loc 1 320 0
	movl	(%eax), %edx	# <variable>.flags, <variable>.flags
.LVL541:
	movl	(%ebx), %eax	# <variable>.flags, <variable>.flags
.LVL542:
	shrl	$30, %edx	#, <variable>.flags
	shrl	$30, %eax	#, <variable>.flags
	cmpl	%eax, %edx	# <variable>.flags, <variable>.flags
	jne	.L587	#,
.LBB1082:

The compiler's done something sneaky there and has omitted the masking.


Anyway.  It sure doesn't look like it's worth a config option.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07 16:29           ` Andrew Morton
  2006-06-07 16:35             ` Andrew Morton
@ 2006-06-07 17:22             ` Andy Whitcroft
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Whitcroft @ 2006-06-07 17:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: y-goto, kamezawa.hiroyu, mbligh, linux-kernel, 76306.1226

Andrew Morton wrote:
> On Wed, 07 Jun 2006 10:26:03 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
> 
>>>btw Andy, that UNALIGNED_ZONE_BOUNDARIES message is useless.  Only 0.1% of
>>>users even have the knowledge how to recompile their kernel, let alone the
>>>inclination.  Can we do something smarter here?
>>
>>Yes, valid point there.  The overall plan is that this should never come
>>out as the option should be on unless the architecture is ensuring
>>alignment.  Right now the only architecture which is so marked is x86.
>>I wonder if we should also be tainting the kernel at that point so its
>>obvious to 'us' that a kernel has this problem?
> 
> 
> Better to make things just work if we can.
> 
> 
>>The other option is to just turn the check on all the time.  It is two
>>shift and mask + a compare on two cache lines that we definatly are
>>examining anyhow to make the merge checks.
> 
> 
> Sounds OK to me.
> 
> Note that the code can be optimised:
> 
> 	if (page_zone_id(page) != page_zone_id(buddy))
> 
> ...
> 
> static inline int page_zone_id(struct page *page)
> {
> 	return (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK;
> }
> 
> We don't need to perform the shift to make that comparison.  If the
> compiler's sufficiently smart it will be able to optimise that for us.
> 
> <checks>
> 
>         shrl    $30, %edx       #, <variable>.flags
>         shrl    $30, %eax       #, <variable>.flags
>         cmpl    %eax, %edx      # <variable>.flags, <variable>.flags
> 
> Nope, not smart enough.

Piece of junk compiler ...   Ok.  I'll put together the minimum check
without the shift and test that.  See if its visible in the performance.

-apw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06 23:42 ` Andrew Morton
  2006-06-07  9:16   ` Mel Gorman
@ 2006-06-07 17:38   ` Andy Whitcroft
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Whitcroft @ 2006-06-07 17:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin Bligh, linux-kernel

Andrew Morton wrote:
> On Mon, 05 Jun 2006 17:51:00 -0700
> Martin Bligh <mbligh@google.com> wrote:
> 
> 
>>http://test.kernel.org/abat/34264/debug/console.log
> 
> 
> What sort of machine is this, anyway?
> 
> 
>>WARNING: Not an IBM x440/NUMAQ and CONFIG_NUMA enabled!

Its confusing that that message is there, cause the machine in question
is an x440.  Then again I thought we'd only needed that cause of the
unaligned zones issue which if not nailed down we have fixes for.

Sadly this machine has just bitten the dust and I am waiting for a mendy
person to mend it.

-apw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-06  0:51 sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Martin Bligh
  2006-06-06  3:07 ` Andrew Morton
  2006-06-06 23:42 ` Andrew Morton
@ 2006-06-07 17:41 ` Andy Whitcroft
  2 siblings, 0 replies; 21+ messages in thread
From: Andy Whitcroft @ 2006-06-07 17:41 UTC (permalink / raw)
  To: Martin Bligh; +Cc: LKML, Andrew Morton

Martin Bligh wrote:
> http://test.kernel.org/abat/34264/debug/console.log
> 
> Only seems to happen on the sparsemem runs. Possibly a side-effect
> of the page migration stuff, manifesting itself differently?
> Or maybe not?
> 

Ok, this shouldn't be that issue as the sparsemem checks won't tickle
that puppy, not enough swap devices in use.  That said, I've just run a
full sweep of sparsemem and its GOOD across the board on swap patch and
2222 deadlock.  Sadly this failure is on a machine which has just bitten
the dust and I'm waiting for it to be mended.  Not sure how long that
will take.

-apw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
  2006-06-07 16:35             ` Andrew Morton
@ 2006-06-07 17:50               ` Andy Whitcroft
  2006-06-15 12:28               ` [PATCH] zone handle unaligned zone boundaries Andy Whitcroft
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Whitcroft @ 2006-06-07 17:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: y-goto, kamezawa.hiroyu, mbligh, linux-kernel, 76306.1226

Andrew Morton wrote:
> On Wed, 7 Jun 2006 09:29:50 -0700
> Andrew Morton <akpm@osdl.org> wrote:
> 
> 
>>Note that the code can be optimised:
>>
>>	if (page_zone_id(page) != page_zone_id(buddy))
>>
>>...
>>
>>static inline int page_zone_id(struct page *page)
>>{
>>	return (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK;
>>}
>>
>>We don't need to perform the shift to make that comparison.  If the
>>compiler's sufficiently smart it will be able to optimise that for us.
>>
>><checks>
>>
>>        shrl    $30, %edx       #, <variable>.flags
>>        shrl    $30, %eax       #, <variable>.flags
>>        cmpl    %eax, %edx      # <variable>.flags, <variable>.flags
>>
>>Nope, not smart enough.
> 
> 
> I take it back:
> 
> .LFB856:
> 	.loc 1 314 0
> .LVL540:
> 	pushl	%ebp	#
> .LCFI419:
> 	movl	%esp, %ebp	#,
> .LCFI420:
> 	pushl	%ebx	#
> .LCFI421:
> 	.loc 1 314 0
> 	movl	%edx, %ebx	# buddy, buddy
> 	.loc 1 320 0
> 	movl	(%eax), %edx	# <variable>.flags, <variable>.flags
> .LVL541:
> 	movl	(%ebx), %eax	# <variable>.flags, <variable>.flags
> .LVL542:
> 	shrl	$30, %edx	#, <variable>.flags
> 	shrl	$30, %eax	#, <variable>.flags
> 	cmpl	%eax, %edx	# <variable>.flags, <variable>.flags
> 	jne	.L587	#,
> .LBB1082:
> 
> The compiler's done something sneaky there and has omitted the masking.
> 
> 
> Anyway.  It sure doesn't look like it's worth a config option.

Yep I forgot about that.  We fought hard to keep the NODEZONE zonetable
index at the very edge of the flags for this very reason.  The
optimisation is worth taking care to order the fields in this area.

So if we assume barrel roll and a mask are equivalent cost to the
processor (fair I'd say) this comparison is as optimal as it can be.

I'll spin and test just an always on version and send you that to
replace the other patches.

-apw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] zone handle unaligned zone boundaries
  2006-06-07 16:35             ` Andrew Morton
  2006-06-07 17:50               ` Andy Whitcroft
@ 2006-06-15 12:28               ` Andy Whitcroft
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Whitcroft @ 2006-06-15 12:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: apw, y-goto, kamezawa.hiroyu, mbligh, linux-kernel, 76306.1226

This one has been though my tests ok.  It simply enables
the test unconditionally.  This should replace all four
unaligned-zones patches.

-apw

=== 8< ===
From: Andy Whitcroft <apw@shadowen.org>

The buddy allocator has a requirement that boundaries between
contigious zones occur aligned with the the MAX_ORDER ranges.  Where
they do not we will incorrectly merge pages cross zone boundaries.
This can lead to pages from the wrong zone being handed out.

Originally the buddy allocator would check that buddies were in the
same zone by referencing the zone start and end page frame numbers.
This was removed as it became very expensive and the buddy allocator
already made the assumption that zones boundaries were aligned.

It is clear that not all configurations and architectures are
honouring this alignment requirement.  Therefore it seems safest
to reintroduce support for non-aligned zone boundaries.  This patch
introduces a new check when considering a page a buddy it compares
the zone_table index for the two pages and refuses to merge the
pages where they do not match.  The zone_table index is unique for
each node/zone combination when FLATMEM/DISCONTIGMEM is enabled
and for each section/zone combination when SPARSEMEM is enabled
(a SPARSEMEM section is at least a MAX_ORDER size).

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 include/linux/mm.h |    7 +++++--
 mm/page_alloc.c    |   17 +++++++++++------
 2 files changed, 16 insertions(+), 8 deletions(-)
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/include/linux/mm.h current/include/linux/mm.h
--- reference/include/linux/mm.h
+++ current/include/linux/mm.h
@@ -484,10 +484,13 @@ static inline unsigned long page_zonenum
 struct zone;
 extern struct zone *zone_table[];
 
+static inline int page_zone_id(struct page *page)
+{
+	return (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK;
+}
 static inline struct zone *page_zone(struct page *page)
 {
-	return zone_table[(page->flags >> ZONETABLE_PGSHIFT) &
-			ZONETABLE_MASK];
+	return zone_table[page_zone_id(page)];
 }
 
 static inline unsigned long page_to_nid(struct page *page)
diff -X /home/apw/brief/lib/vdiff.excl -rupN reference/mm/page_alloc.c current/mm/page_alloc.c
--- reference/mm/page_alloc.c
+++ current/mm/page_alloc.c
@@ -301,22 +301,27 @@ __find_combined_index(unsigned long page
  * we can do coalesce a page and its buddy if
  * (a) the buddy is not in a hole &&
  * (b) the buddy is in the buddy system &&
- * (c) a page and its buddy have the same order.
+ * (c) a page and its buddy have the same order &&
+ * (d) a page and its buddy are in the same zone.
  *
  * For recording whether a page is in the buddy system, we use PG_buddy.
  * Setting, clearing, and testing PG_buddy is serialized by zone->lock.
  *
  * For recording page's order, we use page_private(page).
  */
-static inline int page_is_buddy(struct page *page, int order)
+static inline int page_is_buddy(struct page *page, struct page *buddy,
+								int order)
 {
 #ifdef CONFIG_HOLES_IN_ZONE
-	if (!pfn_valid(page_to_pfn(page)))
+	if (!pfn_valid(page_to_pfn(buddy)))
 		return 0;
 #endif
 
-	if (PageBuddy(page) && page_order(page) == order) {
-		BUG_ON(page_count(page) != 0);
+	if (page_zone_id(page) != page_zone_id(buddy))
+		return 0;
+
+	if (PageBuddy(buddy) && page_order(buddy) == order) {
+		BUG_ON(page_count(buddy) != 0);
 		return 1;
 	}
 	return 0;
@@ -367,7 +372,7 @@ static inline void __free_one_page(struc
 		struct page *buddy;
 
 		buddy = __page_find_buddy(page, page_idx, order);
-		if (!page_is_buddy(buddy, order))
+		if (!page_is_buddy(page, buddy, order))
 			break;		/* Move the buddy up one level. */
 
 		list_del(&buddy->lru);

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2006-06-15 12:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-06  0:51 sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Martin Bligh
2006-06-06  3:07 ` Andrew Morton
2006-06-06  5:19   ` KAMEZAWA Hiroyuki
2006-06-06  5:36     ` Yasunori Goto
2006-06-06  7:27       ` Andrew Morton
2006-06-07  0:43         ` KAMEZAWA Hiroyuki
2006-06-07  4:58           ` Andrew Morton
2006-06-07  5:36             ` Rusty Russell
2006-06-07  5:50               ` Andrew Morton
2006-06-07  6:49                 ` Rusty Russell
2006-06-07  9:26         ` Andy Whitcroft
2006-06-07 16:29           ` Andrew Morton
2006-06-07 16:35             ` Andrew Morton
2006-06-07 17:50               ` Andy Whitcroft
2006-06-15 12:28               ` [PATCH] zone handle unaligned zone boundaries Andy Whitcroft
2006-06-07 17:22             ` sparsemem panic in 2.6.17-rc5-mm1 and -mm2 Andy Whitcroft
2006-06-06 23:42 ` Andrew Morton
2006-06-07  9:16   ` Mel Gorman
2006-06-07 17:38   ` Andy Whitcroft
2006-06-07 17:41 ` Andy Whitcroft
  -- strict thread matches above, loose matches on Subject: below --
2006-06-06  3:50 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox