public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
       [not found] <200812250806.mBP86LPQ018399@imap1.linux-foundation.org>
@ 2009-02-08 20:47 ` Johannes Weiner
  2009-02-09  0:31   ` Benjamin Herrenschmidt
  2009-02-10 11:35   ` Chandru
  0 siblings, 2 replies; 8+ messages in thread
From: Johannes Weiner @ 2009-02-08 20:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: mm-commits, chandru, benh, chandru, paulus, akpm

On Thu, Dec 25, 2008 at 12:06:21AM -0800, akpm@linux-foundation.org wrote:
> 
> The patch titled
>      powerpc: fix code for reserved memory spanning across nodes
> has been added to the -mm tree.  Its filename is
>      powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
> out what to do about this
> 
> The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/
> 
> ------------------------------------------------------
> Subject: powerpc: fix code for reserved memory spanning across nodes
> From: Chandru <chandru@in.ibm.com>
> 
> When booted with crashkernel=224M@32M or any memory size less than this,
> the system boots properly.  The following was the observation..  The
> system comes up with two nodes (0-256M and 256M-4GB).  The crashkernel
> memory reservation spans across these two nodes.  The
> mark_reserved_regions_for_nid() in arch/powerpc/mm/numa.c resizes the
> reserved part of the memory within it as:
> 
> 	if (end_pfn > node_ar.end_pfn)
> 		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
> 				- (start_pfn << PAGE_SHIFT);
> 
> 
> but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end 
> 
> 	end = PFN_UP(physaddr + size);
>
> This causes end to get a value past the last page in the 0-256M node. 
> Again when reserve_bootmem_node() returns, mark_reserved_regions_for_nid()
> loops around to set the rest of the crashkernel memory in the next node as
> reserved.  It references NODE_DATA(node_ar.nid) and this causes another
> 'Oops: kernel access of bad area' problem.  The following changes made the
> system to boot with any amount of crashkernel memory size.
> 
> Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  arch/powerpc/mm/numa.c |    7 ++++---
>  mm/bootmem.c           |    4 ++++
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff -puN arch/powerpc/mm/numa.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes arch/powerpc/mm/numa.c
> --- a/arch/powerpc/mm/numa.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes
> +++ a/arch/powerpc/mm/numa.c
> @@ -995,10 +995,11 @@ void __init do_init_bootmem(void)
>  				  start_pfn, end_pfn);
>  
>  		free_bootmem_with_active_regions(nid, end_pfn);
> +	}
> +
> +	for_each_online_node(nid) {
>  		/*
> -		 * Be very careful about moving this around.  Future
> -		 * calls to careful_allocation() depend on this getting
> -		 * done correctly.
> +		 * Be very careful about moving this around.
>  		 */
>  		mark_reserved_regions_for_nid(nid);
>  		sparse_memory_present_with_active_regions(nid);
> diff -puN mm/bootmem.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes mm/bootmem.c
> --- a/mm/bootmem.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes
> +++ a/mm/bootmem.c
> @@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
>  				 unsigned long size, int flags)
>  {
>  	unsigned long start, end;
> +	bootmem_data_t *bdata = pgdat->bdata;
>  
>  	start = PFN_DOWN(physaddr);
>  	end = PFN_UP(physaddr + size);
>  
> +	if (end > bdata->node_low_pfn)
> +		end = bdata->node_low_pfn;

Chandru,

from your patch descriptions I read two reasons why your bootup fails:

1. Rounding a contained range so that it exceeds the node

This is impossible.  Either your range includes that page already or
it doesn't.  bootmem doesn't reserve more pages than requested, it
just rounds up to full pages if your range includes partials.  Your
size must already overflow the node boundary if the end pfn is too
high.

2. Node-spanning range

If you have ranges that span nodes you are using the wrong interface.
By passing a node descriptor to reserve_bootmem_node(), you expect the
specified range to be on that node.  If it is not, please use the
node-agnostic reserve_bootmem() interface.

The above patch brings a kludge back to bootmem that silently cuts
bogus ranges.  I don't agree, these bugs should be fixed, not worked
around.

What I can spot in the callsite is the following:

                unsigned long physbase = lmb.reserved.region[i].base;
                unsigned long size = lmb.reserved.region[i].size;
                unsigned long start_pfn = physbase >> PAGE_SHIFT;
                unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);

When physbase + size is not a multiple of PAGE_SIZE, the overlap is
cut off and end_pfn is too low.

                        if (end_pfn > node_ar.end_pfn)
                                reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
                                        - (start_pfn << PAGE_SHIFT);

The test could then fail even if the size should have been trimmed.
Since the overlap must be less than PAGE_SIZE for this to happen, this
would explain what you noticed: the bootmem range exceeds the node
boundary by exactly one PFN because of the rounding (if I got you
right).

Patch attached, can not test, sorry, eat carefully.

	Hannes

PS: Andrew, I wasn't CC'd but this patch was listed in this `patches
that might be from hannes@etcppp'-list from another mm-commit message.
It's magical!

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: powerpc: fix rounding error in teaching bootmem about LMB

If the reserved LMB does not exactly span complete pages, treating
(start + size) >> PAGE_SHIFT as the ending PFN is an off by one error.

The subsequent check for whether the region needs to be trimmed to fit
the underlying node can now fail if the range exceeds the node by 1 to
PAGE_SIZE - 1 bytes.  The excessive range is then passed to bootmem
which BUG()s out on it correctly.

Fix up the rounding to include all pages the LMB spans, even partial
ones.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 7393bd7..38016a2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -12,6 +12,7 @@
 #include <linux/bootmem.h>
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/pfn.h>
 #include <linux/mmzone.h>
 #include <linux/module.h>
 #include <linux/nodemask.h>
@@ -881,8 +882,8 @@ static void mark_reserved_regions_for_nid(int nid)
 	for (i = 0; i < lmb.reserved.cnt; i++) {
 		unsigned long physbase = lmb.reserved.region[i].base;
 		unsigned long size = lmb.reserved.region[i].size;
-		unsigned long start_pfn = physbase >> PAGE_SHIFT;
-		unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);
+		unsigned long start_pfn = PFN_DOWN(physbase);
+		unsigned long end_pfn = PFN_UP(physbase + size);
 		struct node_active_region node_ar;
 		unsigned long node_end_pfn = node->node_start_pfn +
 					     node->node_spanned_pages;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-08 20:47 ` + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree Johannes Weiner
@ 2009-02-09  0:31   ` Benjamin Herrenschmidt
  2009-02-10 17:51     ` Dave Hansen
  2009-02-10 11:35   ` Chandru
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2009-02-09  0:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, mm-commits, chandru, chandru, paulus, akpm,
	Johannes Weiner

Dave, you've had your hands in that code more than I did lately, do that
look ok to you ?

(BTW. On another note, do you still have pending bug fixes that didn't
got in .29 ?)

Cheers,
Ben.

On Sun, 2009-02-08 at 21:47 +0100, Johannes Weiner wrote:
> On Thu, Dec 25, 2008 at 12:06:21AM -0800, akpm@linux-foundation.org wrote:
> > 
> > The patch titled
> >      powerpc: fix code for reserved memory spanning across nodes
> > has been added to the -mm tree.  Its filename is
> >      powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch
> > 
> > Before you just go and hit "reply", please:
> >    a) Consider who else should be cc'ed
> >    b) Prefer to cc a suitable mailing list as well
> >    c) Ideally: find the original patch on the mailing list and do a
> >       reply-to-all to that, adding suitable additional cc's
> > 
> > *** Remember to use Documentation/SubmitChecklist when testing your code ***
> > 
> > See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
> > out what to do about this
> > 
> > The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/
> > 
> > ------------------------------------------------------
> > Subject: powerpc: fix code for reserved memory spanning across nodes
> > From: Chandru <chandru@in.ibm.com>
> > 
> > When booted with crashkernel=224M@32M or any memory size less than this,
> > the system boots properly.  The following was the observation..  The
> > system comes up with two nodes (0-256M and 256M-4GB).  The crashkernel
> > memory reservation spans across these two nodes.  The
> > mark_reserved_regions_for_nid() in arch/powerpc/mm/numa.c resizes the
> > reserved part of the memory within it as:
> > 
> > 	if (end_pfn > node_ar.end_pfn)
> > 		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
> > 				- (start_pfn << PAGE_SHIFT);
> > 
> > 
> > but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end 
> > 
> > 	end = PFN_UP(physaddr + size);
> >
> > This causes end to get a value past the last page in the 0-256M node. 
> > Again when reserve_bootmem_node() returns, mark_reserved_regions_for_nid()
> > loops around to set the rest of the crashkernel memory in the next node as
> > reserved.  It references NODE_DATA(node_ar.nid) and this causes another
> > 'Oops: kernel access of bad area' problem.  The following changes made the
> > system to boot with any amount of crashkernel memory size.
> > 
> > Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> > 
> >  arch/powerpc/mm/numa.c |    7 ++++---
> >  mm/bootmem.c           |    4 ++++
> >  2 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff -puN arch/powerpc/mm/numa.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes arch/powerpc/mm/numa.c
> > --- a/arch/powerpc/mm/numa.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes
> > +++ a/arch/powerpc/mm/numa.c
> > @@ -995,10 +995,11 @@ void __init do_init_bootmem(void)
> >  				  start_pfn, end_pfn);
> >  
> >  		free_bootmem_with_active_regions(nid, end_pfn);
> > +	}
> > +
> > +	for_each_online_node(nid) {
> >  		/*
> > -		 * Be very careful about moving this around.  Future
> > -		 * calls to careful_allocation() depend on this getting
> > -		 * done correctly.
> > +		 * Be very careful about moving this around.
> >  		 */
> >  		mark_reserved_regions_for_nid(nid);
> >  		sparse_memory_present_with_active_regions(nid);
> > diff -puN mm/bootmem.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes mm/bootmem.c
> > --- a/mm/bootmem.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes
> > +++ a/mm/bootmem.c
> > @@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
> >  				 unsigned long size, int flags)
> >  {
> >  	unsigned long start, end;
> > +	bootmem_data_t *bdata = pgdat->bdata;
> >  
> >  	start = PFN_DOWN(physaddr);
> >  	end = PFN_UP(physaddr + size);
> >  
> > +	if (end > bdata->node_low_pfn)
> > +		end = bdata->node_low_pfn;
> 
> Chandru,
> 
> from your patch descriptions I read two reasons why your bootup fails:
> 
> 1. Rounding a contained range so that it exceeds the node
> 
> This is impossible.  Either your range includes that page already or
> it doesn't.  bootmem doesn't reserve more pages than requested, it
> just rounds up to full pages if your range includes partials.  Your
> size must already overflow the node boundary if the end pfn is too
> high.
> 
> 2. Node-spanning range
> 
> If you have ranges that span nodes you are using the wrong interface.
> By passing a node descriptor to reserve_bootmem_node(), you expect the
> specified range to be on that node.  If it is not, please use the
> node-agnostic reserve_bootmem() interface.
> 
> The above patch brings a kludge back to bootmem that silently cuts
> bogus ranges.  I don't agree, these bugs should be fixed, not worked
> around.
> 
> What I can spot in the callsite is the following:
> 
>                 unsigned long physbase = lmb.reserved.region[i].base;
>                 unsigned long size = lmb.reserved.region[i].size;
>                 unsigned long start_pfn = physbase >> PAGE_SHIFT;
>                 unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);
> 
> When physbase + size is not a multiple of PAGE_SIZE, the overlap is
> cut off and end_pfn is too low.
> 
>                         if (end_pfn > node_ar.end_pfn)
>                                 reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
>                                         - (start_pfn << PAGE_SHIFT);
> 
> The test could then fail even if the size should have been trimmed.
> Since the overlap must be less than PAGE_SIZE for this to happen, this
> would explain what you noticed: the bootmem range exceeds the node
> boundary by exactly one PFN because of the rounding (if I got you
> right).
> 
> Patch attached, can not test, sorry, eat carefully.
> 
> 	Hannes
> 
> PS: Andrew, I wasn't CC'd but this patch was listed in this `patches
> that might be from hannes@etcppp'-list from another mm-commit message.
> It's magical!
> 
> ---
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: powerpc: fix rounding error in teaching bootmem about LMB
> 
> If the reserved LMB does not exactly span complete pages, treating
> (start + size) >> PAGE_SHIFT as the ending PFN is an off by one error.
> 
> The subsequent check for whether the region needs to be trimmed to fit
> the underlying node can now fail if the range exceeds the node by 1 to
> PAGE_SIZE - 1 bytes.  The excessive range is then passed to bootmem
> which BUG()s out on it correctly.
> 
> Fix up the rounding to include all pages the LMB spans, even partial
> ones.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 7393bd7..38016a2 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -12,6 +12,7 @@
>  #include <linux/bootmem.h>
>  #include <linux/init.h>
>  #include <linux/mm.h>
> +#include <linux/pfn.h>
>  #include <linux/mmzone.h>
>  #include <linux/module.h>
>  #include <linux/nodemask.h>
> @@ -881,8 +882,8 @@ static void mark_reserved_regions_for_nid(int nid)
>  	for (i = 0; i < lmb.reserved.cnt; i++) {
>  		unsigned long physbase = lmb.reserved.region[i].base;
>  		unsigned long size = lmb.reserved.region[i].size;
> -		unsigned long start_pfn = physbase >> PAGE_SHIFT;
> -		unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);
> +		unsigned long start_pfn = PFN_DOWN(physbase);
> +		unsigned long end_pfn = PFN_UP(physbase + size);
>  		struct node_active_region node_ar;
>  		unsigned long node_end_pfn = node->node_start_pfn +
>  					     node->node_spanned_pages;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-08 20:47 ` + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree Johannes Weiner
  2009-02-09  0:31   ` Benjamin Herrenschmidt
@ 2009-02-10 11:35   ` Chandru
  2009-02-10 17:59     ` Johannes Weiner
  1 sibling, 1 reply; 8+ messages in thread
From: Chandru @ 2009-02-10 11:35 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-kernel, mm-commits, benh, chandru, paulus, akpm

Johannes Weiner wrote:
> ---
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: powerpc: fix rounding error in teaching bootmem about LMB
>
> If the reserved LMB does not exactly span complete pages, treating
> (start + size) >> PAGE_SHIFT as the ending PFN is an off by one error.
>
> The subsequent check for whether the region needs to be trimmed to fit
> the underlying node can now fail if the range exceeds the node by 1 to
> PAGE_SIZE - 1 bytes.  The excessive range is then passed to bootmem
> which BUG()s out on it correctly.
>
> Fix up the rounding to include all pages the LMB spans, even partial
> ones.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---

Hello Hannes,

Dave Hansen gave a fix for this problem which looks similar
to the changes that you have made here , but has an additional
one line change too. Here is the patch ( probably may not apply
cleanly to the latest kernel ) . I don' t know if Dave
submitted this patch to lkml for it's inclusion into the
latest tree. Thanks for looking into this issue. We may also have
to remove the 
powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch
from the -mm tree.

Thanks,
Chandru

=================
Snippet from Dave's patch without change log

---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff -puN arch/powerpc/mm/numa.c~reserve-over-fix arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~reserve-over-fix	2009-01-26 10:17:20.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2009-01-26 10:17:30.000000000 -0800
@@ -19,6 +19,7 @@
 #include <linux/notifier.h>
 #include <linux/lmb.h>
 #include <linux/of.h>
+#include <linux/pfn.h>
 #include <asm/sparsemem.h>
 #include <asm/prom.h>
 #include <asm/system.h>
@@ -882,7 +883,7 @@ static void mark_reserved_regions_for_ni
 		unsigned long physbase = lmb.reserved.region[i].base;
 		unsigned long size = lmb.reserved.region[i].size;
 		unsigned long start_pfn = physbase >> PAGE_SHIFT;
-		unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);
+		unsigned long end_pfn = PFN_UP(physbase + size);
 		struct node_active_region node_ar;
 		unsigned long node_end_pfn = node->node_start_pfn +
 					     node->node_spanned_pages;
@@ -908,7 +909,7 @@ static void mark_reserved_regions_for_ni
 			 */
 			if (end_pfn > node_ar.end_pfn)
 				reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
-					- (start_pfn << PAGE_SHIFT);
+					- physbase;
 			/*
 			 * Only worry about *this* node, others may not
 			 * yet have valid NODE_DATA().
diff -puN arch/powerpc/kernel/prom_init.c~reserve-over-fix arch/powerpc/kernel/prom_init.c



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-09  0:31   ` Benjamin Herrenschmidt
@ 2009-02-10 17:51     ` Dave Hansen
  2009-02-10 18:27       ` Chandru
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2009-02-10 17:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-kernel, mm-commits, chandru, chandru, paulus, akpm,
	Johannes Weiner, Jon M. Tollefson [imap]

On Mon, 2009-02-09 at 11:31 +1100, Benjamin Herrenschmidt wrote:
> Dave, you've had your hands in that code more than I did lately, do
> that
> look ok to you ?

I still think that patch is bogus.  I don't know how it is still around.
I described the problem pretty completely here:

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a6186696e7f15b3ea4dafcdb64ee0703e0e4487

Chandru effectively reverted my bugfix from that patch in this one.

> (BTW. On another note, do you still have pending bug fixes that didn't
> got in .29 ?)

No, I don't think so.  There were some distro fixes but I couldn't
reproduce the bugs on mainline.

-- Dave


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-10 11:35   ` Chandru
@ 2009-02-10 17:59     ` Johannes Weiner
  2009-02-13 11:18       ` Chandru
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2009-02-10 17:59 UTC (permalink / raw)
  To: Chandru; +Cc: linux-kernel, mm-commits, benh, chandru, paulus, akpm

On Tue, Feb 10, 2009 at 05:05:45PM +0530, Chandru wrote:
> Johannes Weiner wrote:
> >---
> >From: Johannes Weiner <hannes@cmpxchg.org>
> >Subject: powerpc: fix rounding error in teaching bootmem about LMB
> >
> >If the reserved LMB does not exactly span complete pages, treating
> >(start + size) >> PAGE_SHIFT as the ending PFN is an off by one error.
> >
> >The subsequent check for whether the region needs to be trimmed to fit
> >the underlying node can now fail if the range exceeds the node by 1 to
> >PAGE_SIZE - 1 bytes.  The excessive range is then passed to bootmem
> >which BUG()s out on it correctly.
> >
> >Fix up the rounding to include all pages the LMB spans, even partial
> >ones.
> >
> >Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> >---
> 
> Hello Hannes,
> 
> Dave Hansen gave a fix for this problem which looks similar
> to the changes that you have made here , but has an additional
> one line change too. Here is the patch ( probably may not apply
> cleanly to the latest kernel ) . I don' t know if Dave
> submitted this patch to lkml for it's inclusion into the
> latest tree. Thanks for looking into this issue. We may also have
> to remove the 
> powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch
> from the -mm tree.
> 
> Thanks,
> Chandru
> 
> =================
> Snippet from Dave's patch without change log
> 
> ---
> 
> linux-2.6.git-dave/arch/powerpc/mm/numa.c |    5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff -puN arch/powerpc/mm/numa.c~reserve-over-fix arch/powerpc/mm/numa.c
> --- linux-2.6.git/arch/powerpc/mm/numa.c~reserve-over-fix	2009-01-26 
> 10:17:20.000000000 -0800
> +++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2009-01-26 
> 10:17:30.000000000 -0800
> @@ -19,6 +19,7 @@
> #include <linux/notifier.h>
> #include <linux/lmb.h>
> #include <linux/of.h>
> +#include <linux/pfn.h>
> #include <asm/sparsemem.h>
> #include <asm/prom.h>
> #include <asm/system.h>
> @@ -882,7 +883,7 @@ static void mark_reserved_regions_for_ni
> 		unsigned long physbase = lmb.reserved.region[i].base;
> 		unsigned long size = lmb.reserved.region[i].size;
> 		unsigned long start_pfn = physbase >> PAGE_SHIFT;
> -		unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);
> +		unsigned long end_pfn = PFN_UP(physbase + size);
> 		struct node_active_region node_ar;
> 		unsigned long node_end_pfn = node->node_start_pfn +
> 					     node->node_spanned_pages;
> @@ -908,7 +909,7 @@ static void mark_reserved_regions_for_ni
> 			 */
> 			if (end_pfn > node_ar.end_pfn)
> 				reserve_size = (node_ar.end_pfn << 
> 				PAGE_SHIFT)
> -					- (start_pfn << PAGE_SHIFT);
> +					- physbase;
> 			/*
> 			 * Only worry about *this* node, others may not
> 			 * yet have valid NODE_DATA().
> diff -puN arch/powerpc/kernel/prom_init.c~reserve-over-fix 
> arch/powerpc/kernel/prom_init.c

This is fine, too.  Subtracting physbase yields a smaller reserve_size
if not page aligned (bootmem will round it up again, no problem there)
and at the end of the loop, size is not zero.  This has no practical
impacts, though, as far as I can see.

The interesting question, however, is whether this patch actually
fixes the problem you encountered?  If so, I would be glad if we could
drop the workaround we currently have in -mm.

	Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-10 17:51     ` Dave Hansen
@ 2009-02-10 18:27       ` Chandru
  2009-02-13 11:01         ` Chandru
  0 siblings, 1 reply; 8+ messages in thread
From: Chandru @ 2009-02-10 18:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Benjamin Herrenschmidt, linux-kernel, mm-commits, chandru, paulus,
	akpm, Johannes Weiner, Jon M. Tollefson [imap]

Dave Hansen wrote:
> On Mon, 2009-02-09 at 11:31 +1100, Benjamin Herrenschmidt wrote:
>   
>> Dave, you've had your hands in that code more than I did lately, do
>> that
>> look ok to you ?
>>     
>
> I still think that patch is bogus.  I don't know how it is still around.
> I described the problem pretty completely here:
>
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a6186696e7f15b3ea4dafcdb64ee0703e0e4487
>
> Chandru effectively reverted my bugfix from that patch in this one.
>
>   
>> (BTW. On another note, do you still have pending bug fixes that didn't
>> got in .29 ?)
>>     
>
> No, I don't think so.  There were some distro fixes but I couldn't
> reproduce the bugs on mainline.
>
> -- Dave
>
>   
Dave,

If I remember,  the first mail that I sent on this subject said
2.6.28-rc9 panics. Unless something has changed in the way
lmb's are added or are reserved, this should be also
reproducible with the mainline kernel on the machines that
had exhibited this problem earlier. I will verify your patch
on these machines again and let you know the results.

Thanks,
Chandru

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-10 18:27       ` Chandru
@ 2009-02-13 11:01         ` Chandru
  0 siblings, 0 replies; 8+ messages in thread
From: Chandru @ 2009-02-13 11:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Benjamin Herrenschmidt, linux-kernel, mm-commits, chandru, paulus,
	akpm, Johannes Weiner, Jon M. Tollefson [imap]

Chandru wrote:
> Dave Hansen wrote:
>> On Mon, 2009-02-09 at 11:31 +1100, Benjamin Herrenschmidt wrote:
>>  
>>  
>>> (BTW. On another note, do you still have pending bug fixes that didn't
>>> got in .29 ?)
>>>     
>>
>> No, I don't think so.  There were some distro fixes but I couldn't
>> reproduce the bugs on mainline.
>>
>> -- Dave
>>
>>   
> Dave,
>
> If I remember,  the first mail that I sent on this subject said
> 2.6.28-rc9 panics. Unless something has changed in the way
> lmb's are added or are reserved, this should be also
> reproducible with the mainline kernel on the machines that
> had exhibited this problem earlier. I will verify your patch
> on these machines again and let you know the results.
>
> Thanks,
> Chandru

This still occurs with 2.6.29.rc4 on one of the machines that had
shown this problem earlier and Dave's patch solves the issue here.

The system comes up with two nodes...

<snip>
NUMA associativity depth for CPU/Memory: 3
Node 0 Memory: 0x0-0x40000000
Node 1 Memory: 0x40000000-0x80000000

I had to increase the crashkernel memory size to beyond 1G and
hence used crashkernel=1280M@32M. This caused the kernel
to hit the bug as with earlier kernels.

-Chandru


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree
  2009-02-10 17:59     ` Johannes Weiner
@ 2009-02-13 11:18       ` Chandru
  0 siblings, 0 replies; 8+ messages in thread
From: Chandru @ 2009-02-13 11:18 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-kernel, mm-commits, benh, chandru, paulus, akpm

Johannes Weiner wrote:
>
> This is fine, too.  Subtracting physbase yields a smaller reserve_size
> if not page aligned (bootmem will round it up again, no problem there)
> and at the end of the loop, size is not zero.  This has no practical
> impacts, though, as far as I can see.
>
> The interesting question, however, is whether this patch actually
> fixes the problem you encountered?  If so, I would be glad if we could
> drop the workaround we currently have in -mm.
>
> 	Hannes
>   
Hello Hannes,

Yes, the patch fixes the problem of hitting the BUG_ON()
in bootmem.  Andrew removed the workaround from the
-mm tree.  Thanks to you for running your eyes on this
issue and arriving at an almost workable patch.

thanks,
Chandru

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-13 11:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200812250806.mBP86LPQ018399@imap1.linux-foundation.org>
2009-02-08 20:47 ` + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.patch added to -mm tree Johannes Weiner
2009-02-09  0:31   ` Benjamin Herrenschmidt
2009-02-10 17:51     ` Dave Hansen
2009-02-10 18:27       ` Chandru
2009-02-13 11:01         ` Chandru
2009-02-10 11:35   ` Chandru
2009-02-10 17:59     ` Johannes Weiner
2009-02-13 11:18       ` Chandru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox