linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jon Tollefson <kniht@us.ibm.com>
To: benh@kernel.crashing.org
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@ozlabs.org>,
	Paul Mackerras <paulus@samba.org>
Subject: Re: [PATCH v3] powerpc: properly reserve in bootmem the lmb reserved regions that cross NUMA nodes
Date: Thu, 16 Oct 2008 23:59:43 -0500	[thread overview]
Message-ID: <48F81BBF.7050801@us.ibm.com> (raw)
In-Reply-To: <1223614516.8157.154.camel@pasglop>

Benjamin Herrenschmidt wrote:
> On Thu, 2008-10-09 at 15:18 -0500, Jon Tollefson wrote:
>   
>> If there are multiple reserved memory blocks via lmb_reserve() that are
>> contiguous addresses and on different NUMA nodes we are losing track of which 
>> address ranges to reserve in bootmem on which node.  I discovered this 
>> when I recently got to try 16GB huge pages on a system with more then 2 nodes.
>>     
>
> I'm going to apply it, however, could you double check something for
> me ? A cursory glance of the new version makes me wonder, what if the
> first call to get_node_active_region() ends up with the work_fn never
> hitting the if () case ? I think in that case, node_ar->end_pfn never
> gets initialized right ? Can that happen in practice ? I suspect that
> isn't the case but better safe than sorry...
>   
I have tested this on a few machines and it hasn't been a problem.  But 
I don't see anything in lmb_reserve() that would prevent reserving a 
block that was outside of valid memory.  So to be safe I have attached a 
patch that checks for an empty active range.

I also noticed that the size to reserve for subsequent nodes for a 
reserve that spans nodes wasn't taking into account the amount reserved 
on previous nodes so the patch addresses that too.  If you would prefer 
this be a separate patch let me know.

> If there's indeed a potential problem, please send a fixup patch.
>
> Cheers,
> Ben.
>   
Adjust amount to reserve based on previous nodes for reserves spanning
multiple nodes. Check if the node active range is empty before attempting
to pass the reserve to bootmem.  In practice the range shouldn't be empty,
but to be sure we check.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
---

 
 arch/powerpc/mm/numa.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)


diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 6cf5c71..195bfcd 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -116,6 +116,7 @@ static int __init get_active_region_work_fn(unsigned long start_pfn,
 
 /*
  * get_node_active_region - Return active region containing start_pfn
+ * Active range returned is empty if none found.
  * @start_pfn: The page to return the region for.
  * @node_ar: Returned set to the active region containing start_pfn
  */
@@ -126,6 +127,7 @@ static void __init get_node_active_region(unsigned long start_pfn,
 
 	node_ar->nid = nid;
 	node_ar->start_pfn = start_pfn;
+	node_ar->end_pfn = start_pfn;
 	work_with_active_regions(nid, get_active_region_work_fn, node_ar);
 }
 
@@ -933,18 +935,20 @@ void __init do_init_bootmem(void)
 		struct node_active_region node_ar;
 
 		get_node_active_region(start_pfn, &node_ar);
-		while (start_pfn < end_pfn) {
+		while (start_pfn < end_pfn &&
+			node_ar.start_pfn < node_ar.end_pfn) {
+			unsigned long reserve_size = size;
 			/*
 			 * if reserved region extends past active region
 			 * then trim size to active region
 			 */
 			if (end_pfn > node_ar.end_pfn)
-				size = (node_ar.end_pfn << PAGE_SHIFT)
+				reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
 					- (start_pfn << PAGE_SHIFT);
-			dbg("reserve_bootmem %lx %lx nid=%d\n", physbase, size,
-				node_ar.nid);
+			dbg("reserve_bootmem %lx %lx nid=%d\n", physbase,
+				reserve_size, node_ar.nid);
 			reserve_bootmem_node(NODE_DATA(node_ar.nid), physbase,
-						size, BOOTMEM_DEFAULT);
+						reserve_size, BOOTMEM_DEFAULT);
 			/*
 			 * if reserved region is contained in the active region
 			 * then done.
@@ -959,6 +963,7 @@ void __init do_init_bootmem(void)
 			 */
 			start_pfn = node_ar.end_pfn;
 			physbase = start_pfn << PAGE_SHIFT;
+			size = size - reserve_size;
 			get_node_active_region(start_pfn, &node_ar);
 		}
 

  reply	other threads:[~2008-10-17  4:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-09 20:18 [PATCH v3] powerpc: properly reserve in bootmem the lmb reserved regions that cross NUMA nodes Jon Tollefson
2008-10-10  4:55 ` Benjamin Herrenschmidt
2008-10-17  4:59   ` Jon Tollefson [this message]
2009-02-11  3:17 ` problem with numa reserve bootmem Geoff Levand
2009-02-11  3:55   ` Michael Ellerman
2009-02-12 22:36   ` [patch] powerpc: fix numa reserve bootmem page selection Geoff Levand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48F81BBF.7050801@us.ibm.com \
    --to=kniht@us.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=kniht@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).