linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code
@ 2008-12-09 18:21 Dave Hansen
  2008-12-09 18:21 ` [PATCH 1/8] fix bootmem reservation on uninitialized node Dave Hansen
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn

The first patch in this series is a genuine bug fix.  The rest
are really just an RFC.

Jon introduced a bug a while ago.  I introduced another when
trying to fix Jon's bug.  I refuse to accept personal blame for
this and, instead, blame the code. :)

The reset of the series are "cleanups" that I think will help
clarify the code in numa.c and work to ensure that the next
bonehead like me is not as able to easily muck up the code. :)

The cleanups increase in impact and intrusiveness as the series
goes along, so please consider them an RFC.  But, what I really
want to figure out is a safer way to initialize NODE_DATA() and
start using it as we bring up bootmem on all the nodes.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/8] fix bootmem reservation on uninitialized node
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-10 22:14   ` Paul Mackerras
  2008-12-09 18:21 ` [PATCH 2/8] Add better comment on careful_allocation() Dave Hansen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


careful_allocation() was calling into the bootemem allocator for
nodes which had not been fully initialized and caused a previous
bug.  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
few broken out loops in do_init_bootmem() to fix it.  That changed
the code ordering.

I think this bug is triggered by having reserved areas for a node
which are spanned by another node's contents.  In the
mark_reserved_regions_for_nid() code, we attempt to reserve the
area for a node before we have allocated the NODE_DATA() for that
nid.  We do this since I reordered that loop.  I suck.

This may only present on some systems that have 16GB pages
reserved.  But, it can probably happen on any system that is
trying to reserve large swaths of memory that happen to span other
nodes' contents.

This patch ensures that we do not touch bootmem for any node which
has not been initialized.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff -puN arch/powerpc/mm/numa.c~fix-bad-node-reserve arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~fix-bad-node-reserve	2008-12-09 10:16:04.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:04.000000000 -0800
@@ -870,6 +870,7 @@ static void mark_reserved_regions_for_ni
 	struct pglist_data *node = NODE_DATA(nid);
 	int i;
 
+	dbg("mark_reserved_regions_for_nid(%d) NODE_DATA: %p\n", nid, node);
 	for (i = 0; i < lmb.reserved.cnt; i++) {
 		unsigned long physbase = lmb.reserved.region[i].base;
 		unsigned long size = lmb.reserved.region[i].size;
@@ -901,10 +902,14 @@ static void mark_reserved_regions_for_ni
 			if (end_pfn > node_ar.end_pfn)
 				reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
 					- (start_pfn << PAGE_SHIFT);
-			dbg("reserve_bootmem %lx %lx nid=%d\n", physbase,
-				reserve_size, node_ar.nid);
-			reserve_bootmem_node(NODE_DATA(node_ar.nid), physbase,
-						reserve_size, BOOTMEM_DEFAULT);
+			/*
+			 * Only worry about *this* node, others may not
+			 * yet have valid NODE_DATA().
+			 */
+			if (node_ar.nid == nid)
+				reserve_bootmem_node(NODE_DATA(node_ar.nid),
+						physbase, reserve_size,
+						BOOTMEM_DEFAULT);
 			/*
 			 * if reserved region is contained in the active region
 			 * then done.
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/8] Add better comment on careful_allocation()
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
  2008-12-09 18:21 ` [PATCH 1/8] fix bootmem reservation on uninitialized node Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-09 18:21 ` [PATCH 3/8] cleanup careful_allocation(): bootmem already panics Dave Hansen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


The behavior in careful_allocation() really confused me
at first.  Add a comment to hopefully make it easier
on the next doofus that looks at it.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff -puN arch/powerpc/mm/numa.c~cleanup-careful_allocation4 arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~cleanup-careful_allocation4	2008-12-09 10:16:05.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:05.000000000 -0800
@@ -840,8 +840,16 @@ static void __init *careful_allocation(i
 		      size, nid);
 
 	/*
-	 * If the memory came from a previously allocated node, we must
-	 * retry with the bootmem allocator.
+	 * We initialize the nodes in numeric order: 0, 1, 2...
+	 * and hand over control from the LMB allocator to the
+	 * bootmem allocator.  If this function is called for
+	 * node 5, then we know that all nodes <5 are using the
+	 * bootmem allocator instead of the LMB allocator.
+	 *
+	 * So, check the nid from which this allocation came
+	 * and double check to see if we need to use bootmem
+	 * instead of the LMB.  We don't free the LMB memory
+	 * since it would be useless.
 	 */
 	new_nid = early_pfn_to_nid(ret >> PAGE_SHIFT);
 	if (new_nid < nid) {
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/8] cleanup careful_allocation(): bootmem already panics
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
  2008-12-09 18:21 ` [PATCH 1/8] fix bootmem reservation on uninitialized node Dave Hansen
  2008-12-09 18:21 ` [PATCH 2/8] Add better comment on careful_allocation() Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-09 18:21 ` [PATCH 4/8] make careful_allocation() return vaddrs Dave Hansen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


If we fail a bootmem allocation, the bootmem code itself
panics.  No need to redo it here.

Also change the wording of the other panic.  We don't
strictly have to allocate memory on the specified node.
It is just a hint and that node may not even *have* any
memory on it.  In that case we can and do fall back to
other nodes.


Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff -puN arch/powerpc/mm/numa.c~cleanup-careful_allocation arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~cleanup-careful_allocation	2008-12-09 10:16:05.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:05.000000000 -0800
@@ -836,7 +836,7 @@ static void __init *careful_allocation(i
 		ret = __lmb_alloc_base(size, align, lmb_end_of_DRAM());
 
 	if (!ret)
-		panic("numa.c: cannot allocate %lu bytes on node %d",
+		panic("numa.c: cannot allocate %lu bytes for node %d",
 		      size, nid);
 
 	/*
@@ -856,10 +856,6 @@ static void __init *careful_allocation(i
 		ret = (unsigned long)__alloc_bootmem_node(NODE_DATA(new_nid),
 				size, align, 0);
 
-		if (!ret)
-			panic("numa.c: cannot allocate %lu bytes on node %d",
-			      size, new_nid);
-
 		ret = __pa(ret);
 
 		dbg("alloc_bootmem %lx %lx\n", ret, size);
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/8] make careful_allocation() return vaddrs
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
                   ` (2 preceding siblings ...)
  2008-12-09 18:21 ` [PATCH 3/8] cleanup careful_allocation(): bootmem already panics Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-09 18:21 ` [PATCH 5/8] cleanup careful_allocation(): consolidate memset() Dave Hansen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


Since we memset() the result in both of the uses here,
just make careful_alloc() return a virtual address.
Also, add a separate variable to store the physial
address that comes back from the lmb_alloc() functions.
This makes it less likely that someone will screw it up
forgetting to convert before returning since the vaddr
is always in a void* and the paddr is always in an
unsigned long.

I admit this is arbitrary since one of its users needs
a paddr and one a vaddr, but it does remove a good
number of casts.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |   37 ++++++++++++++++--------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff -puN arch/powerpc/mm/numa.c~cleanup-careful_allocation1 arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~cleanup-careful_allocation1	2008-12-09 10:16:06.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:06.000000000 -0800
@@ -822,23 +822,28 @@ static void __init dump_numa_memory_topo
  * required. nid is the preferred node and end is the physical address of
  * the highest address in the node.
  *
- * Returns the physical address of the memory.
+ * Returns the virtual address of the memory.
  */
 static void __init *careful_allocation(int nid, unsigned long size,
 				       unsigned long align,
 				       unsigned long end_pfn)
 {
+	void *ret;
 	int new_nid;
-	unsigned long ret = __lmb_alloc_base(size, align, end_pfn << PAGE_SHIFT);
+	unsigned long ret_paddr;
+
+	ret_paddr = __lmb_alloc_base(size, align, end_pfn << PAGE_SHIFT);
 
 	/* retry over all memory */
-	if (!ret)
-		ret = __lmb_alloc_base(size, align, lmb_end_of_DRAM());
+	if (!ret_paddr)
+		ret_paddr = __lmb_alloc_base(size, align, lmb_end_of_DRAM());
 
-	if (!ret)
+	if (!ret_paddr)
 		panic("numa.c: cannot allocate %lu bytes for node %d",
 		      size, nid);
 
+	ret = __va(ret_paddr);
+
 	/*
 	 * We initialize the nodes in numeric order: 0, 1, 2...
 	 * and hand over control from the LMB allocator to the
@@ -851,17 +856,15 @@ static void __init *careful_allocation(i
 	 * instead of the LMB.  We don't free the LMB memory
 	 * since it would be useless.
 	 */
-	new_nid = early_pfn_to_nid(ret >> PAGE_SHIFT);
+	new_nid = early_pfn_to_nid(ret_paddr >> PAGE_SHIFT);
 	if (new_nid < nid) {
-		ret = (unsigned long)__alloc_bootmem_node(NODE_DATA(new_nid),
+		ret = __alloc_bootmem_node(NODE_DATA(new_nid),
 				size, align, 0);
 
-		ret = __pa(ret);
-
-		dbg("alloc_bootmem %lx %lx\n", ret, size);
+		dbg("alloc_bootmem %p %lx\n", ret, size);
 	}
 
-	return (void *)ret;
+	return ret;
 }
 
 static struct notifier_block __cpuinitdata ppc64_numa_nb = {
@@ -955,7 +958,7 @@ void __init do_init_bootmem(void)
 
 	for_each_online_node(nid) {
 		unsigned long start_pfn, end_pfn;
-		unsigned long bootmem_paddr;
+		void *bootmem_vaddr;
 		unsigned long bootmap_pages;
 
 		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
@@ -970,7 +973,6 @@ void __init do_init_bootmem(void)
 		NODE_DATA(nid) = careful_allocation(nid,
 					sizeof(struct pglist_data),
 					SMP_CACHE_BYTES, end_pfn);
-		NODE_DATA(nid) = __va(NODE_DATA(nid));
 		memset(NODE_DATA(nid), 0, sizeof(struct pglist_data));
 
   		dbg("node %d\n", nid);
@@ -987,14 +989,15 @@ void __init do_init_bootmem(void)
   		dbg("end_paddr = %lx\n", end_pfn << PAGE_SHIFT);
 
 		bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
-		bootmem_paddr = (unsigned long)careful_allocation(nid,
+		bootmem_vaddr = careful_allocation(nid,
 					bootmap_pages << PAGE_SHIFT,
 					PAGE_SIZE, end_pfn);
-		memset(__va(bootmem_paddr), 0, bootmap_pages << PAGE_SHIFT);
+		memset(bootmem_vaddr, 0, bootmap_pages << PAGE_SHIFT);
 
-		dbg("bootmap_paddr = %lx\n", bootmem_paddr);
+		dbg("bootmap_vaddr = %p\n", bootmem_vaddr);
 
-		init_bootmem_node(NODE_DATA(nid), bootmem_paddr >> PAGE_SHIFT,
+		init_bootmem_node(NODE_DATA(nid),
+				  __pa(bootmem_vaddr) >> PAGE_SHIFT,
 				  start_pfn, end_pfn);
 
 		free_bootmem_with_active_regions(nid, end_pfn);
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/8] cleanup careful_allocation(): consolidate memset()
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
                   ` (3 preceding siblings ...)
  2008-12-09 18:21 ` [PATCH 4/8] make careful_allocation() return vaddrs Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-09 18:21 ` [PATCH 6/8] cleanup do_init_bootmem() Dave Hansen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


Both users of careful_allocation() immediately memset() the
result.  So, just do it in one place.

Also give careful_allocation() a 'z' prefix to bring it in
line with kzmalloc() and friends.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff -puN arch/powerpc/mm/numa.c~cleanup-careful_allocation2 arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~cleanup-careful_allocation2	2008-12-09 10:16:06.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:06.000000000 -0800
@@ -824,7 +824,7 @@ static void __init dump_numa_memory_topo
  *
  * Returns the virtual address of the memory.
  */
-static void __init *careful_allocation(int nid, unsigned long size,
+static void __init *careful_zallocation(int nid, unsigned long size,
 				       unsigned long align,
 				       unsigned long end_pfn)
 {
@@ -864,6 +864,7 @@ static void __init *careful_allocation(i
 		dbg("alloc_bootmem %p %lx\n", ret, size);
 	}
 
+	memset(ret, 0, size);
 	return ret;
 }
 
@@ -970,10 +971,9 @@ void __init do_init_bootmem(void)
 		 * previous nodes' bootmem to be initialized and have
 		 * all reserved areas marked.
 		 */
-		NODE_DATA(nid) = careful_allocation(nid,
+		NODE_DATA(nid) = careful_zallocation(nid,
 					sizeof(struct pglist_data),
 					SMP_CACHE_BYTES, end_pfn);
-		memset(NODE_DATA(nid), 0, sizeof(struct pglist_data));
 
   		dbg("node %d\n", nid);
 		dbg("NODE_DATA() = %p\n", NODE_DATA(nid));
@@ -989,10 +989,9 @@ void __init do_init_bootmem(void)
   		dbg("end_paddr = %lx\n", end_pfn << PAGE_SHIFT);
 
 		bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
-		bootmem_vaddr = careful_allocation(nid,
+		bootmem_vaddr = careful_zallocation(nid,
 					bootmap_pages << PAGE_SHIFT,
 					PAGE_SIZE, end_pfn);
-		memset(bootmem_vaddr, 0, bootmap_pages << PAGE_SHIFT);
 
 		dbg("bootmap_vaddr = %p\n", bootmem_vaddr);
 
@@ -1003,7 +1002,7 @@ void __init do_init_bootmem(void)
 		free_bootmem_with_active_regions(nid, end_pfn);
 		/*
 		 * Be very careful about moving this around.  Future
-		 * calls to careful_allocation() depend on this getting
+		 * calls to careful_zallocation() depend on this getting
 		 * done correctly.
 		 */
 		mark_reserved_regions_for_nid(nid);
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/8] cleanup do_init_bootmem()
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
                   ` (4 preceding siblings ...)
  2008-12-09 18:21 ` [PATCH 5/8] cleanup careful_allocation(): consolidate memset() Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-09 21:54   ` Serge E. Hallyn
  2008-12-16  5:06   ` Paul Mackerras
  2008-12-09 18:21 ` [PATCH 7/8] less use of NODE_DATA() Dave Hansen
  2008-12-09 18:21 ` [PATCH 8/8] make free_bootmem_with_active_regions() take pgdat Dave Hansen
  7 siblings, 2 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


I'm debating whether this is worth it. It makes this a bit more clean
looking, but doesn't seriously enhance readability.  But, I do think
it helps a bit.

Thoughts?

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |  104 +++++++++++++++---------------
 1 file changed, 55 insertions(+), 49 deletions(-)

diff -puN arch/powerpc/mm/numa.c~cleanup-careful_allocation3 arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~cleanup-careful_allocation3	2008-12-09 10:16:07.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:07.000000000 -0800
@@ -938,6 +938,59 @@ static void mark_reserved_regions_for_ni
 	}
 }
 
+void do_init_bootmem_node(int node)
+{
+	unsigned long start_pfn, end_pfn;
+	void *bootmem_vaddr;
+	unsigned long bootmap_pages;
+
+	dbg("node %d is online\n", nid);
+	get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+
+	/*
+	 * Allocate the node structure node local if possible
+	 *
+	 * Be careful moving this around, as it relies on all
+	 * previous nodes' bootmem to be initialized and have
+	 * all reserved areas marked.
+	 */
+	NODE_DATA(nid) = careful_zallocation(nid,
+				sizeof(struct pglist_data),
+				SMP_CACHE_BYTES, end_pfn);
+
+	dbg("node %d\n", nid);
+	dbg("NODE_DATA() = %p\n", NODE_DATA(nid));
+
+	NODE_DATA(nid)->bdata = &bootmem_node_data[nid];
+	NODE_DATA(nid)->node_start_pfn = start_pfn;
+	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
+
+	if (NODE_DATA(nid)->node_spanned_pages == 0)
+		return;
+
+	dbg("start_paddr = %lx\n", start_pfn << PAGE_SHIFT);
+	dbg("end_paddr = %lx\n", end_pfn << PAGE_SHIFT);
+
+	bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
+	bootmem_vaddr = careful_zallocation(nid,
+				bootmap_pages << PAGE_SHIFT,
+				PAGE_SIZE, end_pfn);
+
+	dbg("bootmap_vaddr = %p\n", bootmem_vaddr);
+
+	init_bootmem_node(NODE_DATA(nid),
+			  __pa(bootmem_vaddr) >> PAGE_SHIFT,
+			  start_pfn, end_pfn);
+
+	free_bootmem_with_active_regions(nid, end_pfn);
+	/*
+	 * Be very careful about moving this around.  Future
+	 * calls to careful_zallocation() depend on this getting
+	 * done correctly.
+	 */
+	mark_reserved_regions_for_nid(nid);
+	sparse_memory_present_with_active_regions(nid);
+}
 
 void __init do_init_bootmem(void)
 {
@@ -958,55 +1011,8 @@ void __init do_init_bootmem(void)
 			  (void *)(unsigned long)boot_cpuid);
 
 	for_each_online_node(nid) {
-		unsigned long start_pfn, end_pfn;
-		void *bootmem_vaddr;
-		unsigned long bootmap_pages;
-
-		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
-
-		/*
-		 * Allocate the node structure node local if possible
-		 *
-		 * Be careful moving this around, as it relies on all
-		 * previous nodes' bootmem to be initialized and have
-		 * all reserved areas marked.
-		 */
-		NODE_DATA(nid) = careful_zallocation(nid,
-					sizeof(struct pglist_data),
-					SMP_CACHE_BYTES, end_pfn);
-
-  		dbg("node %d\n", nid);
-		dbg("NODE_DATA() = %p\n", NODE_DATA(nid));
-
-		NODE_DATA(nid)->bdata = &bootmem_node_data[nid];
-		NODE_DATA(nid)->node_start_pfn = start_pfn;
-		NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
-
-		if (NODE_DATA(nid)->node_spanned_pages == 0)
-  			continue;
-
-  		dbg("start_paddr = %lx\n", start_pfn << PAGE_SHIFT);
-  		dbg("end_paddr = %lx\n", end_pfn << PAGE_SHIFT);
-
-		bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
-		bootmem_vaddr = careful_zallocation(nid,
-					bootmap_pages << PAGE_SHIFT,
-					PAGE_SIZE, end_pfn);
-
-		dbg("bootmap_vaddr = %p\n", bootmem_vaddr);
-
-		init_bootmem_node(NODE_DATA(nid),
-				  __pa(bootmem_vaddr) >> PAGE_SHIFT,
-				  start_pfn, end_pfn);
-
-		free_bootmem_with_active_regions(nid, end_pfn);
-		/*
-		 * Be very careful about moving this around.  Future
-		 * calls to careful_zallocation() depend on this getting
-		 * done correctly.
-		 */
-		mark_reserved_regions_for_nid(nid);
-		sparse_memory_present_with_active_regions(nid);
+		dbg("node %d: marked online, initializing bootmem\n", nid);
+		do_init_bootmem_node(nid);
 	}
 }
 
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/8] less use of NODE_DATA()
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
                   ` (5 preceding siblings ...)
  2008-12-09 18:21 ` [PATCH 6/8] cleanup do_init_bootmem() Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  2008-12-16  5:16   ` Paul Mackerras
  2008-12-09 18:21 ` [PATCH 8/8] make free_bootmem_with_active_regions() take pgdat Dave Hansen
  7 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


The use of NODE_DATA() in the ppc init code is fragile.  We use
it for some nodes as we are initializing others.  As the loop
initializing them has gotten more complex and broken out into
several functions it gets harder and harder to remember how
this goes.

This was recently the cause of a bug

	http://patchwork.ozlabs.org/patch/10528/

in which I also created a new regression for machines with large
memory reservations in the LMB structures (most likely 16GB
pages).

This patch reduces the references to NODE_DATA() and also keeps
it unitialized for as long as possible.  Hopefully, the delay
in initialization will help its use from spreading too much,
reducing the chances for future bugs.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |   63 ++++++++++++++----------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff -puN arch/powerpc/mm/numa.c~less-use-of-NODE_DATA arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~less-use-of-NODE_DATA	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:08.000000000 -0800
@@ -847,17 +847,16 @@ static void __init *careful_zallocation(
 	/*
 	 * We initialize the nodes in numeric order: 0, 1, 2...
 	 * and hand over control from the LMB allocator to the
-	 * bootmem allocator.  If this function is called for
-	 * node 5, then we know that all nodes <5 are using the
-	 * bootmem allocator instead of the LMB allocator.
+	 * bootmem allocator.
 	 *
-	 * So, check the nid from which this allocation came
-	 * and double check to see if we need to use bootmem
-	 * instead of the LMB.  We don't free the LMB memory
-	 * since it would be useless.
+	 * We must not call into the bootmem allocator for any node
+	 * which has not had bootmem initialized and had all of the
+	 * reserved areas set up.  In do_init_bootmem_node(), we do
+	 * not set NODE_DATA(nid) up until that is done.  Use that
+	 * property here.
 	 */
 	new_nid = early_pfn_to_nid(ret_paddr >> PAGE_SHIFT);
-	if (new_nid < nid) {
+	if (NODE_DATA(new_nid)) {
 		ret = __alloc_bootmem_node(NODE_DATA(new_nid),
 				size, align, 0);
 
@@ -873,12 +872,12 @@ static struct notifier_block __cpuinitda
 	.priority = 1 /* Must run before sched domains notifier. */
 };
 
-static void mark_reserved_regions_for_nid(int nid)
+static void mark_reserved_regions_for_node(struct pglist_data *node)
 {
-	struct pglist_data *node = NODE_DATA(nid);
+	int nid = node->node_id;
 	int i;
 
-	dbg("mark_reserved_regions_for_nid(%d) NODE_DATA: %p\n", nid, node);
+	dbg("%s(%d) NODE_DATA: %p\n", __func__, nid, node);
 	for (i = 0; i < lmb.reserved.cnt; i++) {
 		unsigned long physbase = lmb.reserved.region[i].base;
 		unsigned long size = lmb.reserved.region[i].size;
@@ -915,9 +914,8 @@ static void mark_reserved_regions_for_ni
 			 * yet have valid NODE_DATA().
 			 */
 			if (node_ar.nid == nid)
-				reserve_bootmem_node(NODE_DATA(node_ar.nid),
-						physbase, reserve_size,
-						BOOTMEM_DEFAULT);
+				reserve_bootmem_node(node, physbase,
+						reserve_size, BOOTMEM_DEFAULT);
 			/*
 			 * if reserved region is contained in the active region
 			 * then done.
@@ -938,8 +936,9 @@ static void mark_reserved_regions_for_ni
 	}
 }
 
-void do_init_bootmem_node(int node)
+void do_init_bootmem_node(int nid)
 {
+	struct pglist_data *node;
 	unsigned long start_pfn, end_pfn;
 	void *bootmem_vaddr;
 	unsigned long bootmap_pages;
@@ -954,18 +953,16 @@ void do_init_bootmem_node(int node)
 	 * previous nodes' bootmem to be initialized and have
 	 * all reserved areas marked.
 	 */
-	NODE_DATA(nid) = careful_zallocation(nid,
-				sizeof(struct pglist_data),
-				SMP_CACHE_BYTES, end_pfn);
-
-	dbg("node %d\n", nid);
-	dbg("NODE_DATA() = %p\n", NODE_DATA(nid));
-
-	NODE_DATA(nid)->bdata = &bootmem_node_data[nid];
-	NODE_DATA(nid)->node_start_pfn = start_pfn;
-	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
+	node = careful_zallocation(nid, sizeof(struct pglist_data),
+				   SMP_CACHE_BYTES, end_pfn);
 
-	if (NODE_DATA(nid)->node_spanned_pages == 0)
+	dbg("node %d pgkist_data: %p\n", nid, node);
+
+	node->bdata = &bootmem_node_data[nid];
+	node->node_start_pfn = start_pfn;
+	node->node_spanned_pages = end_pfn - start_pfn;
+
+	if (node->node_spanned_pages == 0)
 		return;
 
 	dbg("start_paddr = %lx\n", start_pfn << PAGE_SHIFT);
@@ -978,17 +975,19 @@ void do_init_bootmem_node(int node)
 
 	dbg("bootmap_vaddr = %p\n", bootmem_vaddr);
 
-	init_bootmem_node(NODE_DATA(nid),
-			  __pa(bootmem_vaddr) >> PAGE_SHIFT,
+	init_bootmem_node(node, __pa(bootmem_vaddr) >> PAGE_SHIFT,
 			  start_pfn, end_pfn);
 
+	NODE_DATA(nid) = node;
+	/* this call needs NODE_DATA(), so initialize it above */
 	free_bootmem_with_active_regions(nid, end_pfn);
+	mark_reserved_regions_for_node(node);
 	/*
-	 * Be very careful about moving this around.  Future
-	 * calls to careful_zallocation() depend on this getting
-	 * done correctly.
+	 * We wait to set NODE_DATA() until after the bootmem
+	 * allocator on this node is up and ready to go.
+	 * careful_zallocation() depends on this getting set
+	 * now to tell from which nodes it must use bootmem.
 	 */
-	mark_reserved_regions_for_nid(nid);
 	sparse_memory_present_with_active_regions(nid);
 }
 
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 8/8] make free_bootmem_with_active_regions() take pgdat
  2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
                   ` (6 preceding siblings ...)
  2008-12-09 18:21 ` [PATCH 7/8] less use of NODE_DATA() Dave Hansen
@ 2008-12-09 18:21 ` Dave Hansen
  7 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-09 18:21 UTC (permalink / raw)
  To: paulus
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn


As I said earlier, I'm trying to restrict the use of NODE_DATA()
since it can easily be referenced too early otherwise.

free_bootmem_with_active_regions() does not in practice need to
deal with multiple nodes.  I already audited all of its callers.

This patch makes it take a pgdat instead of doing the NODE_DATA()
lookup internally.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/mips/sgi-ip27/ip27-memory.c |    2 +-
 linux-2.6.git-dave/arch/powerpc/mm/mem.c            |    5 +++--
 linux-2.6.git-dave/arch/powerpc/mm/numa.c           |    3 +--
 linux-2.6.git-dave/arch/s390/kernel/setup.c         |    2 +-
 linux-2.6.git-dave/arch/sh/mm/numa.c                |    2 +-
 linux-2.6.git-dave/arch/sparc64/mm/init.c           |    6 +++---
 linux-2.6.git-dave/arch/x86/mm/init_32.c            |    2 +-
 linux-2.6.git-dave/arch/x86/mm/init_64.c            |    2 +-
 linux-2.6.git-dave/arch/x86/mm/numa_64.c            |    2 +-
 linux-2.6.git-dave/include/linux/mm.h               |    2 +-
 linux-2.6.git-dave/mm/page_alloc.c                  |    8 ++++----
 11 files changed, 18 insertions(+), 18 deletions(-)

diff -puN arch/mips/sgi-ip27/ip27-memory.c~make-free_bootmem_with_active_regions-take-pgdat arch/mips/sgi-ip27/ip27-memory.c
--- linux-2.6.git/arch/mips/sgi-ip27/ip27-memory.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/mips/sgi-ip27/ip27-memory.c	2008-12-09 10:16:08.000000000 -0800
@@ -412,7 +412,7 @@ static void __init node_mem_init(cnodeid
 
   	bootmap_size = init_bootmem_node(NODE_DATA(node), slot_freepfn,
 					start_pfn, end_pfn);
-	free_bootmem_with_active_regions(node, end_pfn);
+	free_bootmem_with_active_regions(NODE_DATA(node), end_pfn);
 	reserve_bootmem_node(NODE_DATA(node), slot_firstpfn << PAGE_SHIFT,
 		((slot_freepfn - slot_firstpfn) << PAGE_SHIFT) + bootmap_size,
 		BOOTMEM_DEFAULT);
diff -puN arch/powerpc/mm/mem.c~make-free_bootmem_with_active_regions-take-pgdat arch/powerpc/mm/mem.c
--- linux-2.6.git/arch/powerpc/mm/mem.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/mem.c	2008-12-09 10:16:08.000000000 -0800
@@ -212,7 +212,8 @@ void __init do_init_bootmem(void)
 	 * present.
 	 */
 #ifdef CONFIG_HIGHMEM
-	free_bootmem_with_active_regions(0, lowmem_end_addr >> PAGE_SHIFT);
+	free_bootmem_with_active_regions(NODE_DATA(0),
+					 lowmem_end_addr >> PAGE_SHIFT);
 
 	/* reserve the sections we're already using */
 	for (i = 0; i < lmb.reserved.cnt; i++) {
@@ -230,7 +231,7 @@ void __init do_init_bootmem(void)
 		}
 	}
 #else
-	free_bootmem_with_active_regions(0, max_pfn);
+	free_bootmem_with_active_regions(NODE_DATA(0), max_pfn);
 
 	/* reserve the sections we're already using */
 	for (i = 0; i < lmb.reserved.cnt; i++)
diff -puN arch/powerpc/mm/numa.c~make-free_bootmem_with_active_regions-take-pgdat arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:08.000000000 -0800
@@ -978,8 +978,6 @@ void do_init_bootmem_node(int nid)
 	init_bootmem_node(node, __pa(bootmem_vaddr) >> PAGE_SHIFT,
 			  start_pfn, end_pfn);
 
-	NODE_DATA(nid) = node;
-	/* this call needs NODE_DATA(), so initialize it above */
 	free_bootmem_with_active_regions(nid, end_pfn);
 	mark_reserved_regions_for_node(node);
 	/*
@@ -988,6 +986,7 @@ void do_init_bootmem_node(int nid)
 	 * careful_zallocation() depends on this getting set
 	 * now to tell from which nodes it must use bootmem.
 	 */
+	NODE_DATA(nid) = node;
 	sparse_memory_present_with_active_regions(nid);
 }
 
diff -puN arch/s390/kernel/setup.c~make-free_bootmem_with_active_regions-take-pgdat arch/s390/kernel/setup.c
--- linux-2.6.git/arch/s390/kernel/setup.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/s390/kernel/setup.c	2008-12-09 10:16:08.000000000 -0800
@@ -616,7 +616,7 @@ setup_memory(void)
 
 	psw_set_key(PAGE_DEFAULT_KEY);
 
-	free_bootmem_with_active_regions(0, max_pfn);
+	free_bootmem_with_active_regions(NODE_DATA(0), max_pfn);
 
 	/*
 	 * Reserve memory used for lowcore/command line/kernel image.
diff -puN arch/sh/mm/numa.c~make-free_bootmem_with_active_regions-take-pgdat arch/sh/mm/numa.c
--- linux-2.6.git/arch/sh/mm/numa.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/sh/mm/numa.c	2008-12-09 10:16:08.000000000 -0800
@@ -75,7 +75,7 @@ void __init setup_bootmem_node(int nid, 
 	bootmap_size = init_bootmem_node(NODE_DATA(nid), free_pfn, start_pfn,
 				    end_pfn);
 
-	free_bootmem_with_active_regions(nid, end_pfn);
+	free_bootmem_with_active_regions(NODE_DATA(nid), end_pfn);
 
 	/* Reserve the pgdat and bootmap space with the bootmem allocator */
 	reserve_bootmem_node(NODE_DATA(nid), start_pfn << PAGE_SHIFT,
diff -puN arch/sparc64/mm/init.c~make-free_bootmem_with_active_regions-take-pgdat arch/sparc64/mm/init.c
--- linux-2.6.git/arch/sparc64/mm/init.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/sparc64/mm/init.c	2008-12-09 10:16:08.000000000 -0800
@@ -1353,9 +1353,9 @@ static void __init bootmem_init_one_node
 		init_bootmem_node(p, paddr >> PAGE_SHIFT,
 				  p->node_start_pfn, end_pfn);
 
-		numadbg("  free_bootmem_with_active_regions(%d, %lx)\n",
-			nid, end_pfn);
-		free_bootmem_with_active_regions(nid, end_pfn);
+		numadbg("  free_bootmem_with_active_regions(%p, %lx)\n",
+			p, end_pfn);
+		free_bootmem_with_active_regions(p, end_pfn);
 
 		trim_reserved_in_node(nid);
 
diff -puN arch/x86/mm/init_32.c~make-free_bootmem_with_active_regions-take-pgdat arch/x86/mm/init_32.c
--- linux-2.6.git/arch/x86/mm/init_32.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/x86/mm/init_32.c	2008-12-09 10:16:08.000000000 -0800
@@ -768,7 +768,7 @@ void __init setup_bootmem_allocator(void
 	printk(KERN_INFO "  bootmap %08lx - %08lx\n",
 		 bootmap, bootmap + bootmap_size);
 	for_each_online_node(i)
-		free_bootmem_with_active_regions(i, max_low_pfn);
+		free_bootmem_with_active_regions(NODE_DATA(i), max_low_pfn);
 	early_res_to_bootmem(0, max_low_pfn<<PAGE_SHIFT);
 
 	after_init_bootmem = 1;
diff -puN arch/x86/mm/init_64.c~make-free_bootmem_with_active_regions-take-pgdat arch/x86/mm/init_64.c
--- linux-2.6.git/arch/x86/mm/init_64.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/x86/mm/init_64.c	2008-12-09 10:16:08.000000000 -0800
@@ -817,7 +817,7 @@ void __init initmem_init(unsigned long s
 	bootmap_size = init_bootmem_node(NODE_DATA(0), bootmap >> PAGE_SHIFT,
 					 0, end_pfn);
 	e820_register_active_regions(0, start_pfn, end_pfn);
-	free_bootmem_with_active_regions(0, end_pfn);
+	free_bootmem_with_active_regions(NODE_DATA(0), end_pfn);
 	early_res_to_bootmem(0, end_pfn<<PAGE_SHIFT);
 	reserve_bootmem(bootmap, bootmap_size, BOOTMEM_DEFAULT);
 }
diff -puN arch/x86/mm/numa_64.c~make-free_bootmem_with_active_regions-take-pgdat arch/x86/mm/numa_64.c
--- linux-2.6.git/arch/x86/mm/numa_64.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/arch/x86/mm/numa_64.c	2008-12-09 10:16:08.000000000 -0800
@@ -235,7 +235,7 @@ void __init setup_node_bootmem(int nodei
 		 bootmap_start, bootmap_start + bootmap_size - 1,
 		 bootmap_pages);
 
-	free_bootmem_with_active_regions(nodeid, end);
+	free_bootmem_with_active_regions(NODE_DATA(nodeid), end);
 
 	/*
 	 * convert early reserve to bootmem reserve earlier
diff -puN include/linux/mm.h~make-free_bootmem_with_active_regions-take-pgdat include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/include/linux/mm.h	2008-12-09 10:16:08.000000000 -0800
@@ -1023,7 +1023,7 @@ extern unsigned long absent_pages_in_ran
 extern void get_pfn_range_for_nid(unsigned int nid,
 			unsigned long *start_pfn, unsigned long *end_pfn);
 extern unsigned long find_min_pfn_with_active_regions(void);
-extern void free_bootmem_with_active_regions(int nid,
+extern void free_bootmem_with_active_regions(struct pglist_data *node,
 						unsigned long max_low_pfn);
 typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
 extern void work_with_active_regions(int nid, work_fn_t work_fn, void *data);
diff -puN mm/page_alloc.c~make-free_bootmem_with_active_regions-take-pgdat mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~make-free_bootmem_with_active_regions-take-pgdat	2008-12-09 10:16:08.000000000 -0800
+++ linux-2.6.git-dave/mm/page_alloc.c	2008-12-09 10:16:08.000000000 -0800
@@ -2997,16 +2997,17 @@ int __meminit early_pfn_to_nid(unsigned 
 
 /**
  * free_bootmem_with_active_regions - Call free_bootmem_node for each active range
- * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed.
+ * @node: The node on which to free memory.
  * @max_low_pfn: The highest PFN that will be passed to free_bootmem_node
  *
  * If an architecture guarantees that all ranges registered with
  * add_active_ranges() contain no holes and may be freed, this
  * this function may be used instead of calling free_bootmem() manually.
  */
-void __init free_bootmem_with_active_regions(int nid,
+void __init free_bootmem_with_active_regions(struct pglist_data *node,
 						unsigned long max_low_pfn)
 {
+	int nid = node->node_id;
 	int i;
 
 	for_each_active_range_index_in_nid(i, nid) {
@@ -3020,8 +3021,7 @@ void __init free_bootmem_with_active_reg
 			end_pfn = max_low_pfn;
 
 		size_pages = end_pfn - early_node_map[i].start_pfn;
-		free_bootmem_node(NODE_DATA(early_node_map[i].nid),
-				PFN_PHYS(early_node_map[i].start_pfn),
+		free_bootmem_node(node,	PFN_PHYS(early_node_map[i].start_pfn),
 				size_pages << PAGE_SHIFT);
 	}
 }
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/8] cleanup do_init_bootmem()
  2008-12-09 18:21 ` [PATCH 6/8] cleanup do_init_bootmem() Dave Hansen
@ 2008-12-09 21:54   ` Serge E. Hallyn
  2008-12-16  5:06   ` Paul Mackerras
  1 sibling, 0 replies; 15+ messages in thread
From: Serge E. Hallyn @ 2008-12-09 21:54 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Jon Tollefson, Mel Gorman, linuxppc-dev, paulus

Quoting Dave Hansen (dave@linux.vnet.ibm.com):
> 
> I'm debating whether this is worth it. It makes this a bit more clean
> looking, but doesn't seriously enhance readability.  But, I do think
> it helps a bit.
> 
> Thoughts?

Absolutely.  do_init_bootmem_node() is *still* a bit largish,
but far better broken out.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/8] fix bootmem reservation on uninitialized node
  2008-12-09 18:21 ` [PATCH 1/8] fix bootmem reservation on uninitialized node Dave Hansen
@ 2008-12-10 22:14   ` Paul Mackerras
  2008-12-10 22:30     ` Jon Tollefson
  2008-12-10 22:54     ` Dave Hansen
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Mackerras @ 2008-12-10 22:14 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Jon Tollefson, Mel Gorman, linuxppc-dev, Serge E. Hallyn

Dave Hansen writes:

> This patch ensures that we do not touch bootmem for any node which
> has not been initialized.
> 
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>

So, should I be sending this to Linus for 2.6.28?

I notice you have added a dbg() call.  For a 2.6.28 patch I'd somewhat
prefer not to have that in unless necessary.

Jon, does this patch fix the problem on your machine with 16G pages?

Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/8] fix bootmem reservation on uninitialized node
  2008-12-10 22:14   ` Paul Mackerras
@ 2008-12-10 22:30     ` Jon Tollefson
  2008-12-10 22:54     ` Dave Hansen
  1 sibling, 0 replies; 15+ messages in thread
From: Jon Tollefson @ 2008-12-10 22:30 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Jon Tollefson, Mel Gorman, Dave Hansen, linuxppc-dev,
	Serge E. Hallyn

Paul Mackerras wrote:
> Dave Hansen writes:
>
>   
>> This patch ensures that we do not touch bootmem for any node which
>> has not been initialized.
>>
>> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
>>     
>
> So, should I be sending this to Linus for 2.6.28?
>
> I notice you have added a dbg() call.  For a 2.6.28 patch I'd somewhat
> prefer not to have that in unless necessary.
>
> Jon, does this patch fix the problem on your machine with 16G pages?
>   
It worked on a machine with one page, I am awaiting access to another 
with more pages.

> Paul.
>   
Jon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/8] fix bootmem reservation on uninitialized node
  2008-12-10 22:14   ` Paul Mackerras
  2008-12-10 22:30     ` Jon Tollefson
@ 2008-12-10 22:54     ` Dave Hansen
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2008-12-10 22:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Jon Tollefson, Mel Gorman, linuxppc-dev, Serge E. Hallyn

On Thu, 2008-12-11 at 09:14 +1100, Paul Mackerras wrote:
> Dave Hansen writes:
> > This patch ensures that we do not touch bootmem for any node which
> > has not been initialized.
> > 
> > Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> 
> So, should I be sending this to Linus for 2.6.28?

Yes, this is 2.6.28 material.

> I notice you have added a dbg() call.  For a 2.6.28 patch I'd somewhat
> prefer not to have that in unless necessary.

It isn't necessary and probably snuck in from the others.  I'll respin
and send this one out again.  If you want the others, I'll send
separately.

-- Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/8] cleanup do_init_bootmem()
  2008-12-09 18:21 ` [PATCH 6/8] cleanup do_init_bootmem() Dave Hansen
  2008-12-09 21:54   ` Serge E. Hallyn
@ 2008-12-16  5:06   ` Paul Mackerras
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Mackerras @ 2008-12-16  5:06 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Jon Tollefson, Mel Gorman, linuxppc-dev, Serge E. Hallyn

Dave Hansen writes:

> I'm debating whether this is worth it. It makes this a bit more clean
> looking, but doesn't seriously enhance readability.  But, I do think
> it helps a bit.

I get this when compiling a pseries config (with the patches up to
this point applied but not 7/8 or 8/8):

  CC      arch/powerpc/mm/numa.o
/home/paulus/kernel/powerpc/arch/powerpc/mm/numa.c: In function 'do_init_bootmem_node':
/home/paulus/kernel/powerpc/arch/powerpc/mm/numa.c:949: error: 'nid' undeclared (first use in this function)
/home/paulus/kernel/powerpc/arch/powerpc/mm/numa.c:949: error: (Each undeclared identifier is reported only once
/home/paulus/kernel/powerpc/arch/powerpc/mm/numa.c:949: error: for each function it appears in.)
make[2]: *** [arch/powerpc/mm/numa.o] Error 1

Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/8] less use of NODE_DATA()
  2008-12-09 18:21 ` [PATCH 7/8] less use of NODE_DATA() Dave Hansen
@ 2008-12-16  5:16   ` Paul Mackerras
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Mackerras @ 2008-12-16  5:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Jon Tollefson, Mel Gorman, linuxppc-dev, Serge E. Hallyn

Dave Hansen writes:

> The use of NODE_DATA() in the ppc init code is fragile.  We use
> it for some nodes as we are initializing others.  As the loop
> initializing them has gotten more complex and broken out into
> several functions it gets harder and harder to remember how
> this goes.

With this one applied (it needed some adjustment because of
differences in dbg() lines), I get a hang on boot on a POWER6
partition after printing:

Linux version 2.6.28-rc8-test (paulus@drongo) (gcc version 4.3.2 (Debian 4.3.2-1) ) #64 SMP Tue Dec 16 16:14:45 EST 2008
[boot]0012 Setup Arch
Node 0 Memory:
Node 1 Memory: 0x0-0x100000000

It appears that this partition has all its memory on node 1.  It did
boot OK with your patches up to 5/8 applied.

Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-12-16  5:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-09 18:21 [PATCH 0/8] Fix a bug and cleanup NUMA boot-time code Dave Hansen
2008-12-09 18:21 ` [PATCH 1/8] fix bootmem reservation on uninitialized node Dave Hansen
2008-12-10 22:14   ` Paul Mackerras
2008-12-10 22:30     ` Jon Tollefson
2008-12-10 22:54     ` Dave Hansen
2008-12-09 18:21 ` [PATCH 2/8] Add better comment on careful_allocation() Dave Hansen
2008-12-09 18:21 ` [PATCH 3/8] cleanup careful_allocation(): bootmem already panics Dave Hansen
2008-12-09 18:21 ` [PATCH 4/8] make careful_allocation() return vaddrs Dave Hansen
2008-12-09 18:21 ` [PATCH 5/8] cleanup careful_allocation(): consolidate memset() Dave Hansen
2008-12-09 18:21 ` [PATCH 6/8] cleanup do_init_bootmem() Dave Hansen
2008-12-09 21:54   ` Serge E. Hallyn
2008-12-16  5:06   ` Paul Mackerras
2008-12-09 18:21 ` [PATCH 7/8] less use of NODE_DATA() Dave Hansen
2008-12-16  5:16   ` Paul Mackerras
2008-12-09 18:21 ` [PATCH 8/8] make free_bootmem_with_active_regions() take pgdat Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).