linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
@ 2006-05-09  7:03 Michael Ellerman
  2006-05-09  8:32 ` Andy Whitcroft
  2006-05-09 13:34 ` Andy Whitcroft
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Ellerman @ 2006-05-09  7:03 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-kernel, haveblue, kravetz

I can't believe I'm the first person to see this, so I imagine I'm missing
something. Perhaps it's only an issue on powerpc?

I have a machine with some memory at 0, then a hole, and then some more memory
which doesn't start on a section boundary. This is causing the following
crash:

add_region nid 1 start_pfn 0x77c0 pages 0x840
add_region nid 1 start_pfn 0x0 pages 0x6000

...

Unable to handle kernel paging request for data at address 0x00002430
Faulting instruction address: 0xc0000000004f2940
cpu 0x4: Vector: 300 (Data Access) at [c000000000737aa0]
    pc: c0000000004f2940: .__alloc_bootmem_node+0x28/0x7c
    lr: c0000000000a47a0: .sparse_init+0xa8/0x138
    sp: c000000000737d20
   msr: 8000000000001032
   dar: 2430
 dsisr: 40000000
  current = 0xc000000000538410
  paca    = 0xc000000000539780
    pid   = 0, comm = swapper
enter ? for help
4:mon> r
R00 = c0000000000a47a0   R16 = 0000000005ff5000
R01 = c000000000737d20   R17 = 0000000000000004
R02 = c0000000007331e0   R18 = 00000000100d0000
R03 = 0000000000000000   R19 = 00000000100b0000
R04 = 0000000000038000   R20 = 00000000100d0000
R05 = 0000000000000080   R21 = 0000000010070000

The root cause is that we have no memory at pfn 7000 and so early_pfn_to_nid()
is giving us back -1 in sparse_early_mem_map_alloc(). We then pass -1 to
NODE_DATA() which gets us NULL, and hence __alloc_bootmem_node() explodes.

AFAICT there's no logic to prevent us creating sections with no zeroth page,
and in fact my box is doing it. Therefore it's not valid to assume we can
get the nid from the zeroth page in a section. All we know is that there's
one or more pages in that section for which early_pfn_to_nid() will work.

So I came up with this hack. Loop through all pages in the section until
we get a valid nid, this should always work.

We also call early_pfn_to_nid() in node_memmap_size_bytes(), but I didn't
touch that because it's not used on powerpc so I can't test it.

With this patch my machine boots and seems to be happy.

cheers

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---

 mm/sparse.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Index: to-merge/mm/sparse.c
===================================================================
--- to-merge.orig/mm/sparse.c
+++ to-merge/mm/sparse.c
@@ -172,10 +172,26 @@ static int sparse_init_one_section(struc
 	return 1;
 }
 
+static int sparse_section_nr_to_nid(unsigned long pnum)
+{
+	unsigned long pfn = section_nr_to_pfn(pnum);
+	int i, nid;
+
+	for (i = 0; i < PAGES_PER_SECTION; i++) {
+		nid = early_pfn_to_nid(pfn + i);
+		if (nid != -1)
+			break;
+	}
+
+	BUG_ON(nid == -1);
+
+	return nid;
+}
+
 static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
-	int nid = early_pfn_to_nid(section_nr_to_pfn(pnum));
+	int nid = sparse_section_nr_to_nid(pnum);
 	struct mem_section *ms = __nr_to_section(pnum);
 
 	map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09  7:03 [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? Michael Ellerman
@ 2006-05-09  8:32 ` Andy Whitcroft
  2006-05-09  9:11   ` Michael Ellerman
  2006-05-09 13:34 ` Andy Whitcroft
  1 sibling, 1 reply; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-09  8:32 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Andrew Morton, linux-kernel, haveblue, kravetz

Michael Ellerman wrote:
> I can't believe I'm the first person to see this, so I imagine I'm missing
> something. Perhaps it's only an issue on powerpc?
> 
> I have a machine with some memory at 0, then a hole, and then some more memory
> which doesn't start on a section boundary. This is causing the following
> crash:
> 
> add_region nid 1 start_pfn 0x77c0 pages 0x840
> add_region nid 1 start_pfn 0x0 pages 0x6000

Nasty, could you send me your full boot log and your config and I'll
have a look at it.  I can say this code has been booted on a lot of
power boxes and I've never seen that before!  :)

Anyhow, will take a look and see if we can avoid iterating over the area
to find it.

-apw

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09  8:32 ` Andy Whitcroft
@ 2006-05-09  9:11   ` Michael Ellerman
  2006-05-09 10:24     ` Andy Whitcroft
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Ellerman @ 2006-05-09  9:11 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-kernel, haveblue, kravetz

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

On Tue, 2006-05-09 at 09:32 +0100, Andy Whitcroft wrote:
> Michael Ellerman wrote:
> > I can't believe I'm the first person to see this, so I imagine I'm missing
> > something. Perhaps it's only an issue on powerpc?
> > 
> > I have a machine with some memory at 0, then a hole, and then some more memory
> > which doesn't start on a section boundary. This is causing the following
> > crash:
> > 
> > add_region nid 1 start_pfn 0x77c0 pages 0x840
> > add_region nid 1 start_pfn 0x0 pages 0x6000
> 
> Nasty, could you send me your full boot log and your config and I'll
> have a look at it.  I can say this code has been booted on a lot of
> power boxes and I've never seen that before!  :)

Ah yeah, I seem to have neglected to mention it's a kdump boot :} Sorry.
That's why we get the strangely aligned memory sections. The section
starting at 0 is for the kdump kernel, the bit at 0x77c0 covers some
firmware that's allocated there by the first kernel.

I can still give you a .config and log if you want it, but not 'til
tomorrow morning.

> Anyhow, will take a look and see if we can avoid iterating over the area
> to find it.

Yeah it's a bit nasty having the loop. I figure on most configs it'll
pop on the first iteration, but still.

cheers

-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09  9:11   ` Michael Ellerman
@ 2006-05-09 10:24     ` Andy Whitcroft
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-09 10:24 UTC (permalink / raw)
  To: michael; +Cc: Andrew Morton, linux-kernel, haveblue, kravetz

Michael Ellerman wrote:
> On Tue, 2006-05-09 at 09:32 +0100, Andy Whitcroft wrote:
> 
>>Michael Ellerman wrote:
>>
>>>I can't believe I'm the first person to see this, so I imagine I'm missing
>>>something. Perhaps it's only an issue on powerpc?
>>>
>>>I have a machine with some memory at 0, then a hole, and then some more memory
>>>which doesn't start on a section boundary. This is causing the following
>>>crash:
>>>
>>>add_region nid 1 start_pfn 0x77c0 pages 0x840
>>>add_region nid 1 start_pfn 0x0 pages 0x6000
>>
>>Nasty, could you send me your full boot log and your config and I'll
>>have a look at it.  I can say this code has been booted on a lot of
>>power boxes and I've never seen that before!  :)
> 
> 
> Ah yeah, I seem to have neglected to mention it's a kdump boot :} Sorry.
> That's why we get the strangely aligned memory sections. The section
> starting at 0 is for the kdump kernel, the bit at 0x77c0 covers some
> firmware that's allocated there by the first kernel.
> 
> I can still give you a .config and log if you want it, but not 'til
> tomorrow morning.

Heh.  Well at least that makes more sense.  It would be good to have
them generally.  But no I'm not scared of where this odd alignment is
coming from I can look at what we can do about it.

Thanks.

-apw

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09  7:03 [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? Michael Ellerman
  2006-05-09  8:32 ` Andy Whitcroft
@ 2006-05-09 13:34 ` Andy Whitcroft
  2006-05-09 14:28   ` Andy Whitcroft
  2006-05-09 16:05   ` [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? mike kravetz
  1 sibling, 2 replies; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-09 13:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Andrew Morton, linux-kernel, haveblue, kravetz

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

Michael Ellerman wrote:
> I can't believe I'm the first person to see this, so I imagine I'm missing
> something. Perhaps it's only an issue on powerpc?
> 
> I have a machine with some memory at 0, then a hole, and then some more memory
> which doesn't start on a section boundary. This is causing the following
> crash:
> 
> add_region nid 1 start_pfn 0x77c0 pages 0x840
> add_region nid 1 start_pfn 0x0 pages 0x6000
> 
> ...
> 
> Unable to handle kernel paging request for data at address 0x00002430
> Faulting instruction address: 0xc0000000004f2940
> cpu 0x4: Vector: 300 (Data Access) at [c000000000737aa0]
>     pc: c0000000004f2940: .__alloc_bootmem_node+0x28/0x7c
>     lr: c0000000000a47a0: .sparse_init+0xa8/0x138
>     sp: c000000000737d20
>    msr: 8000000000001032
>    dar: 2430
>  dsisr: 40000000
>   current = 0xc000000000538410
>   paca    = 0xc000000000539780
>     pid   = 0, comm = swapper
> enter ? for help
> 4:mon> r
> R00 = c0000000000a47a0   R16 = 0000000005ff5000
> R01 = c000000000737d20   R17 = 0000000000000004
> R02 = c0000000007331e0   R18 = 00000000100d0000
> R03 = 0000000000000000   R19 = 00000000100b0000
> R04 = 0000000000038000   R20 = 00000000100d0000
> R05 = 0000000000000080   R21 = 0000000010070000
> 
> The root cause is that we have no memory at pfn 7000 and so early_pfn_to_nid()
> is giving us back -1 in sparse_early_mem_map_alloc(). We then pass -1 to
> NODE_DATA() which gets us NULL, and hence __alloc_bootmem_node() explodes.
> 
> AFAICT there's no logic to prevent us creating sections with no zeroth page,
> and in fact my box is doing it. Therefore it's not valid to assume we can
> get the nid from the zeroth page in a section. All we know is that there's
> one or more pages in that section for which early_pfn_to_nid() will work.
> 
> So I came up with this hack. Loop through all pages in the section until
> we get a valid nid, this should always work.
> 
> We also call early_pfn_to_nid() in node_memmap_size_bytes(), but I didn't
> touch that because it's not used on powerpc so I can't test it.
> 
> With this patch my machine boots and seems to be happy.

We see to have a few options:

1) default the nid to something -- right now most of the non powerpc
architectures return 0 when the nid cannot be found thus avoiding this
panic.

2) nid search -- we can do pretty much what is in this patch probabally
with some optimisations like trying the first and last pfn in the
section first.

3) record the nid -- when we record the memory present in the system we
are passed the nid.

Somehow the last of these seems the most logical given we have the
correct information at the time we record that we need to instantiate
the section.  So I had a quick go at something which seems to have come
out pretty clean.  Attached is a completly untested patch to show what I
am proposing.

-apw

[-- Attachment #2: sparsemem-record-nid-during-memory-present --]
[-- Type: text/plain, Size: 1764 bytes --]

sparsemem record nid during memory present

Record the node id as we mark sections for instantiation.  Use this
nid during instantiation to direct allocations.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 sparse.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)
diff -upN reference/mm/sparse.c current/mm/sparse.c
--- reference/mm/sparse.c
+++ current/mm/sparse.c
@@ -102,6 +102,20 @@ int __section_nr(struct mem_section* ms)
 	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
 }
 
+/*
+ * During early boot we need to record the nid from which we will
+ * later allocate the section mem_map.  Encode this into the section
+ * pointer.  Overload the section_mem_map with this information.
+ */
+static inline unsigned long sparse_encode_early_nid(int nid)
+	return (nid << SECTION_MAP_LAST_BIT);
+}
+
+static inline int sparse_early_nid(struct mem_section *section)
+	unsigned long nid = section->section_mem_map;
+	return (nid >> SECTION_MAP_LAST_BIT);
+}
+
 /* Record a memory area against a node. */
 void memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -116,7 +130,8 @@ void memory_present(int nid, unsigned lo
 
 		ms = __nr_to_section(section);
 		if (!ms->section_mem_map)
-			ms->section_mem_map = SECTION_MARKED_PRESENT;
+			ms->section_mem_map = sparse_encode_early_nid(nid) |
+							SECTION_MARKED_PRESENT;
 	}
 }
 
@@ -175,8 +190,8 @@ static int sparse_init_one_section(struc
 static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
-	int nid = early_pfn_to_nid(section_nr_to_pfn(pnum));
 	struct mem_section *ms = __nr_to_section(pnum);
+	int nid = sparse_early_nid(ms);
 
 	map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
 	if (map)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09 13:34 ` Andy Whitcroft
@ 2006-05-09 14:28   ` Andy Whitcroft
  2006-05-09 16:26     ` Dave Hansen
  2006-05-10  6:06     ` Michael Ellerman
  2006-05-09 16:05   ` [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? mike kravetz
  1 sibling, 2 replies; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-09 14:28 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Michael Ellerman, Andrew Morton, linux-kernel, haveblue, kravetz

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

Andy Whitcroft wrote:

> We see to have a few options:
> 
> 1) default the nid to something -- right now most of the non powerpc
> architectures return 0 when the nid cannot be found thus avoiding this
> panic.
> 
> 2) nid search -- we can do pretty much what is in this patch probabally
> with some optimisations like trying the first and last pfn in the
> section first.
> 
> 3) record the nid -- when we record the memory present in the system we
> are passed the nid.
> 
> Somehow the last of these seems the most logical given we have the
> correct information at the time we record that we need to instantiate
> the section.  So I had a quick go at something which seems to have come
> out pretty clean.  Attached is a completly untested patch to show what I
> am proposing.

Ok.  Attached is a version which builds and boots.  The patch looks
pretty simple.  Michael could you give it a spin on the broken machine
for me.

-apw

[-- Attachment #2: sparsemem-record-nid-during-memory-present --]
[-- Type: text/plain, Size: 2002 bytes --]

sparsemem record nid during memory present

Record the node id as we mark sections for instantiation.  Use this
nid during instantiation to direct allocations.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 sparse.c |   22 ++++++++++++++++++++--
 1 files changed, 20 insertions(+), 2 deletions(-)
diff -upN reference/mm/sparse.c current/mm/sparse.c
--- reference/mm/sparse.c
+++ current/mm/sparse.c
@@ -102,6 +102,22 @@ int __section_nr(struct mem_section* ms)
 	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
 }
 
+/*
+ * During early boot we need to record the nid from which we will
+ * later allocate the section mem_map.  Encode this into the section
+ * pointer.  Overload the section_mem_map with this information.
+ */
+static inline unsigned long sparse_encode_early_nid(int nid)
+{
+	return (nid << SECTION_MAP_LAST_BIT);
+}
+
+static inline int sparse_early_nid(struct mem_section *section)
+{
+	unsigned long nid = section->section_mem_map;
+	return (nid >> SECTION_MAP_LAST_BIT);
+}
+
 /* Record a memory area against a node. */
 void memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -116,7 +132,8 @@ void memory_present(int nid, unsigned lo
 
 		ms = __nr_to_section(section);
 		if (!ms->section_mem_map)
-			ms->section_mem_map = SECTION_MARKED_PRESENT;
+			ms->section_mem_map = sparse_encode_early_nid(nid) |
+							SECTION_MARKED_PRESENT;
 	}
 }
 
@@ -167,6 +184,7 @@ static int sparse_init_one_section(struc
 	if (!valid_section(ms))
 		return -EINVAL;
 
+	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum);
 
 	return 1;
@@ -175,8 +193,8 @@ static int sparse_init_one_section(struc
 static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
-	int nid = early_pfn_to_nid(section_nr_to_pfn(pnum));
 	struct mem_section *ms = __nr_to_section(pnum);
+	int nid = sparse_early_nid(ms);
 
 	map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
 	if (map)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09 13:34 ` Andy Whitcroft
  2006-05-09 14:28   ` Andy Whitcroft
@ 2006-05-09 16:05   ` mike kravetz
  2006-05-09 16:14     ` Andy Whitcroft
  1 sibling, 1 reply; 12+ messages in thread
From: mike kravetz @ 2006-05-09 16:05 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Michael Ellerman, Andrew Morton, linux-kernel, haveblue

On Tue, May 09, 2006 at 02:34:51PM +0100, Andy Whitcroft wrote:
> 3) record the nid -- when we record the memory present in the system we
> are passed the nid.
> 
> Somehow the last of these seems the most logical given we have the
> correct information at the time we record that we need to instantiate
> the section.  So I had a quick go at something which seems to have come
> out pretty clean.  Attached is a completly untested patch to show what I
> am proposing.

Looks sane to me.  I've always wanted to encode the nid in the section.
But, never had a compelling reason to do so.

With this code in place, we could optimize the pfn_to_nid() routines to
now obtain the nid from the section (rather than page struct).  However,
I'm not sure this is worth the effort.

-- 
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09 16:05   ` [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? mike kravetz
@ 2006-05-09 16:14     ` Andy Whitcroft
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-09 16:14 UTC (permalink / raw)
  To: mike kravetz; +Cc: Michael Ellerman, Andrew Morton, linux-kernel, haveblue

mike kravetz wrote:
> On Tue, May 09, 2006 at 02:34:51PM +0100, Andy Whitcroft wrote:
> 
>>3) record the nid -- when we record the memory present in the system we
>>are passed the nid.
>>
>>Somehow the last of these seems the most logical given we have the
>>correct information at the time we record that we need to instantiate
>>the section.  So I had a quick go at something which seems to have come
>>out pretty clean.  Attached is a completly untested patch to show what I
>>am proposing.
> 
> 
> Looks sane to me.  I've always wanted to encode the nid in the section.
> But, never had a compelling reason to do so.
> 
> With this code in place, we could optimize the pfn_to_nid() routines to
> now obtain the nid from the section (rather than page struct).  However,
> I'm not sure this is worth the effort.

Well its only in there temporarily during init, its not in there once we
have allocated the section mem_map.

-apw

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09 14:28   ` Andy Whitcroft
@ 2006-05-09 16:26     ` Dave Hansen
  2006-05-09 16:31       ` Andy Whitcroft
  2006-05-10  6:06     ` Michael Ellerman
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2006-05-09 16:26 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Michael Ellerman, Andrew Morton, linux-kernel, kravetz

On Tue, 2006-05-09 at 15:28 +0100, Andy Whitcroft wrote:		
> +/*
> + * During early boot we need to record the nid from which we will
> + * later allocate the section mem_map.  Encode this into the section
> + * pointer.  Overload the section_mem_map with this information.
> + */ 

Andy, this all looks pretty good.  Although, it might be nice to
document this a bit more. 

First, can you update the mem_section definition comment?  It has a nice
explanation of how we use section_mem_map, and it would be a shame to
miss this use.

Also, your comment says when we _record_ the nid information, but not
that it is only _used_ during early boot.  I think this is what Mike K.
missed, and it might be good to clarify.

How about something like this:

/*
 * During early boot, before section_mem_map is used for an actual
 * mem_map, we use section_mem_map to store the section's NUMA
 * node.  This keeps us from having to use another data structure.  The
 * node information is cleared just before we store the real mem_map.
 */

-- Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09 16:26     ` Dave Hansen
@ 2006-05-09 16:31       ` Andy Whitcroft
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-09 16:31 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Michael Ellerman, Andrew Morton, linux-kernel, kravetz

Dave Hansen wrote:
> On Tue, 2006-05-09 at 15:28 +0100, Andy Whitcroft wrote:		
> 
>>+/*
>>+ * During early boot we need to record the nid from which we will
>>+ * later allocate the section mem_map.  Encode this into the section
>>+ * pointer.  Overload the section_mem_map with this information.
>>+ */ 
> 
> 
> Andy, this all looks pretty good.  Although, it might be nice to
> document this a bit more. 
> 
> First, can you update the mem_section definition comment?  It has a nice
> explanation of how we use section_mem_map, and it would be a shame to
> miss this use.
> 
> Also, your comment says when we _record_ the nid information, but not
> that it is only _used_ during early boot.  I think this is what Mike K.
> missed, and it might be good to clarify.
> 
> How about something like this:
> 
> /*
>  * During early boot, before section_mem_map is used for an actual
>  * mem_map, we use section_mem_map to store the section's NUMA
>  * node.  This keeps us from having to use another data structure.  The
>  * node information is cleared just before we store the real mem_map.
>  */

Yep sounds very sane.  Will update and resend -- best wait and see if it
fixes Michael find its works for him too :).

Thanks.

-apw

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions?
  2006-05-09 14:28   ` Andy Whitcroft
  2006-05-09 16:26     ` Dave Hansen
@ 2006-05-10  6:06     ` Michael Ellerman
  2006-05-10 10:51       ` [PATCH] sparsemem record nid during memory present Andy Whitcroft
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Ellerman @ 2006-05-10  6:06 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-kernel, haveblue, kravetz

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

On Tue, 2006-05-09 at 15:28 +0100, Andy Whitcroft wrote:
> Andy Whitcroft wrote:
> > 3) record the nid -- when we record the memory present in the system we
> > are passed the nid.
> > 
> > Somehow the last of these seems the most logical given we have the
> > correct information at the time we record that we need to instantiate
> > the section.  So I had a quick go at something which seems to have come
> > out pretty clean.  Attached is a completly untested patch to show what I
> > am proposing.
> 
> Ok.  Attached is a version which builds and boots.  The patch looks
> pretty simple.  Michael could you give it a spin on the broken machine
> for me.

Hey, it's not broken, it's just special :D

That works a treat, boots and otherwise seems fine.

> Signed-off-by: Andy Whitcroft <apw@shadowen.org>

> +/*
> + * During early boot we need to record the nid from which we will
> + * later allocate the section mem_map.  Encode this into the section
> + * pointer.  Overload the section_mem_map with this information.
> + */
> +static inline unsigned long sparse_encode_early_nid(int nid)
> +{
> +	return (nid << SECTION_MAP_LAST_BIT);
> +}
> +
> +static inline int sparse_early_nid(struct mem_section *section)
> +{
> +	unsigned long nid = section->section_mem_map;
> +	return (nid >> SECTION_MAP_LAST_BIT);
> +}

What about just for readability: (in linux/mmzone.h)

#define SECTION_MARKED_PRESENT  (1UL<<0)
#define SECTION_HAS_MEM_MAP     (1UL<<1)
#define SECTION_MAP_LAST_BIT    (1UL<<2)
#define SECTION_NID_SHIFT       SECTION_MAP_LAST_BIT

cheers

-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] sparsemem record nid during memory present
  2006-05-10  6:06     ` Michael Ellerman
@ 2006-05-10 10:51       ` Andy Whitcroft
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Whitcroft @ 2006-05-10 10:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andy Whitcroft, Andrew Morton, linux-kernel, haveblue, kravetz

Ok, here is the updated version.  Better commentry on what we are doing
and how long the data is kept (suggested by Dave Hansen).  Also added a 
SECTION_NID_SHIFT to better describe its use.  This showed a small
buglett as we were shifting by 4 (1<<2) instead of 2; now fixed.

-apw
=== 8< ===
sparsemem record nid during memory present

Record the node id as we mark sections for instantiation.  Use this
nid during instantiation to direct allocations.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 include/linux/mmzone.h |    5 +++++
 mm/sparse.c            |   22 ++++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)
diff -upN reference/include/linux/mmzone.h current/include/linux/mmzone.h
--- reference/include/linux/mmzone.h
+++ current/include/linux/mmzone.h
@@ -508,6 +508,10 @@ struct mem_section {
 	 * pages.  However, it is stored with some other magic.
 	 * (see sparse.c::sparse_init_one_section())
 	 *
+	 * Additionally during early boot we encode node id of
+	 * the location of the section here to guide allocation.
+	 * (see sparse.c::memory_present())
+	 *
 	 * Making it a UL at least makes someone do a cast
 	 * before using it wrong.
 	 */
@@ -547,6 +551,7 @@ extern int __section_nr(struct mem_secti
 #define SECTION_HAS_MEM_MAP	(1UL<<1)
 #define SECTION_MAP_LAST_BIT	(1UL<<2)
 #define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
+#define SECTION_NID_SHIFT	2
 
 static inline struct page *__section_mem_map_addr(struct mem_section *section)
 {
diff -upN reference/mm/sparse.c current/mm/sparse.c
--- reference/mm/sparse.c
+++ current/mm/sparse.c
@@ -102,6 +102,22 @@ int __section_nr(struct mem_section* ms)
 	return (root_nr * SECTIONS_PER_ROOT) + (ms - root);
 }
 
+/*
+ * During early boot, before section_mem_map is used for an actual
+ * mem_map, we use section_mem_map to store the section's NUMA
+ * node.  This keeps us from having to use another data structure.  The
+ * node information is cleared just before we store the real mem_map.
+ */
+static inline unsigned long sparse_encode_early_nid(int nid)
+{
+	return (nid << SECTION_NID_SHIFT);
+}
+
+static inline int sparse_early_nid(struct mem_section *section)
+{
+	return (section->section_mem_map >> SECTION_NID_SHIFT);
+}
+
 /* Record a memory area against a node. */
 void memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -116,7 +132,8 @@ void memory_present(int nid, unsigned lo
 
 		ms = __nr_to_section(section);
 		if (!ms->section_mem_map)
-			ms->section_mem_map = SECTION_MARKED_PRESENT;
+			ms->section_mem_map = sparse_encode_early_nid(nid) |
+							SECTION_MARKED_PRESENT;
 	}
 }
 
@@ -167,6 +184,7 @@ static int sparse_init_one_section(struc
 	if (!valid_section(ms))
 		return -EINVAL;
 
+	ms->section_mem_map &= ~SECTION_MAP_MASK;
 	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum);
 
 	return 1;
@@ -175,8 +193,8 @@ static int sparse_init_one_section(struc
 static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
-	int nid = early_pfn_to_nid(section_nr_to_pfn(pnum));
 	struct mem_section *ms = __nr_to_section(pnum);
+	int nid = sparse_early_nid(ms);
 
 	map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
 	if (map)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-05-10 10:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-09  7:03 [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? Michael Ellerman
2006-05-09  8:32 ` Andy Whitcroft
2006-05-09  9:11   ` Michael Ellerman
2006-05-09 10:24     ` Andy Whitcroft
2006-05-09 13:34 ` Andy Whitcroft
2006-05-09 14:28   ` Andy Whitcroft
2006-05-09 16:26     ` Dave Hansen
2006-05-09 16:31       ` Andy Whitcroft
2006-05-10  6:06     ` Michael Ellerman
2006-05-10 10:51       ` [PATCH] sparsemem record nid during memory present Andy Whitcroft
2006-05-09 16:05   ` [PATCH] SPARSEMEM + NUMA can't handle unaligned memory regions? mike kravetz
2006-05-09 16:14     ` Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).