* [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
@ 2007-07-20 6:03 Paul Mundt
2007-07-20 7:21 ` KAMEZAWA Hiroyuki
2007-07-20 11:20 ` Mel Gorman
0 siblings, 2 replies; 8+ messages in thread
From: Paul Mundt @ 2007-07-20 6:03 UTC (permalink / raw)
To: Mel Gorman; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
zone_movable_pfn is presently marked as __initdata and referenced
from adjust_zone_range_for_zone_movable(), which in turn is
referenced by zone_spanned_pages_in_node(). Both of these are
__meminit annotated. When memory hotplug is enabled, this will oops
on a hot-add, due to zone_movable_pfn having been freed.
__meminitdata annotation gives the desired behaviour.
This will only impact platforms that enable both memory hotplug
and ARCH_POPULATES_NODE_MAP.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
--
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 43cb3b3..40954fb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -138,7 +138,7 @@ static unsigned long __meminitdata dma_reserve;
#endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */
unsigned long __initdata required_kernelcore;
unsigned long __initdata required_movablecore;
- unsigned long __initdata zone_movable_pfn[MAX_NUMNODES];
+ unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 6:03 [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes Paul Mundt
@ 2007-07-20 7:21 ` KAMEZAWA Hiroyuki
2007-07-20 7:30 ` Paul Mundt
2007-07-20 11:20 ` Mel Gorman
1 sibling, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-20 7:21 UTC (permalink / raw)
To: Paul Mundt; +Cc: Mel Gorman, Linus Torvalds, Andrew Morton, linux-kernel
On Fri, 20 Jul 2007 15:03:42 +0900
Paul Mundt <lethal@linux-sh.org> wrote:
> zone_movable_pfn is presently marked as __initdata and referenced
> from adjust_zone_range_for_zone_movable(), which in turn is
> referenced by zone_spanned_pages_in_node(). Both of these are
> __meminit annotated. When memory hotplug is enabled, this will oops
> on a hot-add, due to zone_movable_pfn having been freed.
>
> __meminitdata annotation gives the desired behaviour.
>
> This will only impact platforms that enable both memory hotplug
> and ARCH_POPULATES_NODE_MAP.
>
> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
>
Thank you for catching.
It seems that this route will called at node-hotplug.
We tested node hotplug with our machine but haven't seen this.
The reason is maybe...
==
Currently, our firmware shows *to-be-added* nodes in SRAT at boot and
ia64 creates empty node against possible node. So, when we add node,
pgdat is already allocated and free_area_init_core() is not called at hotplug.
==
We need to use dummy firmware to test this route, sorry.
just thinking of:
I'm now considering to post patches for "specifies a zone which all hotaddes pages are
added to, MOVABLE, NORMAL, DMA..."If we do that, we may be able to avoid calling
adjust_zone_range_for_zone_movable() (which touches this __meminit data) at hot-add.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 7:21 ` KAMEZAWA Hiroyuki
@ 2007-07-20 7:30 ` Paul Mundt
0 siblings, 0 replies; 8+ messages in thread
From: Paul Mundt @ 2007-07-20 7:30 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: Mel Gorman, Linus Torvalds, Andrew Morton, linux-kernel
On Fri, Jul 20, 2007 at 04:21:52PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 20 Jul 2007 15:03:42 +0900
> Paul Mundt <lethal@linux-sh.org> wrote:
>
> > zone_movable_pfn is presently marked as __initdata and referenced
> > from adjust_zone_range_for_zone_movable(), which in turn is
> > referenced by zone_spanned_pages_in_node(). Both of these are
> > __meminit annotated. When memory hotplug is enabled, this will oops
> > on a hot-add, due to zone_movable_pfn having been freed.
> >
> > __meminitdata annotation gives the desired behaviour.
> >
> > This will only impact platforms that enable both memory hotplug
> > and ARCH_POPULATES_NODE_MAP.
> >
> > Signed-off-by: Paul Mundt <lethal@linux-sh.org>
> >
> Thank you for catching.
> It seems that this route will called at node-hotplug.
> We tested node hotplug with our machine but haven't seen this.
>
> The reason is maybe...
> ==
> Currently, our firmware shows *to-be-added* nodes in SRAT at boot and
> ia64 creates empty node against possible node. So, when we add node,
> pgdat is already allocated and free_area_init_core() is not called at hotplug.
> ==
> We need to use dummy firmware to test this route, sorry.
>
Correct, if the pgdat is already allocated on the node you're adding,
then you shouldn't hit this path. I've been toying with the hotplug code
for handoffs of small SRAM blocks (transitioning between exclusive
application usage and letting the kernel manage it as a node), which can
reliably follow this path. It's managed to find a couple of oopses in
this area at least, this not being the first :-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 6:03 [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes Paul Mundt
2007-07-20 7:21 ` KAMEZAWA Hiroyuki
@ 2007-07-20 11:20 ` Mel Gorman
2007-07-20 11:42 ` Paul Mundt
1 sibling, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2007-07-20 11:20 UTC (permalink / raw)
To: Paul Mundt, Linus Torvalds, Andrew Morton, linux-kernel
On (20/07/07 15:03), Paul Mundt didst pronounce:
> zone_movable_pfn is presently marked as __initdata and referenced
> from adjust_zone_range_for_zone_movable(), which in turn is
> referenced by zone_spanned_pages_in_node(). Both of these are
> __meminit annotated. When memory hotplug is enabled, this will oops
> on a hot-add, due to zone_movable_pfn having been freed.
>
Ouch. Thanks for catching, your patch looks good. Before I add the ack though,
I would like to get more details of the error in case there are other gremlins
I haven't thought of and the fix for memory-hotadd is more complex.
First, can you confirm this is node hot-add please? I'm haven't looked at
the memory hot-add code in a while so I would like to be sure this problem
path only exists on node hot-add. If it is node hot-add, then everything
should be ok. The memory should get added to the same zone as it did in
older kernels.
Even if this is node hot-add, can you confirm that "ordinary" memory hot-add
is working as expected when ZONE_MOVABLE exists please? To test, add a boot
parameter kernelcore=N where N == 80% of memory and hot-add some memory to
an existing node.
I expect that the memory gets added to the same zone as historically but
when ZONE_MOVABLE is set, you'll see a situation where zones are overlapping
after memory hot-add. i.e. Before memory hot-add, you'd see
DDDDMM
for ZONE_DMA and ZONE_MOVABLE and after hotadd, you'd see something like
DDDDMMDDDD
so /proc/zoneinfo will look unusual. I'd like to be sure the memory exists
where you expect it to exist and that there are no problems after hot-add. To
test, a simple memory hot-add followed by a dd of a file the size of all
physical memory followed by a delete should do the trick. Also make sure
files like /proc/zoneinfo and /proc/meminfo are ok and particularly that
sysrq+m produces sensible output.
The only in-kernel user that should notice is one that is trying to walk the
whole of memmap and is using page_zone() changing to detect boundaries. The
existing users I am aware of only walk within a MAX_ORDER_NR_PAGES boundary
usually and a memory section boundary at most. Those users will be ok even
if zones overlap.
Thanks a lot.
If other memory-hotadd users are watching, I'd appreciate a test and a report
with a recent -git kernel to confirm you're ok. Is there a way currently
of simulating memory-hotadd so I can try this out? Ages ago, one could boot
with mem= and "add" the remaining memory at run-time.
> __meminitdata annotation gives the desired behaviour.
>
> This will only impact platforms that enable both memory hotplug
> and ARCH_POPULATES_NODE_MAP.
>
> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
>
> --
>
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 43cb3b3..40954fb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -138,7 +138,7 @@ static unsigned long __meminitdata dma_reserve;
> #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */
> unsigned long __initdata required_kernelcore;
> unsigned long __initdata required_movablecore;
> - unsigned long __initdata zone_movable_pfn[MAX_NUMNODES];
> + unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
>
> /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> int movable_zone;
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 11:20 ` Mel Gorman
@ 2007-07-20 11:42 ` Paul Mundt
2007-07-20 12:56 ` Mel Gorman
2007-07-20 13:39 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 8+ messages in thread
From: Paul Mundt @ 2007-07-20 11:42 UTC (permalink / raw)
To: Mel Gorman; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
On Fri, Jul 20, 2007 at 12:20:25PM +0100, Mel Gorman wrote:
> On (20/07/07 15:03), Paul Mundt didst pronounce:
> > zone_movable_pfn is presently marked as __initdata and referenced
> > from adjust_zone_range_for_zone_movable(), which in turn is
> > referenced by zone_spanned_pages_in_node(). Both of these are
> > __meminit annotated. When memory hotplug is enabled, this will oops
> > on a hot-add, due to zone_movable_pfn having been freed.
> >
>
> First, can you confirm this is node hot-add please? I'm haven't looked at
> the memory hot-add code in a while so I would like to be sure this problem
> path only exists on node hot-add. If it is node hot-add, then everything
> should be ok. The memory should get added to the same zone as it did in
> older kernels.
>
> Even if this is node hot-add, can you confirm that "ordinary" memory hot-add
> is working as expected when ZONE_MOVABLE exists please? To test, add a boot
> parameter kernelcore=N where N == 80% of memory and hot-add some memory to
> an existing node.
>
Normal memory hot-add does work as advertised. The problem in this case
is really a corner case, as I pointed out to Kamezawa-san. The only way
to enter the problematic code path (namely zone_spanned_pages_in_node())
is via free_area_init_node(), which in this case is only triggered by
hotadd_new_pgdat() if there's no existing pgdat.
Kamezawa-san wasn't able to reproduce this particular problem due to the
pgdat being preallocated, and I imagine that this is the common case for
most users. I have a slightly different use case where we might tear down
the node entirely, either to hand it off to another application, kernel,
or explicit power gating. In which case we have to start over via the new
pgdat path.
> I expect that the memory gets added to the same zone as historically but
> when ZONE_MOVABLE is set, you'll see a situation where zones are overlapping
> after memory hot-add. i.e. Before memory hot-add, you'd see
>
> DDDDMM
>
> for ZONE_DMA and ZONE_MOVABLE and after hotadd, you'd see something like
>
> DDDDMMDDDD
>
> so /proc/zoneinfo will look unusual. I'd like to be sure the memory exists
> where you expect it to exist and that there are no problems after hot-add. To
> test, a simple memory hot-add followed by a dd of a file the size of all
> physical memory followed by a delete should do the trick. Also make sure
> files like /proc/zoneinfo and /proc/meminfo are ok and particularly that
> sysrq+m produces sensible output.
>
It'll be Monday before I'm by a system I can test this out on, so perhaps
someone else will beat me to it. If not, I'll give it some more testing
and follow up then.
> If other memory-hotadd users are watching, I'd appreciate a test and a report
> with a recent -git kernel to confirm you're ok. Is there a way currently
> of simulating memory-hotadd so I can try this out? Ages ago, one could boot
> with mem= and "add" the remaining memory at run-time.
>
That's probably something worth resurrecting if it's broken in current
kernels. It should be pretty trivial to hook something like that up via
sparsemem anyways.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 11:42 ` Paul Mundt
@ 2007-07-20 12:56 ` Mel Gorman
2007-07-20 13:39 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2007-07-20 12:56 UTC (permalink / raw)
To: Paul Mundt, Linus Torvalds, Andrew Morton, linux-kernel
On (20/07/07 20:42), Paul Mundt didst pronounce:
> On Fri, Jul 20, 2007 at 12:20:25PM +0100, Mel Gorman wrote:
> > On (20/07/07 15:03), Paul Mundt didst pronounce:
> > > zone_movable_pfn is presently marked as __initdata and referenced
> > > from adjust_zone_range_for_zone_movable(), which in turn is
> > > referenced by zone_spanned_pages_in_node(). Both of these are
> > > __meminit annotated. When memory hotplug is enabled, this will oops
> > > on a hot-add, due to zone_movable_pfn having been freed.
> > >
> >
> > First, can you confirm this is node hot-add please? I'm haven't looked at
> > the memory hot-add code in a while so I would like to be sure this problem
> > path only exists on node hot-add. If it is node hot-add, then everything
> > should be ok. The memory should get added to the same zone as it did in
> > older kernels.
> >
> > Even if this is node hot-add, can you confirm that "ordinary" memory hot-add
> > is working as expected when ZONE_MOVABLE exists please? To test, add a boot
> > parameter kernelcore=N where N == 80% of memory and hot-add some memory to
> > an existing node.
> >
> Normal memory hot-add does work as advertised. The problem in this case
> is really a corner case, as I pointed out to Kamezawa-san. The only way
> to enter the problematic code path (namely zone_spanned_pages_in_node())
> is via free_area_init_node(), which in this case is only triggered by
> hotadd_new_pgdat() if there's no existing pgdat.
>
Right, all sounds fair and right. In that case for your patch
Acked-by: Mel Gorman <mel@csn.ul.ie>
Point is moot as Andrew already picked it up but what harm :)
> Kamezawa-san wasn't able to reproduce this particular problem due to the
> pgdat being preallocated, and I imagine that this is the common case for
> most users.
Probably. I suspect that Kamezawa-san's usecases for node-hotadd are the
only cases currently out there.
> I have a slightly different use case where we might tear down
> the node entirely, either to hand it off to another application, kernel,
> or explicit power gating. In which case we have to start over via the new
> pgdat path.
>
Fun stuff!
> > I expect that the memory gets added to the same zone as historically but
> > when ZONE_MOVABLE is set, you'll see a situation where zones are overlapping
> > after memory hot-add. i.e. Before memory hot-add, you'd see
> >
> > DDDDMM
> >
> > for ZONE_DMA and ZONE_MOVABLE and after hotadd, you'd see something like
> >
> > DDDDMMDDDD
> >
> > so /proc/zoneinfo will look unusual. I'd like to be sure the memory exists
> > where you expect it to exist and that there are no problems after hot-add. To
> > test, a simple memory hot-add followed by a dd of a file the size of all
> > physical memory followed by a delete should do the trick. Also make sure
> > files like /proc/zoneinfo and /proc/meminfo are ok and particularly that
> > sysrq+m produces sensible output.
> >
> It'll be Monday before I'm by a system I can test this out on, so perhaps
> someone else will beat me to it. If not, I'll give it some more testing
> and follow up then.
>
Thanks.
> > If other memory-hotadd users are watching, I'd appreciate a test and a report
> > with a recent -git kernel to confirm you're ok. Is there a way currently
> > of simulating memory-hotadd so I can try this out? Ages ago, one could boot
> > with mem= and "add" the remaining memory at run-time.
> >
> That's probably something worth resurrecting if it's broken in current
> kernels. It should be pretty trivial to hook something like that up via
> sparsemem anyways.
I'll take a look around and see can I come up with a basic testcase that
uses "normal" hardware to test the hot-add paths if no one else comes up
with a generally approved approach.
Thanks Paul.
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 11:42 ` Paul Mundt
2007-07-20 12:56 ` Mel Gorman
@ 2007-07-20 13:39 ` KAMEZAWA Hiroyuki
2007-07-20 14:00 ` Mel Gorman
1 sibling, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-20 13:39 UTC (permalink / raw)
To: Paul Mundt; +Cc: mel, torvalds, akpm, linux-kernel
On Fri, 20 Jul 2007 20:42:46 +0900
Paul Mundt <lethal@linux-sh.org> wrote:
> > I expect that the memory gets added to the same zone as historically but
> > when ZONE_MOVABLE is set, you'll see a situation where zones are overlapping
> > after memory hot-add. i.e. Before memory hot-add, you'd see
> >
> > DDDDMM
> >
> > for ZONE_DMA and ZONE_MOVABLE and after hotadd, you'd see something like
> >
> > DDDDMMDDDD
> >
As similar case, I hear powerpc has followig memory layout
(1)(1)(0)(0)(0)(1)(1)(1) # (0) is node 0, (1) is node 1. all zones are ZONE_DMA.
So zone is overlapped without memory hotplug.
But I agree we have to take care of this kind of corner cases.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
2007-07-20 13:39 ` KAMEZAWA Hiroyuki
@ 2007-07-20 14:00 ` Mel Gorman
0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2007-07-20 14:00 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: Paul Mundt, torvalds, akpm, linux-kernel
On (20/07/07 22:39), KAMEZAWA Hiroyuki didst pronounce:
> On Fri, 20 Jul 2007 20:42:46 +0900
> Paul Mundt <lethal@linux-sh.org> wrote:
>
> > > I expect that the memory gets added to the same zone as historically but
> > > when ZONE_MOVABLE is set, you'll see a situation where zones are overlapping
> > > after memory hot-add. i.e. Before memory hot-add, you'd see
> > >
> > > DDDDMM
> > >
> > > for ZONE_DMA and ZONE_MOVABLE and after hotadd, you'd see something like
> > >
> > > DDDDMMDDDD
> > >
>
> As similar case, I hear powerpc has followig memory layout
>
> (1)(1)(0)(0)(0)(1)(1)(1) # (0) is node 0, (1) is node 1. all zones are ZONE_DMA.
>
Not always, but it happens. We used to have two test machines available
that exhibited this behaviour although they went missing at some point
which is unfortunate.
> So zone is overlapped without memory hotplug.
Zones overlapping have some subtle differences to nodes but the place this is
really noticeable is during memory init when a range of PFNs are initialised
by memmap_init_zone(). If zones overlap there, they can initialise the
struct page fields twice but that is not an issue with memory hot-add. The
potential issue I am thinking of is users that manually walk the memmap but
no functionality outside of debugging should be doing anything like that.
> But I agree we have to take care of this kind of corner cases.
>
Indeed. I think we're ok, but it should be checked out anyway.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-07-20 14:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-20 6:03 [PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes Paul Mundt
2007-07-20 7:21 ` KAMEZAWA Hiroyuki
2007-07-20 7:30 ` Paul Mundt
2007-07-20 11:20 ` Mel Gorman
2007-07-20 11:42 ` Paul Mundt
2007-07-20 12:56 ` Mel Gorman
2007-07-20 13:39 ` KAMEZAWA Hiroyuki
2007-07-20 14:00 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox