From: Dave Hansen <haveblue@us.ibm.com>
To: Andy Whitcroft <apw@shadowen.org>
Cc: akpm@osdl.org, linux-mm@kvack.org,
Keith Mannthey <kmannth@us.ibm.com>,
linux-kernel@vger.kernel.org, hch@infradead.org,
linuxppc-dev@ozlabs.org, paulus@samba.org, mkravetz@us.ibm.com,
gone@us.ibm.com, cbe-oss-dev@ozlabs.org
Subject: Re: [PATCH] Fix sparsemem on Cell
Date: Fri, 15 Dec 2006 09:24:00 -0800 [thread overview]
Message-ID: <1166203440.8105.22.camel@localhost.localdomain> (raw)
In-Reply-To: <4582D756.7090702@shadowen.org>
On Fri, 2006-12-15 at 17:11 +0000, Andy Whitcroft wrote:
> ie. that if hotplug is pushing this memory into a zone on a node it
> really does know what its doing, and its putting it in the right place.
> Obviously that code needs to be 'overlap' aware but thats ok for this
> interface.
I'm not sure if this, especially if we're only doing one section at a
time.
> > system_state = SYSTEM_RUNNING;
> > numa_default_policy();
> >
> > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> > --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800
> > +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.000000000 -0800
> > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
> >
> > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
> >
> > +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> > +{
>
> __meminit ?
Yup. I'll get that.
> > + /*
> > + * There are two things that make this work:
> > + * 1. The early_pfn...() functions are __init and
> > + * use __initdata. If the system is < SYSTEM_RUNNING,
> > + * those functions and their data will still exist.
> > + * 2. We also assume that all actual memory hotplug
> > + * (as opposed to boot-time) calls to this are only
> > + * for contiguous memory regions. With sparsemem,
> > + * this guaranteed is easy because all sections are
> > + * contiguous and we never online more than one
> > + * section at a time. Boot-time memory can have holes
> > + * anywhere.
> > + */
> > + if (system_state >= SYSTEM_RUNNING)
> > + return 1;
>
> Is there any value in codifying the assumption here, as a safety net?
>
> if (system_state >= SYSTEM_RUNNING)
> #ifdef CONFIG_SPARSEMEM
> return 1;
> #else
> return 0;
> #endif
Dunno. The normal case is that it isn't even called without memory
hotplug. The only non-sparsemem memory hotplug is Keith's baby, and
they lay out all of their mem_map at boot-time anyway, so I don't think
this gets used.
Keith, do you know whether you even do memmap_init_zone() at runtime,
and if you ever have holes if/when you do?
> > + if (!early_pfn_valid(pfn))
> > + return 0;
> > + if (!early_pfn_in_nid(pfn, nid))
> > + return 0;
> > + return 1;
> > +}
> > +
> > /*
> > * Initially all pages are reserved - free ones are freed
> > * up by free_all_bootmem() once the early boot process is
> > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
> > unsigned long pfn;
> >
> > for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > - if (!early_pfn_valid(pfn))
> > - continue;
> > - if (!early_pfn_in_nid(pfn, nid))
> > + if (!can_online_pfn_into_nid(pfn))
>
> We're not passing nid here?
No, because my ppc64 cross compiler has magically broken. :(
Fixed patch appended.
-- Dave
I think the comments added say it pretty well, but I'll repeat it here.
This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.
The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early. Those functions stop working as soon as
free_initmem() is called. system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.
I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.
The only other assumption is that all memory-hotplug-time pages
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running. The "valid" part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory. Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.
As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
---
lxc-dave/init/main.c | 4 ++++
lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++---
2 files changed, 29 insertions(+), 3 deletions(-)
diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800
+++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+ /*
+ * Memory hotplug requires that this system_state transition
+ * happen after free_initmem(). (see memmap_init_zone())
+ */
system_state = SYSTEM_RUNNING;
numa_default_policy();
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800
+++ lxc-dave/mm/page_alloc.c 2006-12-15 09:15:00.000000000 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
#define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
+static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid)
+{
+ /*
+ * There are two things that make this work:
+ * 1. The early_pfn...() functions are __init and
+ * use __initdata. If the system is < SYSTEM_RUNNING,
+ * those functions and their data will still exist.
+ * 2. We also assume that all actual memory hotplug
+ * (as opposed to boot-time) calls to this are only
+ * for contiguous memory regions. With sparsemem,
+ * this guaranteed is easy because all sections are
+ * contiguous and we never online more than one
+ * section at a time. Boot-time memory can have holes
+ * anywhere.
+ */
+ if (system_state >= SYSTEM_RUNNING)
+ return 1;
+ if (!early_pfn_valid(pfn))
+ return 0;
+ if (!early_pfn_in_nid(pfn, nid))
+ return 0;
+ return 1;
+}
+
/*
* Initially all pages are reserved - free ones are freed
* up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
- if (!early_pfn_valid(pfn))
- continue;
- if (!early_pfn_in_nid(pfn, nid))
+ if (!can_online_pfn_into_nid(pfn, nid))
continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_
next prev parent reply other threads:[~2006-12-15 17:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen
2006-12-15 17:11 ` Andy Whitcroft
2006-12-15 17:24 ` Dave Hansen [this message]
2006-12-15 19:45 ` Andrew Morton
2006-12-16 8:03 ` KAMEZAWA Hiroyuki
2006-12-18 21:13 ` Dave Hansen
2006-12-18 22:15 ` Christoph Hellwig
2006-12-18 22:54 ` Arnd Bergmann
2006-12-18 23:16 ` Dave Hansen
2006-12-19 0:16 ` KAMEZAWA Hiroyuki
2006-12-19 8:59 ` Arnd Bergmann
2006-12-19 19:34 ` Dave Hansen
2007-01-06 1:10 ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen
2007-01-06 4:52 ` John Rose
2007-01-07 8:58 ` Dave Hansen
2007-01-07 12:07 ` Arnd Bergmann
2007-01-08 6:31 ` Tim Pepper
2007-01-08 6:47 ` Tim Pepper
2006-12-15 17:22 ` [PATCH] Fix sparsemem on Cell Michael Buesch
2006-12-15 17:57 ` Dave Hansen
2006-12-15 20:21 ` Benjamin Herrenschmidt
-- strict thread matches above, loose matches on Subject: below --
2006-12-15 17:14 Dave Hansen
2006-12-17 23:02 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1166203440.8105.22.camel@localhost.localdomain \
--to=haveblue@us.ibm.com \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=cbe-oss-dev@ozlabs.org \
--cc=gone@us.ibm.com \
--cc=hch@infradead.org \
--cc=kmannth@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mkravetz@us.ibm.com \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).