* [PATCH] Fix sparsemem on Cell @ 2006-12-15 16:53 Dave Hansen 2006-12-15 17:11 ` Andy Whitcroft ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Dave Hansen @ 2006-12-15 16:53 UTC (permalink / raw) To: cbe-oss-dev Cc: akpm, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> --- lxc-dave/init/main.c | 4 ++++ lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 +++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* + * Memory hotplug requires that this system_state transition + * happer after free_initmem(). (see memmap_init_zone()) + */ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.000000000 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* + * There are two things that make this work: + * 1. The early_pfn...() functions are __init and + * use __initdata. If the system is < SYSTEM_RUNNING, + * those functions and their data will still exist. + * 2. We also assume that all actual memory hotplug + * (as opposed to boot-time) calls to this are only + * for contiguous memory regions. With sparsemem, + * this guaranteed is easy because all sections are + * contiguous and we never online more than one + * section at a time. Boot-time memory can have holes + * anywhere. + */ + if (system_state >= SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen @ 2006-12-15 17:11 ` Andy Whitcroft 2006-12-15 17:24 ` Dave Hansen 2006-12-15 17:22 ` [PATCH] Fix sparsemem on Cell Michael Buesch 2006-12-15 20:21 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 23+ messages in thread From: Andy Whitcroft @ 2006-12-15 17:11 UTC (permalink / raw) To: Dave Hansen Cc: akpm, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev Dave Hansen wrote: > I think the comments added say it pretty well, but I'll repeat it here. > > This fix is pretty similar in concept to the one that Arnd posted > as a temporary workaround, but I've added a few comments explaining > what the actual assumptions are, and improved it a wee little bit. > > The end goal here is to simply avoid calling the early_*() functions > when it is _not_ early. Those functions stop working as soon as > free_initmem() is called. system_state is set to SYSTEM_RUNNING > just after free_initmem() is called, so it seems appropriate to use > here. > > I did think twice about actually using SYSTEM_RUNNING because we > moved away from it in other parts of memory hotplug, but those > were actually for _allocations_ in favor of slab_is_available(), > and we really don't care about the slab here. > > The only other assumption is that all memory-hotplug-time pages > given to memmap_init_zone() are valid and able to be onlined into > any any zone after the system is running. The "valid" part is > really just a question of whether or not a 'struct page' is there > for the pfn, and *not* whether there is actual memory. Since > all sparsemem sections have contiguous mem_map[]s within them, > and we only memory hotplug entire sparsemem sections, we can > be confident that this assumption will hold. ie. that if hotplug is pushing this memory into a zone on a node it really does know what its doing, and its putting it in the right place. Obviously that code needs to be 'overlap' aware but thats ok for this interface. > > As for the memory being in the right node, we'll assume tha > memory hotplug is putting things in the right node. > > Signed-off-by: Dave Hansen <haveblue@us.ibm.com> > > --- > > lxc-dave/init/main.c | 4 ++++ > lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++--- > 2 files changed, 29 insertions(+), 3 deletions(-) > > diff -puN init/main.c~sparsemem-fix init/main.c > --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 > +++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800 > @@ -770,6 +770,10 @@ static int init(void * unused) > free_initmem(); > unlock_kernel(); > mark_rodata_ro(); > + /* > + * Memory hotplug requires that this system_state transition > + * happer after free_initmem(). (see memmap_init_zone()) > + */ typo: happen > system_state = SYSTEM_RUNNING; > numa_default_policy(); > > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c > --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 > +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.000000000 -0800 > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b > > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > +static int can_online_pfn_into_nid(unsigned long pfn, int nid) > +{ __meminit ? > + /* > + * There are two things that make this work: > + * 1. The early_pfn...() functions are __init and > + * use __initdata. If the system is < SYSTEM_RUNNING, > + * those functions and their data will still exist. > + * 2. We also assume that all actual memory hotplug > + * (as opposed to boot-time) calls to this are only > + * for contiguous memory regions. With sparsemem, > + * this guaranteed is easy because all sections are > + * contiguous and we never online more than one > + * section at a time. Boot-time memory can have holes > + * anywhere. > + */ > + if (system_state >= SYSTEM_RUNNING) > + return 1; Is there any value in codifying the assumption here, as a safety net? if (system_state >= SYSTEM_RUNNING) #ifdef CONFIG_SPARSEMEM return 1; #else return 0; #endif > + if (!early_pfn_valid(pfn)) > + return 0; > + if (!early_pfn_in_nid(pfn, nid)) > + return 0; > + return 1; > +} > + > /* > * Initially all pages are reserved - free ones are freed > * up by free_all_bootmem() once the early boot process is > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned > unsigned long pfn; > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > - if (!early_pfn_valid(pfn)) > - continue; > - if (!early_pfn_in_nid(pfn, nid)) > + if (!can_online_pfn_into_nid(pfn)) We're not passing nid here? > continue; > page = pfn_to_page(pfn); > set_page_links(page, zone, nid, pfn); > _ Looks like this would do the job to me. -apw ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 17:11 ` Andy Whitcroft @ 2006-12-15 17:24 ` Dave Hansen 2006-12-15 19:45 ` Andrew Morton 0 siblings, 1 reply; 23+ messages in thread From: Dave Hansen @ 2006-12-15 17:24 UTC (permalink / raw) To: Andy Whitcroft Cc: akpm, linux-mm, Keith Mannthey, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev On Fri, 2006-12-15 at 17:11 +0000, Andy Whitcroft wrote: > ie. that if hotplug is pushing this memory into a zone on a node it > really does know what its doing, and its putting it in the right place. > Obviously that code needs to be 'overlap' aware but thats ok for this > interface. I'm not sure if this, especially if we're only doing one section at a time. > > system_state = SYSTEM_RUNNING; > > numa_default_policy(); > > > > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c > > --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 > > +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.000000000 -0800 > > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b > > > > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > > > +static int can_online_pfn_into_nid(unsigned long pfn, int nid) > > +{ > > __meminit ? Yup. I'll get that. > > + /* > > + * There are two things that make this work: > > + * 1. The early_pfn...() functions are __init and > > + * use __initdata. If the system is < SYSTEM_RUNNING, > > + * those functions and their data will still exist. > > + * 2. We also assume that all actual memory hotplug > > + * (as opposed to boot-time) calls to this are only > > + * for contiguous memory regions. With sparsemem, > > + * this guaranteed is easy because all sections are > > + * contiguous and we never online more than one > > + * section at a time. Boot-time memory can have holes > > + * anywhere. > > + */ > > + if (system_state >= SYSTEM_RUNNING) > > + return 1; > > Is there any value in codifying the assumption here, as a safety net? > > if (system_state >= SYSTEM_RUNNING) > #ifdef CONFIG_SPARSEMEM > return 1; > #else > return 0; > #endif Dunno. The normal case is that it isn't even called without memory hotplug. The only non-sparsemem memory hotplug is Keith's baby, and they lay out all of their mem_map at boot-time anyway, so I don't think this gets used. Keith, do you know whether you even do memmap_init_zone() at runtime, and if you ever have holes if/when you do? > > + if (!early_pfn_valid(pfn)) > > + return 0; > > + if (!early_pfn_in_nid(pfn, nid)) > > + return 0; > > + return 1; > > +} > > + > > /* > > * Initially all pages are reserved - free ones are freed > > * up by free_all_bootmem() once the early boot process is > > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned > > unsigned long pfn; > > > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > > - if (!early_pfn_valid(pfn)) > > - continue; > > - if (!early_pfn_in_nid(pfn, nid)) > > + if (!can_online_pfn_into_nid(pfn)) > > We're not passing nid here? No, because my ppc64 cross compiler has magically broken. :( Fixed patch appended. -- Dave I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> --- lxc-dave/init/main.c | 4 ++++ lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 +++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* + * Memory hotplug requires that this system_state transition + * happen after free_initmem(). (see memmap_init_zone()) + */ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 +++ lxc-dave/mm/page_alloc.c 2006-12-15 09:15:00.000000000 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) +static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* + * There are two things that make this work: + * 1. The early_pfn...() functions are __init and + * use __initdata. If the system is < SYSTEM_RUNNING, + * those functions and their data will still exist. + * 2. We also assume that all actual memory hotplug + * (as opposed to boot-time) calls to this are only + * for contiguous memory regions. With sparsemem, + * this guaranteed is easy because all sections are + * contiguous and we never online more than one + * section at a time. Boot-time memory can have holes + * anywhere. + */ + if (system_state >= SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn, nid)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 17:24 ` Dave Hansen @ 2006-12-15 19:45 ` Andrew Morton 2006-12-16 8:03 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2006-12-15 19:45 UTC (permalink / raw) To: Dave Hansen Cc: linuxppc-dev, Keith Mannthey, linux-kernel, hch, linux-mm, paulus, mkravetz, gone, cbe-oss-dev On Fri, 15 Dec 2006 09:24:00 -0800 Dave Hansen <haveblue@us.ibm.com> wrote: > > ... > > I think the comments added say it pretty well, but I'll repeat it here. > > This fix is pretty similar in concept to the one that Arnd posted > as a temporary workaround, but I've added a few comments explaining > what the actual assumptions are, and improved it a wee little bit. > > The end goal here is to simply avoid calling the early_*() functions > when it is _not_ early. Those functions stop working as soon as > free_initmem() is called. system_state is set to SYSTEM_RUNNING > just after free_initmem() is called, so it seems appropriate to use > here. Would really prefer not to do this. system_state is evil. Its semantics are poorly-defined and if someone changes them a bit, or changes memory initialisation order, you get whacked. I think an mm-private flag with /*documented*/ semantics would be better. It's only a byte. > +static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid) I spent some time trying to work out what "can_online_pfn_into_nid" can possibly mean and failed. "We can bring a pfn online then turn it into a NID"? Don't think so. "We can bring this page online and allocate it to this node"? Maybe. Perhaps if the function's role in the world was commented it would be clearer. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 19:45 ` Andrew Morton @ 2006-12-16 8:03 ` KAMEZAWA Hiroyuki 2006-12-18 21:13 ` Dave Hansen 2006-12-18 22:54 ` Arnd Bergmann 0 siblings, 2 replies; 23+ messages in thread From: KAMEZAWA Hiroyuki @ 2006-12-16 8:03 UTC (permalink / raw) To: Andrew Morton Cc: linuxppc-dev, kmannth, linux-kernel, hch, linux-mm, paulus, mkravetz, gone, cbe-oss-dev On Fri, 15 Dec 2006 11:45:36 -0800 Andrew Morton <akpm@osdl.org> wrote: > Perhaps if the function's role in the world was commented it would be clearer. > How about patch like this ? (this one is not tested.) Already-exisiting-more-generic-flag is available ? -Kame == include/linux/memory_hotplug.h | 8 ++++++++ mm/memory_hotplug.c | 14 ++++++++++++++ mm/page_alloc.c | 11 +++++++---- 3 files changed, 29 insertions(+), 4 deletions(-) Index: linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h =================================================================== --- linux-2.6.20-rc1-mm1.orig/include/linux/memory_hotplug.h 2006-11-30 06:57:37.000000000 +0900 +++ linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h 2006-12-16 16:42:40.000000000 +0900 @@ -133,6 +133,10 @@ #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ +extern int under_memory_hotadd(void); + + + #else /* ! CONFIG_MEMORY_HOTPLUG */ /* * Stub functions for when hotplug is off @@ -159,6 +163,10 @@ dump_stack(); return -ENOSYS; } +static inline int under_memory_hotadd() +{ + return 0; +} #endif /* ! CONFIG_MEMORY_HOTPLUG */ static inline int __remove_pages(struct zone *zone, unsigned long start_pfn, Index: linux-2.6.20-rc1-mm1/mm/memory_hotplug.c =================================================================== --- linux-2.6.20-rc1-mm1.orig/mm/memory_hotplug.c 2006-12-16 14:24:10.000000000 +0900 +++ linux-2.6.20-rc1-mm1/mm/memory_hotplug.c 2006-12-16 16:51:08.000000000 +0900 @@ -26,6 +26,17 @@ #include <asm/tlbflush.h> +/* + * Because memory hotplug shares some codes for initilization with boot, + * we sometimes have to check what we are doing ? + */ +static atomic_t memory_hotadd_count; + +int under_memory_hotadd(void) +{ + return atomic_read(&memory_hotadd_count); +} + /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size) { @@ -273,10 +284,13 @@ if (ret) goto error; } + atomic_inc(&memory_hotadd_count); /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size); + atomic_dec(&memory_hotadd_count); + if (ret < 0) goto error; Index: linux-2.6.20-rc1-mm1/mm/page_alloc.c =================================================================== --- linux-2.6.20-rc1-mm1.orig/mm/page_alloc.c 2006-12-16 14:24:38.000000000 +0900 +++ linux-2.6.20-rc1-mm1/mm/page_alloc.c 2006-12-16 16:53:02.000000000 +0900 @@ -2069,10 +2069,13 @@ unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + if (!under_memory_hotadd()) { + /* we are in boot */ + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-16 8:03 ` KAMEZAWA Hiroyuki @ 2006-12-18 21:13 ` Dave Hansen 2006-12-18 22:15 ` Christoph Hellwig 2006-12-18 22:54 ` Arnd Bergmann 1 sibling, 1 reply; 23+ messages in thread From: Dave Hansen @ 2006-12-18 21:13 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, linux-mm, kmannth, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote: > /* add this memory to iomem resource */ > static struct resource *register_memory_resource(u64 start, u64 size) > { > @@ -273,10 +284,13 @@ > if (ret) > goto error; > } > + atomic_inc(&memory_hotadd_count); > > /* call arch's memory hotadd */ > ret = arch_add_memory(nid, start, size); > > + atomic_dec(&memory_hotadd_count); I'd be willing to be that this will work just fine. But, I think we can do it without any static state at all, if we just pass a runtime-or-not flag down into the arch_add_memory() call chain. I'll code that up so we can compare to yours. -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-18 21:13 ` Dave Hansen @ 2006-12-18 22:15 ` Christoph Hellwig 0 siblings, 0 replies; 23+ messages in thread From: Christoph Hellwig @ 2006-12-18 22:15 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, linux-mm, kmannth, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev, KAMEZAWA Hiroyuki On Mon, Dec 18, 2006 at 01:13:57PM -0800, Dave Hansen wrote: > On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote: > > /* add this memory to iomem resource */ > > static struct resource *register_memory_resource(u64 start, u64 size) > > { > > @@ -273,10 +284,13 @@ > > if (ret) > > goto error; > > } > > + atomic_inc(&memory_hotadd_count); > > > > /* call arch's memory hotadd */ > > ret = arch_add_memory(nid, start, size); > > > > + atomic_dec(&memory_hotadd_count); > > I'd be willing to be that this will work just fine. But, I think we can > do it without any static state at all, if we just pass a runtime-or-not > flag down into the arch_add_memory() call chain. > > I'll code that up so we can compare to yours. Yes, I stronly concur that passing an explicit flag is much much better than any hack involving global state. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-16 8:03 ` KAMEZAWA Hiroyuki 2006-12-18 21:13 ` Dave Hansen @ 2006-12-18 22:54 ` Arnd Bergmann 2006-12-18 23:16 ` Dave Hansen 1 sibling, 1 reply; 23+ messages in thread From: Arnd Bergmann @ 2006-12-18 22:54 UTC (permalink / raw) To: linuxppc-dev Cc: Andrew Morton, mkravetz, linux-kernel, hch, linux-mm, paulus, kmannth, gone, cbe-oss-dev, KAMEZAWA Hiroyuki On Saturday 16 December 2006 09:03, KAMEZAWA Hiroyuki wrote: > @@ -273,10 +284,13 @@ > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (ret) > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0g= oto error; > =A0=A0=A0=A0=A0=A0=A0=A0} > +=A0=A0=A0=A0=A0=A0=A0atomic_inc(&memory_hotadd_count); > =A0 > =A0=A0=A0=A0=A0=A0=A0=A0/* call arch's memory hotadd */ > =A0=A0=A0=A0=A0=A0=A0=A0ret =3D arch_add_memory(nid, start, size); > =A0 > +=A0=A0=A0=A0=A0=A0=A0atomic_dec(&memory_hotadd_count); > + > =A0=A0=A0=A0=A0=A0=A0=A0if (ret < 0) > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0goto error; > =A0 This also doesn't fix the problem on cell, since the time when the bug happens, we're not calling through this function, or arch_add_memory, at all, but rather invoke __add_pages directly. As BenH already mentioned, we shouldn't do really call __add_pages at all. Let me attempt another fix that might address all cases. This is completely untested as of now, but also addresses Dave's latest comment. Arnd <>< diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 1a3d8a2..723d220 100644 =2D-- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -543,7 +543,7 @@ virtual_memmap_init (u64 start, u64 end, void *arg) =20 if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), =2D args->nid, args->zone, page_to_pfn(map_start)); + args->nid, args->zone, page_to_pfn(map_start), 1); return 0; } =20 diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 7f2944d..1e52cd1 100644 =2D-- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -61,7 +61,7 @@ void memmap_init(unsigned long size, int nid, unsigned lo= ng zone, =20 if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), =2D nid, zone, page_to_pfn(map_start)); + nid, zone, page_to_pfn(map_start), 1); } } =20 diff --git a/include/linux/mm.h b/include/linux/mm.h index a17b147..6d85068 100644 =2D-- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -978,7 +978,7 @@ extern int early_pfn_to_nid(unsigned long pfn); #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); =2Dextern void memmap_init_zone(unsigned long, int, unsigned long, unsigned= long); +extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned l= ong, int); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0c055a0..16c9930 100644 =2D-- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -71,7 +71,7 @@ static int __add_zone(struct zone *zone, unsigned long ph= ys_start_pfn) if (ret < 0) return ret; } =2D memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn, 0); return 0; } =20 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8c1a116..60d1ac8 100644 =2D-- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1953,17 +1953,19 @@ static inline unsigned long wait_table_bits(unsigne= d long size) * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long= zone, =2D unsigned long start_pfn) + unsigned long start_pfn, int early) { struct page *page; unsigned long end_pfn =3D start_pfn + size; unsigned long pfn; =20 for (pfn =3D start_pfn; pfn < end_pfn; pfn++) { =2D if (!early_pfn_valid(pfn)) =2D continue; =2D if (!early_pfn_in_nid(pfn, nid)) =2D continue; + if (early) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page =3D pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -1990,7 +1992,7 @@ void zone_init_free_lists(struct pglist_data *pgdat, = struct zone *zone, =20 #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ =2D memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), 1) #endif =20 static int __cpuinit zone_batchsize(struct zone *zone) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-18 22:54 ` Arnd Bergmann @ 2006-12-18 23:16 ` Dave Hansen 2006-12-19 0:16 ` KAMEZAWA Hiroyuki 2006-12-19 8:59 ` Arnd Bergmann 0 siblings, 2 replies; 23+ messages in thread From: Dave Hansen @ 2006-12-18 23:16 UTC (permalink / raw) To: Arnd Bergmann Cc: Andrew Morton, mkravetz, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, kmannth, gone, cbe-oss-dev, KAMEZAWA Hiroyuki On Mon, 2006-12-18 at 23:54 +0100, Arnd Bergmann wrote: > #ifndef __HAVE_ARCH_MEMMAP_INIT > #define memmap_init(size, nid, zone, start_pfn) \ > - memmap_init_zone((size), (nid), (zone), (start_pfn)) > + memmap_init_zone((size), (nid), (zone), (start_pfn), 1) > #endif This is what I was thinking of. Sometimes I find these kinds of calls a bit annoying: foo(0, 1, 1, 0, 99, 22) It only takes a minute to look up what all of the numbers do, but that is one minute too many. :) How about an enum, or a pair of #defines? enum context { EARLY, HOTPLUG }; extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, enum call_context); ... So, the call I quoted above would become: memmap_init_zone((size), (nid), (zone), (start_pfn), EARLY) -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-18 23:16 ` Dave Hansen @ 2006-12-19 0:16 ` KAMEZAWA Hiroyuki 2006-12-19 8:59 ` Arnd Bergmann 1 sibling, 0 replies; 23+ messages in thread From: KAMEZAWA Hiroyuki @ 2006-12-19 0:16 UTC (permalink / raw) To: Dave Hansen Cc: akpm, mkravetz, arnd, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, kmannth, gone, cbe-oss-dev On Mon, 18 Dec 2006 15:16:20 -0800 Dave Hansen <haveblue@us.ibm.com> wrote: > enum context > { > EARLY, > HOTPLUG > }; I like this :) Thanks, -Kame ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-18 23:16 ` Dave Hansen 2006-12-19 0:16 ` KAMEZAWA Hiroyuki @ 2006-12-19 8:59 ` Arnd Bergmann 2006-12-19 19:34 ` Dave Hansen 2007-01-06 1:10 ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen 1 sibling, 2 replies; 23+ messages in thread From: Arnd Bergmann @ 2006-12-19 8:59 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, mkravetz, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, kmannth, gone, cbe-oss-dev, KAMEZAWA Hiroyuki On Tuesday 19 December 2006 00:16, Dave Hansen wrote: > How about an enum, or a pair of #defines? >=20 > enum context > { > =A0 =A0 =A0 =A0 EARLY, > =A0 =A0 =A0 =A0 HOTPLUG > }; Sounds good, but since this is in a global header file, it needs to be in an appropriate name space, like enum memmap_context { MEMMAP_EARLY, MEMMAP_HOTPLUG, }; Arnd <>< ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-19 8:59 ` Arnd Bergmann @ 2006-12-19 19:34 ` Dave Hansen 2007-01-06 1:10 ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen 1 sibling, 0 replies; 23+ messages in thread From: Dave Hansen @ 2006-12-19 19:34 UTC (permalink / raw) To: Arnd Bergmann Cc: Andrew Morton, mkravetz, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, kmannth, gone, KAMEZAWA Hiroyuki Pulling cbe... list off the cc because it is giving annoying 'too many recipients' warnings. On Tue, 2006-12-19 at 09:59 +0100, Arnd Bergmann wrote: > On Tuesday 19 December 2006 00:16, Dave Hansen wrote: > > How about an enum, or a pair of #defines? > > > > enum context > > { > > EARLY, > > HOTPLUG > > }; This patch should do the trick. Arnd, can you try this to make sure it solves the issue? Also, if you get a chance, could you do a quick s390 boot of it? It was the only arch touched by this whole thing. It compiles on all of my i386 .configs. -- The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> --- lxc-dave/arch/s390/mm/vmem.c | 3 ++- lxc-dave/include/linux/mm.h | 7 ++++++- lxc-dave/include/linux/mmzone.h | 3 ++- lxc-dave/mm/memory_hotplug.c | 6 ++++-- lxc-dave/mm/page_alloc.c | 25 +++++++++++++++++-------- 5 files changed, 31 insertions(+), 13 deletions(-) diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.000000000 -0800 +++ lxc-dave/mm/page_alloc.c 2006-12-19 11:18:24.000000000 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* + * There can be holes in boot-time mem_map[]s + * handed to this function. They do not + * exist on hotplugged memory. + */ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone->zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h --- lxc/include/linux/mm.h~sparsemem-enum1 2006-12-19 09:38:45.000000000 -0800 +++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.000000000 -0800 @@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.000000000 -0800 +++ lxc-dave/mm/memory_hotplug.c 2006-12-19 09:50:24.000000000 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat->node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret < 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, + phys_start_pfn, MEMMAP_HOTPLUG); return 0; } diff -puN arch/s390/mm/vmem.c~sparsemem-enum1 arch/s390/mm/vmem.c --- lxc/arch/s390/mm/vmem.c~sparsemem-enum1 2006-12-19 09:42:26.000000000 -0800 +++ lxc-dave/arch/s390/mm/vmem.c 2006-12-19 11:17:49.000000000 -0800 @@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), - nid, zone, page_to_pfn(map_start)); + nid, zone, page_to_pfn(map_start), + context); } } diff -puN include/linux/mmzone.h~sparsemem-enum1 include/linux/mmzone.h --- lxc/include/linux/mmzone.h~sparsemem-enum1 2006-12-19 09:48:58.000000000 -0800 +++ lxc-dave/include/linux/mmzone.h 2006-12-19 09:53:42.000000000 -0800 @@ -474,7 +474,8 @@ int zone_watermark_ok(struct zone *z, in int classzone_idx, int alloc_flags); extern int init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, - unsigned long size); + unsigned long size, + enum memmap_context context); #ifdef CONFIG_HAVE_MEMORY_PRESENT void memory_present(int nid, unsigned long start, unsigned long end); _ -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Fix sparsemem on Cell (take 3) 2006-12-19 8:59 ` Arnd Bergmann 2006-12-19 19:34 ` Dave Hansen @ 2007-01-06 1:10 ` Dave Hansen 2007-01-06 4:52 ` John Rose 1 sibling, 1 reply; 23+ messages in thread From: Dave Hansen @ 2007-01-06 1:10 UTC (permalink / raw) To: Andrew Morton Cc: Andrew Morton, mkravetz, Arnd Bergmann, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, Keith Mannthey, gone, cbe-oss-dev, KAMEZAWA Hiroyuki I dropped this on the floor over Christmas. This has had a few smoke tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> --- lxc-dave/arch/s390/mm/vmem.c | 3 ++- lxc-dave/include/linux/mm.h | 7 ++++++- lxc-dave/include/linux/mmzone.h | 3 ++- lxc-dave/mm/memory_hotplug.c | 6 ++++-- lxc-dave/mm/page_alloc.c | 25 +++++++++++++++++-------- 5 files changed, 31 insertions(+), 13 deletions(-) diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.000000000 -0800 +++ lxc-dave/mm/page_alloc.c 2006-12-19 11:18:24.000000000 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* + * There can be holes in boot-time mem_map[]s + * handed to this function. They do not + * exist on hotplugged memory. + */ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone->zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h --- lxc/include/linux/mm.h~sparsemem-enum1 2006-12-19 09:38:45.000000000 -0800 +++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.000000000 -0800 @@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.000000000 -0800 +++ lxc-dave/mm/memory_hotplug.c 2006-12-19 09:50:24.000000000 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat->node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret < 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, + phys_start_pfn, MEMMAP_HOTPLUG); return 0; } diff -puN arch/s390/mm/vmem.c~sparsemem-enum1 arch/s390/mm/vmem.c --- lxc/arch/s390/mm/vmem.c~sparsemem-enum1 2006-12-19 09:42:26.000000000 -0800 +++ lxc-dave/arch/s390/mm/vmem.c 2006-12-19 11:17:49.000000000 -0800 @@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), - nid, zone, page_to_pfn(map_start)); + nid, zone, page_to_pfn(map_start), + context); } } diff -puN include/linux/mmzone.h~sparsemem-enum1 include/linux/mmzone.h --- lxc/include/linux/mmzone.h~sparsemem-enum1 2006-12-19 09:48:58.000000000 -0800 +++ lxc-dave/include/linux/mmzone.h 2006-12-19 09:53:42.000000000 -0800 @@ -474,7 +474,8 @@ int zone_watermark_ok(struct zone *z, in int classzone_idx, int alloc_flags); extern int init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, - unsigned long size); + unsigned long size, + enum memmap_context context); #ifdef CONFIG_HAVE_MEMORY_PRESENT void memory_present(int nid, unsigned long start, unsigned long end); _ -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell (take 3) 2007-01-06 1:10 ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen @ 2007-01-06 4:52 ` John Rose 2007-01-07 8:58 ` Dave Hansen 0 siblings, 1 reply; 23+ messages in thread From: John Rose @ 2007-01-06 4:52 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, mkravetz, Arnd Bergmann, linux-mm, lkml, hch, External List, Paul Mackerras, kmannth, gone, cbe-oss-dev, KAMEZAWA Hiroyuki > I dropped this on the floor over Christmas. This has had a few smoke > tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. Could this break ia64, given that it uses memmap_init_zone()? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell (take 3) 2007-01-06 4:52 ` John Rose @ 2007-01-07 8:58 ` Dave Hansen 2007-01-07 12:07 ` Arnd Bergmann ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Dave Hansen @ 2007-01-07 8:58 UTC (permalink / raw) To: John Rose Cc: Andrew Morton, mkravetz, Arnd Bergmann, linux-mm, lkml, hch, External List, Paul Mackerras, kmannth, gone, KAMEZAWA Hiroyuki On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: > > I dropped this on the floor over Christmas. This has had a few smoke > > tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. > > Could this break ia64, given that it uses memmap_init_zone()? You are right, I think it does. Here's an updated patch to replace the earlier one. I had to move the enum definition over to a different header because ia64 evidently has a different include order. --- The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ia64, s390, i386 and x86_64. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> --- lxc-dave/arch/ia64/mm/init.c | 5 +++-- lxc-dave/arch/s390/mm/vmem.c | 3 ++- lxc-dave/include/linux/mm.h | 3 ++- lxc-dave/include/linux/mmzone.h | 8 ++++++-- lxc-dave/mm/memory_hotplug.c | 6 ++++-- lxc-dave/mm/page_alloc.c | 25 +++++++++++++++++-------- 6 files changed, 34 insertions(+), 16 deletions(-) diff -puN arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell arch/s390/mm/vmem.c --- lxc/arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.000000000 -0800 +++ lxc-dave/arch/s390/mm/vmem.c 2007-01-07 00:47:02.000000000 -0800 @@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), - nid, zone, page_to_pfn(map_start)); + nid, zone, page_to_pfn(map_start), + MEMMAP_EARLY); } } diff -puN include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mm.h --- lxc/include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.000000000 -0800 +++ lxc-dave/include/linux/mm.h 2007-01-06 23:57:59.000000000 -0800 @@ -979,7 +979,8 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mmzone.h --- lxc/include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.000000000 -0800 +++ lxc-dave/include/linux/mmzone.h 2007-01-06 23:58:15.000000000 -0800 @@ -471,9 +471,13 @@ void build_all_zonelists(void); void wakeup_kswapd(struct zone *zone, int order); int zone_watermark_ok(struct zone *z, int order, unsigned long mark, int classzone_idx, int alloc_flags); - +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; extern int init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, - unsigned long size); + unsigned long size, + enum memmap_context context); #ifdef CONFIG_HAVE_MEMORY_PRESENT void memory_present(int nid, unsigned long start, unsigned long end); diff -puN mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.000000000 -0800 +++ lxc-dave/mm/memory_hotplug.c 2007-01-05 15:38:23.000000000 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat->node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret < 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, + phys_start_pfn, MEMMAP_HOTPLUG); return 0; } diff -puN mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/page_alloc.c --- lxc/mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.000000000 -0800 +++ lxc-dave/mm/page_alloc.c 2007-01-07 00:35:27.000000000 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* + * There can be holes in boot-time mem_map[]s + * handed to this function. They do not + * exist on hotplugged memory. + */ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone->zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN arch/ia64/mm/init.c~Re-_PATCH_Fix_sparsemem_on_Cell arch/ia64/mm/init.c --- lxc/arch/ia64/mm/init.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-06 23:58:55.000000000 -0800 +++ lxc-dave/arch/ia64/mm/init.c 2007-01-07 00:08:01.000000000 -0800 @@ -541,7 +541,8 @@ virtual_memmap_init (u64 start, u64 end, if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), - args->nid, args->zone, page_to_pfn(map_start)); + args->nid, args->zone, page_to_pfn(map_start), + MEMMAP_EARLY); return 0; } @@ -550,7 +551,7 @@ memmap_init (unsigned long size, int nid unsigned long start_pfn) { if (!vmem_map) - memmap_init_zone(size, nid, zone, start_pfn); + memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY); else { struct page *start; struct memmap_init_callback_data args; _ -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell (take 3) 2007-01-07 8:58 ` Dave Hansen @ 2007-01-07 12:07 ` Arnd Bergmann 2007-01-08 6:31 ` Tim Pepper 2007-01-08 6:47 ` Tim Pepper 2 siblings, 0 replies; 23+ messages in thread From: Arnd Bergmann @ 2007-01-07 12:07 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, mkravetz, linux-mm, lkml, hch, External List, Paul Mackerras, kmannth, gone, KAMEZAWA Hiroyuki On Sunday 07 January 2007 09:58, Dave Hansen wrote: > The following patch fixes an oops experienced on the Cell architecture > when init-time functions, early_*(), are called at runtime. =A0It alters > the call paths to make sure that the callers explicitly say whether the > call is being made on behalf of a hotplug even, or happening at > boot-time.=20 >=20 > It has been compile tested on ia64, s390, i386 and x86_64. I can't test it here, since I'm travelling at the moment, but this version looks good to me. Thanks for picking it up again! > Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Arnd Bergmann <arndb@de.ibm.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell (take 3) 2007-01-07 8:58 ` Dave Hansen 2007-01-07 12:07 ` Arnd Bergmann @ 2007-01-08 6:31 ` Tim Pepper 2007-01-08 6:47 ` Tim Pepper 2 siblings, 0 replies; 23+ messages in thread From: Tim Pepper @ 2007-01-08 6:31 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, mkravetz, Arnd Bergmann, linux-mm, lkml, hch, External List, Paul Mackerras, kmannth, gone, KAMEZAWA Hiroyuki [-- Attachment #1: Type: text/plain, Size: 457 bytes --] On 1/7/07, Dave Hansen <haveblue@us.ibm.com> wrote: > > On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: > > > Could this break ia64, given that it uses memmap_init_zone()? > > You are right, I think it does. > > Here's an updated patch to replace the earlier one. I had to move the > enum definition over to a different header because ia64 evidently has a > different include order. Boot tested OK on ia64 with this latest version of the patch. Tim [-- Attachment #2: Type: text/html, Size: 773 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell (take 3) 2007-01-07 8:58 ` Dave Hansen 2007-01-07 12:07 ` Arnd Bergmann 2007-01-08 6:31 ` Tim Pepper @ 2007-01-08 6:47 ` Tim Pepper 2 siblings, 0 replies; 23+ messages in thread From: Tim Pepper @ 2007-01-08 6:47 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Morton, mkravetz, Arnd Bergmann, linux-mm, lkml, hch, External List, Paul Mackerras, kmannth, gone, KAMEZAWA Hiroyuki On 1/7/07, Dave Hansen <haveblue@us.ibm.com> wrote: > On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: > > Could this break ia64, given that it uses memmap_init_zone()? > > You are right, I think it does. Boot tested OK on ia64 with this latest version of the patch. (forgot to click plain text on gmail the first time..sorry if you got html mail or repeat) Tim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen 2006-12-15 17:11 ` Andy Whitcroft @ 2006-12-15 17:22 ` Michael Buesch 2006-12-15 17:57 ` Dave Hansen 2006-12-15 20:21 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 23+ messages in thread From: Michael Buesch @ 2006-12-15 17:22 UTC (permalink / raw) To: Dave Hansen Cc: akpm, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev On Friday 15 December 2006 17:53, Dave Hansen wrote: > > I think the comments added say it pretty well, but I'll repeat it here. > > This fix is pretty similar in concept to the one that Arnd posted > as a temporary workaround, but I've added a few comments explaining > what the actual assumptions are, and improved it a wee little bit. > > The end goal here is to simply avoid calling the early_*() functions > when it is _not_ early. Those functions stop working as soon as > free_initmem() is called. system_state is set to SYSTEM_RUNNING > just after free_initmem() is called, so it seems appropriate to use > here. > > I did think twice about actually using SYSTEM_RUNNING because we > moved away from it in other parts of memory hotplug, but those > were actually for _allocations_ in favor of slab_is_available(), > and we really don't care about the slab here. > > The only other assumption is that all memory-hotplug-time pages > given to memmap_init_zone() are valid and able to be onlined into > any any zone after the system is running. The "valid" part is > really just a question of whether or not a 'struct page' is there > for the pfn, and *not* whether there is actual memory. Since > all sparsemem sections have contiguous mem_map[]s within them, > and we only memory hotplug entire sparsemem sections, we can > be confident that this assumption will hold. > > As for the memory being in the right node, we'll assume tha > memory hotplug is putting things in the right node. > > Signed-off-by: Dave Hansen <haveblue@us.ibm.com> > > --- > > lxc-dave/init/main.c | 4 ++++ > lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++--- > 2 files changed, 29 insertions(+), 3 deletions(-) > > diff -puN init/main.c~sparsemem-fix init/main.c > --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 > +++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800 > @@ -770,6 +770,10 @@ static int init(void * unused) > free_initmem(); > unlock_kernel(); > mark_rodata_ro(); > + /* > + * Memory hotplug requires that this system_state transition > + * happer after free_initmem(). (see memmap_init_zone()) s/happer/happens/ Other than that, can't this possibly race and crash here? I mean, it's not a big race window, but it can happen, no? > + */ > system_state = SYSTEM_RUNNING; > numa_default_policy(); > > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c > --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 > +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.000000000 -0800 > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b > > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > +static int can_online_pfn_into_nid(unsigned long pfn, int nid) > +{ > + /* > + * There are two things that make this work: > + * 1. The early_pfn...() functions are __init and > + * use __initdata. If the system is < SYSTEM_RUNNING, > + * those functions and their data will still exist. > + * 2. We also assume that all actual memory hotplug > + * (as opposed to boot-time) calls to this are only > + * for contiguous memory regions. With sparsemem, > + * this guaranteed is easy because all sections are > + * contiguous and we never online more than one > + * section at a time. Boot-time memory can have holes > + * anywhere. > + */ > + if (system_state >= SYSTEM_RUNNING) > + return 1; > + if (!early_pfn_valid(pfn)) > + return 0; > + if (!early_pfn_in_nid(pfn, nid)) > + return 0; > + return 1; > +} > + > /* > * Initially all pages are reserved - free ones are freed > * up by free_all_bootmem() once the early boot process is > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned > unsigned long pfn; > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > - if (!early_pfn_valid(pfn)) > - continue; > - if (!early_pfn_in_nid(pfn, nid)) > + if (!can_online_pfn_into_nid(pfn)) > continue; > page = pfn_to_page(pfn); > set_page_links(page, zone, nid, pfn); > _ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -- Greetings Michael. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 17:22 ` [PATCH] Fix sparsemem on Cell Michael Buesch @ 2006-12-15 17:57 ` Dave Hansen 0 siblings, 0 replies; 23+ messages in thread From: Dave Hansen @ 2006-12-15 17:57 UTC (permalink / raw) To: Michael Buesch Cc: akpm, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev, naveen.b.s On Fri, 2006-12-15 at 18:22 +0100, Michael Buesch wrote: > On Friday 15 December 2006 17:53, Dave Hansen wrote: > > lxc-dave/init/main.c | 4 ++++ > > lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++--- > > 2 files changed, 29 insertions(+), 3 deletions(-) > > > > diff -puN init/main.c~sparsemem-fix init/main.c > > --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 > > +++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800 > > @@ -770,6 +770,10 @@ static int init(void * unused) > > free_initmem(); > > unlock_kernel(); > > mark_rodata_ro(); > > + /* > > + * Memory hotplug requires that this system_state transition > > + * happer after free_initmem(). (see memmap_init_zone()) > > s/happer/happens/ > > Other than that, can't this possibly race and crash here? > I mean, it's not a big race window, but it can happen, no? That's a good point. Nice eye. There are three routes in here: boot-time init, an ACPI call, and a write to a sysfs file. Bootmem is taken care of. The write to a sysfs file can't happen yet because userspace isn't up. The only question would be about ACPI. I _guess_ an ACPI event could come in at any time, and could hit this race window. One other thought I had was to add an argument to memmap_init_zone() to indicate that the memory being fed to it was contiguous and did not need the validation checks. Anybody have thoughts on that? -- Dave ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen 2006-12-15 17:11 ` Andy Whitcroft 2006-12-15 17:22 ` [PATCH] Fix sparsemem on Cell Michael Buesch @ 2006-12-15 20:21 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 23+ messages in thread From: Benjamin Herrenschmidt @ 2006-12-15 20:21 UTC (permalink / raw) To: Dave Hansen Cc: akpm, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone, cbe-oss-dev > The only other assumption is that all memory-hotplug-time pages > given to memmap_init_zone() are valid and able to be onlined into > any any zone after the system is running. The "valid" part is > really just a question of whether or not a 'struct page' is there > for the pfn, and *not* whether there is actual memory. Since > all sparsemem sections have contiguous mem_map[]s within them, > and we only memory hotplug entire sparsemem sections, we can > be confident that this assumption will hold. > > As for the memory being in the right node, we'll assume tha > memory hotplug is putting things in the right node. BTW, just that people know, what we are adding isn't even memory :-) We are calling __add_pages() to create struct page for the SPE local stores and register space as we use them later from a nopage() handler (and no, we can't use no_pfn just yet for various reasons, notably we need to handle races with unmap_mapping_ranges() and thus have the truncate logic in). Those pages, thus, must never be onlined. Ever. It might make sense to create a way to inform memory hotplug of that fact, but on the other hand, I wouldn't bother as I have a plan to get rid of those __add_pages() completely and work without struct page, maybe in a 2.6.21 timeframe. Ben. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Fix sparsemem on Cell @ 2006-12-15 17:14 Dave Hansen 2006-12-17 23:02 ` Arnd Bergmann 0 siblings, 1 reply; 23+ messages in thread From: Dave Hansen @ 2006-12-15 17:14 UTC (permalink / raw) To: cbe-oss-dev Cc: akpm, linux-mm, linux-kernel, hch, linuxppc-dev, paulus, mkravetz, gone I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> --- lxc-dave/init/main.c | 4 ++++ lxc-dave/mm/page_alloc.c | 28 +++++++++++++++++++++++++--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 +++ lxc-dave/init/main.c 2006-12-15 08:49:53.000000000 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* + * Memory hotplug requires that this system_state transition + * happer after free_initmem(). (see memmap_init_zone()) + */ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.000000000 -0800 +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.000000000 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* + * There are two things that make this work: + * 1. The early_pfn...() functions are __init and + * use __initdata. If the system is < SYSTEM_RUNNING, + * those functions and their data will still exist. + * 2. We also assume that all actual memory hotplug + * (as opposed to boot-time) calls to this are only + * for contiguous memory regions. With sparsemem, + * this guaranteed is easy because all sections are + * contiguous and we never online more than one + * section at a time. Boot-time memory can have holes + * anywhere. + */ + if (system_state >= SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Fix sparsemem on Cell 2006-12-15 17:14 Dave Hansen @ 2006-12-17 23:02 ` Arnd Bergmann 0 siblings, 0 replies; 23+ messages in thread From: Arnd Bergmann @ 2006-12-17 23:02 UTC (permalink / raw) To: linuxppc-dev Cc: akpm, linux-kernel, hch, linux-mm, paulus, mkravetz, gone, cbe-oss-dev On Friday 15 December 2006 18:14, Dave Hansen wrote: > +=A0=A0=A0=A0=A0=A0=A0if (system_state >=3D SYSTEM_RUNNING) > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0return 1; > + if (!early_pfn_valid(pfn)) > + return 0; > + if (!early_pfn_in_nid(pfn, nid)) > + return 0; I haven't tried it, but I assume this is still wrong. On cell, we didn't actually hit the case where the init sections have been overwritten, since we call __add_pages from an initcall. However, the pages we add are not part of the early_node_map, so early_pfn_in_nid() returns a bogus result, causing some page structs not to get initialized. I believe your patch is going in the right direction, but it does not solve the bug we have... Arnd <>< ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-01-08 6:47 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen 2006-12-15 17:11 ` Andy Whitcroft 2006-12-15 17:24 ` Dave Hansen 2006-12-15 19:45 ` Andrew Morton 2006-12-16 8:03 ` KAMEZAWA Hiroyuki 2006-12-18 21:13 ` Dave Hansen 2006-12-18 22:15 ` Christoph Hellwig 2006-12-18 22:54 ` Arnd Bergmann 2006-12-18 23:16 ` Dave Hansen 2006-12-19 0:16 ` KAMEZAWA Hiroyuki 2006-12-19 8:59 ` Arnd Bergmann 2006-12-19 19:34 ` Dave Hansen 2007-01-06 1:10 ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen 2007-01-06 4:52 ` John Rose 2007-01-07 8:58 ` Dave Hansen 2007-01-07 12:07 ` Arnd Bergmann 2007-01-08 6:31 ` Tim Pepper 2007-01-08 6:47 ` Tim Pepper 2006-12-15 17:22 ` [PATCH] Fix sparsemem on Cell Michael Buesch 2006-12-15 17:57 ` Dave Hansen 2006-12-15 20:21 ` Benjamin Herrenschmidt -- strict thread matches above, loose matches on Subject: below -- 2006-12-15 17:14 Dave Hansen 2006-12-17 23:02 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).