linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Buesch <mb@bu3sch.de>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: akpm@osdl.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hch@infradead.org, linuxppc-dev@ozlabs.org, paulus@samba.org,
	mkravetz@us.ibm.com, gone@us.ibm.com, cbe-oss-dev@ozlabs.org
Subject: Re: [PATCH] Fix sparsemem on Cell
Date: Fri, 15 Dec 2006 18:22:23 +0100	[thread overview]
Message-ID: <200612151822.23764.mb@bu3sch.de> (raw)
In-Reply-To: <20061215165335.61D9F775@localhost.localdomain>

On Friday 15 December 2006 17:53, Dave Hansen wrote:
> 
> I think the comments added say it pretty well, but I'll repeat it here.
> 
> This fix is pretty similar in concept to the one that Arnd posted
> as a temporary workaround, but I've added a few comments explaining
> what the actual assumptions are, and improved it a wee little bit.
> 
> The end goal here is to simply avoid calling the early_*() functions
> when it is _not_ early.  Those functions stop working as soon as
> free_initmem() is called.  system_state is set to SYSTEM_RUNNING
> just after free_initmem() is called, so it seems appropriate to use
> here.
> 
> I did think twice about actually using SYSTEM_RUNNING because we
> moved away from it in other parts of memory hotplug, but those
> were actually for _allocations_ in favor of slab_is_available(),
> and we really don't care about the slab here.
> 
> The only other assumption is that all memory-hotplug-time pages 
> given to memmap_init_zone() are valid and able to be onlined into
> any any zone after the system is running.  The "valid" part is
> really just a question of whether or not a 'struct page' is there
> for the pfn, and *not* whether there is actual memory.  Since
> all sparsemem sections have contiguous mem_map[]s within them,
> and we only memory hotplug entire sparsemem sections, we can
> be confident that this assumption will hold.
> 
> As for the memory being in the right node, we'll assume tha
> memory hotplug is putting things in the right node.
> 
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> 
> ---
> 
>  lxc-dave/init/main.c     |    4 ++++
>  lxc-dave/mm/page_alloc.c |   28 +++++++++++++++++++++++++---
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff -puN init/main.c~sparsemem-fix init/main.c
> --- lxc/init/main.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/init/main.c	2006-12-15 08:49:53.000000000 -0800
> @@ -770,6 +770,10 @@ static int init(void * unused)
>  	free_initmem();
>  	unlock_kernel();
>  	mark_rodata_ro();
> +	/*
> +	 * Memory hotplug requires that this system_state transition
> +	 * happer after free_initmem().  (see memmap_init_zone())

s/happer/happens/

Other than that, can't this possibly race and crash here?
I mean, it's not a big race window, but it can happen, no?

> +	 */
>  	system_state = SYSTEM_RUNNING;
>  	numa_default_policy();
>  
> diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> --- lxc/mm/page_alloc.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/mm/page_alloc.c	2006-12-15 08:49:53.000000000 -0800
> @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
>  
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
> +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> +{
> +	/*
> +	 * There are two things that make this work:
> +	 * 1. The early_pfn...() functions are __init and
> +	 *    use __initdata.  If the system is < SYSTEM_RUNNING,
> +	 *    those functions and their data will still exist.
> +	 * 2. We also assume that all actual memory hotplug
> +	 *    (as opposed to boot-time) calls to this are only
> +	 *    for contiguous memory regions.  With sparsemem,
> +	 *    this guaranteed is easy because all sections are
> +	 *    contiguous and we never online more than one
> +	 *    section at a time.  Boot-time memory can have holes
> +	 *    anywhere.
> +	 */
> +	if (system_state >= SYSTEM_RUNNING)
> +		return 1;
> +	if (!early_pfn_valid(pfn))
> +		return 0;
> +	if (!early_pfn_in_nid(pfn, nid))
> +		return 0;
> +	return 1;
> +}
> +
>  /*
>   * Initially all pages are reserved - free ones are freed
>   * up by free_all_bootmem() once the early boot process is
> @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
>  	unsigned long pfn;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> -		if (!early_pfn_valid(pfn))
> -			continue;
> -		if (!early_pfn_in_nid(pfn, nid))
> +		if (!can_online_pfn_into_nid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
>  		set_page_links(page, zone, nid, pfn);
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

-- 
Greetings Michael.

  parent reply	other threads:[~2006-12-15 17:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen
2006-12-15 17:11 ` Andy Whitcroft
2006-12-15 17:24   ` Dave Hansen
2006-12-15 19:45     ` Andrew Morton
2006-12-16  8:03       ` KAMEZAWA Hiroyuki
2006-12-18 21:13         ` Dave Hansen
2006-12-18 22:15           ` Christoph Hellwig
2006-12-18 22:54         ` Arnd Bergmann
2006-12-18 23:16           ` Dave Hansen
2006-12-19  0:16             ` KAMEZAWA Hiroyuki
2006-12-19  8:59             ` Arnd Bergmann
2006-12-19 19:34               ` Dave Hansen
2007-01-06  1:10               ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen
2007-01-06  4:52                 ` John Rose
2007-01-07  8:58                   ` Dave Hansen
2007-01-07 12:07                     ` Arnd Bergmann
2007-01-08  6:31                     ` Tim Pepper
2007-01-08  6:47                     ` Tim Pepper
2006-12-15 17:22 ` Michael Buesch [this message]
2006-12-15 17:57   ` [PATCH] Fix sparsemem on Cell Dave Hansen
2006-12-15 20:21 ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2006-12-15 17:14 Dave Hansen
2006-12-17 23:02 ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200612151822.23764.mb@bu3sch.de \
    --to=mb@bu3sch.de \
    --cc=akpm@osdl.org \
    --cc=cbe-oss-dev@ozlabs.org \
    --cc=gone@us.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mkravetz@us.ibm.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).