All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Buesch <mb@bu3sch.de>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: akpm@osdl.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hch@infradead.org, linuxppc-dev@ozlabs.org, paulus@samba.org,
	mkravetz@us.ibm.com, gone@us.ibm.com, cbe-oss-dev@ozlabs.org
Subject: Re: [PATCH] Fix sparsemem on Cell
Date: Fri, 15 Dec 2006 18:22:23 +0100	[thread overview]
Message-ID: <200612151822.23764.mb@bu3sch.de> (raw)
In-Reply-To: <20061215165335.61D9F775@localhost.localdomain>

On Friday 15 December 2006 17:53, Dave Hansen wrote:
> 
> I think the comments added say it pretty well, but I'll repeat it here.
> 
> This fix is pretty similar in concept to the one that Arnd posted
> as a temporary workaround, but I've added a few comments explaining
> what the actual assumptions are, and improved it a wee little bit.
> 
> The end goal here is to simply avoid calling the early_*() functions
> when it is _not_ early.  Those functions stop working as soon as
> free_initmem() is called.  system_state is set to SYSTEM_RUNNING
> just after free_initmem() is called, so it seems appropriate to use
> here.
> 
> I did think twice about actually using SYSTEM_RUNNING because we
> moved away from it in other parts of memory hotplug, but those
> were actually for _allocations_ in favor of slab_is_available(),
> and we really don't care about the slab here.
> 
> The only other assumption is that all memory-hotplug-time pages 
> given to memmap_init_zone() are valid and able to be onlined into
> any any zone after the system is running.  The "valid" part is
> really just a question of whether or not a 'struct page' is there
> for the pfn, and *not* whether there is actual memory.  Since
> all sparsemem sections have contiguous mem_map[]s within them,
> and we only memory hotplug entire sparsemem sections, we can
> be confident that this assumption will hold.
> 
> As for the memory being in the right node, we'll assume tha
> memory hotplug is putting things in the right node.
> 
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> 
> ---
> 
>  lxc-dave/init/main.c     |    4 ++++
>  lxc-dave/mm/page_alloc.c |   28 +++++++++++++++++++++++++---
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff -puN init/main.c~sparsemem-fix init/main.c
> --- lxc/init/main.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/init/main.c	2006-12-15 08:49:53.000000000 -0800
> @@ -770,6 +770,10 @@ static int init(void * unused)
>  	free_initmem();
>  	unlock_kernel();
>  	mark_rodata_ro();
> +	/*
> +	 * Memory hotplug requires that this system_state transition
> +	 * happer after free_initmem().  (see memmap_init_zone())

s/happer/happens/

Other than that, can't this possibly race and crash here?
I mean, it's not a big race window, but it can happen, no?

> +	 */
>  	system_state = SYSTEM_RUNNING;
>  	numa_default_policy();
>  
> diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> --- lxc/mm/page_alloc.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/mm/page_alloc.c	2006-12-15 08:49:53.000000000 -0800
> @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
>  
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
> +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> +{
> +	/*
> +	 * There are two things that make this work:
> +	 * 1. The early_pfn...() functions are __init and
> +	 *    use __initdata.  If the system is < SYSTEM_RUNNING,
> +	 *    those functions and their data will still exist.
> +	 * 2. We also assume that all actual memory hotplug
> +	 *    (as opposed to boot-time) calls to this are only
> +	 *    for contiguous memory regions.  With sparsemem,
> +	 *    this guaranteed is easy because all sections are
> +	 *    contiguous and we never online more than one
> +	 *    section at a time.  Boot-time memory can have holes
> +	 *    anywhere.
> +	 */
> +	if (system_state >= SYSTEM_RUNNING)
> +		return 1;
> +	if (!early_pfn_valid(pfn))
> +		return 0;
> +	if (!early_pfn_in_nid(pfn, nid))
> +		return 0;
> +	return 1;
> +}
> +
>  /*
>   * Initially all pages are reserved - free ones are freed
>   * up by free_all_bootmem() once the early boot process is
> @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
>  	unsigned long pfn;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> -		if (!early_pfn_valid(pfn))
> -			continue;
> -		if (!early_pfn_in_nid(pfn, nid))
> +		if (!can_online_pfn_into_nid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
>  		set_page_links(page, zone, nid, pfn);
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

-- 
Greetings Michael.

WARNING: multiple messages have this Message-ID (diff)
From: Michael Buesch <mb@bu3sch.de>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: linuxppc-dev@ozlabs.org, linux-mm@kvack.org, apw@shadowen.org,
	mkravetz@us.ibm.com, hch@infradead.org, jk@ozlabs.org,
	linux-kernel@vger.kernel.org, akpm@osdl.org, paulus@samba.org,
	benh@kernel.crashing.org, gone@us.ibm.com,
	Dave Hansen <haveblue@us.ibm.com>,
	cbe-oss-dev@ozlabs.org
Subject: Re: [PATCH] Fix sparsemem on Cell
Date: Fri, 15 Dec 2006 18:22:23 +0100	[thread overview]
Message-ID: <200612151822.23764.mb@bu3sch.de> (raw)
In-Reply-To: <20061215165335.61D9F775@localhost.localdomain>

On Friday 15 December 2006 17:53, Dave Hansen wrote:
> 
> I think the comments added say it pretty well, but I'll repeat it here.
> 
> This fix is pretty similar in concept to the one that Arnd posted
> as a temporary workaround, but I've added a few comments explaining
> what the actual assumptions are, and improved it a wee little bit.
> 
> The end goal here is to simply avoid calling the early_*() functions
> when it is _not_ early.  Those functions stop working as soon as
> free_initmem() is called.  system_state is set to SYSTEM_RUNNING
> just after free_initmem() is called, so it seems appropriate to use
> here.
> 
> I did think twice about actually using SYSTEM_RUNNING because we
> moved away from it in other parts of memory hotplug, but those
> were actually for _allocations_ in favor of slab_is_available(),
> and we really don't care about the slab here.
> 
> The only other assumption is that all memory-hotplug-time pages 
> given to memmap_init_zone() are valid and able to be onlined into
> any any zone after the system is running.  The "valid" part is
> really just a question of whether or not a 'struct page' is there
> for the pfn, and *not* whether there is actual memory.  Since
> all sparsemem sections have contiguous mem_map[]s within them,
> and we only memory hotplug entire sparsemem sections, we can
> be confident that this assumption will hold.
> 
> As for the memory being in the right node, we'll assume tha
> memory hotplug is putting things in the right node.
> 
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> 
> ---
> 
>  lxc-dave/init/main.c     |    4 ++++
>  lxc-dave/mm/page_alloc.c |   28 +++++++++++++++++++++++++---
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff -puN init/main.c~sparsemem-fix init/main.c
> --- lxc/init/main.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/init/main.c	2006-12-15 08:49:53.000000000 -0800
> @@ -770,6 +770,10 @@ static int init(void * unused)
>  	free_initmem();
>  	unlock_kernel();
>  	mark_rodata_ro();
> +	/*
> +	 * Memory hotplug requires that this system_state transition
> +	 * happer after free_initmem().  (see memmap_init_zone())

s/happer/happens/

Other than that, can't this possibly race and crash here?
I mean, it's not a big race window, but it can happen, no?

> +	 */
>  	system_state = SYSTEM_RUNNING;
>  	numa_default_policy();
>  
> diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> --- lxc/mm/page_alloc.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/mm/page_alloc.c	2006-12-15 08:49:53.000000000 -0800
> @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
>  
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
> +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> +{
> +	/*
> +	 * There are two things that make this work:
> +	 * 1. The early_pfn...() functions are __init and
> +	 *    use __initdata.  If the system is < SYSTEM_RUNNING,
> +	 *    those functions and their data will still exist.
> +	 * 2. We also assume that all actual memory hotplug
> +	 *    (as opposed to boot-time) calls to this are only
> +	 *    for contiguous memory regions.  With sparsemem,
> +	 *    this guaranteed is easy because all sections are
> +	 *    contiguous and we never online more than one
> +	 *    section at a time.  Boot-time memory can have holes
> +	 *    anywhere.
> +	 */
> +	if (system_state >= SYSTEM_RUNNING)
> +		return 1;
> +	if (!early_pfn_valid(pfn))
> +		return 0;
> +	if (!early_pfn_in_nid(pfn, nid))
> +		return 0;
> +	return 1;
> +}
> +
>  /*
>   * Initially all pages are reserved - free ones are freed
>   * up by free_all_bootmem() once the early boot process is
> @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
>  	unsigned long pfn;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> -		if (!early_pfn_valid(pfn))
> -			continue;
> -		if (!early_pfn_in_nid(pfn, nid))
> +		if (!can_online_pfn_into_nid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
>  		set_page_links(page, zone, nid, pfn);
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

-- 
Greetings Michael.

WARNING: multiple messages have this Message-ID (diff)
From: Michael Buesch <mb@bu3sch.de>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: linuxppc-dev@ozlabs.org, linux-mm@kvack.org, apw@shadowen.org,
	mkravetz@us.ibm.com, hch@infradead.org, jk@ozlabs.org,
	linux-kernel@vger.kernel.org, akpm@osdl.org, paulus@samba.org,
	benh@kernel.crashing.org, gone@us.ibm.com,
	cbe-oss-dev@ozlabs.org
Subject: Re: [PATCH] Fix sparsemem on Cell
Date: Fri, 15 Dec 2006 18:22:23 +0100	[thread overview]
Message-ID: <200612151822.23764.mb@bu3sch.de> (raw)
In-Reply-To: <20061215165335.61D9F775@localhost.localdomain>

On Friday 15 December 2006 17:53, Dave Hansen wrote:
> 
> I think the comments added say it pretty well, but I'll repeat it here.
> 
> This fix is pretty similar in concept to the one that Arnd posted
> as a temporary workaround, but I've added a few comments explaining
> what the actual assumptions are, and improved it a wee little bit.
> 
> The end goal here is to simply avoid calling the early_*() functions
> when it is _not_ early.  Those functions stop working as soon as
> free_initmem() is called.  system_state is set to SYSTEM_RUNNING
> just after free_initmem() is called, so it seems appropriate to use
> here.
> 
> I did think twice about actually using SYSTEM_RUNNING because we
> moved away from it in other parts of memory hotplug, but those
> were actually for _allocations_ in favor of slab_is_available(),
> and we really don't care about the slab here.
> 
> The only other assumption is that all memory-hotplug-time pages 
> given to memmap_init_zone() are valid and able to be onlined into
> any any zone after the system is running.  The "valid" part is
> really just a question of whether or not a 'struct page' is there
> for the pfn, and *not* whether there is actual memory.  Since
> all sparsemem sections have contiguous mem_map[]s within them,
> and we only memory hotplug entire sparsemem sections, we can
> be confident that this assumption will hold.
> 
> As for the memory being in the right node, we'll assume tha
> memory hotplug is putting things in the right node.
> 
> Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> 
> ---
> 
>  lxc-dave/init/main.c     |    4 ++++
>  lxc-dave/mm/page_alloc.c |   28 +++++++++++++++++++++++++---
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff -puN init/main.c~sparsemem-fix init/main.c
> --- lxc/init/main.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/init/main.c	2006-12-15 08:49:53.000000000 -0800
> @@ -770,6 +770,10 @@ static int init(void * unused)
>  	free_initmem();
>  	unlock_kernel();
>  	mark_rodata_ro();
> +	/*
> +	 * Memory hotplug requires that this system_state transition
> +	 * happer after free_initmem().  (see memmap_init_zone())

s/happer/happens/

Other than that, can't this possibly race and crash here?
I mean, it's not a big race window, but it can happen, no?

> +	 */
>  	system_state = SYSTEM_RUNNING;
>  	numa_default_policy();
>  
> diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> --- lxc/mm/page_alloc.c~sparsemem-fix	2006-12-15 08:49:53.000000000 -0800
> +++ lxc-dave/mm/page_alloc.c	2006-12-15 08:49:53.000000000 -0800
> @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
>  
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
> +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> +{
> +	/*
> +	 * There are two things that make this work:
> +	 * 1. The early_pfn...() functions are __init and
> +	 *    use __initdata.  If the system is < SYSTEM_RUNNING,
> +	 *    those functions and their data will still exist.
> +	 * 2. We also assume that all actual memory hotplug
> +	 *    (as opposed to boot-time) calls to this are only
> +	 *    for contiguous memory regions.  With sparsemem,
> +	 *    this guaranteed is easy because all sections are
> +	 *    contiguous and we never online more than one
> +	 *    section at a time.  Boot-time memory can have holes
> +	 *    anywhere.
> +	 */
> +	if (system_state >= SYSTEM_RUNNING)
> +		return 1;
> +	if (!early_pfn_valid(pfn))
> +		return 0;
> +	if (!early_pfn_in_nid(pfn, nid))
> +		return 0;
> +	return 1;
> +}
> +
>  /*
>   * Initially all pages are reserved - free ones are freed
>   * up by free_all_bootmem() once the early boot process is
> @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
>  	unsigned long pfn;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> -		if (!early_pfn_valid(pfn))
> -			continue;
> -		if (!early_pfn_in_nid(pfn, nid))
> +		if (!can_online_pfn_into_nid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
>  		set_page_links(page, zone, nid, pfn);
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

-- 
Greetings Michael.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-12-15 17:22 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-15 16:53 [PATCH] Fix sparsemem on Cell Dave Hansen
2006-12-15 16:53 ` Dave Hansen
2006-12-15 16:53 ` Dave Hansen
2006-12-15 17:11 ` Andy Whitcroft
2006-12-15 17:11   ` Andy Whitcroft
2006-12-15 17:11   ` Andy Whitcroft
2006-12-15 17:24   ` Dave Hansen
2006-12-15 17:24     ` Dave Hansen
2006-12-15 17:24     ` Dave Hansen
2006-12-15 19:45     ` Andrew Morton
2006-12-15 19:45       ` Andrew Morton
2006-12-15 19:45       ` Andrew Morton
2006-12-16  8:03       ` KAMEZAWA Hiroyuki
2006-12-16  8:03         ` KAMEZAWA Hiroyuki
2006-12-16  8:03         ` KAMEZAWA Hiroyuki
2006-12-18 21:13         ` Dave Hansen
2006-12-18 21:13           ` Dave Hansen
2006-12-18 21:13           ` Dave Hansen
2006-12-18 22:15           ` Christoph Hellwig
2006-12-18 22:15             ` Christoph Hellwig
2006-12-18 22:15             ` Christoph Hellwig
2006-12-18 22:54         ` Arnd Bergmann
2006-12-18 22:54           ` Arnd Bergmann
2006-12-18 22:54           ` Arnd Bergmann
2006-12-18 23:16           ` Dave Hansen
2006-12-18 23:16             ` Dave Hansen
2006-12-18 23:16             ` Dave Hansen
2006-12-19  0:16             ` KAMEZAWA Hiroyuki
2006-12-19  0:16               ` KAMEZAWA Hiroyuki
2006-12-19  0:16               ` KAMEZAWA Hiroyuki
2006-12-19  8:59             ` Arnd Bergmann
2006-12-19  8:59               ` Arnd Bergmann
2006-12-19  8:59               ` Arnd Bergmann
2006-12-19 19:34               ` Dave Hansen
2006-12-19 19:34                 ` Dave Hansen
2006-12-19 19:34                 ` Dave Hansen
2007-01-06  1:10               ` [PATCH] Fix sparsemem on Cell (take 3) Dave Hansen
2007-01-06  1:10                 ` Dave Hansen
2007-01-06  1:10                 ` Dave Hansen
2007-01-06  4:52                 ` John Rose
2007-01-06  4:52                   ` John Rose
2007-01-06  4:52                   ` John Rose
2007-01-07  8:58                   ` Dave Hansen
2007-01-07  8:58                     ` Dave Hansen
2007-01-07  8:58                     ` Dave Hansen
2007-01-07 12:07                     ` Arnd Bergmann
2007-01-07 12:07                       ` Arnd Bergmann
2007-01-07 12:07                       ` Arnd Bergmann
2007-01-08  6:31                     ` Tim Pepper
2007-01-08  6:31                       ` Tim Pepper
2007-01-08  6:47                     ` Tim Pepper
2007-01-08  6:47                       ` Tim Pepper
2007-01-08  6:47                       ` Tim Pepper
2006-12-15 17:22 ` Michael Buesch [this message]
2006-12-15 17:22   ` [PATCH] Fix sparsemem on Cell Michael Buesch
2006-12-15 17:22   ` Michael Buesch
2006-12-15 17:57   ` Dave Hansen
2006-12-15 17:57     ` Dave Hansen
2006-12-15 17:57     ` Dave Hansen
2006-12-15 20:21 ` Benjamin Herrenschmidt
2006-12-15 20:21   ` Benjamin Herrenschmidt
2006-12-15 20:21   ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2006-12-15 17:14 Dave Hansen
2006-12-15 17:14 ` Dave Hansen
2006-12-15 17:14 ` Dave Hansen
2006-12-17 23:02 ` Arnd Bergmann
2006-12-17 23:02   ` Arnd Bergmann
2006-12-17 23:02   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200612151822.23764.mb@bu3sch.de \
    --to=mb@bu3sch.de \
    --cc=akpm@osdl.org \
    --cc=cbe-oss-dev@ozlabs.org \
    --cc=gone@us.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mkravetz@us.ibm.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.