All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Tang Chen <tangchen@cn.fujitsu.com>,
	akpm@linux-foundation.org, santosh.shilimkar@ti.com,
	grygorii.strashko@ti.com, yinghai@kernel.org,
	isimatu.yasuaki@jp.fujitsu.co, fabf@skynet.be, nzimmer@sgi.com,
	wangnan0@huawei.com, vdavydov@parallels.com, toshi.kani@hp.com,
	phacht@linux.vnet.ibm.com, tj@kernel.org,
	kirill.shutemov@linux.intel.com, riel@redhat.com,
	luto@amacapital.net, hpa@linux.intel.com, aarcange@redhat.com,
	qiuxishi@huawei.com, mgorman@suse.de, rientjes@google.com,
	hannes@cmpxchg.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Gu Zheng <guz.fnst@cn.fujitsu.com>
Subject: Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages().
Date: Tue, 4 Nov 2014 17:10:53 +0900	[thread overview]
Message-ID: <54588A0D.1060207@jp.fujitsu.com> (raw)
In-Reply-To: <1414748812-22610-3-git-send-email-tangchen@cn.fujitsu.com>

(2014/10/31 18:46), Tang Chen wrote:
> When we are doing memory hot-add, the following functions are called:
> 
> add_memory()
> |--> hotadd_new_pgdat()
>       |--> free_area_init_node()
>            |--> free_area_init_core()
>                 |--> zone->present_pages = realsize;           /* 1. zone is populated */
>                      |--> zone_pcp_init()
>                           |--> zone->pageset = &boot_pageset;  /* 2. zone->pageset is set to boot_pageset */
> 
> There are two problems here:
> 1. Zones could be populated before any memory is onlined.
> 2. All the zones on a newly added node have the same pageset pointing to boot_pageset.
> 
> The above two problems will result in the following problem:
> When we online memory on one node, e.g node2, the following code is executed:
> 
> online_pages()
> {
>          ......
>          if (!populated_zone(zone)) {
>                  need_zonelists_rebuild = 1;
>                  build_all_zonelists(NULL, zone);
>          }
>          ......
> }
> 
> Because of problem 1, the zone has been populated, and the build_all_zonelists()
>                        will never called. zone->pageset won't be updated.
> Because of problem 2, All the zones on a newly added node have the same pageset
>                        pointing to boot_pageset.
> And as a result, when we online memory on node2, node3's meminfo will corrupt.
> Pages on node2 may be freed to node3.
> 
> # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done
> # cat /sys/devices/system/node/node2/meminfo
> Node 2 MemTotal:       33554432 kB
> Node 2 MemFree:        33549092 kB
> Node 2 MemUsed:            5340 kB
> ......
> # cat /sys/devices/system/node/node3/meminfo
> Node 3 MemTotal:              0 kB
> Node 3 MemFree:               248 kB                    /* corrupted */
> Node 3 MemUsed:               0 kB
> ......
> 
> We have to populate some zones before onlining memory, otherwise no memory could be onlined.
> So when onlining pages, we should also check if zone->pageset is pointing to boot_pageset.
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>   include/linux/mm.h  | 1 +
>   mm/memory_hotplug.c | 6 +++++-
>   mm/page_alloc.c     | 5 +++++
>   3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 02d11ee..83e6505 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...);
>   
>   extern void setup_per_cpu_pageset(void);
>   
> +extern bool zone_pcp_initialized(struct zone *zone);
>   extern void zone_pcp_update(struct zone *zone);
>   extern void zone_pcp_reset(struct zone *zone);
>   
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3ab01b2..bc0de0f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>   	 * If this zone is not populated, then it is not in zonelist.
>   	 * This means the page allocator ignores this zone.
>   	 * So, zonelist must be updated after online.
> +	 *
> +	 * If this zone is populated, zone->pageset could be initialized
> +	 * to boot_pageset for the first time a node is added. If so,
> +	 * zone->pageset should be allocated.
>   	 */
>   	mutex_lock(&zonelists_mutex);
> -	if (!populated_zone(zone)) {

> +	if (!populated_zone(zone) || !zone_pcp_initialized(zone)) {
>   		need_zonelists_rebuild = 1;
>   		build_all_zonelists(NULL, zone);
>   	}

Why does zone->present_pages of the hot-added memroy have valid value?
In my understading, the present_pages is incremented/decremented by memory
online/offline. So when hot adding memory, the zone->present_pages of the
memory should be 0.

Thanks,
Yasuaki Ishimatsu


> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 736d8e1..4ff1540 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone)
>   }
>   #endif
>   
> +bool zone_pcp_initialized(struct zone *zone)
> +{
> +	return (zone->pageset != &boot_pageset);
> +}
> +
>   void zone_pcp_reset(struct zone *zone)
>   {
>   	unsigned long flags;
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Tang Chen <tangchen@cn.fujitsu.com>, <akpm@linux-foundation.org>,
	<santosh.shilimkar@ti.com>, <grygorii.strashko@ti.com>,
	<yinghai@kernel.org>, <isimatu.yasuaki@jp.fujitsu.co>,
	<fabf@skynet.be>, <nzimmer@sgi.com>, <wangnan0@huawei.com>,
	<vdavydov@parallels.com>, <toshi.kani@hp.com>,
	<phacht@linux.vnet.ibm.com>, <tj@kernel.org>,
	<kirill.shutemov@linux.intel.com>, <riel@redhat.com>,
	<luto@amacapital.net>, <hpa@linux.intel.com>,
	<aarcange@redhat.com>, <qiuxishi@huawei.com>, <mgorman@suse.de>,
	<rientjes@google.com>, <hannes@cmpxchg.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	Gu Zheng <guz.fnst@cn.fujitsu.com>
Subject: Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages().
Date: Tue, 4 Nov 2014 17:10:53 +0900	[thread overview]
Message-ID: <54588A0D.1060207@jp.fujitsu.com> (raw)
In-Reply-To: <1414748812-22610-3-git-send-email-tangchen@cn.fujitsu.com>

(2014/10/31 18:46), Tang Chen wrote:
> When we are doing memory hot-add, the following functions are called:
> 
> add_memory()
> |--> hotadd_new_pgdat()
>       |--> free_area_init_node()
>            |--> free_area_init_core()
>                 |--> zone->present_pages = realsize;           /* 1. zone is populated */
>                      |--> zone_pcp_init()
>                           |--> zone->pageset = &boot_pageset;  /* 2. zone->pageset is set to boot_pageset */
> 
> There are two problems here:
> 1. Zones could be populated before any memory is onlined.
> 2. All the zones on a newly added node have the same pageset pointing to boot_pageset.
> 
> The above two problems will result in the following problem:
> When we online memory on one node, e.g node2, the following code is executed:
> 
> online_pages()
> {
>          ......
>          if (!populated_zone(zone)) {
>                  need_zonelists_rebuild = 1;
>                  build_all_zonelists(NULL, zone);
>          }
>          ......
> }
> 
> Because of problem 1, the zone has been populated, and the build_all_zonelists()
>                        will never called. zone->pageset won't be updated.
> Because of problem 2, All the zones on a newly added node have the same pageset
>                        pointing to boot_pageset.
> And as a result, when we online memory on node2, node3's meminfo will corrupt.
> Pages on node2 may be freed to node3.
> 
> # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done
> # cat /sys/devices/system/node/node2/meminfo
> Node 2 MemTotal:       33554432 kB
> Node 2 MemFree:        33549092 kB
> Node 2 MemUsed:            5340 kB
> ......
> # cat /sys/devices/system/node/node3/meminfo
> Node 3 MemTotal:              0 kB
> Node 3 MemFree:               248 kB                    /* corrupted */
> Node 3 MemUsed:               0 kB
> ......
> 
> We have to populate some zones before onlining memory, otherwise no memory could be onlined.
> So when onlining pages, we should also check if zone->pageset is pointing to boot_pageset.
> 
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>   include/linux/mm.h  | 1 +
>   mm/memory_hotplug.c | 6 +++++-
>   mm/page_alloc.c     | 5 +++++
>   3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 02d11ee..83e6505 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...);
>   
>   extern void setup_per_cpu_pageset(void);
>   
> +extern bool zone_pcp_initialized(struct zone *zone);
>   extern void zone_pcp_update(struct zone *zone);
>   extern void zone_pcp_reset(struct zone *zone);
>   
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3ab01b2..bc0de0f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
>   	 * If this zone is not populated, then it is not in zonelist.
>   	 * This means the page allocator ignores this zone.
>   	 * So, zonelist must be updated after online.
> +	 *
> +	 * If this zone is populated, zone->pageset could be initialized
> +	 * to boot_pageset for the first time a node is added. If so,
> +	 * zone->pageset should be allocated.
>   	 */
>   	mutex_lock(&zonelists_mutex);
> -	if (!populated_zone(zone)) {

> +	if (!populated_zone(zone) || !zone_pcp_initialized(zone)) {
>   		need_zonelists_rebuild = 1;
>   		build_all_zonelists(NULL, zone);
>   	}

Why does zone->present_pages of the hot-added memroy have valid value?
In my understading, the present_pages is incremented/decremented by memory
online/offline. So when hot adding memory, the zone->present_pages of the
memory should be 0.

Thanks,
Yasuaki Ishimatsu


> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 736d8e1..4ff1540 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone)
>   }
>   #endif
>   
> +bool zone_pcp_initialized(struct zone *zone)
> +{
> +	return (zone->pageset != &boot_pageset);
> +}
> +
>   void zone_pcp_reset(struct zone *zone)
>   {
>   	unsigned long flags;
> 



  reply	other threads:[~2014-11-04  8:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-31  9:46 [PATCH 0/2] Fix node meminfo corruption Tang Chen
2014-10-31  9:46 ` Tang Chen
2014-10-31  9:46 ` [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat Tang Chen
2014-10-31  9:46   ` Tang Chen
2014-11-04  1:50   ` Xishi Qiu
2014-11-04  1:50     ` Xishi Qiu
2014-10-31  9:46 ` [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages() Tang Chen
2014-10-31  9:46   ` Tang Chen
2014-11-04  8:10   ` Yasuaki Ishimatsu [this message]
2014-11-04  8:10     ` Yasuaki Ishimatsu
2014-11-04  8:46     ` Tang Chen
2014-11-04  8:46       ` Tang Chen
2014-11-05  1:01   ` Kamezawa Hiroyuki
2014-11-05  1:01     ` Kamezawa Hiroyuki
2014-11-05  2:17     ` Tang Chen
2014-11-05  2:17       ` Tang Chen
2014-11-04  1:10 ` [PATCH 0/2] Fix node meminfo corruption Tang Chen
2014-11-04  1:10   ` Tang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54588A0D.1060207@jp.fujitsu.com \
    --to=isimatu.yasuaki@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=fabf@skynet.be \
    --cc=grygorii.strashko@ti.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@linux.intel.com \
    --cc=isimatu.yasuaki@jp.fujitsu.co \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mgorman@suse.de \
    --cc=nzimmer@sgi.com \
    --cc=phacht@linux.vnet.ibm.com \
    --cc=qiuxishi@huawei.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=santosh.shilimkar@ti.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=vdavydov@parallels.com \
    --cc=wangnan0@huawei.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.