All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Reza Arbab <arbab@linux.vnet.ibm.com>,
	Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
	qiuxishi@huawei.com, Kani Toshimitsu <toshi.kani@hpe.com>,
	slaoub@gmail.com, Joonsoo Kim <js1304@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Daniel Kiper <daniel.kiper@oracle.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
Date: Thu, 22 Jun 2017 10:32:43 +0800	[thread overview]
Message-ID: <20170622023243.GA1242@WeideMacBook-Pro.local> (raw)
In-Reply-To: <20170601083746.4924-3-mhocko@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 6116 bytes --]

On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
>From: Michal Hocko <mhocko@suse.com>
>
>Heiko Carstens has noticed that he can generate overlapping zones for
>ZONE_DMA and ZONE_NORMAL:
>DMA      [mem 0x0000000000000000-0x000000007fffffff]
>Normal   [mem 0x0000000080000000-0x000000017fffffff]
>
>$ cat /sys/devices/system/memory/block_size_bytes
>10000000
>$ cat /sys/devices/system/memory/memory5/valid_zones
>DMA
>$ echo 0 > /sys/devices/system/memory/memory5/online
>$ cat /sys/devices/system/memory/memory5/valid_zones
>Normal
>$ echo 1 > /sys/devices/system/memory/memory5/online
>Normal
>
>$ cat /proc/zoneinfo
>Node 0, zone      DMA
>spanned  524288        <-----
>present  458752
>managed  455078
>start_pfn:           0 <-----
>
>Node 0, zone   Normal
>spanned  720896
>present  589824
>managed  571648
>start_pfn:           327680 <-----
>
>The reason is that we assume that the default zone for kernel onlining
>is ZONE_NORMAL. This was a simplification introduced by the memory
>hotplug rework and it is easily fixable by checking the range overlap in
>the zone order and considering the first matching zone as the default
>one. If there is no such zone then assume ZONE_NORMAL as we have been
>doing so far.
>
>Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
>Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: Michal Hocko <mhocko@suse.com>
>---
> drivers/base/memory.c          |  2 +-
> include/linux/memory_hotplug.h |  2 ++
> mm/memory_hotplug.c            | 27 ++++++++++++++++++++++++---
> 3 files changed, 27 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index b86fda30ce62..c7c4e0325cdb 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,
> 
> 	nid = pfn_to_nid(start_pfn);
> 	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
>-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
>+		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
> 		append = true;
> 	}
> 
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 9e0249d0f5e4..ed167541e4fc 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> 					  unsigned long pnum);
> extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> 		int online_type);
>+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
>+		unsigned long nr_pages);
> #endif /* __LINUX_MEMORY_HOTPLUG_H */
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b3895fd609f4..a0348de3e18c 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> {
> 	struct pglist_data *pgdat = NODE_DATA(nid);
> 	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>-	struct zone *normal_zone =  &pgdat->node_zones[ZONE_NORMAL];
>+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> 
> 	/*
> 	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
>@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> 			return true;
> 		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
>-		return zone_end_pfn(normal_zone) <= pfn;
>+		return zone_end_pfn(default_zone) <= pfn;
> 	}
> 
> 	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
>@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> }
> 
> /*
>+ * Returns a default kernel memory zone for the given pfn range.
>+ * If no kernel zone covers this pfn range it will automatically go
>+ * to the ZONE_NORMAL.
>+ */
>+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>+		unsigned long nr_pages)
>+{
>+	struct pglist_data *pgdat = NODE_DATA(nid);
>+	int zid;
>+
>+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
>+		struct zone *zone = &pgdat->node_zones[zid];
>+
>+		if (zone_intersects(zone, start_pfn, nr_pages))
>+			return zone;
>+	}
>+
>+	return &pgdat->node_zones[ZONE_NORMAL];
>+}

Hmm... a corner case jumped into my mind which may invalidate this
calculation.

The case is:


       Zone:         | DMA   | DMA32      | NORMAL       |
                     v       v            v              v
       
       Phy mem:      [           ]     [                  ]
       
                     ^           ^     ^                  ^
       Node:         |   Node0   |     |      Node1       |
                             A   B     C  D


The key point is
1. There is a hole between Node0 and Node1
2. The hole sits in a non-normal zone

Let's mark the boundary as A, B, C, D. Then we would have
node0->zone[dma21] = [A, B]
node1->zone[dma32] = [C, D]

If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
if we want to hotplug a range in [B, C] on node1, it will introduce the
overlapped zone. Because the range [B, C] intersects none of the existing
zones on node1.

Do you think this is possible?

>+
>+/*
>  * Associates the given pfn range with the given node and the zone appropriate
>  * for the given online type.
>  */
>@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
> 		unsigned long start_pfn, unsigned long nr_pages)
> {
> 	struct pglist_data *pgdat = NODE_DATA(nid);
>-	struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
>+	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> 
> 	if (online_type == MMOP_ONLINE_KEEP) {
> 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>-- 
>2.11.0

-- 
Wei Yang
Help you, Help me

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2017-06-22  2:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-01  8:37 [PATCH 0/2] memory hotplug follow up fixes Michal Hocko
2017-06-01  8:37 ` Michal Hocko
2017-06-01  8:37 ` [PATCH 1/2] mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior Michal Hocko
2017-06-01  8:37   ` Michal Hocko
2017-06-01 12:32   ` Vlastimil Babka
2017-06-01 12:32     ` Vlastimil Babka
2017-06-01 12:40     ` Michal Hocko
2017-06-01 12:40       ` Michal Hocko
2017-06-01  8:37 ` [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Michal Hocko
2017-06-01  8:37   ` Michal Hocko
2017-06-01 12:57   ` Vlastimil Babka
2017-06-01 12:57     ` Vlastimil Babka
2017-06-22  2:32   ` Wei Yang [this message]
2017-06-22 18:16     ` Michal Hocko
2017-06-22 18:16       ` Michal Hocko
2017-06-23  1:37       ` Wei Yang
  -- strict thread matches above, loose matches on Subject: below --
2017-05-31  6:24 [-next] memory hotplug regression Michal Hocko
2017-05-31  6:26 ` [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Michal Hocko
2017-05-31  6:26   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170622023243.GA1242@WeideMacBook-Pro.local \
    --to=richard.weiyang@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arbab@linux.vnet.ibm.com \
    --cc=daniel.kiper@oracle.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=imammedo@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=qiuxishi@huawei.com \
    --cc=rientjes@google.com \
    --cc=slaoub@gmail.com \
    --cc=toshi.kani@hpe.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.