From: Wei Yang <richard.weiyang@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
Vlastimil Babka <vbabka@suse.cz>,
Andrea Arcangeli <aarcange@redhat.com>,
Reza Arbab <arbab@linux.vnet.ibm.com>,
Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
qiuxishi@huawei.com, Kani Toshimitsu <toshi.kani@hpe.com>,
slaoub@gmail.com, Joonsoo Kim <js1304@gmail.com>,
Andi Kleen <ak@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Daniel Kiper <daniel.kiper@oracle.com>,
Igor Mammedov <imammedo@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm, memory_hotplug: support movable_node for hotplugable nodes
Date: Thu, 15 Jun 2017 11:29:27 +0800 [thread overview]
Message-ID: <20170615032927.GA17971@WeideMacBook-Pro.local> (raw)
In-Reply-To: <20170608122318.31598-1-mhocko@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 7467 bytes --]
On Thu, Jun 08, 2017 at 02:23:18PM +0200, Michal Hocko wrote:
>From: Michal Hocko <mhocko@suse.com>
>
>movable_node kernel parameter allows to make hotplugable NUMA
>nodes to put all the hotplugable memory into movable zone which
>allows more or less reliable memory hotremove. At least this
>is the case for the NUMA nodes present during the boot (see
>find_zone_movable_pfns_for_nodes).
>
>This is not the case for the memory hotplug, though.
>
> echo online > /sys/devices/system/memory/memoryXYZ/status
>
>will default to a kernel zone (usually ZONE_NORMAL) unless the
>particular memblock is already in the movable zone range which is not
>the case normally when onlining the memory from the udev rule context
>for a freshly hotadded NUMA node. The only option currently is to have a
>special udev rule to echo online_movable to all memblocks belonging to
>such a node which is rather clumsy. Not the mention this is inconsistent
>as well because what ended up in the movable zone during the boot will
>end up in a kernel zone after hotremove & hotadd without special care.
>
>It would be nice to reuse memblock_is_hotpluggable but the runtime
>hotplug doesn't have that information available because the boot and
>hotplug paths are not shared and it would be really non trivial to
>make them use the same code path because the runtime hotplug doesn't
>play with the memblock allocator at all.
>
>Teach move_pfn_range that MMOP_ONLINE_KEEP can use the movable zone if
>movable_node is enabled and the range doesn't overlap with the existing
>normal zone. This should provide a reasonable default onlining strategy.
>
>Strictly speaking the semantic is not identical with the boot time
>initialization because find_zone_movable_pfns_for_nodes covers only the
>hotplugable range as described by the BIOS/FW. From my experience this
>is usually a full node though (except for Node0 which is special and
>never goes away completely). If this turns out to be a problem in the
>real life we can tweak the code to store hotplug flag into memblocks
>but let's keep this simple now.
>
>Signed-off-by: Michal Hocko <mhocko@suse.com>
>---
>
>Hi Andrew,
>I've posted this as an RFC previously [1] and there haven't been any
>objections to the approach so I've dropped the RFC and sending it for
>inclusion. The only change since the last time is the update of the
>documentation to clarify the semantic as suggested by Reza Arbab.
>
>[1] http://lkml.kernel.org/r/20170601122004.32732-1-mhocko@kernel.org
>
> Documentation/memory-hotplug.txt | 12 +++++++++---
> mm/memory_hotplug.c | 19 ++++++++++++++++---
> 2 files changed, 25 insertions(+), 6 deletions(-)
>
>diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
>index 670f3ded0802..5c628e19d6cd 100644
>--- a/Documentation/memory-hotplug.txt
>+++ b/Documentation/memory-hotplug.txt
>@@ -282,20 +282,26 @@ offlined it is possible to change the individual block's state by writing to the
> % echo online > /sys/devices/system/memory/memoryXXX/state
>
> This onlining will not change the ZONE type of the target memory block,
>-If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
>+If the memory block doesn't belong to any zone an appropriate kernel zone
>+(usually ZONE_NORMAL) will be used unless movable_node kernel command line
>+option is specified when ZONE_MOVABLE will be used.
>+
>+You can explicitly request to associate it with ZONE_MOVABLE by
>
> % echo online_movable > /sys/devices/system/memory/memoryXXX/state
> (NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE)
>
>-And if the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
>+Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:
>
> % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
> (NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL)
>
>+An explicit zone onlining can fail (e.g. when the range is already within
>+and existing and incompatible zone already).
>+
> After this, memory block XXX's state will be 'online' and the amount of
> available memory will be increased.
>
>-Currently, newly added memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA).
> This may be changed in future.
>
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b98fb0b3ae11..74d75583736c 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -943,6 +943,19 @@ struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> return &pgdat->node_zones[ZONE_NORMAL];
> }
>
>+static inline bool movable_pfn_range(int nid, struct zone *default_zone,
>+ unsigned long start_pfn, unsigned long nr_pages)
>+{
>+ if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
>+ MMOP_ONLINE_KERNEL))
>+ return true;
>+
>+ if (!movable_node_is_enabled())
>+ return false;
>+
>+ return !zone_intersects(default_zone, start_pfn, nr_pages);
>+}
>+
To be honest, I don't understand this clearly.
move_pfn_range() will choose and move the range to a zone based on the
online_type, where we have two cases:
1. ONLINE_MOVABLE -> ZONE_MOVABLE will be chosen
2. ONLINE_KEEP -> ZONE_NORMAL is the default while ZONE_MOVABLE will be
chosen in case movable_pfn_range() returns true.
There are three conditions in movable_pfn_range():
1. Not allowed in kernel_zone, returns true
2. Movable_node not enabled, return false
3. Range [start_pfn, start_pfn + nr_pages) doesn't intersect with
default_zone, return true
The first one is inherited from original code, so lets look at the other two.
Number 3 is easy to understand, if the hot-added range is already part of
ZONE_NORMAL, use it.
Number 2 makes me confused. If movable_node is not enabled, ZONE_NORMAL will
be chosen. If movable_node is enabled, it still depends on other two
condition. So how a memory_block is onlined to ZONE_MOVABLE because
movable_node is enabled? What I see is you would forbid a memory_block to be
onlined to ZONE_MOVABLE when movable_node is not enabled. Instead of you would
online a memory_block to ZONE_MOVABLE when movable_node is enabled, which is
implied in your change log.
BTW, would you mind giving me these two information?
1. Which branch your code is based on? I have cloned your
git(//git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git), while still see
some difference.
2. Any example or test case I could try your patch and see the difference? It
would be better if it could run in qemu+kvm.
> /*
> * Associates the given pfn range with the given node and the zone appropriate
> * for the given online type.
>@@ -958,10 +971,10 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
> /*
> * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
> * movable zone if that is not possible (e.g. we are within
>- * or past the existing movable zone)
>+ * or past the existing movable zone). movable_node overrides
>+ * this default and defaults to movable zone
> */
>- if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
>- MMOP_ONLINE_KERNEL))
>+ if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
> zone = movable_zone;
> } else if (online_type == MMOP_ONLINE_MOVABLE) {
> zone = &pgdat->node_zones[ZONE_MOVABLE];
>--
>2.11.0
--
Wei Yang
Help you, Help me
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2017-06-15 3:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-08 12:23 [PATCH] mm, memory_hotplug: support movable_node for hotplugable nodes Michal Hocko
2017-06-08 12:23 ` Michal Hocko
2017-06-10 14:33 ` Wei Yang
2017-06-12 6:35 ` Michal Hocko
2017-06-12 6:35 ` Michal Hocko
2017-06-11 1:45 ` Wei Yang
2017-06-12 6:37 ` Michal Hocko
2017-06-12 6:37 ` Michal Hocko
2017-06-12 4:28 ` Wei Yang
2017-06-12 6:45 ` Michal Hocko
2017-06-12 6:45 ` Michal Hocko
2017-06-14 9:06 ` Wei Yang
2017-06-14 9:07 ` Vlastimil Babka
2017-06-14 9:07 ` Vlastimil Babka
2017-06-15 1:03 ` Wei Yang
2017-06-15 3:13 ` Wei Yang
2017-06-15 8:16 ` Michal Hocko
2017-06-15 8:16 ` Michal Hocko
2017-06-12 8:58 ` Vlastimil Babka
2017-06-12 8:58 ` Vlastimil Babka
2017-06-12 11:12 ` [PATCH -v2] " Michal Hocko
2017-06-12 11:12 ` Michal Hocko
2017-06-15 3:29 ` Wei Yang [this message]
2017-06-15 8:24 ` [PATCH] " Michal Hocko
2017-06-15 8:24 ` Michal Hocko
2017-06-15 15:43 ` Reza Arbab
2017-06-15 15:43 ` Reza Arbab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170615032927.GA17971@WeideMacBook-Pro.local \
--to=richard.weiyang@gmail.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=arbab@linux.vnet.ibm.com \
--cc=daniel.kiper@oracle.com \
--cc=imammedo@redhat.com \
--cc=js1304@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=qiuxishi@huawei.com \
--cc=rientjes@google.com \
--cc=slaoub@gmail.com \
--cc=toshi.kani@hpe.com \
--cc=vbabka@suse.cz \
--cc=vkuznets@redhat.com \
--cc=yasu.isimatu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.