From: Tang Chen <tangchen@cn.fujitsu.com>
To: rjw@sisk.pl, lenb@kernel.org, tglx@linutronix.de, mingo@elte.hu,
hpa@zytor.com, akpm@linux-foundation.org, tj@kernel.org,
toshi.kani@hp.com, zhangyanfei@cn.fujitsu.com,
liwanp@linux.vnet.ibm.com, trenn@suse.de, yinghai@kernel.org,
jiang.liu@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com,
mgorman@suse.de, minchan@kernel.org, mina86@mina86.com,
gong.chen@linux.intel.com, vasilis.liaskovitis@profitbricks.com,
lwoodman@redhat.com, riel@redhat.com, jweiner@redhat.com,
prarit@redhat.com
Cc: x86@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-acpi@vger.kernel.org
Subject: [PATCH v3 5/5] mem-hotplug: Introduce movablenode boot option to control memblock allocation direction.
Date: Fri, 13 Sep 2013 17:30:55 +0800 [thread overview]
Message-ID: <1379064655-20874-6-git-send-email-tangchen@cn.fujitsu.com> (raw)
In-Reply-To: <1379064655-20874-1-git-send-email-tangchen@cn.fujitsu.com>
The Hot-Pluggable fired in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel,
it cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.
But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.
So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movablenode boot option to allow users to
choose to reserve hotpluggable memory and set it as ZONE_MOVABLE or not.
Users can specify "movablenode" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.
After memblock is ready, before SRAT is parsed, we should allocate memory
near the kernel image. So this patch does the following:
1. After memblock is ready, make memblock allocate memory from low address
to high.
2. After SRAT is parsed, make memblock behave as default, allocate memory
from high address to low.
This behavior is controlled by movablenode boot option.
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
Documentation/kernel-parameters.txt | 15 ++++++++++++++
arch/x86/kernel/setup.c | 36 +++++++++++++++++++++++++++++++++++
include/linux/memory_hotplug.h | 5 ++++
mm/memory_hotplug.c | 9 ++++++++
4 files changed, 65 insertions(+), 0 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1a036cd..8c056c4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1769,6 +1769,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
+ movablenode [KNL,X86] This parameter enables/disables the
+ kernel to arrange hotpluggable memory ranges recorded
+ in ACPI SRAT(System Resource Affinity Table) as
+ ZONE_MOVABLE. And these memory can be hot-removed when
+ the system is up.
+ By specifying this option, all the hotpluggable memory
+ will be in ZONE_MOVABLE, which the kernel cannot use.
+ This will cause NUMA performance down. For users who
+ care about NUMA performance, just don't use it.
+ If all the memory ranges in the system are hotpluggable,
+ then the ones used by the kernel at early time, such as
+ kernel code and data segments, initrd file and so on,
+ won't be set as ZONE_MOVABLE, and won't be hotpluggable.
+ Otherwise the kernel won't have enough memory to boot.
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 36cfce3..617af9a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1094,6 +1094,31 @@ void __init setup_arch(char **cmdline_p)
trim_platform_memory_ranges();
trim_low_memory_range();
+#ifdef CONFIG_MOVABLE_NODE
+ if (movablenode_enable_srat) {
+ /*
+ * Memory used by the kernel cannot be hot-removed because Linux
+ * cannot migrate the kernel pages. When memory hotplug is
+ * enabled, we should prevent memblock from allocating memory
+ * for the kernel.
+ *
+ * ACPI SRAT records all hotpluggable memory ranges. But before
+ * SRAT is parsed, we don't know about it.
+ *
+ * The kernel image is loaded into memory at very early time. We
+ * cannot prevent this anyway. So on NUMA system, we set any
+ * node the kernel resides in as un-hotpluggable.
+ *
+ * Since on modern servers, one node could have double-digit
+ * gigabytes memory, we can assume the memory around the kernel
+ * image is also un-hotpluggable. So before SRAT is parsed, just
+ * allocate memory near the kernel image to try the best to keep
+ * the kernel away from hotpluggable memory.
+ */
+ memblock_set_current_direction(MEMBLOCK_DIRECTION_LOW_TO_HIGH);
+ }
+#endif /* CONFIG_MOVABLE_NODE */
+
init_mem_mapping();
early_trap_pf_init();
@@ -1132,6 +1157,17 @@ void __init setup_arch(char **cmdline_p)
early_acpi_boot_init();
initmem_init();
+
+#ifdef CONFIG_MOVABLE_NODE
+ if (movablenode_enable_srat) {
+ /*
+ * When ACPI SRAT is parsed, which is done in initmem_init(),
+ * set memblock back to the default behavior.
+ */
+ memblock_set_current_direction(MEMBLOCK_DIRECTION_DEFAULT);
+ }
+#endif /* CONFIG_MOVABLE_NODE */
+
memblock_find_dma_reserve();
/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index dd38e62..5d2c07b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,11 @@ enum {
ONLINE_MOVABLE,
};
+#ifdef CONFIG_MOVABLE_NODE
+/* Enable/disable SRAT in movablenode boot option */
+extern bool movablenode_enable_srat;
+#endif /* CONFIG_MOVABLE_NODE */
+
/*
* pgdat resizing functions
*/
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0eb1a1d..8a4c8ff 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1390,6 +1390,15 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
{
return true;
}
+
+bool __initdata movablenode_enable_srat;
+
+static int __init cmdline_parse_movablenode(char *p)
+{
+ movablenode_enable_srat = true;
+ return 0;
+}
+early_param("movablenode", cmdline_parse_movablenode);
#else /* CONFIG_MOVABLE_NODE */
/* ensure the node has NORMAL memory if it is still online */
static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
--
1.7.1
next prev parent reply other threads:[~2013-09-13 9:30 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-13 9:30 [PATCH v3 0/5] x86, memblock: Allocate memory near kernel image before SRAT parsed Tang Chen
2013-09-13 9:30 ` [PATCH v3 1/5] memblock: Introduce allocation direction to memblock Tang Chen
2013-09-14 2:42 ` Jianguo Wu
2013-09-15 13:23 ` chen tang
2013-09-23 15:38 ` Tejun Heo
2013-09-23 16:36 ` Zhang Yanfei
2013-09-13 9:30 ` [PATCH v3 2/5] memblock: Improve memblock to support allocation from lower address Tang Chen
2013-09-13 21:53 ` Toshi Kani
2013-09-16 1:28 ` Zhang Yanfei
2013-09-23 15:50 ` Tejun Heo
2013-09-23 16:44 ` Zhang Yanfei
2013-09-23 18:07 ` Zhang Yanfei
2013-09-23 20:21 ` Tejun Heo
2013-09-24 2:41 ` Zhang Yanfei
2013-09-24 2:46 ` Tejun Heo
2013-09-13 9:30 ` [PATCH v3 3/5] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed Tang Chen
2013-09-13 9:30 ` [PATCH v3 4/5] x86, mem-hotplug: Support initialize page tables from low to high Tang Chen
2013-09-23 15:53 ` Tejun Heo
2013-09-23 16:46 ` Zhang Yanfei
2013-09-13 9:30 ` Tang Chen [this message]
2013-09-23 15:57 ` [PATCH v3 5/5] mem-hotplug: Introduce movablenode boot option to control memblock allocation direction Tejun Heo
2013-09-23 16:58 ` Zhang Yanfei
2013-09-23 17:11 ` Tejun Heo
2013-09-16 11:00 ` [PATCH v3 0/5] x86, memblock: Allocate memory near kernel image before SRAT parsed Zhang Yanfei
2013-09-19 16:57 ` Yanfei Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1379064655-20874-6-git-send-email-tangchen@cn.fujitsu.com \
--to=tangchen@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=gong.chen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=jiang.liu@huawei.com \
--cc=jweiner@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=mingo@elte.hu \
--cc=prarit@redhat.com \
--cc=riel@redhat.com \
--cc=rjw@sisk.pl \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=toshi.kani@hp.com \
--cc=trenn@suse.de \
--cc=vasilis.liaskovitis@profitbricks.com \
--cc=wency@cn.fujitsu.com \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).