* [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE.
@ 2013-04-19 9:31 Tang Chen
2013-04-19 9:31 ` [PATCH v1 01/12] x86: get pg_data_t's memory from other node Tang Chen
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
In memory hotplug situation, the hotpluggable memory should be
arranged in ZONE_MOVABLE because memory in ZONE_NORMAL may be
used by kernel, and Linux cannot migrate pages used by kernel.
So we need a way to specify hotpluggable memory as movable. It
should be as easy as possible.
According to ACPI spec 5.0, SRAT table has memory affinity
structure and the structure has Hot Pluggable Filed.
See "5.2.16.2 Memory Affinity Structure".
If we use the information, we might be able to specify hotpluggable
memory by firmware. For example, if Hot Pluggable Filed is enabled,
kernel sets the memory as movable memory.
To achieve this goal, we need to do the following:
1. Prevent memblock from allocating hotpluggable memroy for kernel.
This is done by reserving hotpluggable memory in memblock as the
folowing steps:
1) Parse SRAT early enough so that memblock knows which memory
is hotpluggable.
2) Add a "flags" member to memblock so that it is able to tell
which memory is hotpluggable when freeing it to buddy.
2. Free hotpluggable memory to buddy system when memory initialization
is done.
3. Arrange hotpluggable memory in ZONE_MOVABLE.
(This will cause NUMA performance decreased)
4. Provide a user interface to enable/disable this functionality.
(This is useful for those who don't use memory hotplug and who don't
want to lose their NUMA performance.)
This patch-set does the following:
patch1: Fix a little problem.
patch2: Have Hot-Pluggable Field in SRAT printed when parsing SRAT.
patch4,5: Introduce hotpluggable field to numa_meminfo.
patch6,7: Introduce flags to memblock, and keep the public APIs prototype
unmodified.
patch8: Reserve node-life-cycle memory as MEMBLK_LOCAL_NODE with memblock.
patch9,10: Reserve hotpluggable memory as MEMBLK_HOTPLUGGABLE with memblock,
and free it to buddy when memory initialization is done.
patch3,11,12: Improve "movablecore" boot option to support "movablecore=acpi".
This patch-set is based on Yinghai's
"x86, ACPI, numa: Parse numa info early" patch-set.
Please refer to:
v1: https://lkml.org/lkml/2013/3/7/642
v2: https://lkml.org/lkml/2013/3/10/47
v3: https://lkml.org/lkml/2013/4/4/639
v4: https://lkml.org/lkml/2013/4/11/829
And Yinghai's patch did the following things:
1) Parse SRAT early enough.
2)Allocate pagetable pages in local node.
Tang Chen (11):
acpi: Print Hot-Pluggable Field in SRAT.
page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using
SRAT.
x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct
numa_meminfo.
x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup
numa_meminfo.
memblock, numa: Introduce flag into memblock.
x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve
node-life-cycle data.
x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark
and reserve hotpluggable memory.
x86, memblock, mem-hotplug: Free hotpluggable memory reserved by
memblock.
x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher
priority.
doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot
option.
Yasuaki Ishimatsu (1):
x86: get pg_data_t's memory from other node
Documentation/kernel-parameters.txt | 8 ++
arch/x86/include/asm/numa.h | 3 +-
arch/x86/kernel/apic/numaq_32.c | 2 +-
arch/x86/mm/amdtopology.c | 3 +-
arch/x86/mm/init.c | 16 +++-
arch/x86/mm/numa.c | 60 ++++++++++++++---
arch/x86/mm/numa_internal.h | 1 +
arch/x86/mm/srat.c | 11 ++-
include/linux/memblock.h | 16 +++++
include/linux/memory_hotplug.h | 3 +
mm/memblock.c | 127 ++++++++++++++++++++++++++++++----
mm/nobootmem.c | 3 +
mm/page_alloc.c | 37 ++++++++++-
13 files changed, 253 insertions(+), 37 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v1 01/12] x86: get pg_data_t's memory from other node
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 02/12] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
arch/x86/mm/numa.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 11acdf6..4f754e6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -214,10 +214,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
* Allocate node data. Try node-local memory and then any node.
* Never allocate in DMA zone.
*/
- nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+ nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
- pr_err("Cannot find %zu bytes in node %d\n",
- nd_size, nid);
+ pr_err("Cannot find %zu bytes in any node\n", nd_size);
return;
}
nd = __va(nd_pa);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 02/12] acpi: Print Hot-Pluggable Field in SRAT.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
2013-04-19 9:31 ` [PATCH v1 01/12] x86: get pg_data_t's memory from other node Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 03/12] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
The Hot-Pluggable field in SRAT suggests if the memory could be
hotplugged while the system is running. Print it as well when
parsing SRAT will help users to know which memory is hotpluggable.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/srat.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 443f9ef..5055fa7 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -146,6 +146,7 @@ int __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
u64 start, end;
+ u32 hotpluggable;
int node, pxm;
if (srat_disabled())
@@ -154,7 +155,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
goto out_err_bad_srat;
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
goto out_err;
- if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+ hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
+ if (hotpluggable && !save_add_info())
goto out_err;
start = ma->base_address;
@@ -174,9 +176,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
node_set(node, numa_nodes_parsed);
- printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
+ printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
node, pxm,
- (unsigned long long) start, (unsigned long long) end - 1);
+ (unsigned long long) start, (unsigned long long) end - 1,
+ hotpluggable ? "Hot Pluggable" : "");
return 0;
out_err_bad_srat:
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 03/12] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
2013-04-19 9:31 ` [PATCH v1 01/12] x86: get pg_data_t's memory from other node Tang Chen
2013-04-19 9:31 ` [PATCH v1 02/12] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 04/12] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
The Hot-Pluggable Fired in SRAT specified which memory ranges are hotpluggable.
We will arrange hotpluggable memory as ZONE_MOVABLE for users who want to use
memory hotplug functionality. But this will cause NUMA performance decreased
because kernel cannot use ZONE_MOVABLE.
So we improve movablecore boot option to allow those who want to use memory
hotplug functionality to enable using SRAT info to arrange movable memory.
Users can specify "movablecore=acpi" in kernel commandline to enable this
functionality.
For those who don't use memory hotplug or who don't want to lose their NUMA
performance, just don't specify anything. The kernel will work as before.
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
include/linux/memory_hotplug.h | 3 +++
mm/page_alloc.c | 13 +++++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index b6a3be7..18fe2a3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -33,6 +33,9 @@ enum {
ONLINE_MOVABLE,
};
+/* Enable/disable SRAT in movablecore boot option */
+extern bool movablecore_enable_srat;
+
/*
* pgdat resizing functions
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f368db4..b9ea143 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,8 @@ static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+bool __initdata movablecore_enable_srat = false;
+
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
EXPORT_SYMBOL(movable_zone);
@@ -5025,6 +5027,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
}
}
+static void __init cmdline_movablecore_srat(char *p)
+{
+ if (p && !strcmp(p, "acpi"))
+ movablecore_enable_srat = true;
+}
+
static int __init cmdline_parse_core(char *p, unsigned long *core)
{
unsigned long long coremem;
@@ -5055,6 +5063,11 @@ static int __init cmdline_parse_kernelcore(char *p)
*/
static int __init cmdline_parse_movablecore(char *p)
{
+ cmdline_movablecore_srat(p);
+
+ if (movablecore_enable_srat)
+ return 0;
+
return cmdline_parse_core(p, &required_movablecore);
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 04/12] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (2 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 03/12] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 05/12] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
Since Yinghai has implement "Allocate pagetable pages in local node", for a
node with hotpluggable memory, we have to allocate pagetable pages first, and
then reserve the rest as hotpluggable memory in memblock.
But the kernel parse SRAT first, and then initialize memory mapping. So we have
to remember the which memory ranges are hotpluggable for future usage.
When parsing SRAT, we added each memory range to numa_meminfo. So we can store
hotpluggable info in numa_meminfo.
This patch introduces a "bool hotpluggable" member into struct
numa_meminfo.
And modifies the following APIs' prototypes to support it:
- numa_add_memblk()
- numa_add_memblk_to()
And the following callers:
- numaq_register_node()
- dummy_numa_init()
- amd_numa_init()
- acpi_numa_memory_affinity_init() in x86
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/include/asm/numa.h | 3 ++-
arch/x86/kernel/apic/numaq_32.c | 2 +-
arch/x86/mm/amdtopology.c | 3 ++-
arch/x86/mm/numa.c | 10 +++++++---
arch/x86/mm/numa_internal.h | 1 +
arch/x86/mm/srat.c | 2 +-
6 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 1b99ee5..73096b2 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -31,7 +31,8 @@ extern int numa_off;
extern s16 __apicid_to_node[MAX_LOCAL_APIC];
extern nodemask_t numa_nodes_parsed __initdata;
-extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+extern int __init numa_add_memblk(int nodeid, u64 start, u64 end,
+ bool hotpluggable);
extern void __init numa_set_distance(int from, int to, int distance);
static inline void set_apicid_to_node(int apicid, s16 node)
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index d661ee9..7a9c542 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -82,7 +82,7 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
int ret;
node_set(node, numa_nodes_parsed);
- ret = numa_add_memblk(node, start, end);
+ ret = numa_add_memblk(node, start, end, false);
BUG_ON(ret < 0);
}
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 5247d01..d521471 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -167,7 +167,8 @@ int __init amd_numa_init(void)
nodeid, base, limit);
prevbase = base;
- numa_add_memblk(nodeid, base, limit);
+ /* Do not support memory hotplug for AMD cpu. */
+ numa_add_memblk(nodeid, base, limit, false);
node_set(nodeid, numa_nodes_parsed);
}
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 4f754e6..ecf37fd 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -134,6 +134,7 @@ void __init setup_node_to_cpumask_map(void)
}
static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+ bool hotpluggable,
struct numa_meminfo *mi)
{
/* ignore zero length blks */
@@ -155,6 +156,7 @@ static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
mi->blk[mi->nr_blks].start = start;
mi->blk[mi->nr_blks].end = end;
mi->blk[mi->nr_blks].nid = nid;
+ mi->blk[mi->nr_blks].hotpluggable = hotpluggable;
mi->nr_blks++;
return 0;
}
@@ -179,15 +181,17 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
* @nid: NUMA node ID of the new memblk
* @start: Start address of the new memblk
* @end: End address of the new memblk
+ * @hotpluggable: True if memblk is hotpluggable
*
* Add a new memblk to the default numa_meminfo.
*
* RETURNS:
* 0 on success, -errno on failure.
*/
-int __init numa_add_memblk(int nid, u64 start, u64 end)
+int __init numa_add_memblk(int nid, u64 start, u64 end,
+ bool hotpluggable)
{
- return numa_add_memblk_to(nid, start, end, &numa_meminfo);
+ return numa_add_memblk_to(nid, start, end, hotpluggable, &numa_meminfo);
}
/* Initialize NODE_DATA for a node on the local memory */
@@ -631,7 +635,7 @@ static int __init dummy_numa_init(void)
0LLU, PFN_PHYS(max_pfn) - 1);
node_set(0, numa_nodes_parsed);
- numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+ numa_add_memblk(0, 0, PFN_PHYS(max_pfn), false);
return 0;
}
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index bb2fbcc..1ce4e6b 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -8,6 +8,7 @@ struct numa_memblk {
u64 start;
u64 end;
int nid;
+ bool hotpluggable;
};
struct numa_meminfo {
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 5055fa7..f7f6fd4 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -171,7 +171,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
goto out_err_bad_srat;
}
- if (numa_add_memblk(node, start, end) < 0)
+ if (numa_add_memblk(node, start, end, hotpluggable) < 0)
goto out_err_bad_srat;
node_set(node, numa_nodes_parsed);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 05/12] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (3 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 04/12] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 06/12] memblock, numa: Introduce flag into memblock Tang Chen
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
Since we have introduced hotplug info into struct numa_meminfo, we need
to consider it when cleanup numa_meminfo.
The original logic in numa_cleanup_meminfo() is:
Merge blocks on the same node, holes between which don't overlap with
memory on other nodes.
This patch modifies numa_cleanup_meminfo() logic like this:
Merge blocks with the same hotpluggable type on the same node, holes
between which don't overlap with memory on other nodes.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/numa.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index ecf37fd..26d1800 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -296,18 +296,22 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
}
/*
- * Join together blocks on the same node, holes
- * between which don't overlap with memory on other
- * nodes.
+ * Join together blocks on the same node, with the same
+ * hotpluggable flags, holes between which don't overlap
+ * with memory on other nodes.
*/
if (bi->nid != bj->nid)
continue;
+ if (bi->hotpluggable != bj->hotpluggable)
+ continue;
+
start = min(bi->start, bj->start);
end = max(bi->end, bj->end);
for (k = 0; k < mi->nr_blks; k++) {
struct numa_memblk *bk = &mi->blk[k];
- if (bi->nid == bk->nid)
+ if (bi->nid == bk->nid &&
+ bi->hotpluggable == bk->hotpluggable)
continue;
if (start < bk->end && end > bk->start)
break;
@@ -327,6 +331,7 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
mi->blk[i].start = mi->blk[i].end = 0;
mi->blk[i].nid = NUMA_NO_NODE;
+ mi->blk[i].hotpluggable = false;
}
return 0;
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 06/12] memblock, numa: Introduce flag into memblock.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (4 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 05/12] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 07/12] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
There is no flag in memblock to discribe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
For example, as Yinghai did in his patch, allocate pagetables on local
node before all the memory on the node is mapped.
Please refer to Yinghai's patch:
v1: https://lkml.org/lkml/2013/3/7/642
v2: https://lkml.org/lkml/2013/3/10/47
v3: https://lkml.org/lkml/2013/4/4/639
v4: https://lkml.org/lkml/2013/4/11/829
In hotplug environment, there could be some problems when we hot-remove
memory if we do so. Pagetable pages are kernel memory, which we cannot
migrate. But we can put them in local node because their life-cycle is
the same as the node. So we need to free them all before memory hot-removing.
Actually, data whose life cycle is the same as a node, such as pagetable
pages, vmemmap pages, page_cgroup pages, all could be put on local node.
They can be freed when we hot-removing a whole node.
In order to do so, we need to mark out these special pages in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
struct memblock_region {
phys_addr_t base;
phys_addr_t size;
unsigned long flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int nid;
#endif
};
This patch does the following things:
1) Add "flags" member to memblock_region, and MEMBLK_ANY flag for common usage.
2) Modify the following APIs' prototype:
memblock_add_region()
memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.
The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.
Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Suggested-by: Liu Jiang <jiang.liu@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
include/linux/memblock.h | 8 ++++++
mm/memblock.c | 56 +++++++++++++++++++++++++++++++++------------
2 files changed, 49 insertions(+), 15 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f388203..c63a66e 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -19,9 +19,17 @@
#define INIT_MEMBLOCK_REGIONS 128
+#define MEMBLK_FLAGS_DEFAULT 0
+
+/* Definition of memblock flags. */
+enum memblock_flags {
+ __NR_MEMBLK_FLAGS, /* number of flags */
+};
+
struct memblock_region {
phys_addr_t base;
phys_addr_t size;
+ unsigned long flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int nid;
#endif
diff --git a/mm/memblock.c b/mm/memblock.c
index 16eda3d..63924ae 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -157,6 +157,7 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
type->cnt = 1;
type->regions[0].base = 0;
type->regions[0].size = 0;
+ type->regions[0].flags = 0;
memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
}
}
@@ -307,7 +308,8 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
if (this->base + this->size != next->base ||
memblock_get_region_node(this) !=
- memblock_get_region_node(next)) {
+ memblock_get_region_node(next) ||
+ this->flags != next->flags) {
BUG_ON(this->base + this->size > next->base);
i++;
continue;
@@ -327,13 +329,15 @@ static void __init_memblock memblock_merge_regions(struct memblock_type *type)
* @base: base address of the new region
* @size: size of the new region
* @nid: node id of the new region
+ * @flags: flags of the new region
*
* Insert new memblock region [@base,@base+@size) into @type at @idx.
* @type must already have extra room to accomodate the new region.
*/
static void __init_memblock memblock_insert_region(struct memblock_type *type,
int idx, phys_addr_t base,
- phys_addr_t size, int nid)
+ phys_addr_t size,
+ int nid, unsigned long flags)
{
struct memblock_region *rgn = &type->regions[idx];
@@ -341,6 +345,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
rgn->base = base;
rgn->size = size;
+ rgn->flags = flags;
memblock_set_region_node(rgn, nid);
type->cnt++;
type->total_size += size;
@@ -352,6 +357,7 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
* @base: base address of the new region
* @size: size of the new region
* @nid: nid of the new region
+ * @flags: flags of the new region
*
* Add new memblock region [@base,@base+@size) into @type. The new region
* is allowed to overlap with existing ones - overlaps don't affect already
@@ -362,7 +368,8 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type,
* 0 on success, -errno on failure.
*/
static int __init_memblock memblock_add_region(struct memblock_type *type,
- phys_addr_t base, phys_addr_t size, int nid)
+ phys_addr_t base, phys_addr_t size,
+ int nid, unsigned long flags)
{
bool insert = false;
phys_addr_t obase = base;
@@ -377,6 +384,7 @@ static int __init_memblock memblock_add_region(struct memblock_type *type,
WARN_ON(type->cnt != 1 || type->total_size);
type->regions[0].base = base;
type->regions[0].size = size;
+ type->regions[0].flags = flags;
memblock_set_region_node(&type->regions[0], nid);
type->total_size = size;
return 0;
@@ -407,7 +415,8 @@ repeat:
nr_new++;
if (insert)
memblock_insert_region(type, i++, base,
- rbase - base, nid);
+ rbase - base, nid,
+ flags);
}
/* area below @rend is dealt with, forget about it */
base = min(rend, end);
@@ -417,7 +426,8 @@ repeat:
if (base < end) {
nr_new++;
if (insert)
- memblock_insert_region(type, i, base, end - base, nid);
+ memblock_insert_region(type, i, base, end - base,
+ nid, flags);
}
/*
@@ -439,12 +449,14 @@ repeat:
int __init_memblock memblock_add_node(phys_addr_t base, phys_addr_t size,
int nid)
{
- return memblock_add_region(&memblock.memory, base, size, nid);
+ return memblock_add_region(&memblock.memory, base, size,
+ nid, MEMBLK_FLAGS_DEFAULT);
}
int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
{
- return memblock_add_region(&memblock.memory, base, size, MAX_NUMNODES);
+ return memblock_add_region(&memblock.memory, base, size,
+ MAX_NUMNODES, MEMBLK_FLAGS_DEFAULT);
}
/**
@@ -499,7 +511,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
rgn->size -= base - rbase;
type->total_size -= base - rbase;
memblock_insert_region(type, i, rbase, base - rbase,
- memblock_get_region_node(rgn));
+ memblock_get_region_node(rgn),
+ rgn->flags);
} else if (rend > end) {
/*
* @rgn intersects from above. Split and redo the
@@ -509,7 +522,8 @@ static int __init_memblock memblock_isolate_range(struct memblock_type *type,
rgn->size -= end - rbase;
type->total_size -= end - rbase;
memblock_insert_region(type, i--, rbase, end - rbase,
- memblock_get_region_node(rgn));
+ memblock_get_region_node(rgn),
+ rgn->flags);
} else {
/* @rgn is fully contained, record it */
if (!*end_rgn)
@@ -551,16 +565,25 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
return __memblock_remove(&memblock.reserved, base, size);
}
-int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+static int __init_memblock memblock_reserve_region(phys_addr_t base,
+ phys_addr_t size,
+ int nid,
+ unsigned long flags)
{
struct memblock_type *_rgn = &memblock.reserved;
- memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
+ memblock_dbg("memblock_reserve: [%#016llx-%#016llx] with flags %#016lx %pF\n",
(unsigned long long)base,
(unsigned long long)base + size,
- (void *)_RET_IP_);
+ flags, (void *)_RET_IP_);
+
+ return memblock_add_region(_rgn, base, size, nid, flags);
+}
- return memblock_add_region(_rgn, base, size, MAX_NUMNODES);
+int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
+{
+ return memblock_reserve_region(base, size, MAX_NUMNODES,
+ MEMBLK_FLAGS_DEFAULT);
}
/**
@@ -982,6 +1005,7 @@ void __init_memblock memblock_set_current_limit(phys_addr_t limit)
static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
{
unsigned long long base, size;
+ unsigned long flags;
int i;
pr_info(" %s.cnt = 0x%lx\n", name, type->cnt);
@@ -992,13 +1016,15 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
base = rgn->base;
size = rgn->size;
+ flags = rgn->flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
if (memblock_get_region_node(rgn) != MAX_NUMNODES)
snprintf(nid_buf, sizeof(nid_buf), " on node %d",
memblock_get_region_node(rgn));
#endif
- pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s\n",
- name, i, base, base + size - 1, size, nid_buf);
+ pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s "
+ "flags: %#lx\n",
+ name, i, base, base + size - 1, size, nid_buf, flags);
}
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 07/12] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (5 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 06/12] memblock, numa: Introduce flag into memblock Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 08/12] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
If all the memory ranges in SRAT are hotpluggable, we should not
arrange them all in ZONE_MOVABLE. Otherwise the kernel won't have
enough memory to boot.
This patch introduce a global variable kernel_nodemask to mark
all the nodes the kernel resides in. And no matter if they are
hotpluggable, we arrange them as un-hotpluggable.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/numa.c | 6 ++++++
include/linux/memblock.h | 1 +
mm/memblock.c | 20 ++++++++++++++++++++
3 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 26d1800..105b092 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -658,6 +658,12 @@ static bool srat_used __initdata;
*/
static void __init early_x86_numa_init(void)
{
+ /*
+ * Need to find out which nodes the kernel resides in, and arrange
+ * them as un-hotpluggable when parsing SRAT.
+ */
+ memblock_mark_kernel_nodes();
+
if (!numa_off) {
#ifdef CONFIG_X86_NUMAQ
if (!numa_init(numaq_numa_init))
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index c63a66e..5064eed 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -66,6 +66,7 @@ int memblock_remove(phys_addr_t base, phys_addr_t size);
int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
void memblock_trim_memory(phys_addr_t align);
+void memblock_mark_kernel_nodes(void);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 63924ae..1b93a5d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -35,6 +35,9 @@ struct memblock memblock __initdata_memblock = {
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};
+/* Mark which nodes the kernel resides in. */
+static nodemask_t memblock_kernel_nodemask __initdata_memblock;
+
int memblock_debug __initdata_memblock;
static int memblock_can_resize __initdata_memblock;
static int memblock_memory_in_slab __initdata_memblock = 0;
@@ -787,6 +790,23 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
memblock_merge_regions(type);
return 0;
}
+
+void __init_memblock memblock_mark_kernel_nodes()
+{
+ int i, nid;
+ struct memblock_type *reserved = &memblock.reserved;
+
+ for (i = 0; i < reserved->cnt; i++)
+ if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
+ nid = memblock_get_region_node(&reserved->regions[i]);
+ node_set(nid, memblock_kernel_nodemask);
+ }
+}
+#else
+void __init_memblock memblock_mark_kernel_nodes()
+{
+ node_set(0, memblock_kernel_nodemask);
+}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 08/12] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (6 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 07/12] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 09/12] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
node-life-cycle data (whose life cycle is the same as a node)
allocated by memblock should be marked so that when we free usable
memory to buddy system, we can skip them.
This patch introduces a flag MEMBLK_LOCAL_NODE for memblock to reserve
node-life-cycle data. For now, it is only kernel direct mapping pagetable
pages, based on Yinghai's patch.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/init.c | 16 ++++++++++++----
include/linux/memblock.h | 2 ++
mm/memblock.c | 7 +++++++
3 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8d0007a..1261e2e 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -62,14 +62,22 @@ __ref void *alloc_low_pages(unsigned int num)
low_min_pfn_mapped << PAGE_SHIFT,
low_max_pfn_mapped << PAGE_SHIFT,
PAGE_SIZE * num , PAGE_SIZE);
- } else
+ if (!ret)
+ panic("alloc_low_page: can not alloc memory");
+
+ memblock_reserve(ret, PAGE_SIZE * num);
+ } else {
ret = memblock_find_in_range(
local_min_pfn_mapped << PAGE_SHIFT,
local_max_pfn_mapped << PAGE_SHIFT,
PAGE_SIZE * num , PAGE_SIZE);
- if (!ret)
- panic("alloc_low_page: can not alloc memory");
- memblock_reserve(ret, PAGE_SIZE * num);
+ if (!ret)
+ panic("alloc_low_page: can not alloc memory");
+
+ memblock_reserve_local_node(ret, PAGE_SIZE * num,
+ MAX_NUMNODES);
+ }
+
pfn = ret >> PAGE_SHIFT;
} else {
pfn = pgt_buf_end;
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 5064eed..3b2d1c4 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -23,6 +23,7 @@
/* Definition of memblock flags. */
enum memblock_flags {
+ MEMBLK_LOCAL_NODE, /* node-life-cycle data */
__NR_MEMBLK_FLAGS, /* number of flags */
};
@@ -65,6 +66,7 @@ int memblock_add(phys_addr_t base, phys_addr_t size);
int memblock_remove(phys_addr_t base, phys_addr_t size);
int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
+int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
void memblock_trim_memory(phys_addr_t align);
void memblock_mark_kernel_nodes(void);
diff --git a/mm/memblock.c b/mm/memblock.c
index 1b93a5d..edde4c2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -589,6 +589,13 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
MEMBLK_FLAGS_DEFAULT);
}
+int __init_memblock memblock_reserve_local_node(phys_addr_t base,
+ phys_addr_t size, int nid)
+{
+ unsigned long flags = 1 << MEMBLK_LOCAL_NODE;
+ return memblock_reserve_region(base, size, nid, flags);
+}
+
/**
* __next_free_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 09/12] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (7 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 08/12] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 10/12] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
We mark out movable memory ranges and reserve them with MEMBLK_HOTPLUGGABLE flag in
memblock.reserved. This should be done after the memory mapping is initialized
because the kernel now supports allocate pagetable pages on local node, which
are kernel pages.
The reserved hotpluggable will be freed to buddy when memory initialization
is done.
This idea is from Wen Congyang <wency@cn.fujitsu.com> and Jiang Liu <jiang.liu@huawei.com>.
Suggested-by: Jiang Liu <jiang.liu@huawei.com>
Suggested-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/mm/numa.c | 26 ++++++++++++++++++++++++++
include/linux/memblock.h | 3 +++
mm/memblock.c | 19 +++++++++++++++++++
3 files changed, 48 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 105b092..6f61691 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -731,6 +731,30 @@ static void __init early_x86_numa_init_mapping(void)
}
#endif
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+static void __init early_mem_hotplug_init()
+{
+ int i, nid;
+ phys_addr_t start, end;
+
+ if (!movablecore_enable_srat)
+ return;
+
+ for (i = 0; i < numa_meminfo.nr_blks; i++) {
+ if (!numa_meminfo.blk[i].hotpluggable)
+ continue;
+
+ nid = numa_meminfo.blk[i].nid;
+
+ memblock_reserve_hotpluggable(start, end - start, nid);
+ }
+}
+#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+static inline void early_mem_hotplug_init()
+{
+}
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
void __init early_initmem_init(void)
{
early_x86_numa_init();
@@ -740,6 +764,8 @@ void __init early_initmem_init(void)
load_cr3(swapper_pg_dir);
__flush_tlb_all();
+ early_mem_hotplug_init();
+
early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
}
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 3b2d1c4..0f01930 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -24,6 +24,7 @@
/* Definition of memblock flags. */
enum memblock_flags {
MEMBLK_LOCAL_NODE, /* node-life-cycle data */
+ MEMBLK_HOTPLUGGABLE, /* hotpluggable region */
__NR_MEMBLK_FLAGS, /* number of flags */
};
@@ -67,8 +68,10 @@ int memblock_remove(phys_addr_t base, phys_addr_t size);
int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
+int memblock_reserve_hotpluggable(phys_addr_t base, phys_addr_t size, int nid);
void memblock_trim_memory(phys_addr_t align);
void memblock_mark_kernel_nodes(void);
+bool memblock_is_kernel_node(int nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index edde4c2..0c55588 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -596,6 +596,13 @@ int __init_memblock memblock_reserve_local_node(phys_addr_t base,
return memblock_reserve_region(base, size, nid, flags);
}
+int __init_memblock memblock_reserve_hotpluggable(phys_addr_t base,
+ phys_addr_t size, int nid)
+{
+ unsigned long flags = 1 << MEMBLK_HOTPLUGGABLE;
+ return memblock_reserve_region(base, size, nid, flags);
+}
+
/**
* __next_free_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
@@ -809,11 +816,23 @@ void __init_memblock memblock_mark_kernel_nodes()
node_set(nid, memblock_kernel_nodemask);
}
}
+
+bool __init_memblock memblock_is_kernel_node(int nid)
+{
+ if (node_isset(nid, memblock_kernel_nodemask))
+ return true;
+ return false;
+}
#else
void __init_memblock memblock_mark_kernel_nodes()
{
node_set(0, memblock_kernel_nodemask);
}
+
+bool __init_memblock memblock_is_kernel_node(int nid)
+{
+ return true;
+}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 10/12] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (8 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 09/12] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 11/12] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
2013-04-19 9:31 ` [PATCH v1 12/12] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
We reserved hotpluggable memory in memblock. And when memory initialization
is done, we have to free it to buddy system.
This patch free memory reserved by memblock with flag MEMBLK_HOTPLUGGABLE.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 20 ++++++++++++++++++++
mm/nobootmem.c | 3 +++
3 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0f01930..08c761d 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -69,6 +69,7 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
int memblock_reserve_hotpluggable(phys_addr_t base, phys_addr_t size, int nid);
+void memblock_free_hotpluggable(void);
void memblock_trim_memory(phys_addr_t align);
void memblock_mark_kernel_nodes(void);
bool memblock_is_kernel_node(int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 0c55588..54de398 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -568,6 +568,26 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
return __memblock_remove(&memblock.reserved, base, size);
}
+static void __init_memblock memblock_free_flags(unsigned long flags)
+{
+ int i;
+ struct memblock_type *reserved = &memblock.reserved;
+
+ for (i = 0; i < reserved->cnt; i++) {
+ if (reserved->regions[i].flags == flags)
+ memblock_remove_region(reserved, i);
+ }
+}
+
+void __init_memblock memblock_free_hotpluggable()
+{
+ unsigned long flags = 1 << MEMBLK_HOTPLUGGABLE;
+
+ memblock_dbg("memblock: free all hotpluggable memory");
+
+ memblock_free_flags(flags);
+}
+
static int __init_memblock memblock_reserve_region(phys_addr_t base,
phys_addr_t size,
int nid,
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 5e07d36..cd85604 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -165,6 +165,9 @@ unsigned long __init free_all_bootmem(void)
for_each_online_pgdat(pgdat)
reset_node_lowmem_managed_pages(pgdat);
+ /* Hotpluggable memory reserved by memblock should also be freed. */
+ memblock_free_hotpluggable();
+
/*
* We need to use MAX_NUMNODES instead of NODE_DATA(0)->node_id
* because in some case like Node0 doesn't have RAM installed
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 11/12] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (9 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 10/12] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
2013-04-19 9:31 ` [PATCH v1 12/12] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
Arrange hotpluggable memory as ZONE_MOVABLE will cause NUMA performance decreased
because the kernel cannot use movable memory.
For users who don't use memory hotplug and who don't want to lose their NUMA
performance, they need a way to disable this functionality.
So, if users specify "movablecore=acpi" in kernel commandline, the kernel will
use SRAT to arrange ZONE_MOVABLE, and it has higher priority then original
movablecore and kernelcore boot option.
For those who don't want this, just specify nothing.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 5 +++++
mm/page_alloc.c | 24 +++++++++++++++++++++++-
3 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 08c761d..5528e8f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -69,6 +69,7 @@ int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
int memblock_reserve_local_node(phys_addr_t base, phys_addr_t size, int nid);
int memblock_reserve_hotpluggable(phys_addr_t base, phys_addr_t size, int nid);
+bool memblock_is_hotpluggable(struct memblock_region *region);
void memblock_free_hotpluggable(void);
void memblock_trim_memory(phys_addr_t align);
void memblock_mark_kernel_nodes(void);
diff --git a/mm/memblock.c b/mm/memblock.c
index 54de398..8b9a13c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -623,6 +623,11 @@ int __init_memblock memblock_reserve_hotpluggable(phys_addr_t base,
return memblock_reserve_region(base, size, nid, flags);
}
+bool __init_memblock memblock_is_hotpluggable(struct memblock_region *region)
+{
+ return region->flags & (1 << MEMBLK_HOTPLUGGABLE);
+}
+
/**
* __next_free_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9ea143..2fe9ebf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4793,9 +4793,31 @@ static void __init find_zone_movable_pfns_for_nodes(void)
nodemask_t saved_node_state = node_states[N_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
int usable_nodes = nodes_weight(node_states[N_MEMORY]);
+ struct memblock_type *reserved = &memblock.reserved;
/*
- * If movablecore was specified, calculate what size of
+ * If movablecore=acpi was specified, then zone_movable_pfn[] has been
+ * initialized, and no more work needs to do.
+ * NOTE: In this case, we ignore kernelcore option.
+ */
+ if (movablecore_enable_srat) {
+ for (i = 0; i < reserved->cnt; i++) {
+ if (!memblock_is_hotpluggable(&reserved->regions[i]))
+ continue;
+
+ nid = reserved->regions[i].nid;
+
+ usable_startpfn = reserved->regions[i].base;
+ zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
+ min(usable_startpfn, zone_movable_pfn[nid]) :
+ usable_startpfn;
+ }
+
+ goto out;
+ }
+
+ /*
+ * If movablecore=nn[KMG] was specified, calculate what size of
* kernelcore that corresponds so that memory usable for
* any allocation type is evenly spread. If both kernelcore
* and movablecore are specified, then the value of kernelcore
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v1 12/12] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option.
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
` (10 preceding siblings ...)
2013-04-19 9:31 ` [PATCH v1 11/12] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
@ 2013-04-19 9:31 ` Tang Chen
11 siblings, 0 replies; 13+ messages in thread
From: Tang Chen @ 2013-04-19 9:31 UTC (permalink / raw)
To: rob, tglx, mingo, hpa, akpm, paulmck, dhowells, davej, agordeev,
suresh.b.siddha, mst, yinghai, penberg, jacob.shin, wency, trenn,
liwanp, isimatu.yasuaki, rientjes, tj, laijs, hannes, davem,
mgorman, minchan, m.szyprowski, mina86
Cc: x86, linux-doc, linux-kernel, linux-mm
Since we modify movablecore boot option to support
"movablecore=acpi", this patch adds doc for it.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
Documentation/kernel-parameters.txt | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4609e81..a1c515b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1649,6 +1649,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
+ movablecore=acpi [KNL,X86] This parameter will enable the
+ kernel to arrange ZONE_MOVABLE with the help of
+ Hot-Pluggable Field in SRAT. All the hotpluggable
+ memory will be arranged in ZONE_MOVABLE.
+ NOTE: Any node which the kernel resides in will
+ always be un-hotpluggable so that the kernel
+ will always have enough memory to boot.
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-04-19 9:29 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19 9:31 [PATCH v1 00/12] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
2013-04-19 9:31 ` [PATCH v1 01/12] x86: get pg_data_t's memory from other node Tang Chen
2013-04-19 9:31 ` [PATCH v1 02/12] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-04-19 9:31 ` [PATCH v1 03/12] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
2013-04-19 9:31 ` [PATCH v1 04/12] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
2013-04-19 9:31 ` [PATCH v1 05/12] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
2013-04-19 9:31 ` [PATCH v1 06/12] memblock, numa: Introduce flag into memblock Tang Chen
2013-04-19 9:31 ` [PATCH v1 07/12] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
2013-04-19 9:31 ` [PATCH v1 08/12] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
2013-04-19 9:31 ` [PATCH v1 09/12] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
2013-04-19 9:31 ` [PATCH v1 10/12] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
2013-04-19 9:31 ` [PATCH v1 11/12] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
2013-04-19 9:31 ` [PATCH v1 12/12] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).