* [PATCH v5 1/5] x86: get pg_data_t's memory from other node
2013-01-14 9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
@ 2013-01-14 9:15 ` Tang Chen
2013-01-14 9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
` (4 subsequent siblings)
5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14 9:15 UTC (permalink / raw)
To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer
Cc: linux-kernel, linux-mm
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
arch/x86/mm/numa.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 2d125be..db939b6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -222,10 +222,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
nd_pa = __pa(nd);
remapped = true;
} else {
- nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+ nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
- pr_err("Cannot find %zu bytes in node %d\n",
- nd_size, nid);
+ pr_err("Cannot find %zu bytes in any node\n", nd_size);
return;
}
nd = __va(nd_pa);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter
2013-01-14 9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
2013-01-14 9:15 ` [PATCH v5 1/5] x86: get pg_data_t's memory from other node Tang Chen
@ 2013-01-14 9:15 ` Tang Chen
2013-01-14 22:35 ` Andrew Morton
2013-01-14 9:15 ` [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
` (3 subsequent siblings)
5 siblings, 1 reply; 36+ messages in thread
From: Tang Chen @ 2013-01-14 9:15 UTC (permalink / raw)
To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer
Cc: linux-kernel, linux-mm
This patch adds functions to parse movablecore_map boot option. Since the
option could be specified more then once, all the maps will be stored in
the global variable movablecore_map.map array.
And also, we keep the array in monotonic increasing order by start_pfn.
And merge all overlapped ranges.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
Documentation/kernel-parameters.txt | 17 +++++
include/linux/mm.h | 11 +++
mm/page_alloc.c | 126 +++++++++++++++++++++++++++++++++++
3 files changed, 154 insertions(+), 0 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 363e348..f02aa4c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1637,6 +1637,23 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
+ movablecore_map=nn[KMG]@ss[KMG]
+ [KNL,X86,IA-64,PPC] This parameter is similar to
+ memmap except it specifies the memory map of
+ ZONE_MOVABLE.
+ If more areas are all within one node, then from
+ lowest ss to the end of the node will be ZONE_MOVABLE.
+ If an area covers two or more nodes, the area from
+ ss to the end of the 1st node will be ZONE_MOVABLE,
+ and all the rest nodes will only have ZONE_MOVABLE.
+ If memmap is specified at the same time, the
+ movablecore_map will be limited within the memmap
+ areas. If kernelcore or movablecore is also specified,
+ movablecore_map will have higher priority to be
+ satisfied. So the administrator should be careful that
+ the amount of movablecore_map areas are not too large.
+ Otherwise kernel won't have enough memory to start.
+
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 66e2f7c..12f5a09 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1359,6 +1359,17 @@ extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
extern void sparse_memory_present_with_active_regions(int nid);
+#define MOVABLECORE_MAP_MAX MAX_NUMNODES
+struct movablecore_entry {
+ unsigned long start_pfn; /* start pfn of memory segment */
+ unsigned long end_pfn; /* end pfn of memory segment */
+};
+
+struct movablecore_map {
+ int nr_map;
+ struct movablecore_entry map[MOVABLECORE_MAP_MAX];
+};
+
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
#if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df2022f..d1a7a88 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -201,6 +201,9 @@ static unsigned long __meminitdata nr_all_pages;
static unsigned long __meminitdata dma_reserve;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+/* Movable memory ranges, will also be used by memblock subsystem. */
+struct movablecore_map movablecore_map;
+
static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
static unsigned long __initdata required_kernelcore;
@@ -5070,6 +5073,129 @@ static int __init cmdline_parse_movablecore(char *p)
early_param("kernelcore", cmdline_parse_kernelcore);
early_param("movablecore", cmdline_parse_movablecore);
+/**
+ * insert_movablecore_map - Insert a memory range in to movablecore_map.map.
+ * @start_pfn: start pfn of the range
+ * @end_pfn: end pfn of the range
+ *
+ * This function will also merge the overlapped ranges, and sort the array
+ * by start_pfn in monotonic increasing order.
+ */
+static void __init insert_movablecore_map(unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+ int pos, overlap;
+
+ /*
+ * pos will be at the 1st overlapped range, or the position
+ * where the element should be inserted.
+ */
+ for (pos = 0; pos < movablecore_map.nr_map; pos++)
+ if (start_pfn <= movablecore_map.map[pos].end_pfn)
+ break;
+
+ /* If there is no overlapped range, just insert the element. */
+ if (pos == movablecore_map.nr_map ||
+ end_pfn < movablecore_map.map[pos].start_pfn) {
+ /*
+ * If pos is not the end of array, we need to move all
+ * the rest elements backward.
+ */
+ if (pos < movablecore_map.nr_map)
+ memmove(&movablecore_map.map[pos+1],
+ &movablecore_map.map[pos],
+ sizeof(struct movablecore_entry) *
+ (movablecore_map.nr_map - pos));
+ movablecore_map.map[pos].start_pfn = start_pfn;
+ movablecore_map.map[pos].end_pfn = end_pfn;
+ movablecore_map.nr_map++;
+ return;
+ }
+
+ /* overlap will be at the last overlapped range */
+ for (overlap = pos + 1; overlap < movablecore_map.nr_map; overlap++)
+ if (end_pfn < movablecore_map.map[overlap].start_pfn)
+ break;
+
+ /*
+ * If there are more ranges overlapped, we need to merge them,
+ * and move the rest elements forward.
+ */
+ overlap--;
+ movablecore_map.map[pos].start_pfn = min(start_pfn,
+ movablecore_map.map[pos].start_pfn);
+ movablecore_map.map[pos].end_pfn = max(end_pfn,
+ movablecore_map.map[overlap].end_pfn);
+
+ if (pos != overlap && overlap + 1 != movablecore_map.nr_map)
+ memmove(&movablecore_map.map[pos+1],
+ &movablecore_map.map[overlap+1],
+ sizeof(struct movablecore_entry) *
+ (movablecore_map.nr_map - overlap - 1));
+
+ movablecore_map.nr_map -= overlap - pos;
+}
+
+/**
+ * movablecore_map_add_region - Add a memory range into movablecore_map.
+ * @start: physical start address of range
+ * @end: physical end address of range
+ *
+ * This function transform the physical address into pfn, and then add the
+ * range into movablecore_map by calling insert_movablecore_map().
+ */
+static void __init movablecore_map_add_region(u64 start, u64 size)
+{
+ unsigned long start_pfn, end_pfn;
+
+ /* In case size == 0 or start + size overflows */
+ if (start + size <= start)
+ return;
+
+ if (movablecore_map.nr_map >= ARRAY_SIZE(movablecore_map.map)) {
+ pr_err("movable_memory_map: too many entries;"
+ " ignoring [mem %#010llx-%#010llx]\n",
+ (unsigned long long) start,
+ (unsigned long long) (start + size - 1));
+ return;
+ }
+
+ start_pfn = PFN_DOWN(start);
+ end_pfn = PFN_UP(start + size);
+ insert_movablecore_map(start_pfn, end_pfn);
+}
+
+/*
+ * movablecore_map=nn[KMG]@ss[KMG] sets the region of memory to be used as
+ * movable memory.
+ */
+static int __init cmdline_parse_movablecore_map(char *p)
+{
+ char *oldp;
+ u64 start_at, mem_size;
+
+ if (!p)
+ goto err;
+
+ oldp = p;
+ mem_size = memparse(p, &p);
+ if (p == oldp)
+ goto err;
+
+ if (*p == '@') {
+ oldp = ++p;
+ start_at = memparse(p, &p);
+ if (p == oldp || *p != '\0')
+ goto err;
+
+ movablecore_map_add_region(start_at, mem_size);
+ return 0;
+ }
+err:
+ return -EINVAL;
+}
+early_param("movablecore_map", cmdline_parse_movablecore_map);
+
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
/**
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter
2013-01-14 9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
@ 2013-01-14 22:35 ` Andrew Morton
0 siblings, 0 replies; 36+ messages in thread
From: Andrew Morton @ 2013-01-14 22:35 UTC (permalink / raw)
To: Tang Chen
Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer, linux-kernel, linux-mm
On Mon, 14 Jan 2013 17:15:22 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:
> This patch adds functions to parse movablecore_map boot option. Since the
> option could be specified more then once, all the maps will be stored in
> the global variable movablecore_map.map array.
>
> And also, we keep the array in monotonic increasing order by start_pfn.
> And merge all overlapped ranges.
>
> ...
>
> +#define MOVABLECORE_MAP_MAX MAX_NUMNODES
> +struct movablecore_entry {
> + unsigned long start_pfn; /* start pfn of memory segment */
> + unsigned long end_pfn; /* end pfn of memory segment */
It is important to tell readers whether an "end" is inclusive or
exclusive. ie: does it point at the last byte, or one beyond it?
By reading the code I see it is exclusive, so...
--- a/include/linux/mm.h~page_alloc-add-movable_memmap-kernel-parameter-fix
+++ a/include/linux/mm.h
@@ -1362,7 +1362,7 @@ extern void sparse_memory_present_with_a
#define MOVABLECORE_MAP_MAX MAX_NUMNODES
struct movablecore_entry {
unsigned long start_pfn; /* start pfn of memory segment */
- unsigned long end_pfn; /* end pfn of memory segment */
+ unsigned long end_pfn; /* end pfn of memory segment (exclusive) */
};
struct movablecore_map {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
2013-01-14 9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
2013-01-14 9:15 ` [PATCH v5 1/5] x86: get pg_data_t's memory from other node Tang Chen
2013-01-14 9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
@ 2013-01-14 9:15 ` Tang Chen
2013-01-14 9:15 ` [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
` (2 subsequent siblings)
5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14 9:15 UTC (permalink / raw)
To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer
Cc: linux-kernel, linux-mm
This patch introduces a new array zone_movable_limit[] to store the
ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
The function sanitize_zone_movable_limit() will find out to which
node the ranges in movable_map.map[] belongs, and calculates the
low boundary of ZONE_MOVABLE for each node.
change log:
Do find_usable_zone_for_movable() to initialize movable_zone
so that sanitize_zone_movable_limit() could use it.
Reported-by: Wu Jianguo <wujianguo@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Liu Jiang <jiang.liu@huawei.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
mm/page_alloc.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 78 insertions(+), 1 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1a7a88..093b953 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
@@ -4370,6 +4371,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
}
+/**
+ * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
+ *
+ * zone_movable_limit is initialized as 0. This function will try to get
+ * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
+ * assigne them to zone_movable_limit.
+ * zone_movable_limit[nid] == 0 means no limit for the node.
+ *
+ * Note: Each range is represented as [start_pfn, end_pfn)
+ */
+static void __meminit sanitize_zone_movable_limit(void)
+{
+ int map_pos = 0, i, nid;
+ unsigned long start_pfn, end_pfn;
+
+ if (!movablecore_map.nr_map)
+ return;
+
+ /* Iterate all ranges from minimum to maximum */
+ for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+ /*
+ * If we have found lowest pfn of ZONE_MOVABLE of the node
+ * specified by user, just go on to check next range.
+ */
+ if (zone_movable_limit[nid])
+ continue;
+
+#ifdef CONFIG_ZONE_DMA
+ /* Skip DMA memory. */
+ if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
+ start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
+#endif
+
+#ifdef CONFIG_ZONE_DMA32
+ /* Skip DMA32 memory. */
+ if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
+ start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
+#endif
+
+#ifdef CONFIG_HIGHMEM
+ /* Skip lowmem if ZONE_MOVABLE is highmem. */
+ if (zone_movable_is_highmem() &&
+ start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
+ start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
+#endif
+
+ if (start_pfn >= end_pfn)
+ continue;
+
+ while (map_pos < movablecore_map.nr_map) {
+ if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
+ break;
+
+ if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
+ map_pos++;
+ continue;
+ }
+
+ /*
+ * The start_pfn of ZONE_MOVABLE is either the minimum
+ * pfn specified by movablecore_map, or 0, which means
+ * the node has no ZONE_MOVABLE.
+ */
+ zone_movable_limit[nid] = max(start_pfn,
+ movablecore_map.map[map_pos].start_pfn);
+
+ break;
+ }
+ }
+}
+
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
unsigned long zone_type,
@@ -4388,6 +4460,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
return zholes_size[zone_type];
}
+static void __meminit sanitize_zone_movable_limit(void)
+{
+}
+
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
@@ -4831,7 +4907,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
goto out;
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
- find_usable_zone_for_movable();
usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
restart:
@@ -4990,6 +5065,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Find the PFNs that ZONE_MOVABLE begins at in each node */
memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
+ find_usable_zone_for_movable();
+ sanitize_zone_movable_limit();
find_zone_movable_pfns_for_nodes();
/* Print out the zone ranges */
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority
2013-01-14 9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
` (2 preceding siblings ...)
2013-01-14 9:15 ` [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
@ 2013-01-14 9:15 ` Tang Chen
2013-01-14 9:15 ` [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14 9:15 UTC (permalink / raw)
To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer
Cc: linux-kernel, linux-mm
If kernelcore or movablecore is specified at the same time
with movablecore_map, movablecore_map will have higher
priority to be satisfied.
This patch will make find_zone_movable_pfns_for_nodes()
calculate zone_movable_pfn[] with the limit from
zone_movable_limit[].
change log:
Move find_usable_zone_for_movable() to free_area_init_nodes()
so that sanitize_zone_movable_limit() in patch 3 could use
initialized movable_zone.
Reported-by: Wu Jianguo <wujianguo@huawei.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
mm/page_alloc.c | 28 +++++++++++++++++++++++++---
1 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 093b953..00037a3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4902,9 +4902,17 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore = max(required_kernelcore, corepages);
}
- /* If kernelcore was not specified, there is no ZONE_MOVABLE */
- if (!required_kernelcore)
+ /*
+ * If neither kernelcore/movablecore nor movablecore_map is specified,
+ * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
+ * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
+ */
+ if (!required_kernelcore) {
+ if (movablecore_map.nr_map)
+ memcpy(zone_movable_pfn, zone_movable_limit,
+ sizeof(zone_movable_pfn));
goto out;
+ }
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
@@ -4934,10 +4942,24 @@ restart:
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
unsigned long size_pages;
+ /*
+ * Find more memory for kernelcore in
+ * [zone_movable_pfn[nid], zone_movable_limit[nid]).
+ */
start_pfn = max(start_pfn, zone_movable_pfn[nid]);
if (start_pfn >= end_pfn)
continue;
+ if (zone_movable_limit[nid]) {
+ end_pfn = min(end_pfn, zone_movable_limit[nid]);
+ /* No range left for kernelcore in this node */
+ if (start_pfn >= end_pfn) {
+ zone_movable_pfn[nid] =
+ zone_movable_limit[nid];
+ break;
+ }
+ }
+
/* Account for what is only usable for kernelcore */
if (start_pfn < usable_startpfn) {
unsigned long kernel_pages;
@@ -4997,12 +5019,12 @@ restart:
if (usable_nodes && required_kernelcore > usable_nodes)
goto restart;
+out:
/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
for (nid = 0; nid < MAX_NUMNODES; nid++)
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
-out:
/* restore the node_state */
node_states[N_MEMORY] = saved_node_state;
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map
2013-01-14 9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
` (3 preceding siblings ...)
2013-01-14 9:15 ` [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
@ 2013-01-14 9:15 ` Tang Chen
2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14 9:15 UTC (permalink / raw)
To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer
Cc: linux-kernel, linux-mm
This patch make sure bootmem will not allocate memory from areas that
may be ZONE_MOVABLE. The map info is from movablecore_map boot option.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 18 +++++++++++++++++-
2 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d452ee1..6e25597 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
extern struct memblock memblock;
extern int memblock_debug;
+extern struct movablecore_map movablecore_map;
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 88adc8a..1e48774 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -101,6 +101,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
{
phys_addr_t this_start, this_end, cand;
u64 i;
+ int curr = movablecore_map.nr_map - 1;
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
@@ -114,13 +115,28 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
this_start = clamp(this_start, start, end);
this_end = clamp(this_end, start, end);
- if (this_end < size)
+restart:
+ if (this_end <= this_start || this_end < size)
continue;
+ for (; curr >= 0; curr--) {
+ if ((movablecore_map.map[curr].start_pfn << PAGE_SHIFT)
+ < this_end)
+ break;
+ }
+
cand = round_down(this_end - size, align);
+ if (curr >= 0 &&
+ cand < movablecore_map.map[curr].end_pfn << PAGE_SHIFT) {
+ this_end = movablecore_map.map[curr].start_pfn
+ << PAGE_SHIFT;
+ goto restart;
+ }
+
if (cand >= this_start)
return cand;
}
+
return 0;
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
` (4 preceding siblings ...)
2013-01-14 9:15 ` [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
@ 2013-01-14 17:31 ` H. Peter Anvin
2013-01-14 22:34 ` Andrew Morton
5 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-14 17:31 UTC (permalink / raw)
To: Tang Chen
Cc: akpm, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer, linux-kernel, linux-mm
On 01/14/2013 01:15 AM, Tang Chen wrote:
>
> For now, users can disable this functionality by not specifying the boot option.
> Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
> to using SRAT.
>
I still think the option "movablecore_map" is uglier than hell. "core"
could just as easily refer to CPU cores there, but it is a memory mem.
"movablemem" seems more appropriate.
Again, without SRAT I consider this patchset to be largely useless for
anything other than prototyping work.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
@ 2013-01-14 22:34 ` Andrew Morton
2013-01-14 22:41 ` Luck, Tony
2013-01-15 0:05 ` Toshi Kani
0 siblings, 2 replies; 36+ messages in thread
From: Andrew Morton @ 2013-01-14 22:34 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Tang Chen, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai,
isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
glommer, linux-kernel, linux-mm
On Mon, 14 Jan 2013 09:31:33 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:
> On 01/14/2013 01:15 AM, Tang Chen wrote:
> >
> > For now, users can disable this functionality by not specifying the boot option.
> > Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
> > to using SRAT.
> >
>
> I still think the option "movablecore_map" is uglier than hell. "core"
> could just as easily refer to CPU cores there, but it is a memory mem.
> "movablemem" seems more appropriate.
>
> Again, without SRAT I consider this patchset to be largely useless for
> anything other than prototyping work.
>
hm, why. Obviously SRAT support will improve things, but is it
actually unusable/unuseful with the command line configuration?
Also, "But even if we can use SRAT, users still need an interface to
enable/disable this functionality if they don't want to loose their
NUMA performance. So I think, an user interface is always needed."
There's also the matter of other architectures. Has any thought been
given to how (eg) powerpc would hook into here?
And what about VMs (xen, KVM)? I wonder if there is a case for those
to implement memory hotplug.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 22:34 ` Andrew Morton
@ 2013-01-14 22:41 ` Luck, Tony
2013-01-14 22:46 ` Andrew Morton
2013-01-15 1:23 ` Yasuaki Ishimatsu
2013-01-15 0:05 ` Toshi Kani
1 sibling, 2 replies; 36+ messages in thread
From: Luck, Tony @ 2013-01-14 22:41 UTC (permalink / raw)
To: Andrew Morton, H. Peter Anvin
Cc: Tang Chen, jiang.liu@huawei.com, wujianguo@huawei.com,
wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org,
isimatu.yasuaki@jp.fujitsu.com, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
> hm, why. Obviously SRAT support will improve things, but is it
> actually unusable/unuseful with the command line configuration?
Users will want to set these moveable zones along node boundaries
(the whole purpose is to be able to remove a node by making sure
the kernel won't allocate anything tricky in it, right?) So raw addresses
are usable ... but to get them right the user will have to go parse the
SRAT table manually to come up with the addresses. Any time you
make the user go off and do some tedious calculation that the computer
should have done for them is user-abuse.
-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 22:41 ` Luck, Tony
@ 2013-01-14 22:46 ` Andrew Morton
2013-01-16 6:25 ` Yasuaki Ishimatsu
2013-01-15 1:23 ` Yasuaki Ishimatsu
1 sibling, 1 reply; 36+ messages in thread
From: Andrew Morton @ 2013-01-14 22:46 UTC (permalink / raw)
To: Luck, Tony
Cc: H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org,
isimatu.yasuaki@jp.fujitsu.com, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
On Mon, 14 Jan 2013 22:41:03 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:
> > hm, why. Obviously SRAT support will improve things, but is it
> > actually unusable/unuseful with the command line configuration?
>
> Users will want to set these moveable zones along node boundaries
> (the whole purpose is to be able to remove a node by making sure
> the kernel won't allocate anything tricky in it, right?) So raw addresses
> are usable ... but to get them right the user will have to go parse the
> SRAT table manually to come up with the addresses. Any time you
> make the user go off and do some tedious calculation that the computer
> should have done for them is user-abuse.
>
Sure. But SRAT configuration is in progress and the boot option is
better than nothing?
Things I'm wondering:
- is there *really* a case for retaining the boot option if/when
SRAT support is available?
- will the boot option be needed for other archictectures, presumably
because they don't provide sufficient layout information to the
kernel?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 22:46 ` Andrew Morton
@ 2013-01-16 6:25 ` Yasuaki Ishimatsu
2013-01-16 21:29 ` Andrew Morton
0 siblings, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-16 6:25 UTC (permalink / raw)
To: Andrew Morton
Cc: Luck, Tony, H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
2013/01/15 7:46, Andrew Morton wrote:
> On Mon, 14 Jan 2013 22:41:03 +0000
> "Luck, Tony" <tony.luck@intel.com> wrote:
>
>>> hm, why. Obviously SRAT support will improve things, but is it
>>> actually unusable/unuseful with the command line configuration?
>>
>> Users will want to set these moveable zones along node boundaries
>> (the whole purpose is to be able to remove a node by making sure
>> the kernel won't allocate anything tricky in it, right?) So raw addresses
>> are usable ... but to get them right the user will have to go parse the
>> SRAT table manually to come up with the addresses. Any time you
>> make the user go off and do some tedious calculation that the computer
>> should have done for them is user-abuse.
>>
>
> Sure. But SRAT configuration is in progress and the boot option is
> better than nothing?
Yes. I think boot option which specifies memory range is necessary.
>
> Things I'm wondering:
>
> - is there *really* a case for retaining the boot option if/when
> SRAT support is available?
Yes. If SRAT support is available, all memory which enabled hotpluggable
bit are managed by ZONEMOVABLE. But performance degradation may
occur by NUMA because we can only allocate anonymous page and page-cache
from these memory.
In this case, if user cannot change SRAT information, user needs a way to
select/set removable memory manually.
Thanks,
Yasuaki Ishimatsu
>
> - will the boot option be needed for other archictectures, presumably
> because they don't provide sufficient layout information to the
> kernel?
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 6:25 ` Yasuaki Ishimatsu
@ 2013-01-16 21:29 ` Andrew Morton
2013-01-16 22:01 ` KOSAKI Motohiro
2013-01-16 22:52 ` H. Peter Anvin
0 siblings, 2 replies; 36+ messages in thread
From: Andrew Morton @ 2013-01-16 21:29 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: Luck, Tony, H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
On Wed, 16 Jan 2013 15:25:44 +0900
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
> >
> > Things I'm wondering:
> >
> > - is there *really* a case for retaining the boot option if/when
> > SRAT support is available?
>
> Yes. If SRAT support is available, all memory which enabled hotpluggable
> bit are managed by ZONEMOVABLE. But performance degradation may
> occur by NUMA because we can only allocate anonymous page and page-cache
> from these memory.
>
> In this case, if user cannot change SRAT information, user needs a way to
> select/set removable memory manually.
If I understand this correctly you mean that once SRAT parsing is
implemented, the user can use movablecore_map to override that SRAT
parsing, yes? That movablecore_map will take precedence over SRAT?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 21:29 ` Andrew Morton
@ 2013-01-16 22:01 ` KOSAKI Motohiro
2013-01-16 23:00 ` H. Peter Anvin
2013-01-16 22:52 ` H. Peter Anvin
1 sibling, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-16 22:01 UTC (permalink / raw)
To: akpm
Cc: isimatu.yasuaki, tony.luck, hpa, tangchen, jiang.liu, wujianguo,
wency, laijs, linfeng, yinghai, rob, kosaki.motohiro, minchan.kim,
mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
glommer, linux-kernel, linux-mm
On 1/16/2013 4:29 PM, Andrew Morton wrote:
> On Wed, 16 Jan 2013 15:25:44 +0900
> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>>>
>>> Things I'm wondering:
>>>
>>> - is there *really* a case for retaining the boot option if/when
>>> SRAT support is available?
>>
>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>> bit are managed by ZONEMOVABLE. But performance degradation may
>> occur by NUMA because we can only allocate anonymous page and page-cache
>> from these memory.
>>
>> In this case, if user cannot change SRAT information, user needs a way to
>> select/set removable memory manually.
>
> If I understand this correctly you mean that once SRAT parsing is
> implemented, the user can use movablecore_map to override that SRAT
> parsing, yes? That movablecore_map will take precedence over SRAT?
I think movablecore_map (I prefer movablemem than it, btw) should behave so.
because of, for past three years, almost all memory hotplug bug was handled
only I and kamezawa-san and, afaik, both don't have hotremove aware specific
hardware.
So, if the new feature require specific hardware, we can't maintain this area
any more.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 22:01 ` KOSAKI Motohiro
@ 2013-01-16 23:00 ` H. Peter Anvin
2013-01-17 20:27 ` KOSAKI Motohiro
0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-16 23:00 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: akpm, isimatu.yasuaki, tony.luck, tangchen, jiang.liu, wujianguo,
wency, laijs, linfeng, yinghai, rob, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, glommer,
linux-kernel, linux-mm
On 01/16/2013 02:01 PM, KOSAKI Motohiro wrote:
>>>>
>>>> Things I'm wondering:
>>>>
>>>> - is there *really* a case for retaining the boot option if/when
>>>> SRAT support is available?
>>>
>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>> from these memory.
>>>
>>> In this case, if user cannot change SRAT information, user needs a way to
>>> select/set removable memory manually.
>>
>> If I understand this correctly you mean that once SRAT parsing is
>> implemented, the user can use movablecore_map to override that SRAT
>> parsing, yes? That movablecore_map will take precedence over SRAT?
>
> I think movablecore_map (I prefer movablemem than it, btw) should behave so.
> because of, for past three years, almost all memory hotplug bug was handled
> only I and kamezawa-san and, afaik, both don't have hotremove aware specific
> hardware.
>
> So, if the new feature require specific hardware, we can't maintain this area
> any more.
>
It is more so than that: the design principle should always be that
lower-level directives, if present, take precedence over higher-level
directives. The reason for that should be pretty obvious: one of the
main uses of the low-level directives is to override the high-level
directives due to bugs or debugging needs.
-hpa
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 23:00 ` H. Peter Anvin
@ 2013-01-17 20:27 ` KOSAKI Motohiro
0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-17 20:27 UTC (permalink / raw)
To: hpa
Cc: kosaki.motohiro, akpm, isimatu.yasuaki, tony.luck, tangchen,
jiang.liu, wujianguo, wency, laijs, linfeng, yinghai, rob,
minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
jaegeuk.hanse, glommer, linux-kernel, linux-mm
On 1/16/2013 6:00 PM, H. Peter Anvin wrote:
> On 01/16/2013 02:01 PM, KOSAKI Motohiro wrote:
>>>>>
>>>>> Things I'm wondering:
>>>>>
>>>>> - is there *really* a case for retaining the boot option if/when
>>>>> SRAT support is available?
>>>>
>>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>>> from these memory.
>>>>
>>>> In this case, if user cannot change SRAT information, user needs a way to
>>>> select/set removable memory manually.
>>>
>>> If I understand this correctly you mean that once SRAT parsing is
>>> implemented, the user can use movablecore_map to override that SRAT
>>> parsing, yes? That movablecore_map will take precedence over SRAT?
>>
>> I think movablecore_map (I prefer movablemem than it, btw) should behave so.
>> because of, for past three years, almost all memory hotplug bug was handled
>> only I and kamezawa-san and, afaik, both don't have hotremove aware specific
>> hardware.
>>
>> So, if the new feature require specific hardware, we can't maintain this area
>> any more.
>>
>
> It is more so than that: the design principle should always be that
> lower-level directives, if present, take precedence over higher-level
> directives. The reason for that should be pretty obvious: one of the
> main uses of the low-level directives is to override the high-level
> directives due to bugs or debugging needs.
My opinion is close to Kani-san@HP. automatic configuration (i.e. reading
firmware infomation) is best for regular user and low level tunable parameter
is best for developer and workaround of firmware bugs.
Perhaps higher level interface may help some corner case but perhaps not. I mean
I don't put any objection to create higher level interface. I only said I myself
haven't observed such use case. so then i have no opinion about that. So, I wouldn't
join interface discussion even though I don't dislike it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 21:29 ` Andrew Morton
2013-01-16 22:01 ` KOSAKI Motohiro
@ 2013-01-16 22:52 ` H. Peter Anvin
2013-01-17 1:49 ` Tang Chen
2013-01-17 5:08 ` Yasuaki Ishimatsu
1 sibling, 2 replies; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-16 22:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Yasuaki Ishimatsu, Luck, Tony, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>
>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>> bit are managed by ZONEMOVABLE. But performance degradation may
>> occur by NUMA because we can only allocate anonymous page and page-cache
>> from these memory.
>>
>> In this case, if user cannot change SRAT information, user needs a way to
>> select/set removable memory manually.
>
> If I understand this correctly you mean that once SRAT parsing is
> implemented, the user can use movablecore_map to override that SRAT
> parsing, yes? That movablecore_map will take precedence over SRAT?
>
Yes, but we still need a higher-level user interface which specifies
which nodes, not which memory ranges, should be movable. That is the
policy granularity that is actually appropriate for the administrator
(trading off performance vs reliability.)
-hpa
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 22:52 ` H. Peter Anvin
@ 2013-01-17 1:49 ` Tang Chen
2013-01-17 20:20 ` KOSAKI Motohiro
2013-01-17 5:08 ` Yasuaki Ishimatsu
1 sibling, 1 reply; 36+ messages in thread
From: Tang Chen @ 2013-01-17 1:49 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andrew Morton, Yasuaki Ishimatsu, Luck, Tony,
jiang.liu@huawei.com, wujianguo@huawei.com, wency@cn.fujitsu.com,
laijs@cn.fujitsu.com, linfeng@cn.fujitsu.com, yinghai@kernel.org,
rob@landley.net, kosaki.motohiro@jp.fujitsu.com,
minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
jaegeuk.hanse@gmail.com, glommer@parallels.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
On 01/17/2013 06:52 AM, H. Peter Anvin wrote:
> On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>>
>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>> from these memory.
>>>
>>> In this case, if user cannot change SRAT information, user needs a way to
>>> select/set removable memory manually.
>>
>> If I understand this correctly you mean that once SRAT parsing is
>> implemented, the user can use movablecore_map to override that SRAT
>> parsing, yes? That movablecore_map will take precedence over SRAT?
>>
>
> Yes,
Hi HPA, Andrew,
No, I don't think so. In my [PATCH v4 3/6], I checked if users specified the
unhotpluggable memory ranges, I will remove them from movablecore_map.map[].
So this option will not override SRAT.
It works like this:
hotpluggable ranges: |-----------------|
unhotpluggable ranges: |-----| |--------|
user specified ranges: |---| |--------------------|
movablecore_map.map[]: |------------|
Please refer to https://lkml.org/lkml/2012/12/19/53.
But in this v5 patch-set, I remove all SRAT related code. So this v5 users'
option will override SRAT.
Thanks. :)
>but we still need a higher-level user interface which specifies
> which nodes, not which memory ranges, should be movable. That is the
> policy granularity that is actually appropriate for the administrator
> (trading off performance vs reliability.)
>
> -hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-17 1:49 ` Tang Chen
@ 2013-01-17 20:20 ` KOSAKI Motohiro
0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-17 20:20 UTC (permalink / raw)
To: tangchen
Cc: hpa, akpm, isimatu.yasuaki, tony.luck, jiang.liu, wujianguo,
wency, laijs, linfeng, yinghai, rob, kosaki.motohiro, minchan.kim,
mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
glommer, linux-kernel, linux-mm
On 1/16/2013 8:49 PM, Tang Chen wrote:
> On 01/17/2013 06:52 AM, H. Peter Anvin wrote:
>> On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>>>
>>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>>> from these memory.
>>>>
>>>> In this case, if user cannot change SRAT information, user needs a way to
>>>> select/set removable memory manually.
>>>
>>> If I understand this correctly you mean that once SRAT parsing is
>>> implemented, the user can use movablecore_map to override that SRAT
>>> parsing, yes? That movablecore_map will take precedence over SRAT?
>>>
>>
>> Yes,
>
> Hi HPA, Andrew,
>
> No, I don't think so. In my [PATCH v4 3/6], I checked if users specified the
> unhotpluggable memory ranges, I will remove them from movablecore_map.map[].
> So this option will not override SRAT.
>
> It works like this:
>
> hotpluggable ranges: |-----------------|
> unhotpluggable ranges: |-----| |--------|
> user specified ranges: |---| |--------------------|
> movablecore_map.map[]: |------------|
>
> Please refer to https://lkml.org/lkml/2012/12/19/53.
>
> But in this v5 patch-set, I remove all SRAT related code. So this v5 users'
> option will override SRAT.
Again, boot option is often used for workaround of firmware bugs. so, if you
make a boot option, it should be override firmware info.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-16 22:52 ` H. Peter Anvin
2013-01-17 1:49 ` Tang Chen
@ 2013-01-17 5:08 ` Yasuaki Ishimatsu
2013-01-17 6:03 ` H. Peter Anvin
1 sibling, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-17 5:08 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andrew Morton, Luck, Tony, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
2013/01/17 7:52, H. Peter Anvin wrote:
> On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>>
>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>> from these memory.
>>>
>>> In this case, if user cannot change SRAT information, user needs a way to
>>> select/set removable memory manually.
>>
>> If I understand this correctly you mean that once SRAT parsing is
>> implemented, the user can use movablecore_map to override that SRAT
>> parsing, yes? That movablecore_map will take precedence over SRAT?
>>
>
> Yes, but we still need a higher-level user interface which specifies
> which nodes, not which memory ranges, should be movable.
I thought about the method of specifying the node. But I think
this method is inconvenience. Node number is decided by OS.
So the number is changed easily.
for example:
o exmaple 1
System has 3 nodes:
node0, node1, node2
When user remove node1, the system has:
node0, node2
But after rebooting the system, the system has:
node0, node1
So node2 becomes node1.
o example 2:
System has 2 nodes:
0x40000000 - 0x7fffffff : node0
0xc0000000 - 0xffffffff : node1
When user add a node wchih memory range is [0x80000000 - 0xbfffffff],
system has:
0x40000000 - 0x7fffffff : node0
0xc0000000 - 0xffffffff : node1
0x80000000 - 0xbfffffff : node2
But after rebooting the system, the system's node may become:
0x40000000 - 0x7fffffff : node0
0x80000000 - 0xbfffffff : node1
0xc0000000 - 0xffffffff : node2
So node nunber is changed.
Specifying node number may be easy method than specifying memory
range. But if user uses node number for specifying removable memory,
user always need to care whether node number is changed or not at
every hotplug operation.
Thanks,
Yasuaki Ishimatsu
> That is the
> policy granularity that is actually appropriate for the administrator
> (trading off performance vs reliability.)
>
> -hpa
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-17 5:08 ` Yasuaki Ishimatsu
@ 2013-01-17 6:03 ` H. Peter Anvin
2013-01-17 16:30 ` Luck, Tony
0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-17 6:03 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: Andrew Morton, Luck, Tony, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
On 01/16/2013 09:08 PM, Yasuaki Ishimatsu wrote:
>
> I thought about the method of specifying the node. But I think
> this method is inconvenience. Node number is decided by OS.
> So the number is changed easily.
>
> for example:
>
> o exmaple 1
> System has 3 nodes:
> node0, node1, node2
>
> When user remove node1, the system has:
> node0, node2
>
> But after rebooting the system, the system has:
> node0, node1
>
> So node2 becomes node1.
>
> o example 2:
> System has 2 nodes:
> 0x40000000 - 0x7fffffff : node0
> 0xc0000000 - 0xffffffff : node1
>
> When user add a node wchih memory range is [0x80000000 - 0xbfffffff],
> system has:
> 0x40000000 - 0x7fffffff : node0
> 0xc0000000 - 0xffffffff : node1
> 0x80000000 - 0xbfffffff : node2
>
> But after rebooting the system, the system's node may become:
> 0x40000000 - 0x7fffffff : node0
> 0x80000000 - 0xbfffffff : node1
> 0xc0000000 - 0xffffffff : node2
>
> So node nunber is changed.
>
> Specifying node number may be easy method than specifying memory
> range. But if user uses node number for specifying removable memory,
> user always need to care whether node number is changed or not at
> every hotplug operation.
>
Well, there are only two options:
1. The user doesn't care which nodes are movable. In that case, the
user may just want to specify a target as a percentage of memory to make
movable -- effectively a "slider" on the performance vs. reliability
spectrum. The kernel can then assign nodes arbitrarily.
2. If the user *does* care which nodes are movable, then the user needs
to be able to specify that *in a way that makes sense to the user*.
This may mean involving the DMI information as well as SRAT in order to
get "silk screen" type information out.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-17 6:03 ` H. Peter Anvin
@ 2013-01-17 16:30 ` Luck, Tony
2013-01-17 20:28 ` KOSAKI Motohiro
0 siblings, 1 reply; 36+ messages in thread
From: Luck, Tony @ 2013-01-17 16:30 UTC (permalink / raw)
To: H. Peter Anvin, Yasuaki Ishimatsu
Cc: Andrew Morton, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
> 2. If the user *does* care which nodes are movable, then the user needs
> to be able to specify that *in a way that makes sense to the user*.
> This may mean involving the DMI information as well as SRAT in order to
> get "silk screen" type information out.
One reason they might care would be which I/O devices are connected
to each node. DMI might be a good way to get an invariant name for the
node, but they might also want to specify in terms of what they actually
want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs - don't
mark both these nodes as removable". Though this is almost certainly not
a job for kernel options, but for some user configuration tool that would
spit out the DMI names.
-Tony
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-17 16:30 ` Luck, Tony
@ 2013-01-17 20:28 ` KOSAKI Motohiro
2013-01-18 6:05 ` Yasuaki Ishimatsu
0 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-17 20:28 UTC (permalink / raw)
To: tony.luck
Cc: hpa, isimatu.yasuaki, akpm, tangchen, jiang.liu, wujianguo, wency,
laijs, linfeng, yinghai, rob, kosaki.motohiro, minchan.kim,
mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
glommer, linux-kernel, linux-mm
On 1/17/2013 11:30 AM, Luck, Tony wrote:
>> 2. If the user *does* care which nodes are movable, then the user needs
>> to be able to specify that *in a way that makes sense to the user*.
>> This may mean involving the DMI information as well as SRAT in order to
>> get "silk screen" type information out.
>
> One reason they might care would be which I/O devices are connected
> to each node. DMI might be a good way to get an invariant name for the
> node, but they might also want to specify in terms of what they actually
> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs - don't
> mark both these nodes as removable". Though this is almost certainly not
> a job for kernel options, but for some user configuration tool that would
> spit out the DMI names.
I agree DMI parsing should be done in userland if we really need DMI parsing.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-17 20:28 ` KOSAKI Motohiro
@ 2013-01-18 6:05 ` Yasuaki Ishimatsu
2013-01-18 6:25 ` H. Peter Anvin
0 siblings, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-18 6:05 UTC (permalink / raw)
To: KOSAKI Motohiro, tony.luck, hpa
Cc: akpm, tangchen, jiang.liu, wujianguo, wency, laijs, linfeng,
yinghai, rob, minchan.kim, mgorman, rientjes, guz.fnst, rusty,
lliubbo, jaegeuk.hanse, glommer, linux-kernel, linux-mm
2013/01/18 5:28, KOSAKI Motohiro wrote:
> On 1/17/2013 11:30 AM, Luck, Tony wrote:
>>> 2. If the user *does* care which nodes are movable, then the user needs
>>> to be able to specify that *in a way that makes sense to the user*.
>>> This may mean involving the DMI information as well as SRAT in order to
>>> get "silk screen" type information out.
>>
>> One reason they might care would be which I/O devices are connected
>> to each node. DMI might be a good way to get an invariant name for the
>> node, but they might also want to specify in terms of what they actually
>> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs - don't
>> mark both these nodes as removable". Though this is almost certainly not
>> a job for kernel options, but for some user configuration tool that would
>> spit out the DMI names.
>
> I agree DMI parsing should be done in userland if we really need DMI parsing.
>
If users use the boot parameter for bugs or debugging, users need
a method which sets in detail range of movable memory. So specifying
node number is not enough because whole memory becomes movable memory.
For this, we are discussing other ways, memory range and DMI information.
By using DMI information, users may get an invariant name. But is it
really user friendly interface? I don't think so.
You will think using memory range is not user friendly interface too.
But I think that using memory range is friendlier than using DMI
information since we can get easily memory range. So from developper
side, using memory range is good.
Of course, using SRAT information is necessary solution. So we are
developing it now.
Thanks,
Yasuaki Ishimatsu
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 6:05 ` Yasuaki Ishimatsu
@ 2013-01-18 6:25 ` H. Peter Anvin
2013-01-18 7:38 ` Yasuaki Ishimatsu
0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-18 6:25 UTC (permalink / raw)
To: Yasuaki Ishimatsu, KOSAKI Motohiro, tony.luck
Cc: akpm, tangchen, jiang.liu, wujianguo, wency, laijs, linfeng,
yinghai, rob, minchan.kim, mgorman, rientjes, guz.fnst, rusty,
lliubbo, jaegeuk.hanse, glommer, linux-kernel, linux-mm
We already do DMI parsing in the kernel...
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>2013/01/18 5:28, KOSAKI Motohiro wrote:
>> On 1/17/2013 11:30 AM, Luck, Tony wrote:
>>>> 2. If the user *does* care which nodes are movable, then the user
>needs
>>>> to be able to specify that *in a way that makes sense to the user*.
>>>> This may mean involving the DMI information as well as SRAT in
>order to
>>>> get "silk screen" type information out.
>>>
>>> One reason they might care would be which I/O devices are connected
>>> to each node. DMI might be a good way to get an invariant name for
>the
>>> node, but they might also want to specify in terms of what they
>actually
>>> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs -
>don't
>>> mark both these nodes as removable". Though this is almost
>certainly not
>>> a job for kernel options, but for some user configuration tool that
>would
>>> spit out the DMI names.
>>
>> I agree DMI parsing should be done in userland if we really need DMI
>parsing.
>>
>
>If users use the boot parameter for bugs or debugging, users need
>a method which sets in detail range of movable memory. So specifying
>node number is not enough because whole memory becomes movable memory.
>
>For this, we are discussing other ways, memory range and DMI
>information.
>By using DMI information, users may get an invariant name. But is it
>really user friendly interface? I don't think so.
>
>You will think using memory range is not user friendly interface too.
>But I think that using memory range is friendlier than using DMI
>information since we can get easily memory range. So from developper
>side, using memory range is good.
>
>Of course, using SRAT information is necessary solution. So we are
>developing it now.
>
>Thanks,
>Yasuaki Ishimatsu
--
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 6:25 ` H. Peter Anvin
@ 2013-01-18 7:38 ` Yasuaki Ishimatsu
2013-01-18 8:08 ` Tang Chen
0 siblings, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-18 7:38 UTC (permalink / raw)
To: H. Peter Anvin
Cc: KOSAKI Motohiro, tony.luck, akpm, tangchen, jiang.liu, wujianguo,
wency, laijs, linfeng, yinghai, rob, minchan.kim, mgorman,
rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, glommer,
linux-kernel, linux-mm
2013/01/18 15:25, H. Peter Anvin wrote:
> We already do DMI parsing in the kernel...
Thank you for giving the infomation.
Is your mention /sys/firmware/dmi/entries?
If so, my box does not have memory information.
My box has only type 0, 1, 2, 3, 4, 7, 8, 9, 38, 127 in DMI.
At least, my box cannot use the information...
If users use the boot parameter for investigating firmware bugs
or debugging, users cannot use DMI information on like my box.
Thanks,
Yasuaki Ishimatsu
>
> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> 2013/01/18 5:28, KOSAKI Motohiro wrote:
>>> On 1/17/2013 11:30 AM, Luck, Tony wrote:
>>>>> 2. If the user *does* care which nodes are movable, then the user
>> needs
>>>>> to be able to specify that *in a way that makes sense to the user*.
>>>>> This may mean involving the DMI information as well as SRAT in
>> order to
>>>>> get "silk screen" type information out.
>>>>
>>>> One reason they might care would be which I/O devices are connected
>>>> to each node. DMI might be a good way to get an invariant name for
>> the
>>>> node, but they might also want to specify in terms of what they
>> actually
>>>> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs -
>> don't
>>>> mark both these nodes as removable". Though this is almost
>> certainly not
>>>> a job for kernel options, but for some user configuration tool that
>> would
>>>> spit out the DMI names.
>>>
>>> I agree DMI parsing should be done in userland if we really need DMI
>> parsing.
>>>
>>
>> If users use the boot parameter for bugs or debugging, users need
>> a method which sets in detail range of movable memory. So specifying
>> node number is not enough because whole memory becomes movable memory.
>>
>> For this, we are discussing other ways, memory range and DMI
>> information.
>> By using DMI information, users may get an invariant name. But is it
>> really user friendly interface? I don't think so.
>>
>> You will think using memory range is not user friendly interface too.
>> But I think that using memory range is friendlier than using DMI
>> information since we can get easily memory range. So from developper
>> side, using memory range is good.
>>
>> Of course, using SRAT information is necessary solution. So we are
>> developing it now.
>>
>> Thanks,
>> Yasuaki Ishimatsu
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 7:38 ` Yasuaki Ishimatsu
@ 2013-01-18 8:08 ` Tang Chen
2013-01-18 9:23 ` li guang
0 siblings, 1 reply; 36+ messages in thread
From: Tang Chen @ 2013-01-18 8:08 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: H. Peter Anvin, KOSAKI Motohiro, tony.luck, akpm, jiang.liu,
wujianguo, wency, laijs, linfeng, yinghai, rob, minchan.kim,
mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
glommer, linux-kernel, linux-mm
On 01/18/2013 03:38 PM, Yasuaki Ishimatsu wrote:
> 2013/01/18 15:25, H. Peter Anvin wrote:
>> We already do DMI parsing in the kernel...
>
> Thank you for giving the infomation.
>
> Is your mention /sys/firmware/dmi/entries?
>
> If so, my box does not have memory information.
> My box has only type 0, 1, 2, 3, 4, 7, 8, 9, 38, 127 in DMI.
> At least, my box cannot use the information...
>
> If users use the boot parameter for investigating firmware bugs
> or debugging, users cannot use DMI information on like my box.
And seeing from Documentation/ABI/testing/sysfs-firmware-dmi,
The kernel itself does not rely on the majority of the
information in these tables being correct. It equally
cannot ensure that the data as exported to userland is
without error either.
So when users are doing debug, they should not rely on this info.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 8:08 ` Tang Chen
@ 2013-01-18 9:23 ` li guang
2013-01-18 18:29 ` Luck, Tony
0 siblings, 1 reply; 36+ messages in thread
From: li guang @ 2013-01-18 9:23 UTC (permalink / raw)
To: Tang Chen
Cc: Yasuaki Ishimatsu, H. Peter Anvin, KOSAKI Motohiro, tony.luck,
akpm, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai, rob,
minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
jaegeuk.hanse, glommer, linux-kernel, linux-mm
在 2013-01-18五的 16:08 +0800,Tang Chen写道:
> On 01/18/2013 03:38 PM, Yasuaki Ishimatsu wrote:
> > 2013/01/18 15:25, H. Peter Anvin wrote:
> >> We already do DMI parsing in the kernel...
> >
> > Thank you for giving the infomation.
> >
> > Is your mention /sys/firmware/dmi/entries?
> >
> > If so, my box does not have memory information.
> > My box has only type 0, 1, 2, 3, 4, 7, 8, 9, 38, 127 in DMI.
> > At least, my box cannot use the information...
> >
> > If users use the boot parameter for investigating firmware bugs
> > or debugging, users cannot use DMI information on like my box.
>
> And seeing from Documentation/ABI/testing/sysfs-firmware-dmi,
>
> The kernel itself does not rely on the majority of the
> information in these tables being correct. It equally
> cannot ensure that the data as exported to userland is
> without error either.
>
> So when users are doing debug, they should not rely on this info.
kernel absolutely should not care much about SMBIOS(DMI info),
AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
mostly only on demand when OEMs required SMBIOS to report some
specific info.
furthermore, SMBIOS is so old and benifit nobody(in my personal
opinion), so maybe let's forget it.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
regards!
li guang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 9:23 ` li guang
@ 2013-01-18 18:29 ` Luck, Tony
2013-01-19 1:06 ` Jiang Liu
2013-01-21 7:36 ` Yasuaki Ishimatsu
0 siblings, 2 replies; 36+ messages in thread
From: Luck, Tony @ 2013-01-18 18:29 UTC (permalink / raw)
To: li guang, Tang Chen
Cc: Yasuaki Ishimatsu, H. Peter Anvin, KOSAKI Motohiro,
akpm@linux-foundation.org, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
jaegeuk.hanse@gmail.com, glommer@parallels.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
> kernel absolutely should not care much about SMBIOS(DMI info),
> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
> mostly only on demand when OEMs required SMBIOS to report some
> specific info.
> furthermore, SMBIOS is so old and benifit nobody(in my personal
> opinion), so maybe let's forget it.
The "not having right information" flaw could be fixed by OEMs selling
systems on which it is important for system functionality that it be right.
They could use monetary incentives, contractual obligations, or sharp
pointy sticks to make their BIOS vendor get the table right.
BUT there is a bigger flaw - SMBIOS is a static table with no way to
update it in response to hotplug events. So it could in theory have the
right information at boot time ... there is no possible way for it to be
right as soon as somebody adds, removes or replaces hardware.
-Tony
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 18:29 ` Luck, Tony
@ 2013-01-19 1:06 ` Jiang Liu
2013-01-19 7:52 ` Chen Gong
2013-01-21 7:36 ` Yasuaki Ishimatsu
1 sibling, 1 reply; 36+ messages in thread
From: Jiang Liu @ 2013-01-19 1:06 UTC (permalink / raw)
To: Luck, Tony
Cc: li guang, Tang Chen, Yasuaki Ishimatsu, H. Peter Anvin,
KOSAKI Motohiro, akpm@linux-foundation.org, wujianguo@huawei.com,
wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
jaegeuk.hanse@gmail.com, glommer@parallels.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
On 2013-1-19 2:29, Luck, Tony wrote:
>> kernel absolutely should not care much about SMBIOS(DMI info),
>> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
>> mostly only on demand when OEMs required SMBIOS to report some
>> specific info.
>> furthermore, SMBIOS is so old and benifit nobody(in my personal
>> opinion), so maybe let's forget it.
>
> The "not having right information" flaw could be fixed by OEMs selling
> systems on which it is important for system functionality that it be right.
> They could use monetary incentives, contractual obligations, or sharp
> pointy sticks to make their BIOS vendor get the table right.
>
> BUT there is a bigger flaw - SMBIOS is a static table with no way to
> update it in response to hotplug events. So it could in theory have the
> right information at boot time ... there is no possible way for it to be
> right as soon as somebody adds, removes or replaces hardware.
SMBIOS plays an important role when we are trying to do hardware fault
management, because OS needs information from SMBIOS to physically
identify a component/FRU. I also remember there were efforts to extend
SMBIOS specification to dynamically update the SMBIOS table when hotplug
happens.
Regards!
Gerry
>
> -Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-19 1:06 ` Jiang Liu
@ 2013-01-19 7:52 ` Chen Gong
0 siblings, 0 replies; 36+ messages in thread
From: Chen Gong @ 2013-01-19 7:52 UTC (permalink / raw)
To: Jiang Liu
Cc: Luck, Tony, li guang, Tang Chen, Yasuaki Ishimatsu,
H. Peter Anvin, KOSAKI Motohiro, akpm@linux-foundation.org,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
jaegeuk.hanse@gmail.com, glommer@parallels.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
[-- Attachment #1: Type: text/plain, Size: 3298 bytes --]
On Sat, Jan 19, 2013 at 09:06:14AM +0800, Jiang Liu wrote:
> Date: Sat, 19 Jan 2013 09:06:14 +0800
> From: Jiang Liu <jiang.liu@huawei.com>
> To: "Luck, Tony" <tony.luck@intel.com>
> CC: li guang <lig.fnst@cn.fujitsu.com>, Tang Chen
> <tangchen@cn.fujitsu.com>, Yasuaki Ishimatsu
> <isimatu.yasuaki@jp.fujitsu.com>, "H. Peter Anvin" <hpa@zytor.com>, KOSAKI
> Motohiro <kosaki.motohiro@jp.fujitsu.com>, "akpm@linux-foundation.org"
> <akpm@linux-foundation.org>, "wujianguo@huawei.com"
> <wujianguo@huawei.com>, "wency@cn.fujitsu.com" <wency@cn.fujitsu.com>,
> "laijs@cn.fujitsu.com" <laijs@cn.fujitsu.com>, "linfeng@cn.fujitsu.com"
> <linfeng@cn.fujitsu.com>, "yinghai@kernel.org" <yinghai@kernel.org>,
> "rob@landley.net" <rob@landley.net>, "minchan.kim@gmail.com"
> <minchan.kim@gmail.com>, "mgorman@suse.de" <mgorman@suse.de>,
> "rientjes@google.com" <rientjes@google.com>, "guz.fnst@cn.fujitsu.com"
> <guz.fnst@cn.fujitsu.com>, "rusty@rustcorp.com.au"
> <rusty@rustcorp.com.au>, "lliubbo@gmail.com" <lliubbo@gmail.com>,
> "jaegeuk.hanse@gmail.com" <jaegeuk.hanse@gmail.com>,
> "glommer@parallels.com" <glommer@parallels.com>,
> "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> "linux-mm@kvack.org" <linux-mm@kvack.org>
> Subject: Re: [PATCH v5 0/5] Add movablecore_map boot option
> User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20111222
> Thunderbird/9.0.1
>
> On 2013-1-19 2:29, Luck, Tony wrote:
> >> kernel absolutely should not care much about SMBIOS(DMI info),
> >> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
> >> mostly only on demand when OEMs required SMBIOS to report some
> >> specific info.
> >> furthermore, SMBIOS is so old and benifit nobody(in my personal
> >> opinion), so maybe let's forget it.
> >
> > The "not having right information" flaw could be fixed by OEMs selling
> > systems on which it is important for system functionality that it be right.
> > They could use monetary incentives, contractual obligations, or sharp
> > pointy sticks to make their BIOS vendor get the table right.
> >
> > BUT there is a bigger flaw - SMBIOS is a static table with no way to
> > update it in response to hotplug events. So it could in theory have the
> > right information at boot time ... there is no possible way for it to be
> > right as soon as somebody adds, removes or replaces hardware.
>
> SMBIOS plays an important role when we are trying to do hardware fault
> management, because OS needs information from SMBIOS to physically
> identify a component/FRU. I also remember there were efforts to extend
> SMBIOS specification to dynamically update the SMBIOS table when hotplug
> happens.
Really, how to do it? Can you describe it clearly. BTW, if my understanding
is right, new Platform Memory Topology Table (PMTT) in ACPI5 should be for
this purpose but it doesn't exist in the older system so I want to know if
there is a workaround for older platform.
>
> Regards!
> Gerry
>
> >
> > -Tony
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-18 18:29 ` Luck, Tony
2013-01-19 1:06 ` Jiang Liu
@ 2013-01-21 7:36 ` Yasuaki Ishimatsu
1 sibling, 0 replies; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-21 7:36 UTC (permalink / raw)
To: Luck, Tony
Cc: li guang, Tang Chen, H. Peter Anvin, KOSAKI Motohiro,
akpm@linux-foundation.org, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
jaegeuk.hanse@gmail.com, glommer@parallels.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
2013/01/19 3:29, Luck, Tony wrote:
>> kernel absolutely should not care much about SMBIOS(DMI info),
>> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
>> mostly only on demand when OEMs required SMBIOS to report some
>> specific info.
>> furthermore, SMBIOS is so old and benifit nobody(in my personal
>> opinion), so maybe let's forget it.
>
> The "not having right information" flaw could be fixed by OEMs selling
> systems on which it is important for system functionality that it be right.
> They could use monetary incentives, contractual obligations, or sharp
> pointy sticks to make their BIOS vendor get the table right.
>
> BUT there is a bigger flaw - SMBIOS is a static table with no way to
> update it in response to hotplug events. So it could in theory have the
> right information at boot time ... there is no possible way for it to be
> right as soon as somebody adds, removes or replaces hardware.
Using DMI information depends on firmware strongly. So even if we
implement boot option which uses DMI information for specifying memory
range as Movable zone, we cannot use it on our box. Other users may
hit same problem.
So we want to keep a current boot option which specifies memory range
since user can know memory address on every box.
Thanks,
Yasuaki Ishimatsu
>
> -Tony
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 22:41 ` Luck, Tony
2013-01-14 22:46 ` Andrew Morton
@ 2013-01-15 1:23 ` Yasuaki Ishimatsu
2013-01-15 3:44 ` H. Peter Anvin
1 sibling, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-15 1:23 UTC (permalink / raw)
To: Luck, Tony
Cc: Andrew Morton, H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
2013/01/15 7:41, Luck, Tony wrote:
>> hm, why. Obviously SRAT support will improve things, but is it
>> actually unusable/unuseful with the command line configuration?
>
> Users will want to set these moveable zones along node boundaries
> (the whole purpose is to be able to remove a node by making sure
> the kernel won't allocate anything tricky in it, right?)
Yes
> So raw addresses
> are usable ... but to get them right the user will have to go parse the
> SRAT table manually to come up with the addresses.
I don't think so because user can easily get raw address by kernel
message in x86.
Here are kernel messages of x86 architecture.
---
[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x7ffffffff]
[ 0.000000] SRAT: Node 1 PXM 2 [mem 0x1000000000-0x17ffffffff]
[ 0.000000] SRAT: Node 2 PXM 3 [mem 0x1800000000-0x1fffffffff]
[ 0.000000] SRAT: Node 3 PXM 4 [mem 0x2000000000-0x27ffffffff]
[ 0.000000] SRAT: Node 4 PXM 5 [mem 0x2800000000-0x2fffffffff]
[ 0.000000] SRAT: Node 5 PXM 6 [mem 0x3000000000-0x37ffffffff]
[ 0.000000] SRAT: Node 6 PXM 7 [mem 0x3800000000-0x3fffffffff]
[ 0.000000] SRAT: Node 7 PXM 1 [mem 0x800000000-0xfffffffff]
---
Thanks,
Yasuaki Ishimatsu
> Any time you
> make the user go off and do some tedious calculation that the computer
> should have done for them is user-abuse.
>
> -Tony
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-15 1:23 ` Yasuaki Ishimatsu
@ 2013-01-15 3:44 ` H. Peter Anvin
2013-01-15 4:04 ` Luck, Tony
0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-15 3:44 UTC (permalink / raw)
To: Yasuaki Ishimatsu, Luck, Tony
Cc: Andrew Morton, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
That *is* user abuse.
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>2013/01/15 7:41, Luck, Tony wrote:
>>> hm, why. Obviously SRAT support will improve things, but is it
>>> actually unusable/unuseful with the command line configuration?
>>
>
>> Users will want to set these moveable zones along node boundaries
>> (the whole purpose is to be able to remove a node by making sure
>> the kernel won't allocate anything tricky in it, right?)
>
>Yes
>
>> So raw addresses
>> are usable ... but to get them right the user will have to go parse
>the
>> SRAT table manually to come up with the addresses.
>
>I don't think so because user can easily get raw address by kernel
>message in x86.
>
>Here are kernel messages of x86 architecture.
>---
>[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
>[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x7ffffffff]
>[ 0.000000] SRAT: Node 1 PXM 2 [mem 0x1000000000-0x17ffffffff]
>[ 0.000000] SRAT: Node 2 PXM 3 [mem 0x1800000000-0x1fffffffff]
>[ 0.000000] SRAT: Node 3 PXM 4 [mem 0x2000000000-0x27ffffffff]
>[ 0.000000] SRAT: Node 4 PXM 5 [mem 0x2800000000-0x2fffffffff]
>[ 0.000000] SRAT: Node 5 PXM 6 [mem 0x3000000000-0x37ffffffff]
>[ 0.000000] SRAT: Node 6 PXM 7 [mem 0x3800000000-0x3fffffffff]
>[ 0.000000] SRAT: Node 7 PXM 1 [mem 0x800000000-0xfffffffff]
>---
>
>Thanks,
>Yasuaki Ishimatsu
>
>> Any time you
>> make the user go off and do some tedious calculation that the
>computer
>> should have done for them is user-abuse.
>>
>> -Tony
>>
--
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-15 3:44 ` H. Peter Anvin
@ 2013-01-15 4:04 ` Luck, Tony
0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2013-01-15 4:04 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, jiang.liu@huawei.com,
wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
glommer@parallels.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
>>
>> I don't think so because user can easily get raw address by kernel
>> message in x86.
>>
Which will fail if on some subsequent boot a DIMM fails BIST and is removed from the memory map by the BIOS which will then change all the mode boundaries for those above the failed DIMM.
-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v5 0/5] Add movablecore_map boot option
2013-01-14 22:34 ` Andrew Morton
2013-01-14 22:41 ` Luck, Tony
@ 2013-01-15 0:05 ` Toshi Kani
1 sibling, 0 replies; 36+ messages in thread
From: Toshi Kani @ 2013-01-15 0:05 UTC (permalink / raw)
To: Andrew Morton
Cc: H. Peter Anvin, Tang Chen, jiang.liu, wujianguo, wency, laijs,
linfeng, yinghai, isimatu.yasuaki, rob, kosaki.motohiro,
minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
jaegeuk.hanse, tony.luck, glommer, linux-kernel, linux-mm
On Mon, 2013-01-14 at 14:34 -0800, Andrew Morton wrote:
> On Mon, 14 Jan 2013 09:31:33 -0800
> "H. Peter Anvin" <hpa@zytor.com> wrote:
>
> > On 01/14/2013 01:15 AM, Tang Chen wrote:
> > >
> > > For now, users can disable this functionality by not specifying the boot option.
> > > Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
> > > to using SRAT.
> > >
> >
> > I still think the option "movablecore_map" is uglier than hell. "core"
> > could just as easily refer to CPU cores there, but it is a memory mem.
> > "movablemem" seems more appropriate.
> >
> > Again, without SRAT I consider this patchset to be largely useless for
> > anything other than prototyping work.
> >
>
> hm, why. Obviously SRAT support will improve things, but is it
> actually unusable/unuseful with the command line configuration?
I think it is useful for prototyping and testing. I do not think it is
suitable for regular users.
> Also, "But even if we can use SRAT, users still need an interface to
> enable/disable this functionality if they don't want to loose their
> NUMA performance. So I think, an user interface is always needed."
Yes, but such user interface could be provided through the management
interface (GUI/CLI) of the platforms (or VMs). If user sets for
performance, SRAT could be generated with no hot-pluggable memory. If
user sets node N to be hot-removable, SRAT could be generated in such
way that all memory ranges in node N are hot-pluggable.
Thanks,
-Toshi
> There's also the matter of other architectures. Has any thought been
> given to how (eg) powerpc would hook into here?
>
> And what about VMs (xen, KVM)? I wonder if there is a case for those
> to implement memory hotplug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 36+ messages in thread