linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] Add movablecore_map boot option
@ 2013-01-14  9:15 Tang Chen
  2013-01-14  9:15 ` [PATCH v5 1/5] x86: get pg_data_t's memory from other node Tang Chen
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14  9:15 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

Hi Andrew, all,

Here is movablecore_map patch-set based on 3.8-rc3.

During the implementation of SRAT support, we met a problem.
In setup_arch(), we have the following call series:

1) memblock is ready;
2) some functions use memblock to allocate memory;
3) parse ACPI tables, such as SRAT.

Before 3), we don't know which memory is hotpluggable, and as a result, we cannot
prevent memblock from allocating hotpluggable memory. So, in 2), there could be
some hotpluggable memory allocated by memblock.

Now, we are trying to parse SRAT earlier, before memblock is ready. But I think we
need more investigation on this topic. So in this v5, I dropped all the SRAT
support, and v5 is just the same as v3, and it is based on 3.8-rc3.

As we planned, we will support getting info from SRAT without users' participation
at last. And we will post another patch-set to do so.


And also, I think for now, we can add this boot option as the first step of
supporting movable node. Since Linux cannot migrate the direct mapped pages,
the only way for now is to limit the whole node containing only movable memory.

Using SRAT is one way. But even if we can use SRAT, users still need an interface
to enable/disable this functionality if they don't want to loose their NUMA performance.

So I think, an user interface is always needed.

For now, users can disable this functionality by not specifying the boot option.
Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
to using SRAT.

Thanks. :)

============================


[What we are doing]
This patchset provide a boot option for user to specify ZONE_MOVABLE memory
map for each node in the system.

movablecore_map=nn[KMG]@ss[KMG]

This option make sure memory range from ss to ss+nn is movable memory.


[Why we do this]
If we hot remove a memroy, the memory cannot have kernel memory,
because Linux cannot migrate kernel memory currently. Therefore,
we have to guarantee that the hot removed memory has only movable
memoroy.

Linux has two boot options, kernelcore= and movablecore=, for
creating movable memory. These boot options can specify the amount
of memory use as kernel or movable memory. Using them, we can
create ZONE_MOVABLE which has only movable memory.

But it does not fulfill a requirement of memory hot remove, because
even if we specify the boot options, movable memory is distributed
in each node evenly. So when we want to hot remove memory which
memory range is 0x80000000-0c0000000, we have no way to specify
the memory as movable memory.

So we proposed a new feature which specifies memory range to use as
movable memory.


[Ways to do this]
There may be 2 ways to specify movable memory.
 1. use firmware information
 2. use boot option

1. use firmware information
  According to ACPI spec 5.0, SRAT table has memory affinity structure
  and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory
  Affinity Structure". If we use the information, we might be able to
  specify movable memory by firmware. For example, if Hot Pluggable
  Filed is enabled, Linux sets the memory as movable memory.

2. use boot option
  This is our proposal. New boot option can specify memory range to use
  as movable memory.


[How we do this]
We chose second way, because if we use first way, users cannot change
memory range to use as movable memory easily. We think if we create
movable memory, performance regression may occur by NUMA. In this case,
user can turn off the feature easily if we prepare the boot option.
And if we prepare the boot optino, the user can select which memory
to use as movable memory easily. 


[How to use]
Specify the following boot option:
movablecore_map=nn[KMG]@ss[KMG]

That means physical address range from ss to ss+nn will be allocated as
ZONE_MOVABLE.

And the following points should be considered.

1) If the range is involved in a single node, then from ss to the end of
   the node will be ZONE_MOVABLE.
2) If the range covers two or more nodes, then from ss to the end of
   the node will be ZONE_MOVABLE, and all the other nodes will only
   have ZONE_MOVABLE.
3) If no range is in the node, then the node will have no ZONE_MOVABLE
   unless kernelcore or movablecore is specified.
4) This option could be specified at most MAX_NUMNODES times.
5) If kernelcore or movablecore is also specified, movablecore_map will have
   higher priority to be satisfied.
6) This option has no conflict with memmap option.



Change log:

v4 -> v5:
1) remove all SRAT support. v5 is now the same as v3.

v3 -> v4:
1) patch2: Add new function remove_movablecore_map() to remove a range from
           movablecore_map.map[].
2) patch2: Add movablecore_map=acpi logic to allow user to skip the physical
           address config. If this option is specified, movablecore_map.map[]
           will be clear at first, and add all the hotpluggable memory ranges
           into it when parsing SRAT.
3) patch3: New patch, add logic to check the Hot Pluggable bit when parsing SRAT.
           If user also specifies a memory range, the logic will check if it is
           hotpluggable and remove it from movablecore_map.map[] if not.

v2 -> v3:
1) Use memblock_alloc_try_nid() instead of memblock_alloc_nid() to allocate
   memory twice if a whole node is ZONE_MOVABLE.
2) Add DMA, DMA32 addresses check, make sure ZONE_MOVABLE won't use these addresses.
   Suggested by Wu Jianguo <wujianguo@huawei.com>
3) Add lowmem addresses check, when the system has highmem, make sure ZONE_MOVABLE
   won't use lowmem. Suggested by Liu Jiang <jiang.liu@huawei.com>
4) Fix misuse of pfns in movablecore_map.map[] as physical addresses.


Tang Chen (4):
  page_alloc: add movable_memmap kernel parameter
  page_alloc: Introduce zone_movable_limit[] to keep movable limit for
    nodes
  page_alloc: Make movablecore_map has higher priority
  page_alloc: Bootmem limit with movablecore_map

Yasuaki Ishimatsu (1):
  x86: get pg_data_t's memory from other node

 Documentation/kernel-parameters.txt |   17 +++
 arch/x86/mm/numa.c                  |    5 +-
 include/linux/memblock.h            |    1 +
 include/linux/mm.h                  |   11 ++
 mm/memblock.c                       |   18 +++-
 mm/page_alloc.c                     |  233 ++++++++++++++++++++++++++++++++++-
 6 files changed, 277 insertions(+), 8 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 1/5] x86: get pg_data_t's memory from other node
  2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
@ 2013-01-14  9:15 ` Tang Chen
  2013-01-14  9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14  9:15 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If system can create movable node which all memory of the
node is allocated as ZONE_MOVABLE, setup_node_data() cannot
allocate memory for the node's pg_data_t.
So, use memblock_alloc_try_nid() instead of memblock_alloc_nid()
to retry when the first allocation fails.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 arch/x86/mm/numa.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 2d125be..db939b6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -222,10 +222,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
 		nd_pa = __pa(nd);
 		remapped = true;
 	} else {
-		nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+		nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 		if (!nd_pa) {
-			pr_err("Cannot find %zu bytes in node %d\n",
-			       nd_size, nid);
+			pr_err("Cannot find %zu bytes in any node\n", nd_size);
 			return;
 		}
 		nd = __va(nd_pa);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter
  2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
  2013-01-14  9:15 ` [PATCH v5 1/5] x86: get pg_data_t's memory from other node Tang Chen
@ 2013-01-14  9:15 ` Tang Chen
  2013-01-14 22:35   ` Andrew Morton
  2013-01-14  9:15 ` [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Tang Chen @ 2013-01-14  9:15 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

This patch adds functions to parse movablecore_map boot option. Since the
option could be specified more then once, all the maps will be stored in
the global variable movablecore_map.map array.

And also, we keep the array in monotonic increasing order by start_pfn.
And merge all overlapped ranges.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 Documentation/kernel-parameters.txt |   17 +++++
 include/linux/mm.h                  |   11 +++
 mm/page_alloc.c                     |  126 +++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 363e348..f02aa4c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1637,6 +1637,23 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			that the amount of memory usable for all allocations
 			is not too small.
 
+	movablecore_map=nn[KMG]@ss[KMG]
+			[KNL,X86,IA-64,PPC] This parameter is similar to
+			memmap except it specifies the memory map of
+			ZONE_MOVABLE.
+			If more areas are all within one node, then from
+			lowest ss to the end of the node will be ZONE_MOVABLE.
+			If an area covers two or more nodes, the area from
+			ss to the end of the 1st node will be ZONE_MOVABLE,
+			and all the rest nodes will only have ZONE_MOVABLE.
+			If memmap is specified at the same time, the
+			movablecore_map will be limited within the memmap
+			areas. If kernelcore or movablecore is also specified,
+			movablecore_map will have higher priority to be
+			satisfied. So the administrator should be careful that
+			the amount of movablecore_map areas are not too large.
+			Otherwise kernel won't have enough memory to start.
+
 	MTD_Partition=	[MTD]
 			Format: <name>,<region-number>,<size>,<offset>
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 66e2f7c..12f5a09 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1359,6 +1359,17 @@ extern void free_bootmem_with_active_regions(int nid,
 						unsigned long max_low_pfn);
 extern void sparse_memory_present_with_active_regions(int nid);
 
+#define MOVABLECORE_MAP_MAX MAX_NUMNODES
+struct movablecore_entry {
+	unsigned long start_pfn;    /* start pfn of memory segment */
+	unsigned long end_pfn;      /* end pfn of memory segment */
+};
+
+struct movablecore_map {
+	int nr_map;
+	struct movablecore_entry map[MOVABLECORE_MAP_MAX];
+};
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df2022f..d1a7a88 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -201,6 +201,9 @@ static unsigned long __meminitdata nr_all_pages;
 static unsigned long __meminitdata dma_reserve;
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+/* Movable memory ranges, will also be used by memblock subsystem. */
+struct movablecore_map movablecore_map;
+
 static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __initdata required_kernelcore;
@@ -5070,6 +5073,129 @@ static int __init cmdline_parse_movablecore(char *p)
 early_param("kernelcore", cmdline_parse_kernelcore);
 early_param("movablecore", cmdline_parse_movablecore);
 
+/**
+ * insert_movablecore_map - Insert a memory range in to movablecore_map.map.
+ * @start_pfn: start pfn of the range
+ * @end_pfn: end pfn of the range
+ *
+ * This function will also merge the overlapped ranges, and sort the array
+ * by start_pfn in monotonic increasing order.
+ */
+static void __init insert_movablecore_map(unsigned long start_pfn,
+					  unsigned long end_pfn)
+{
+	int pos, overlap;
+
+	/*
+	 * pos will be at the 1st overlapped range, or the position
+	 * where the element should be inserted.
+	 */
+	for (pos = 0; pos < movablecore_map.nr_map; pos++)
+		if (start_pfn <= movablecore_map.map[pos].end_pfn)
+			break;
+
+	/* If there is no overlapped range, just insert the element. */
+	if (pos == movablecore_map.nr_map ||
+	    end_pfn < movablecore_map.map[pos].start_pfn) {
+		/*
+		 * If pos is not the end of array, we need to move all
+		 * the rest elements backward.
+		 */
+		if (pos < movablecore_map.nr_map)
+			memmove(&movablecore_map.map[pos+1],
+				&movablecore_map.map[pos],
+				sizeof(struct movablecore_entry) *
+				(movablecore_map.nr_map - pos));
+		movablecore_map.map[pos].start_pfn = start_pfn;
+		movablecore_map.map[pos].end_pfn = end_pfn;
+		movablecore_map.nr_map++;
+		return;
+	}
+
+	/* overlap will be at the last overlapped range */
+	for (overlap = pos + 1; overlap < movablecore_map.nr_map; overlap++)
+		if (end_pfn < movablecore_map.map[overlap].start_pfn)
+			break;
+
+	/*
+	 * If there are more ranges overlapped, we need to merge them,
+	 * and move the rest elements forward.
+	 */
+	overlap--;
+	movablecore_map.map[pos].start_pfn = min(start_pfn,
+					movablecore_map.map[pos].start_pfn);
+	movablecore_map.map[pos].end_pfn = max(end_pfn,
+					movablecore_map.map[overlap].end_pfn);
+
+	if (pos != overlap && overlap + 1 != movablecore_map.nr_map)
+		memmove(&movablecore_map.map[pos+1],
+			&movablecore_map.map[overlap+1],
+			sizeof(struct movablecore_entry) *
+			(movablecore_map.nr_map - overlap - 1));
+
+	movablecore_map.nr_map -= overlap - pos;
+}
+
+/**
+ * movablecore_map_add_region - Add a memory range into movablecore_map.
+ * @start: physical start address of range
+ * @end: physical end address of range
+ *
+ * This function transform the physical address into pfn, and then add the
+ * range into movablecore_map by calling insert_movablecore_map().
+ */
+static void __init movablecore_map_add_region(u64 start, u64 size)
+{
+	unsigned long start_pfn, end_pfn;
+
+	/* In case size == 0 or start + size overflows */
+	if (start + size <= start)
+		return;
+
+	if (movablecore_map.nr_map >= ARRAY_SIZE(movablecore_map.map)) {
+		pr_err("movable_memory_map: too many entries;"
+			" ignoring [mem %#010llx-%#010llx]\n",
+			(unsigned long long) start,
+			(unsigned long long) (start + size - 1));
+		return;
+	}
+
+	start_pfn = PFN_DOWN(start);
+	end_pfn = PFN_UP(start + size);
+	insert_movablecore_map(start_pfn, end_pfn);
+}
+
+/*
+ * movablecore_map=nn[KMG]@ss[KMG] sets the region of memory to be used as
+ * movable memory.
+ */
+static int __init cmdline_parse_movablecore_map(char *p)
+{
+	char *oldp;
+	u64 start_at, mem_size;
+
+	if (!p)
+		goto err;
+
+	oldp = p;
+	mem_size = memparse(p, &p);
+	if (p == oldp)
+		goto err;
+
+	if (*p == '@') {
+		oldp = ++p;
+		start_at = memparse(p, &p);
+		if (p == oldp || *p != '\0')
+			goto err;
+
+		movablecore_map_add_region(start_at, mem_size);
+		return 0;
+	}
+err:
+	return -EINVAL;
+}
+early_param("movablecore_map", cmdline_parse_movablecore_map);
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 /**
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
  2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
  2013-01-14  9:15 ` [PATCH v5 1/5] x86: get pg_data_t's memory from other node Tang Chen
  2013-01-14  9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
@ 2013-01-14  9:15 ` Tang Chen
  2013-01-14  9:15 ` [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14  9:15 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

This patch introduces a new array zone_movable_limit[] to store the
ZONE_MOVABLE limit from movablecore_map boot option for all nodes.
The function sanitize_zone_movable_limit() will find out to which
node the ranges in movable_map.map[] belongs, and calculates the
low boundary of ZONE_MOVABLE for each node.

change log:
Do find_usable_zone_for_movable() to initialize movable_zone
so that sanitize_zone_movable_limit() could use it.

Reported-by: Wu Jianguo <wujianguo@huawei.com>

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Liu Jiang <jiang.liu@huawei.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 mm/page_alloc.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1a7a88..093b953 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
 static unsigned long __initdata required_kernelcore;
 static unsigned long __initdata required_movablecore;
 static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES];
 
 /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
 int movable_zone;
@@ -4370,6 +4371,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
 	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
 }
 
+/**
+ * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array.
+ *
+ * zone_movable_limit is initialized as 0. This function will try to get
+ * the first ZONE_MOVABLE pfn of each node from movablecore_map, and
+ * assigne them to zone_movable_limit.
+ * zone_movable_limit[nid] == 0 means no limit for the node.
+ *
+ * Note: Each range is represented as [start_pfn, end_pfn)
+ */
+static void __meminit sanitize_zone_movable_limit(void)
+{
+	int map_pos = 0, i, nid;
+	unsigned long start_pfn, end_pfn;
+
+	if (!movablecore_map.nr_map)
+		return;
+
+	/* Iterate all ranges from minimum to maximum */
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+		/*
+		 * If we have found lowest pfn of ZONE_MOVABLE of the node
+		 * specified by user, just go on to check next range.
+		 */
+		if (zone_movable_limit[nid])
+			continue;
+
+#ifdef CONFIG_ZONE_DMA
+		/* Skip DMA memory. */
+		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA])
+			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA];
+#endif
+
+#ifdef CONFIG_ZONE_DMA32
+		/* Skip DMA32 memory. */
+		if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32])
+			start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32];
+#endif
+
+#ifdef CONFIG_HIGHMEM
+		/* Skip lowmem if ZONE_MOVABLE is highmem. */
+		if (zone_movable_is_highmem() &&
+		    start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM])
+			start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
+#endif
+
+		if (start_pfn >= end_pfn)
+			continue;
+
+		while (map_pos < movablecore_map.nr_map) {
+			if (end_pfn <= movablecore_map.map[map_pos].start_pfn)
+				break;
+
+			if (start_pfn >= movablecore_map.map[map_pos].end_pfn) {
+				map_pos++;
+				continue;
+			}
+
+			/*
+			 * The start_pfn of ZONE_MOVABLE is either the minimum
+			 * pfn specified by movablecore_map, or 0, which means
+			 * the node has no ZONE_MOVABLE.
+			 */
+			zone_movable_limit[nid] = max(start_pfn,
+					movablecore_map.map[map_pos].start_pfn);
+
+			break;
+		}
+	}
+}
+
 #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
 					unsigned long zone_type,
@@ -4388,6 +4460,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
 	return zholes_size[zone_type];
 }
 
+static void __meminit sanitize_zone_movable_limit(void)
+{
+}
+
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
@@ -4831,7 +4907,6 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	find_usable_zone_for_movable();
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
@@ -4990,6 +5065,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 
 	/* Find the PFNs that ZONE_MOVABLE begins at in each node */
 	memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));
+	find_usable_zone_for_movable();
+	sanitize_zone_movable_limit();
 	find_zone_movable_pfns_for_nodes();
 
 	/* Print out the zone ranges */
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority
  2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
                   ` (2 preceding siblings ...)
  2013-01-14  9:15 ` [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
@ 2013-01-14  9:15 ` Tang Chen
  2013-01-14  9:15 ` [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
  2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
  5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14  9:15 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

If kernelcore or movablecore is specified at the same time
with movablecore_map, movablecore_map will have higher
priority to be satisfied.
This patch will make find_zone_movable_pfns_for_nodes()
calculate zone_movable_pfn[] with the limit from
zone_movable_limit[].

change log:
Move find_usable_zone_for_movable() to free_area_init_nodes()
so that sanitize_zone_movable_limit() in patch 3 could use
initialized movable_zone.

Reported-by: Wu Jianguo <wujianguo@huawei.com>

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 mm/page_alloc.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 093b953..00037a3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4902,9 +4902,17 @@ static void __init find_zone_movable_pfns_for_nodes(void)
 		required_kernelcore = max(required_kernelcore, corepages);
 	}
 
-	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
-	if (!required_kernelcore)
+	/*
+	 * If neither kernelcore/movablecore nor movablecore_map is specified,
+	 * there is no ZONE_MOVABLE. But if movablecore_map is specified, the
+	 * start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[].
+	 */
+	if (!required_kernelcore) {
+		if (movablecore_map.nr_map)
+			memcpy(zone_movable_pfn, zone_movable_limit,
+				sizeof(zone_movable_pfn));
 		goto out;
+	}
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
 	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
@@ -4934,10 +4942,24 @@ restart:
 		for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
 			unsigned long size_pages;
 
+			/*
+			 * Find more memory for kernelcore in
+			 * [zone_movable_pfn[nid], zone_movable_limit[nid]).
+			 */
 			start_pfn = max(start_pfn, zone_movable_pfn[nid]);
 			if (start_pfn >= end_pfn)
 				continue;
 
+			if (zone_movable_limit[nid]) {
+				end_pfn = min(end_pfn, zone_movable_limit[nid]);
+				/* No range left for kernelcore in this node */
+				if (start_pfn >= end_pfn) {
+					zone_movable_pfn[nid] =
+							zone_movable_limit[nid];
+					break;
+				}
+			}
+
 			/* Account for what is only usable for kernelcore */
 			if (start_pfn < usable_startpfn) {
 				unsigned long kernel_pages;
@@ -4997,12 +5019,12 @@ restart:
 	if (usable_nodes && required_kernelcore > usable_nodes)
 		goto restart;
 
+out:
 	/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
 
-out:
 	/* restore the node_state */
 	node_states[N_MEMORY] = saved_node_state;
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map
  2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
                   ` (3 preceding siblings ...)
  2013-01-14  9:15 ` [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
@ 2013-01-14  9:15 ` Tang Chen
  2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
  5 siblings, 0 replies; 36+ messages in thread
From: Tang Chen @ 2013-01-14  9:15 UTC (permalink / raw)
  To: akpm, jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer
  Cc: linux-kernel, linux-mm

This patch make sure bootmem will not allocate memory from areas that
may be ZONE_MOVABLE. The map info is from movablecore_map boot option.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d452ee1..6e25597 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
 
 extern struct memblock memblock;
 extern int memblock_debug;
+extern struct movablecore_map movablecore_map;
 
 #define memblock_dbg(fmt, ...) \
 	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 88adc8a..1e48774 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -101,6 +101,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 {
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
+	int curr = movablecore_map.nr_map - 1;
 
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
@@ -114,13 +115,28 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 		this_start = clamp(this_start, start, end);
 		this_end = clamp(this_end, start, end);
 
-		if (this_end < size)
+restart:
+		if (this_end <= this_start || this_end < size)
 			continue;
 
+		for (; curr >= 0; curr--) {
+			if ((movablecore_map.map[curr].start_pfn << PAGE_SHIFT)
+			    < this_end)
+				break;
+		}
+
 		cand = round_down(this_end - size, align);
+		if (curr >= 0 &&
+		    cand < movablecore_map.map[curr].end_pfn << PAGE_SHIFT) {
+			this_end = movablecore_map.map[curr].start_pfn
+				   << PAGE_SHIFT;
+			goto restart;
+		}
+
 		if (cand >= this_start)
 			return cand;
 	}
+
 	return 0;
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
                   ` (4 preceding siblings ...)
  2013-01-14  9:15 ` [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
@ 2013-01-14 17:31 ` H. Peter Anvin
  2013-01-14 22:34   ` Andrew Morton
  5 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-14 17:31 UTC (permalink / raw)
  To: Tang Chen
  Cc: akpm, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On 01/14/2013 01:15 AM, Tang Chen wrote:
>
> For now, users can disable this functionality by not specifying the boot option.
> Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
> to using SRAT.
>

I still think the option "movablecore_map" is uglier than hell.  "core" 
could just as easily refer to CPU cores there, but it is a memory mem. 
"movablemem" seems more appropriate.

Again, without SRAT I consider this patchset to be largely useless for 
anything other than prototyping work.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
@ 2013-01-14 22:34   ` Andrew Morton
  2013-01-14 22:41     ` Luck, Tony
  2013-01-15  0:05     ` Toshi Kani
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew Morton @ 2013-01-14 22:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tang Chen, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Mon, 14 Jan 2013 09:31:33 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 01/14/2013 01:15 AM, Tang Chen wrote:
> >
> > For now, users can disable this functionality by not specifying the boot option.
> > Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
> > to using SRAT.
> >
> 
> I still think the option "movablecore_map" is uglier than hell.  "core" 
> could just as easily refer to CPU cores there, but it is a memory mem. 
> "movablemem" seems more appropriate.
> 
> Again, without SRAT I consider this patchset to be largely useless for 
> anything other than prototyping work.
> 

hm, why.  Obviously SRAT support will improve things, but is it
actually unusable/unuseful with the command line configuration?

Also, "But even if we can use SRAT, users still need an interface to
enable/disable this functionality if they don't want to loose their
NUMA performance.  So I think, an user interface is always needed."


There's also the matter of other architectures.  Has any thought been
given to how (eg) powerpc would hook into here?

And what about VMs (xen, KVM)?  I wonder if there is a case for those
to implement memory hotplug.  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter
  2013-01-14  9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
@ 2013-01-14 22:35   ` Andrew Morton
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Morton @ 2013-01-14 22:35 UTC (permalink / raw)
  To: Tang Chen
  Cc: jiang.liu, wujianguo, hpa, wency, laijs, linfeng, yinghai,
	isimatu.yasuaki, rob, kosaki.motohiro, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, tony.luck,
	glommer, linux-kernel, linux-mm

On Mon, 14 Jan 2013 17:15:22 +0800
Tang Chen <tangchen@cn.fujitsu.com> wrote:

> This patch adds functions to parse movablecore_map boot option. Since the
> option could be specified more then once, all the maps will be stored in
> the global variable movablecore_map.map array.
> 
> And also, we keep the array in monotonic increasing order by start_pfn.
> And merge all overlapped ranges.
> 
> ...
>
> +#define MOVABLECORE_MAP_MAX MAX_NUMNODES
> +struct movablecore_entry {
> +	unsigned long start_pfn;    /* start pfn of memory segment */
> +	unsigned long end_pfn;      /* end pfn of memory segment */

It is important to tell readers whether an "end" is inclusive or
exclusive.  ie: does it point at the last byte, or one beyond it?

By reading the code I see it is exclusive, so...

--- a/include/linux/mm.h~page_alloc-add-movable_memmap-kernel-parameter-fix
+++ a/include/linux/mm.h
@@ -1362,7 +1362,7 @@ extern void sparse_memory_present_with_a
 #define MOVABLECORE_MAP_MAX MAX_NUMNODES
 struct movablecore_entry {
 	unsigned long start_pfn;    /* start pfn of memory segment */
-	unsigned long end_pfn;      /* end pfn of memory segment */
+	unsigned long end_pfn;      /* end pfn of memory segment (exclusive) */
 };
 
 struct movablecore_map {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14 22:34   ` Andrew Morton
@ 2013-01-14 22:41     ` Luck, Tony
  2013-01-14 22:46       ` Andrew Morton
  2013-01-15  1:23       ` Yasuaki Ishimatsu
  2013-01-15  0:05     ` Toshi Kani
  1 sibling, 2 replies; 36+ messages in thread
From: Luck, Tony @ 2013-01-14 22:41 UTC (permalink / raw)
  To: Andrew Morton, H. Peter Anvin
  Cc: Tang Chen, jiang.liu@huawei.com, wujianguo@huawei.com,
	wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org,
	isimatu.yasuaki@jp.fujitsu.com, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

> hm, why.  Obviously SRAT support will improve things, but is it
> actually unusable/unuseful with the command line configuration?

Users will want to set these moveable zones along node boundaries
(the whole purpose is to be able to remove a node by making sure
the kernel won't allocate anything tricky in it, right?)  So raw addresses
are usable ... but to get them right the user will have to go parse the
SRAT table manually to come up with the addresses. Any time you
make the user go off and do some tedious calculation that the computer
should have done for them is user-abuse.

-Tony 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14 22:41     ` Luck, Tony
@ 2013-01-14 22:46       ` Andrew Morton
  2013-01-16  6:25         ` Yasuaki Ishimatsu
  2013-01-15  1:23       ` Yasuaki Ishimatsu
  1 sibling, 1 reply; 36+ messages in thread
From: Andrew Morton @ 2013-01-14 22:46 UTC (permalink / raw)
  To: Luck, Tony
  Cc: H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org,
	isimatu.yasuaki@jp.fujitsu.com, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

On Mon, 14 Jan 2013 22:41:03 +0000
"Luck, Tony" <tony.luck@intel.com> wrote:

> > hm, why.  Obviously SRAT support will improve things, but is it
> > actually unusable/unuseful with the command line configuration?
> 
> Users will want to set these moveable zones along node boundaries
> (the whole purpose is to be able to remove a node by making sure
> the kernel won't allocate anything tricky in it, right?)  So raw addresses
> are usable ... but to get them right the user will have to go parse the
> SRAT table manually to come up with the addresses. Any time you
> make the user go off and do some tedious calculation that the computer
> should have done for them is user-abuse.
> 

Sure.  But SRAT configuration is in progress and the boot option is
better than nothing?

Things I'm wondering:

- is there *really* a case for retaining the boot option if/when
  SRAT support is available?

- will the boot option be needed for other archictectures, presumably
  because they don't provide sufficient layout information to the
  kernel?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14 22:34   ` Andrew Morton
  2013-01-14 22:41     ` Luck, Tony
@ 2013-01-15  0:05     ` Toshi Kani
  1 sibling, 0 replies; 36+ messages in thread
From: Toshi Kani @ 2013-01-15  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: H. Peter Anvin, Tang Chen, jiang.liu, wujianguo, wency, laijs,
	linfeng, yinghai, isimatu.yasuaki, rob, kosaki.motohiro,
	minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
	jaegeuk.hanse, tony.luck, glommer, linux-kernel, linux-mm

On Mon, 2013-01-14 at 14:34 -0800, Andrew Morton wrote:
> On Mon, 14 Jan 2013 09:31:33 -0800
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
> > On 01/14/2013 01:15 AM, Tang Chen wrote:
> > >
> > > For now, users can disable this functionality by not specifying the boot option.
> > > Later, we will post SRAT support, and add another option value "movablecore_map=acpi"
> > > to using SRAT.
> > >
> > 
> > I still think the option "movablecore_map" is uglier than hell.  "core" 
> > could just as easily refer to CPU cores there, but it is a memory mem. 
> > "movablemem" seems more appropriate.
> > 
> > Again, without SRAT I consider this patchset to be largely useless for 
> > anything other than prototyping work.
> > 
> 
> hm, why.  Obviously SRAT support will improve things, but is it
> actually unusable/unuseful with the command line configuration?

I think it is useful for prototyping and testing.  I do not think it is
suitable for regular users.

> Also, "But even if we can use SRAT, users still need an interface to
> enable/disable this functionality if they don't want to loose their
> NUMA performance.  So I think, an user interface is always needed."

Yes, but such user interface could be provided through the management
interface (GUI/CLI) of the platforms (or VMs).  If user sets for
performance, SRAT could be generated with no hot-pluggable memory.  If
user sets node N to be hot-removable, SRAT could be generated in such
way that all memory ranges in node N are hot-pluggable.

Thanks,
-Toshi


> There's also the matter of other architectures.  Has any thought been
> given to how (eg) powerpc would hook into here?
> 
> And what about VMs (xen, KVM)?  I wonder if there is a case for those
> to implement memory hotplug.  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14 22:41     ` Luck, Tony
  2013-01-14 22:46       ` Andrew Morton
@ 2013-01-15  1:23       ` Yasuaki Ishimatsu
  2013-01-15  3:44         ` H. Peter Anvin
  1 sibling, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-15  1:23 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andrew Morton, H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

2013/01/15 7:41, Luck, Tony wrote:
>> hm, why.  Obviously SRAT support will improve things, but is it
>> actually unusable/unuseful with the command line configuration?
>

> Users will want to set these moveable zones along node boundaries
> (the whole purpose is to be able to remove a node by making sure
> the kernel won't allocate anything tricky in it, right?)

Yes

> So raw addresses
> are usable ... but to get them right the user will have to go parse the
> SRAT table manually to come up with the addresses.

I don't think so because user can easily get raw address by kernel
message in x86.

Here are kernel messages of x86 architecture.
---
[    0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[    0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x7ffffffff]
[    0.000000] SRAT: Node 1 PXM 2 [mem 0x1000000000-0x17ffffffff]
[    0.000000] SRAT: Node 2 PXM 3 [mem 0x1800000000-0x1fffffffff]
[    0.000000] SRAT: Node 3 PXM 4 [mem 0x2000000000-0x27ffffffff]
[    0.000000] SRAT: Node 4 PXM 5 [mem 0x2800000000-0x2fffffffff]
[    0.000000] SRAT: Node 5 PXM 6 [mem 0x3000000000-0x37ffffffff]
[    0.000000] SRAT: Node 6 PXM 7 [mem 0x3800000000-0x3fffffffff]
[    0.000000] SRAT: Node 7 PXM 1 [mem 0x800000000-0xfffffffff]
---

Thanks,
Yasuaki Ishimatsu

> Any time you
> make the user go off and do some tedious calculation that the computer
> should have done for them is user-abuse.
>
> -Tony
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-15  1:23       ` Yasuaki Ishimatsu
@ 2013-01-15  3:44         ` H. Peter Anvin
  2013-01-15  4:04           ` Luck, Tony
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-15  3:44 UTC (permalink / raw)
  To: Yasuaki Ishimatsu, Luck, Tony
  Cc: Andrew Morton, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

That *is* user abuse.

Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:

>2013/01/15 7:41, Luck, Tony wrote:
>>> hm, why.  Obviously SRAT support will improve things, but is it
>>> actually unusable/unuseful with the command line configuration?
>>
>
>> Users will want to set these moveable zones along node boundaries
>> (the whole purpose is to be able to remove a node by making sure
>> the kernel won't allocate anything tricky in it, right?)
>
>Yes
>
>> So raw addresses
>> are usable ... but to get them right the user will have to go parse
>the
>> SRAT table manually to come up with the addresses.
>
>I don't think so because user can easily get raw address by kernel
>message in x86.
>
>Here are kernel messages of x86 architecture.
>---
>[    0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
>[    0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x7ffffffff]
>[    0.000000] SRAT: Node 1 PXM 2 [mem 0x1000000000-0x17ffffffff]
>[    0.000000] SRAT: Node 2 PXM 3 [mem 0x1800000000-0x1fffffffff]
>[    0.000000] SRAT: Node 3 PXM 4 [mem 0x2000000000-0x27ffffffff]
>[    0.000000] SRAT: Node 4 PXM 5 [mem 0x2800000000-0x2fffffffff]
>[    0.000000] SRAT: Node 5 PXM 6 [mem 0x3000000000-0x37ffffffff]
>[    0.000000] SRAT: Node 6 PXM 7 [mem 0x3800000000-0x3fffffffff]
>[    0.000000] SRAT: Node 7 PXM 1 [mem 0x800000000-0xfffffffff]
>---
>
>Thanks,
>Yasuaki Ishimatsu
>
>> Any time you
>> make the user go off and do some tedious calculation that the
>computer
>> should have done for them is user-abuse.
>>
>> -Tony
>>

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-15  3:44         ` H. Peter Anvin
@ 2013-01-15  4:04           ` Luck, Tony
  0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2013-01-15  4:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org


>> 
>> I don't think so because user can easily get raw address by kernel
>> message in x86.
>> 

Which will fail if on some subsequent boot a DIMM fails BIST and is removed from the memory map by the BIOS which will then change all the mode boundaries for those above the failed DIMM.

-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-14 22:46       ` Andrew Morton
@ 2013-01-16  6:25         ` Yasuaki Ishimatsu
  2013-01-16 21:29           ` Andrew Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-16  6:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Luck, Tony, H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

2013/01/15 7:46, Andrew Morton wrote:
> On Mon, 14 Jan 2013 22:41:03 +0000
> "Luck, Tony" <tony.luck@intel.com> wrote:
>
>>> hm, why.  Obviously SRAT support will improve things, but is it
>>> actually unusable/unuseful with the command line configuration?
>>
>> Users will want to set these moveable zones along node boundaries
>> (the whole purpose is to be able to remove a node by making sure
>> the kernel won't allocate anything tricky in it, right?)  So raw addresses
>> are usable ... but to get them right the user will have to go parse the
>> SRAT table manually to come up with the addresses. Any time you
>> make the user go off and do some tedious calculation that the computer
>> should have done for them is user-abuse.
>>
>
> Sure.  But SRAT configuration is in progress and the boot option is
> better than nothing?

Yes. I think boot option which specifies memory range is necessary.

>
> Things I'm wondering:
>
> - is there *really* a case for retaining the boot option if/when
>    SRAT support is available?

Yes. If SRAT support is available, all memory which enabled hotpluggable
bit are managed by ZONEMOVABLE. But performance degradation may
occur by NUMA because we can only allocate anonymous page and page-cache
from these memory.

In this case, if user cannot change SRAT information, user needs a way to
select/set removable memory manually.

Thanks,
Yasuaki Ishimatsu

>
> - will the boot option be needed for other archictectures, presumably
>    because they don't provide sufficient layout information to the
>    kernel?
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16  6:25         ` Yasuaki Ishimatsu
@ 2013-01-16 21:29           ` Andrew Morton
  2013-01-16 22:01             ` KOSAKI Motohiro
  2013-01-16 22:52             ` H. Peter Anvin
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew Morton @ 2013-01-16 21:29 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Luck, Tony, H. Peter Anvin, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

On Wed, 16 Jan 2013 15:25:44 +0900
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:

> >
> > Things I'm wondering:
> >
> > - is there *really* a case for retaining the boot option if/when
> >    SRAT support is available?
> 
> Yes. If SRAT support is available, all memory which enabled hotpluggable
> bit are managed by ZONEMOVABLE. But performance degradation may
> occur by NUMA because we can only allocate anonymous page and page-cache
> from these memory.
> 
> In this case, if user cannot change SRAT information, user needs a way to
> select/set removable memory manually.

If I understand this correctly you mean that once SRAT parsing is
implemented, the user can use movablecore_map to override that SRAT
parsing, yes?  That movablecore_map will take precedence over SRAT?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16 21:29           ` Andrew Morton
@ 2013-01-16 22:01             ` KOSAKI Motohiro
  2013-01-16 23:00               ` H. Peter Anvin
  2013-01-16 22:52             ` H. Peter Anvin
  1 sibling, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-16 22:01 UTC (permalink / raw)
  To: akpm
  Cc: isimatu.yasuaki, tony.luck, hpa, tangchen, jiang.liu, wujianguo,
	wency, laijs, linfeng, yinghai, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	glommer, linux-kernel, linux-mm

On 1/16/2013 4:29 PM, Andrew Morton wrote:
> On Wed, 16 Jan 2013 15:25:44 +0900
> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
> 
>>>
>>> Things I'm wondering:
>>>
>>> - is there *really* a case for retaining the boot option if/when
>>>    SRAT support is available?
>>
>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>> bit are managed by ZONEMOVABLE. But performance degradation may
>> occur by NUMA because we can only allocate anonymous page and page-cache
>> from these memory.
>>
>> In this case, if user cannot change SRAT information, user needs a way to
>> select/set removable memory manually.
> 
> If I understand this correctly you mean that once SRAT parsing is
> implemented, the user can use movablecore_map to override that SRAT
> parsing, yes?  That movablecore_map will take precedence over SRAT?

I think movablecore_map (I prefer movablemem than it, btw) should behave so.
because of, for past three years, almost all memory hotplug bug was handled
only I and kamezawa-san and, afaik, both don't have hotremove aware specific
hardware.

So, if the new feature require specific hardware, we can't maintain this area
any more.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16 21:29           ` Andrew Morton
  2013-01-16 22:01             ` KOSAKI Motohiro
@ 2013-01-16 22:52             ` H. Peter Anvin
  2013-01-17  1:49               ` Tang Chen
  2013-01-17  5:08               ` Yasuaki Ishimatsu
  1 sibling, 2 replies; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-16 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yasuaki Ishimatsu, Luck, Tony, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>
>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>> bit are managed by ZONEMOVABLE. But performance degradation may
>> occur by NUMA because we can only allocate anonymous page and page-cache
>> from these memory.
>>
>> In this case, if user cannot change SRAT information, user needs a way to
>> select/set removable memory manually.
> 
> If I understand this correctly you mean that once SRAT parsing is
> implemented, the user can use movablecore_map to override that SRAT
> parsing, yes?  That movablecore_map will take precedence over SRAT?
> 

Yes, but we still need a higher-level user interface which specifies
which nodes, not which memory ranges, should be movable.  That is the
policy granularity that is actually appropriate for the administrator
(trading off performance vs reliability.)

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16 22:01             ` KOSAKI Motohiro
@ 2013-01-16 23:00               ` H. Peter Anvin
  2013-01-17 20:27                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-16 23:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: akpm, isimatu.yasuaki, tony.luck, tangchen, jiang.liu, wujianguo,
	wency, laijs, linfeng, yinghai, rob, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, glommer,
	linux-kernel, linux-mm

On 01/16/2013 02:01 PM, KOSAKI Motohiro wrote:
>>>>
>>>> Things I'm wondering:
>>>>
>>>> - is there *really* a case for retaining the boot option if/when
>>>>    SRAT support is available?
>>>
>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>> from these memory.
>>>
>>> In this case, if user cannot change SRAT information, user needs a way to
>>> select/set removable memory manually.
>>
>> If I understand this correctly you mean that once SRAT parsing is
>> implemented, the user can use movablecore_map to override that SRAT
>> parsing, yes?  That movablecore_map will take precedence over SRAT?
> 
> I think movablecore_map (I prefer movablemem than it, btw) should behave so.
> because of, for past three years, almost all memory hotplug bug was handled
> only I and kamezawa-san and, afaik, both don't have hotremove aware specific
> hardware.
> 
> So, if the new feature require specific hardware, we can't maintain this area
> any more.
>  

It is more so than that: the design principle should always be that
lower-level directives, if present, take precedence over higher-level
directives.  The reason for that should be pretty obvious: one of the
main uses of the low-level directives is to override the high-level
directives due to bugs or debugging needs.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16 22:52             ` H. Peter Anvin
@ 2013-01-17  1:49               ` Tang Chen
  2013-01-17 20:20                 ` KOSAKI Motohiro
  2013-01-17  5:08               ` Yasuaki Ishimatsu
  1 sibling, 1 reply; 36+ messages in thread
From: Tang Chen @ 2013-01-17  1:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Yasuaki Ishimatsu, Luck, Tony,
	jiang.liu@huawei.com, wujianguo@huawei.com, wency@cn.fujitsu.com,
	laijs@cn.fujitsu.com, linfeng@cn.fujitsu.com, yinghai@kernel.org,
	rob@landley.net, kosaki.motohiro@jp.fujitsu.com,
	minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
	guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
	jaegeuk.hanse@gmail.com, glommer@parallels.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On 01/17/2013 06:52 AM, H. Peter Anvin wrote:
> On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>>
>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>> from these memory.
>>>
>>> In this case, if user cannot change SRAT information, user needs a way to
>>> select/set removable memory manually.
>>
>> If I understand this correctly you mean that once SRAT parsing is
>> implemented, the user can use movablecore_map to override that SRAT
>> parsing, yes?  That movablecore_map will take precedence over SRAT?
>>
>
> Yes,

Hi HPA, Andrew,

No, I don't think so. In my [PATCH v4 3/6], I checked if users specified the
unhotpluggable memory ranges, I will remove them from movablecore_map.map[].
So this option will not override SRAT.

It works like this:

    hotpluggable ranges:            |-----------------|
    unhotpluggable ranges:  |-----|                      |--------|
    user specified ranges:   |---|       |--------------------|
    movablecore_map.map[]:               |------------|

Please refer to https://lkml.org/lkml/2012/12/19/53.

But in this v5 patch-set, I remove all SRAT related code. So this v5 users'
option will override SRAT.


Thanks. :)

>but we still need a higher-level user interface which specifies
> which nodes, not which memory ranges, should be movable.  That is the
> policy granularity that is actually appropriate for the administrator
> (trading off performance vs reliability.)
>
> 	-hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16 22:52             ` H. Peter Anvin
  2013-01-17  1:49               ` Tang Chen
@ 2013-01-17  5:08               ` Yasuaki Ishimatsu
  2013-01-17  6:03                 ` H. Peter Anvin
  1 sibling, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-17  5:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Luck, Tony, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

2013/01/17 7:52, H. Peter Anvin wrote:
> On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>>
>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>> from these memory.
>>>
>>> In this case, if user cannot change SRAT information, user needs a way to
>>> select/set removable memory manually.
>>
>> If I understand this correctly you mean that once SRAT parsing is
>> implemented, the user can use movablecore_map to override that SRAT
>> parsing, yes?  That movablecore_map will take precedence over SRAT?
>>
>
> Yes, but we still need a higher-level user interface which specifies
> which nodes, not which memory ranges, should be movable.

I thought about the method of specifying the node. But I think
this method is inconvenience. Node number is decided by OS.
So the number is changed easily.

for example:

o exmaple 1
   System has 3 nodes:
   node0, node1, node2

   When user remove node1, the system has:
   node0, node2

   But after rebooting the system, the system has:
   node0, node1

   So node2 becomes node1.

o example 2:
   System has 2 nodes:
   0x40000000 - 0x7fffffff : node0
   0xc0000000 - 0xffffffff : node1

   When user add a node wchih memory range is [0x80000000 - 0xbfffffff],
   system has:
   0x40000000 - 0x7fffffff : node0
   0xc0000000 - 0xffffffff : node1
   0x80000000 - 0xbfffffff : node2

   But after rebooting the system, the system's node may become:
   0x40000000 - 0x7fffffff : node0
   0x80000000 - 0xbfffffff : node1
   0xc0000000 - 0xffffffff : node2

   So node nunber is changed.

Specifying node number may be easy method than specifying memory
range. But if user uses node number for specifying removable memory,
user always need to care whether node number is changed or not at
every hotplug operation.

Thanks,
Yasuaki Ishimatsu


> That is the
> policy granularity that is actually appropriate for the administrator
> (trading off performance vs reliability.)
>
> 	-hpa
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-17  5:08               ` Yasuaki Ishimatsu
@ 2013-01-17  6:03                 ` H. Peter Anvin
  2013-01-17 16:30                   ` Luck, Tony
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-17  6:03 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Andrew Morton, Luck, Tony, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

On 01/16/2013 09:08 PM, Yasuaki Ishimatsu wrote:
>
> I thought about the method of specifying the node. But I think
> this method is inconvenience. Node number is decided by OS.
> So the number is changed easily.
>
> for example:
>
> o exmaple 1
>    System has 3 nodes:
>    node0, node1, node2
>
>    When user remove node1, the system has:
>    node0, node2
>
>    But after rebooting the system, the system has:
>    node0, node1
>
>    So node2 becomes node1.
>
> o example 2:
>    System has 2 nodes:
>    0x40000000 - 0x7fffffff : node0
>    0xc0000000 - 0xffffffff : node1
>
>    When user add a node wchih memory range is [0x80000000 - 0xbfffffff],
>    system has:
>    0x40000000 - 0x7fffffff : node0
>    0xc0000000 - 0xffffffff : node1
>    0x80000000 - 0xbfffffff : node2
>
>    But after rebooting the system, the system's node may become:
>    0x40000000 - 0x7fffffff : node0
>    0x80000000 - 0xbfffffff : node1
>    0xc0000000 - 0xffffffff : node2
>
>    So node nunber is changed.
>
> Specifying node number may be easy method than specifying memory
> range. But if user uses node number for specifying removable memory,
> user always need to care whether node number is changed or not at
> every hotplug operation.
>


Well, there are only two options:

1. The user doesn't care which nodes are movable.  In that case, the 
user may just want to specify a target as a percentage of memory to make 
movable -- effectively a "slider" on the performance vs. reliability 
spectrum.  The kernel can then assign nodes arbitrarily.

2. If the user *does* care which nodes are movable, then the user needs 
to be able to specify that *in a way that makes sense to the user*. 
This may mean involving the DMI information as well as SRAT in order to 
get "silk screen" type information out.

	-hpa



-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-17  6:03                 ` H. Peter Anvin
@ 2013-01-17 16:30                   ` Luck, Tony
  2013-01-17 20:28                     ` KOSAKI Motohiro
  0 siblings, 1 reply; 36+ messages in thread
From: Luck, Tony @ 2013-01-17 16:30 UTC (permalink / raw)
  To: H. Peter Anvin, Yasuaki Ishimatsu
  Cc: Andrew Morton, Tang Chen, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	kosaki.motohiro@jp.fujitsu.com, minchan.kim@gmail.com,
	mgorman@suse.de, rientjes@google.com, guz.fnst@cn.fujitsu.com,
	rusty@rustcorp.com.au, lliubbo@gmail.com, jaegeuk.hanse@gmail.com,
	glommer@parallels.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org

> 2. If the user *does* care which nodes are movable, then the user needs 
> to be able to specify that *in a way that makes sense to the user*. 
> This may mean involving the DMI information as well as SRAT in order to 
> get "silk screen" type information out.

One reason they might care would be which I/O devices are connected
to each node.  DMI might be a good way to get an invariant name for the
node, but they might also want to specify in terms of what they actually
want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs - don't
mark both these nodes as removable".  Though this is almost certainly not
a job for kernel options, but for some user configuration tool that would
spit out the DMI names.

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-17  1:49               ` Tang Chen
@ 2013-01-17 20:20                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-17 20:20 UTC (permalink / raw)
  To: tangchen
  Cc: hpa, akpm, isimatu.yasuaki, tony.luck, jiang.liu, wujianguo,
	wency, laijs, linfeng, yinghai, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	glommer, linux-kernel, linux-mm

On 1/16/2013 8:49 PM, Tang Chen wrote:
> On 01/17/2013 06:52 AM, H. Peter Anvin wrote:
>> On 01/16/2013 01:29 PM, Andrew Morton wrote:
>>>>
>>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>>> from these memory.
>>>>
>>>> In this case, if user cannot change SRAT information, user needs a way to
>>>> select/set removable memory manually.
>>>
>>> If I understand this correctly you mean that once SRAT parsing is
>>> implemented, the user can use movablecore_map to override that SRAT
>>> parsing, yes?  That movablecore_map will take precedence over SRAT?
>>>
>>
>> Yes,
> 
> Hi HPA, Andrew,
> 
> No, I don't think so. In my [PATCH v4 3/6], I checked if users specified the
> unhotpluggable memory ranges, I will remove them from movablecore_map.map[].
> So this option will not override SRAT.
> 
> It works like this:
> 
>     hotpluggable ranges:            |-----------------|
>     unhotpluggable ranges:  |-----|                      |--------|
>     user specified ranges:   |---|       |--------------------|
>     movablecore_map.map[]:               |------------|
> 
> Please refer to https://lkml.org/lkml/2012/12/19/53.
> 
> But in this v5 patch-set, I remove all SRAT related code. So this v5 users'
> option will override SRAT.

Again, boot option is often used for workaround of firmware bugs. so, if you
make a boot option, it should be override firmware info.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-16 23:00               ` H. Peter Anvin
@ 2013-01-17 20:27                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-17 20:27 UTC (permalink / raw)
  To: hpa
  Cc: kosaki.motohiro, akpm, isimatu.yasuaki, tony.luck, tangchen,
	jiang.liu, wujianguo, wency, laijs, linfeng, yinghai, rob,
	minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
	jaegeuk.hanse, glommer, linux-kernel, linux-mm

On 1/16/2013 6:00 PM, H. Peter Anvin wrote:
> On 01/16/2013 02:01 PM, KOSAKI Motohiro wrote:
>>>>>
>>>>> Things I'm wondering:
>>>>>
>>>>> - is there *really* a case for retaining the boot option if/when
>>>>>    SRAT support is available?
>>>>
>>>> Yes. If SRAT support is available, all memory which enabled hotpluggable
>>>> bit are managed by ZONEMOVABLE. But performance degradation may
>>>> occur by NUMA because we can only allocate anonymous page and page-cache
>>>> from these memory.
>>>>
>>>> In this case, if user cannot change SRAT information, user needs a way to
>>>> select/set removable memory manually.
>>>
>>> If I understand this correctly you mean that once SRAT parsing is
>>> implemented, the user can use movablecore_map to override that SRAT
>>> parsing, yes?  That movablecore_map will take precedence over SRAT?
>>
>> I think movablecore_map (I prefer movablemem than it, btw) should behave so.
>> because of, for past three years, almost all memory hotplug bug was handled
>> only I and kamezawa-san and, afaik, both don't have hotremove aware specific
>> hardware.
>>
>> So, if the new feature require specific hardware, we can't maintain this area
>> any more.
>>  
> 
> It is more so than that: the design principle should always be that
> lower-level directives, if present, take precedence over higher-level
> directives.  The reason for that should be pretty obvious: one of the
> main uses of the low-level directives is to override the high-level
> directives due to bugs or debugging needs.

My opinion is close to Kani-san@HP. automatic configuration (i.e. reading
firmware infomation) is best for regular user and low level tunable parameter
is best for developer and workaround of firmware bugs.

Perhaps higher level interface may help some corner case but perhaps not. I mean
I don't put any objection to create higher level interface. I only said I myself
haven't observed such use case. so then i have no opinion about that. So, I wouldn't
join interface discussion even though I don't dislike it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-17 16:30                   ` Luck, Tony
@ 2013-01-17 20:28                     ` KOSAKI Motohiro
  2013-01-18  6:05                       ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 36+ messages in thread
From: KOSAKI Motohiro @ 2013-01-17 20:28 UTC (permalink / raw)
  To: tony.luck
  Cc: hpa, isimatu.yasuaki, akpm, tangchen, jiang.liu, wujianguo, wency,
	laijs, linfeng, yinghai, rob, kosaki.motohiro, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	glommer, linux-kernel, linux-mm

On 1/17/2013 11:30 AM, Luck, Tony wrote:
>> 2. If the user *does* care which nodes are movable, then the user needs 
>> to be able to specify that *in a way that makes sense to the user*. 
>> This may mean involving the DMI information as well as SRAT in order to 
>> get "silk screen" type information out.
> 
> One reason they might care would be which I/O devices are connected
> to each node.  DMI might be a good way to get an invariant name for the
> node, but they might also want to specify in terms of what they actually
> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs - don't
> mark both these nodes as removable".  Though this is almost certainly not
> a job for kernel options, but for some user configuration tool that would
> spit out the DMI names.

I agree DMI parsing should be done in userland if we really need DMI parsing.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-17 20:28                     ` KOSAKI Motohiro
@ 2013-01-18  6:05                       ` Yasuaki Ishimatsu
  2013-01-18  6:25                         ` H. Peter Anvin
  0 siblings, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-18  6:05 UTC (permalink / raw)
  To: KOSAKI Motohiro, tony.luck, hpa
  Cc: akpm, tangchen, jiang.liu, wujianguo, wency, laijs, linfeng,
	yinghai, rob, minchan.kim, mgorman, rientjes, guz.fnst, rusty,
	lliubbo, jaegeuk.hanse, glommer, linux-kernel, linux-mm

2013/01/18 5:28, KOSAKI Motohiro wrote:
> On 1/17/2013 11:30 AM, Luck, Tony wrote:
>>> 2. If the user *does* care which nodes are movable, then the user needs
>>> to be able to specify that *in a way that makes sense to the user*.
>>> This may mean involving the DMI information as well as SRAT in order to
>>> get "silk screen" type information out.
>>
>> One reason they might care would be which I/O devices are connected
>> to each node.  DMI might be a good way to get an invariant name for the
>> node, but they might also want to specify in terms of what they actually
>> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs - don't
>> mark both these nodes as removable".  Though this is almost certainly not
>> a job for kernel options, but for some user configuration tool that would
>> spit out the DMI names.
>
> I agree DMI parsing should be done in userland if we really need DMI parsing.
>

If users use the boot parameter for bugs or debugging,  users need
a method which sets in detail range of movable memory. So specifying
node number is not enough because whole memory becomes movable memory.

For this, we are discussing other ways, memory range and DMI information.
By using DMI information, users may get an invariant name. But is it
really user friendly interface? I don't think so.

You will think using memory range is not user friendly interface too.
But I think that using memory range is friendlier than using DMI
information since we can get easily memory range. So from developper
side, using memory range is good.

Of course, using SRAT information is necessary solution. So we are
developing it now.

Thanks,
Yasuaki Ishimatsu




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18  6:05                       ` Yasuaki Ishimatsu
@ 2013-01-18  6:25                         ` H. Peter Anvin
  2013-01-18  7:38                           ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2013-01-18  6:25 UTC (permalink / raw)
  To: Yasuaki Ishimatsu, KOSAKI Motohiro, tony.luck
  Cc: akpm, tangchen, jiang.liu, wujianguo, wency, laijs, linfeng,
	yinghai, rob, minchan.kim, mgorman, rientjes, guz.fnst, rusty,
	lliubbo, jaegeuk.hanse, glommer, linux-kernel, linux-mm

We already do DMI parsing in the kernel...

Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:

>2013/01/18 5:28, KOSAKI Motohiro wrote:
>> On 1/17/2013 11:30 AM, Luck, Tony wrote:
>>>> 2. If the user *does* care which nodes are movable, then the user
>needs
>>>> to be able to specify that *in a way that makes sense to the user*.
>>>> This may mean involving the DMI information as well as SRAT in
>order to
>>>> get "silk screen" type information out.
>>>
>>> One reason they might care would be which I/O devices are connected
>>> to each node.  DMI might be a good way to get an invariant name for
>the
>>> node, but they might also want to specify in terms of what they
>actually
>>> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs -
>don't
>>> mark both these nodes as removable".  Though this is almost
>certainly not
>>> a job for kernel options, but for some user configuration tool that
>would
>>> spit out the DMI names.
>>
>> I agree DMI parsing should be done in userland if we really need DMI
>parsing.
>>
>
>If users use the boot parameter for bugs or debugging,  users need
>a method which sets in detail range of movable memory. So specifying
>node number is not enough because whole memory becomes movable memory.
>
>For this, we are discussing other ways, memory range and DMI
>information.
>By using DMI information, users may get an invariant name. But is it
>really user friendly interface? I don't think so.
>
>You will think using memory range is not user friendly interface too.
>But I think that using memory range is friendlier than using DMI
>information since we can get easily memory range. So from developper
>side, using memory range is good.
>
>Of course, using SRAT information is necessary solution. So we are
>developing it now.
>
>Thanks,
>Yasuaki Ishimatsu

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18  6:25                         ` H. Peter Anvin
@ 2013-01-18  7:38                           ` Yasuaki Ishimatsu
  2013-01-18  8:08                             ` Tang Chen
  0 siblings, 1 reply; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-18  7:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: KOSAKI Motohiro, tony.luck, akpm, tangchen, jiang.liu, wujianguo,
	wency, laijs, linfeng, yinghai, rob, minchan.kim, mgorman,
	rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse, glommer,
	linux-kernel, linux-mm

2013/01/18 15:25, H. Peter Anvin wrote:
> We already do DMI parsing in the kernel...

Thank you for giving the infomation.

Is your mention /sys/firmware/dmi/entries?

If so, my box does not have memory information.
My box has only type 0, 1, 2, 3, 4, 7, 8, 9, 38, 127 in DMI.
At least, my box cannot use the information...

If users use the boot parameter for investigating firmware bugs
or debugging, users cannot use DMI information on like my box.

Thanks,
Yasuaki Ishimatsu

>
> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> 2013/01/18 5:28, KOSAKI Motohiro wrote:
>>> On 1/17/2013 11:30 AM, Luck, Tony wrote:
>>>>> 2. If the user *does* care which nodes are movable, then the user
>> needs
>>>>> to be able to specify that *in a way that makes sense to the user*.
>>>>> This may mean involving the DMI information as well as SRAT in
>> order to
>>>>> get "silk screen" type information out.
>>>>
>>>> One reason they might care would be which I/O devices are connected
>>>> to each node.  DMI might be a good way to get an invariant name for
>> the
>>>> node, but they might also want to specify in terms of what they
>> actually
>>>> want. E.g. "eth0 and eth4 are a redundant bonded pair of NICs -
>> don't
>>>> mark both these nodes as removable".  Though this is almost
>> certainly not
>>>> a job for kernel options, but for some user configuration tool that
>> would
>>>> spit out the DMI names.
>>>
>>> I agree DMI parsing should be done in userland if we really need DMI
>> parsing.
>>>
>>
>> If users use the boot parameter for bugs or debugging,  users need
>> a method which sets in detail range of movable memory. So specifying
>> node number is not enough because whole memory becomes movable memory.
>>
>> For this, we are discussing other ways, memory range and DMI
>> information.
>> By using DMI information, users may get an invariant name. But is it
>> really user friendly interface? I don't think so.
>>
>> You will think using memory range is not user friendly interface too.
>> But I think that using memory range is friendlier than using DMI
>> information since we can get easily memory range. So from developper
>> side, using memory range is good.
>>
>> Of course, using SRAT information is necessary solution. So we are
>> developing it now.
>>
>> Thanks,
>> Yasuaki Ishimatsu
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18  7:38                           ` Yasuaki Ishimatsu
@ 2013-01-18  8:08                             ` Tang Chen
  2013-01-18  9:23                               ` li guang
  0 siblings, 1 reply; 36+ messages in thread
From: Tang Chen @ 2013-01-18  8:08 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: H. Peter Anvin, KOSAKI Motohiro, tony.luck, akpm, jiang.liu,
	wujianguo, wency, laijs, linfeng, yinghai, rob, minchan.kim,
	mgorman, rientjes, guz.fnst, rusty, lliubbo, jaegeuk.hanse,
	glommer, linux-kernel, linux-mm

On 01/18/2013 03:38 PM, Yasuaki Ishimatsu wrote:
> 2013/01/18 15:25, H. Peter Anvin wrote:
>> We already do DMI parsing in the kernel...
>
> Thank you for giving the infomation.
>
> Is your mention /sys/firmware/dmi/entries?
>
> If so, my box does not have memory information.
> My box has only type 0, 1, 2, 3, 4, 7, 8, 9, 38, 127 in DMI.
> At least, my box cannot use the information...
>
> If users use the boot parameter for investigating firmware bugs
> or debugging, users cannot use DMI information on like my box.

And seeing from Documentation/ABI/testing/sysfs-firmware-dmi,

	The kernel itself does not rely on the majority of the
	information in these tables being correct.  It equally
	cannot ensure that the data as exported to userland is
	without error either.

So when users are doing debug, they should not rely on this info.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18  8:08                             ` Tang Chen
@ 2013-01-18  9:23                               ` li guang
  2013-01-18 18:29                                 ` Luck, Tony
  0 siblings, 1 reply; 36+ messages in thread
From: li guang @ 2013-01-18  9:23 UTC (permalink / raw)
  To: Tang Chen
  Cc: Yasuaki Ishimatsu, H. Peter Anvin, KOSAKI Motohiro, tony.luck,
	akpm, jiang.liu, wujianguo, wency, laijs, linfeng, yinghai, rob,
	minchan.kim, mgorman, rientjes, guz.fnst, rusty, lliubbo,
	jaegeuk.hanse, glommer, linux-kernel, linux-mm

在 2013-01-18五的 16:08 +0800,Tang Chen写道:
> On 01/18/2013 03:38 PM, Yasuaki Ishimatsu wrote:
> > 2013/01/18 15:25, H. Peter Anvin wrote:
> >> We already do DMI parsing in the kernel...
> >
> > Thank you for giving the infomation.
> >
> > Is your mention /sys/firmware/dmi/entries?
> >
> > If so, my box does not have memory information.
> > My box has only type 0, 1, 2, 3, 4, 7, 8, 9, 38, 127 in DMI.
> > At least, my box cannot use the information...
> >
> > If users use the boot parameter for investigating firmware bugs
> > or debugging, users cannot use DMI information on like my box.
> 
> And seeing from Documentation/ABI/testing/sysfs-firmware-dmi,
> 
> 	The kernel itself does not rely on the majority of the
> 	information in these tables being correct.  It equally
> 	cannot ensure that the data as exported to userland is
> 	without error either.
> 
> So when users are doing debug, they should not rely on this info.

kernel absolutely should not care much about SMBIOS(DMI info),
AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
mostly only on demand when OEMs required SMBIOS to report some
specific info.
furthermore, SMBIOS is so old and benifit nobody(in my personal
opinion), so maybe let's forget it.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
regards!
li guang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18  9:23                               ` li guang
@ 2013-01-18 18:29                                 ` Luck, Tony
  2013-01-19  1:06                                   ` Jiang Liu
  2013-01-21  7:36                                   ` Yasuaki Ishimatsu
  0 siblings, 2 replies; 36+ messages in thread
From: Luck, Tony @ 2013-01-18 18:29 UTC (permalink / raw)
  To: li guang, Tang Chen
  Cc: Yasuaki Ishimatsu, H. Peter Anvin, KOSAKI Motohiro,
	akpm@linux-foundation.org, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
	guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
	jaegeuk.hanse@gmail.com, glommer@parallels.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

> kernel absolutely should not care much about SMBIOS(DMI info),
> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
> mostly only on demand when OEMs required SMBIOS to report some
> specific info.
> furthermore, SMBIOS is so old and benifit nobody(in my personal
> opinion), so maybe let's forget it.

The "not having right information" flaw could be fixed by OEMs selling
systems on which it is important for system functionality that it be right.
They could use monetary incentives, contractual obligations, or sharp
pointy sticks to make their BIOS vendor get the table right.

BUT there is a bigger flaw - SMBIOS is a static table with no way to
update it in response to hotplug events.  So it could in theory have the
right information at boot time ... there is no possible way for it to be
right as soon as somebody adds, removes or replaces hardware.

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18 18:29                                 ` Luck, Tony
@ 2013-01-19  1:06                                   ` Jiang Liu
  2013-01-19  7:52                                     ` Chen Gong
  2013-01-21  7:36                                   ` Yasuaki Ishimatsu
  1 sibling, 1 reply; 36+ messages in thread
From: Jiang Liu @ 2013-01-19  1:06 UTC (permalink / raw)
  To: Luck, Tony
  Cc: li guang, Tang Chen, Yasuaki Ishimatsu, H. Peter Anvin,
	KOSAKI Motohiro, akpm@linux-foundation.org, wujianguo@huawei.com,
	wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
	guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
	jaegeuk.hanse@gmail.com, glommer@parallels.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On 2013-1-19 2:29, Luck, Tony wrote:
>> kernel absolutely should not care much about SMBIOS(DMI info),
>> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
>> mostly only on demand when OEMs required SMBIOS to report some
>> specific info.
>> furthermore, SMBIOS is so old and benifit nobody(in my personal
>> opinion), so maybe let's forget it.
> 
> The "not having right information" flaw could be fixed by OEMs selling
> systems on which it is important for system functionality that it be right.
> They could use monetary incentives, contractual obligations, or sharp
> pointy sticks to make their BIOS vendor get the table right.
> 
> BUT there is a bigger flaw - SMBIOS is a static table with no way to
> update it in response to hotplug events.  So it could in theory have the
> right information at boot time ... there is no possible way for it to be
> right as soon as somebody adds, removes or replaces hardware.

SMBIOS plays an important role when we are trying to do hardware fault
management, because OS needs information from SMBIOS to physically
identify a component/FRU. I also remember there were efforts to extend
SMBIOS specification to dynamically update the SMBIOS table when hotplug
happens.

Regards!
Gerry

> 
> -Tony


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-19  1:06                                   ` Jiang Liu
@ 2013-01-19  7:52                                     ` Chen Gong
  0 siblings, 0 replies; 36+ messages in thread
From: Chen Gong @ 2013-01-19  7:52 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Luck, Tony, li guang, Tang Chen, Yasuaki Ishimatsu,
	H. Peter Anvin, KOSAKI Motohiro, akpm@linux-foundation.org,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
	guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
	jaegeuk.hanse@gmail.com, glommer@parallels.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

[-- Attachment #1: Type: text/plain, Size: 3298 bytes --]

On Sat, Jan 19, 2013 at 09:06:14AM +0800, Jiang Liu wrote:
> Date: Sat, 19 Jan 2013 09:06:14 +0800
> From: Jiang Liu <jiang.liu@huawei.com>
> To: "Luck, Tony" <tony.luck@intel.com>
> CC: li guang <lig.fnst@cn.fujitsu.com>, Tang Chen
>  <tangchen@cn.fujitsu.com>, Yasuaki Ishimatsu
>  <isimatu.yasuaki@jp.fujitsu.com>, "H. Peter Anvin" <hpa@zytor.com>, KOSAKI
>  Motohiro <kosaki.motohiro@jp.fujitsu.com>, "akpm@linux-foundation.org"
>  <akpm@linux-foundation.org>, "wujianguo@huawei.com"
>  <wujianguo@huawei.com>, "wency@cn.fujitsu.com" <wency@cn.fujitsu.com>,
>  "laijs@cn.fujitsu.com" <laijs@cn.fujitsu.com>, "linfeng@cn.fujitsu.com"
>  <linfeng@cn.fujitsu.com>, "yinghai@kernel.org" <yinghai@kernel.org>,
>  "rob@landley.net" <rob@landley.net>, "minchan.kim@gmail.com"
>  <minchan.kim@gmail.com>, "mgorman@suse.de" <mgorman@suse.de>,
>  "rientjes@google.com" <rientjes@google.com>, "guz.fnst@cn.fujitsu.com"
>  <guz.fnst@cn.fujitsu.com>, "rusty@rustcorp.com.au"
>  <rusty@rustcorp.com.au>, "lliubbo@gmail.com" <lliubbo@gmail.com>,
>  "jaegeuk.hanse@gmail.com" <jaegeuk.hanse@gmail.com>,
>  "glommer@parallels.com" <glommer@parallels.com>,
>  "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
>  "linux-mm@kvack.org" <linux-mm@kvack.org>
> Subject: Re: [PATCH v5 0/5] Add movablecore_map boot option
> User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20111222
>  Thunderbird/9.0.1
> 
> On 2013-1-19 2:29, Luck, Tony wrote:
> >> kernel absolutely should not care much about SMBIOS(DMI info),
> >> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
> >> mostly only on demand when OEMs required SMBIOS to report some
> >> specific info.
> >> furthermore, SMBIOS is so old and benifit nobody(in my personal
> >> opinion), so maybe let's forget it.
> > 
> > The "not having right information" flaw could be fixed by OEMs selling
> > systems on which it is important for system functionality that it be right.
> > They could use monetary incentives, contractual obligations, or sharp
> > pointy sticks to make their BIOS vendor get the table right.
> > 
> > BUT there is a bigger flaw - SMBIOS is a static table with no way to
> > update it in response to hotplug events.  So it could in theory have the
> > right information at boot time ... there is no possible way for it to be
> > right as soon as somebody adds, removes or replaces hardware.
> 
> SMBIOS plays an important role when we are trying to do hardware fault
> management, because OS needs information from SMBIOS to physically
> identify a component/FRU. I also remember there were efforts to extend
> SMBIOS specification to dynamically update the SMBIOS table when hotplug
> happens.

Really, how to do it? Can you describe it clearly. BTW, if my understanding
is right, new Platform Memory Topology Table (PMTT) in ACPI5 should be for
this purpose but it doesn't exist in the older system so I want to know if
there is a workaround for older platform.

> 
> Regards!
> Gerry
> 
> > 
> > -Tony
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 0/5] Add movablecore_map boot option
  2013-01-18 18:29                                 ` Luck, Tony
  2013-01-19  1:06                                   ` Jiang Liu
@ 2013-01-21  7:36                                   ` Yasuaki Ishimatsu
  1 sibling, 0 replies; 36+ messages in thread
From: Yasuaki Ishimatsu @ 2013-01-21  7:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: li guang, Tang Chen, H. Peter Anvin, KOSAKI Motohiro,
	akpm@linux-foundation.org, jiang.liu@huawei.com,
	wujianguo@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	linfeng@cn.fujitsu.com, yinghai@kernel.org, rob@landley.net,
	minchan.kim@gmail.com, mgorman@suse.de, rientjes@google.com,
	guz.fnst@cn.fujitsu.com, rusty@rustcorp.com.au, lliubbo@gmail.com,
	jaegeuk.hanse@gmail.com, glommer@parallels.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

2013/01/19 3:29, Luck, Tony wrote:
>> kernel absolutely should not care much about SMBIOS(DMI info),
>> AFAIK, every BIOS vendor did not fill accurate info in SMBIOS,
>> mostly only on demand when OEMs required SMBIOS to report some
>> specific info.
>> furthermore, SMBIOS is so old and benifit nobody(in my personal
>> opinion), so maybe let's forget it.
>
> The "not having right information" flaw could be fixed by OEMs selling
> systems on which it is important for system functionality that it be right.
> They could use monetary incentives, contractual obligations, or sharp
> pointy sticks to make their BIOS vendor get the table right.
>
> BUT there is a bigger flaw - SMBIOS is a static table with no way to
> update it in response to hotplug events.  So it could in theory have the
> right information at boot time ... there is no possible way for it to be
> right as soon as somebody adds, removes or replaces hardware.

Using DMI information depends on firmware strongly. So even if we
implement boot option which uses DMI information for specifying memory
range as Movable zone, we cannot use it on our box. Other users may
hit same problem.

So we want to keep a current boot option which specifies memory range
since user can know memory address on every box.

Thanks,
Yasuaki Ishimatsu

>
> -Tony
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2013-01-21  7:36 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14  9:15 [PATCH v5 0/5] Add movablecore_map boot option Tang Chen
2013-01-14  9:15 ` [PATCH v5 1/5] x86: get pg_data_t's memory from other node Tang Chen
2013-01-14  9:15 ` [PATCH v5 2/5] page_alloc: add movable_memmap kernel parameter Tang Chen
2013-01-14 22:35   ` Andrew Morton
2013-01-14  9:15 ` [PATCH v5 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes Tang Chen
2013-01-14  9:15 ` [PATCH v5 4/5] page_alloc: Make movablecore_map has higher priority Tang Chen
2013-01-14  9:15 ` [PATCH v5 5/5] page_alloc: Bootmem limit with movablecore_map Tang Chen
2013-01-14 17:31 ` [PATCH v5 0/5] Add movablecore_map boot option H. Peter Anvin
2013-01-14 22:34   ` Andrew Morton
2013-01-14 22:41     ` Luck, Tony
2013-01-14 22:46       ` Andrew Morton
2013-01-16  6:25         ` Yasuaki Ishimatsu
2013-01-16 21:29           ` Andrew Morton
2013-01-16 22:01             ` KOSAKI Motohiro
2013-01-16 23:00               ` H. Peter Anvin
2013-01-17 20:27                 ` KOSAKI Motohiro
2013-01-16 22:52             ` H. Peter Anvin
2013-01-17  1:49               ` Tang Chen
2013-01-17 20:20                 ` KOSAKI Motohiro
2013-01-17  5:08               ` Yasuaki Ishimatsu
2013-01-17  6:03                 ` H. Peter Anvin
2013-01-17 16:30                   ` Luck, Tony
2013-01-17 20:28                     ` KOSAKI Motohiro
2013-01-18  6:05                       ` Yasuaki Ishimatsu
2013-01-18  6:25                         ` H. Peter Anvin
2013-01-18  7:38                           ` Yasuaki Ishimatsu
2013-01-18  8:08                             ` Tang Chen
2013-01-18  9:23                               ` li guang
2013-01-18 18:29                                 ` Luck, Tony
2013-01-19  1:06                                   ` Jiang Liu
2013-01-19  7:52                                     ` Chen Gong
2013-01-21  7:36                                   ` Yasuaki Ishimatsu
2013-01-15  1:23       ` Yasuaki Ishimatsu
2013-01-15  3:44         ` H. Peter Anvin
2013-01-15  4:04           ` Luck, Tony
2013-01-15  0:05     ` Toshi Kani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).