* [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks
@ 2010-11-30 7:13 shaohui.zheng
2010-11-30 7:13 ` [1/8, v6] NUMA Hotplug Emulator: documentation shaohui.zheng
` (7 more replies)
0 siblings, 8 replies; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh
* PATCHSET INTRODUCTION
patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.
patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface
patch 4: Abstract cpu register functions, make these interfaces friendly for cpu
hotplug emulation
patch 5: Support cpu probe/release in x86, it provides a software method to hot
add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
domain to build the incorrect hierarchy.
patch 7: Extend memory probe interface to support NUMA, we can add the memory to
a specified node with the interface.
patch 8: Implement memory probe interface with debugfs
* FEEDBACKDS & RESPONSES
v5:
David: Suggests to use a flexible method to to do node hotplug emulation. After
review our 2 versions emulator implemetations, David provides a better solution
to solve both the flexibility and memory wasting issue.
Add numa=possible=<N> command line option, provide sysfs inteface
/sys/devices/system/node/add_node interface, and move the inteface to debugfs
/sys/kernel/debug/hotplug/add_node after hearing the voice from community.
Greg KH: move the interface from hotplug/add_node to node/add_node
Response: Accept David's node=possible=<n> command line options. After talking
with David, he agreed to add his patch to our patchset, thanks David's solution(patch 1).
David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node
(patch 2)
Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
be the best.
Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
to support memory add on a specified node(patch 6).
We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).
Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).
Response: Thanks for Randy's careful review, we already correct them.
v6:
Greg KH: Suggest to use interface mem_hotplug/add_node
David: Agree with Greg's suggestion
Response: We move the interface from node/add_node to mem_hotplug/add_node, and we also move
memory/probe interface to mem_hotplug/probe since both are related to memory hotplug.
Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
we send patches in future.
* WHAT IS HOTPLUG EMULATOR
NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.
The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu and
memory hotplug test purpose.
* WHY DO WE USE HOTPLUG EMULATOR
We are focusing on the hotplug emualation for a few months. The emualor helps
team to reproduce all the major hotplug bugs. It plays an important role to the
hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.
* Principles & Usages
NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug
emulation.
1) Node hotplug emulation:
Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.
This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.
For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with
mem=2G numa=possible=4
and then creating a new 128M node at runtime:
# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
On node 1 totalpages: 0
init_memory_mapping: 0000000080000000-0000000088000000
0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined. If this
memory represents memory section 16, for example:
# echo online > /sys/devices/system/memory/memory16/state
Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
Policy zone: Normal
[ The memory section(s) mapped to a particular node are visible via
/sys/devices/system/node/node1, in this example. ]
2) CPU hotplug emulation:
The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.
When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator. We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.
- to hide CPUs
- Using boot option "maxcpus=N" hide CPUs
N is the number of initialize CPUs
- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
when cpu_hpe is enabled, the rest CPUs will not be initialized
- to hot-add CPU to node
$ echo nid > cpu/probe
- to hot-remove CPU
$ echo nid > cpu/release
3) Memory hotplug emulation:
The emulator reserve memory before OS booting, the reserved memory region
is remove from e820 table, and they can be hot-added via the probe interface,
this interface was extend to support add memory to the specified node, It
maintains backwards compatibility.
The difficulty of Memory Release is well-known, we have no plan for it until now.
- reserve memory throu grub parameter
mem=1024m
- add a memory section to node 3
$ echo 0x40000000,3 > memory/probe
OR
$ echo 1024m,3 > memory/probe
* ACKNOWLEDGMENT
hotplug emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes and
Yongkang You
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [1/8, v6] NUMA Hotplug Emulator: documentation
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-12-01 0:19 ` David Rientjes
2010-11-30 7:13 ` [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option shaohui.zheng
` (6 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Haicheng Li, Shaohui Zheng
[-- Attachment #1: 001-hotplug-emulator-doc-x86_64-of-numa-hotplug-emulator.patch --]
[-- Type: text/plain, Size: 4574 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.
Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt 2010-11-30 09:48:52.257622002 +0800
@@ -0,0 +1,104 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug. This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+ mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+ # echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
+ On node 1 totalpages: 0
+ init_memory_mapping: 0000000080000000-0000000088000000
+ 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined. If this
+memory represents memory section 16, for example:
+
+ # echo online > /sys/devices/system/memory/memory16/state
+ Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
+ Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+ /sys/devices/system/node/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the CPU
+socket hotplug process. For the CPU supported SMT, some logical CPUs are in the
+same socket, but it may located in different NUMA node after we have emulator.
+We put the logical CPU into a fake CPU socket, and assign it a unique
+phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+ - Using boot option "maxcpus=N" hide CPUs
+ N is the number of CPUs to initialize; the reset will be hidden.
+ - Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+ when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+ $ echo nid > cpu/probe
+
+ - to hot-remove CPU
+ $ echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table, and they can be hot-added via the probe interface.
+this interface was extended to support adding memory to the specified node. It
+maintains backwards compatibility.
+
+The difficulty of Memory Release is well-known, we have no plan for it until now.
+
+ - reserve memory thru a kernel boot paramter
+ mem=1024m
+
+ - add a memory section to node 3
+ $ echo 0x40000000,3 > memory/probe
+ OR
+ $ echo 1024m,3 > memory/probe
+ OR
+ $ echo "physical_address=0x40000000 numa_node=3" > memory/probe
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+ echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+ echo 1 > $c/online;
+done
+
+- David Rientjes <rientjes@google.com>
+- Haicheng Li <haicheng.li@intel.com>
+- Shaohui Zheng <shaohui.zheng@intel.com>
+ Nov 2010
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
2010-11-30 7:13 ` [1/8, v6] NUMA Hotplug Emulator: documentation shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-12-02 1:06 ` David Rientjes
2010-11-30 7:13 ` [3/8, v6] NUMA Hotplug Emulator: Add node hotplug emulation shaohui.zheng
` (5 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Shaohui Zheng, Haicheng Li
[-- Attachment #1: 002-add-node-possible-option.patch --]
[-- Type: text/plain, Size: 3626 bytes --]
From: David Rientjes <rientjes@google.com>
Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug. This set of possible nodes
controls nr_node_ids and the sizes of several dynamically allocated node
arrays.
This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.
The first use-case for this will be node hotplug emulation which will use
these possible nodes to create new nodes to test the memory hotplug
callbacks and surrounding memory hotplug code.
CC: Shaohui Zheng <shaohui.zheng@intel.com>
CC: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
Documentation/x86/x86_64/boot-options.txt | 4 ++++
arch/x86/mm/numa_64.c | 18 +++++++++++++++---
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -174,6 +174,10 @@ NUMA
If given as an integer, fills all system RAM with N fake nodes
interleaved over physical nodes.
+ numa=possible=<N>
+ Sets an additional N nodes as being possible for memory
+ hotplug.
+
ACPI
acpi=off Don't enable ACPI
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,7 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
int numa_off __initdata;
static unsigned long __initdata nodemap_addr;
static unsigned long __initdata nodemap_size;
+static unsigned long __initdata numa_possible_nodes;
/*
* Map cpu index to node index
@@ -611,7 +612,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
#ifdef CONFIG_NUMA_EMU
if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, k8))
- return;
+ goto out;
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
#endif
@@ -619,14 +620,14 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
#ifdef CONFIG_ACPI_NUMA
if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
last_pfn << PAGE_SHIFT))
- return;
+ goto out;
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
#endif
#ifdef CONFIG_K8_NUMA
if (!numa_off && k8 && !k8_scan_nodes())
- return;
+ goto out;
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
#endif
@@ -646,6 +647,15 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
numa_set_node(i, 0);
memblock_x86_register_active_regions(0, start_pfn, last_pfn);
setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+out: __maybe_unused
+ for (i = 0; i < numa_possible_nodes; i++) {
+ int nid;
+
+ nid = first_unset_node(node_possible_map);
+ if (nid == MAX_NUMNODES)
+ break;
+ node_set(nid, node_possible_map);
+ }
}
unsigned long __init numa_free_all_bootmem(void)
@@ -675,6 +685,8 @@ static __init int numa_setup(char *opt)
if (!strncmp(opt, "noacpi", 6))
acpi_numa = -1;
#endif
+ if (!strncmp(opt, "possible=", 9))
+ numa_possible_nodes = simple_strtoul(opt + 9, NULL, 0);
return 0;
}
early_param("numa", numa_setup);
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [3/8, v6] NUMA Hotplug Emulator: Add node hotplug emulation
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
2010-11-30 7:13 ` [1/8, v6] NUMA Hotplug Emulator: documentation shaohui.zheng
2010-11-30 7:13 ` [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-11-30 7:13 ` [4/8, v6] NUMA Hotplug Emulation: Abstract cpu register functions shaohui.zheng
` (4 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Shaohui Zheng, Haicheng Li
[-- Attachment #1: 003-node-hotpluge-emulation.patch --]
[-- Type: text/plain, Size: 5768 bytes --]
From: David Rientjes <rientjes@google.com>
Add an interface to allow new nodes to be added when performing memory
hot-add. This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.
This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
that behaves in a similar way to the memory hot-add "probe" interface.
Its format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.
The new node id is a currently offline, but possible, node. The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.
For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional four nodes as being possible on boot with
mem=2G numa=possible=4
and then creating a new 128M node at runtime:
# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
On node 1 totalpages: 0
init_memory_mapping: 0000000080000000-0000000088000000
0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined. If this
memory represents memory section 16, for example:
# echo online > /sys/devices/system/memory/memory16/state
Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
Policy zone: Normal
[ The memory section(s) mapped to a particular node are visible via
/sys/devices/system/node/node1, in this example. ]
The new node is now hotplugged and ready for testing.
CC: Shaohui Zheng <shaohui.zheng@intel.com>
CC: Haicheng Li <haicheng.li@intel.com>
CC: Greg KH <gregkh@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
---
Documentation/memory-hotplug.txt | 24 +++++++++++++++
mm/memory_hotplug.c | 59 ++++++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+), 0 deletions(-)
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt 2010-11-30 12:40:43.527622001 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt 2010-11-30 14:11:11.827622000 +0800
@@ -18,6 +18,7 @@
4. Physical memory hot-add phase
4.1 Hardware(Firmware) Support
4.2 Notify memory hot-add event by hand
+ 4.3 Node hotplug emulation
5. Logical Memory hot-add phase
5.1. State of memory
5.2. How to online memory
@@ -215,6 +216,29 @@
Please see "How to online memory" in this text.
+4.3 Node hotplug emulation
+------------
+With debugfs, it is possible to test node hotplug by assigning the newly
+added memory to a new node id when using a different interface with a similar
+behavior to "probe" described in section 4.2. If a node id is possible
+(there are bits in /sys/devices/system/memory/possible that are not online),
+then it may be used to emulate a newly added node as the result of memory
+hotplug by using the debugfs "add_node" interface.
+
+The add_node interface is located at "mem_hotplug/add_node" at the debugfs
+mount point.
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G"). The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
------------------------------
5. Logical Memory hot-add phase
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c 2010-11-30 12:40:43.757622001 +0800
+++ linux-hpe4/mm/memory_hotplug.c 2010-11-30 14:02:33.877622002 +0800
@@ -924,3 +924,63 @@
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
EXPORT_SYMBOL_GPL(remove_memory);
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *memhp_debug_root;
+
+static ssize_t add_node_store(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ nodemask_t mask;
+ u64 start, size;
+ char buffer[64];
+ char *p;
+ int nid;
+ int ret;
+
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count))
+ return -EFAULT;
+
+ size = memparse(buffer, &p);
+ if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
+ return -EINVAL;
+ if (*p != '@')
+ return -EINVAL;
+
+ start = simple_strtoull(p + 1, NULL, 0);
+
+ nodes_andnot(mask, node_possible_map, node_online_map);
+ nid = first_node(mask);
+ if (nid == MAX_NUMNODES)
+ return -ENOMEM;
+
+ ret = add_memory(nid, start, size);
+ return ret ? ret : count;
+}
+
+static const struct file_operations add_node_file_ops = {
+ .write = add_node_store,
+ .llseek = generic_file_llseek,
+};
+
+static int __init node_debug_init(void)
+{
+ if (!memhp_debug_root)
+ memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+ if (!memhp_debug_root)
+ return -ENOMEM;
+
+ if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
+ NULL, &add_node_file_ops))
+ return -ENOMEM;
+
+ return 0;
+}
+
+module_init(node_debug_init);
+#endif /* CONFIG_DEBUG_FS */
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [4/8, v6] NUMA Hotplug Emulation: Abstract cpu register functions
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
` (2 preceding siblings ...)
2010-11-30 7:13 ` [3/8, v6] NUMA Hotplug Emulator: Add node hotplug emulation shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-11-30 7:13 ` [5/8, v6] NUMA Hotplug Emulator: support cpu probe/release in x86_64 shaohui.zheng
` (3 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Shaohui Zheng
[-- Attachment #1: 004-hotplug-emulator-x86-abstract-cpu-register-functions.patch --]
[-- Type: text/plain, Size: 3655 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
Abstract cpu register functions, provide a more flexible interface
register_cpu_node, the new interface provides convenience to add cpu
to a specified node, we can use it to add a cpu to a fake node.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/cpu.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/cpu.h 2010-11-17 09:00:59.742608402 +0800
+++ linux-hpe4/arch/x86/include/asm/cpu.h 2010-11-17 09:01:10.192838977 +0800
@@ -27,6 +27,7 @@
#ifdef CONFIG_HOTPLUG_CPU
extern int arch_register_cpu(int num);
+extern int arch_register_cpu_node(int num, int nid);
extern void arch_unregister_cpu(int);
#endif
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c 2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c 2010-11-17 10:05:32.934085248 +0800
@@ -52,6 +52,15 @@
}
EXPORT_SYMBOL(arch_register_cpu);
+int __ref arch_register_cpu_node(int num, int nid)
+{
+ if (num)
+ per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
+
+ return register_cpu_node(&per_cpu(cpu_devices, num).cpu, num, nid);
+}
+EXPORT_SYMBOL(arch_register_cpu_node);
+
void arch_unregister_cpu(int num)
{
unregister_cpu(&per_cpu(cpu_devices, num).cpu);
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c 2010-11-17 09:01:01.053461766 +0800
+++ linux-hpe4/drivers/base/cpu.c 2010-11-17 10:05:32.943465010 +0800
@@ -208,17 +208,18 @@
static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
/*
- * register_cpu - Setup a sysfs device for a CPU.
+ * register_cpu_node - Setup a sysfs device for a CPU.
* @cpu - cpu->hotpluggable field set to 1 will generate a control file in
* sysfs for this CPU.
* @num - CPU number to use when creating the device.
+ * @nid - Node ID to use, if any.
*
* Initialize and register the CPU device.
*/
-int __cpuinit register_cpu(struct cpu *cpu, int num)
+int __cpuinit register_cpu_node(struct cpu *cpu, int num, int nid)
{
int error;
- cpu->node_id = cpu_to_node(num);
+ cpu->node_id = nid;
cpu->sysdev.id = num;
cpu->sysdev.cls = &cpu_sysdev_class;
@@ -229,7 +230,7 @@
if (!error)
per_cpu(cpu_sys_devices, num) = &cpu->sysdev;
if (!error)
- register_cpu_under_node(num, cpu_to_node(num));
+ register_cpu_under_node(num, nid);
#ifdef CONFIG_KEXEC
if (!error)
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h 2010-11-17 09:00:59.772898926 +0800
+++ linux-hpe4/include/linux/cpu.h 2010-11-17 10:05:32.954085309 +0800
@@ -30,7 +30,13 @@
struct sys_device sysdev;
};
-extern int register_cpu(struct cpu *cpu, int num);
+extern int register_cpu_node(struct cpu *cpu, int num, int nid);
+
+static inline int register_cpu(struct cpu *cpu, int num)
+{
+ return register_cpu_node(cpu, num, cpu_to_node(num));
+}
+
extern struct sys_device *get_cpu_sysdev(unsigned cpu);
extern int cpu_add_sysdev_attr(struct sysdev_attribute *attr);
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [5/8, v6] NUMA Hotplug Emulator: support cpu probe/release in x86_64
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
` (3 preceding siblings ...)
2010-11-30 7:13 ` [4/8, v6] NUMA Hotplug Emulation: Abstract cpu register functions shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-11-30 7:13 ` [6/8, v6] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86 shaohui.zheng
` (2 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Ingo Molnar, Len Brown, Yinghai Lu, Shaohui Zheng,
Haicheng Li
[-- Attachment #1: 005-hotplug-emulator-x86-support-cpu-probe-release-in-x86.patch --]
[-- Type: text/plain, Size: 10983 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
CPU physical hot-add/hot-remove are supported on some hardwares, and it
was already supported in current linux kernel. CPU Hotplug Emulator provides
a mechanism to emulate the process with software method. It can be used for
testing or debuging purpose.
CPU physical hotplug is different with logical CPU online/offline. Logical
online/offline is controled by interface /sys/device/cpu/cpuX/online. CPU
hotplug emulator uses probe/release interface. It becomes possible to do cpu
hotplug automation and stress
Add cpu interface probe/release under sysfs for x86_64. User can use this
interface to emulate the cpu hot-add and hot-remove process.
Directive:
*) Reserve CPU thru grub parameter like:
maxcpus=4
the rest CPUs will not be initiliazed.
*) Probe CPU
we can use the probe interface to hot-add new CPUs:
echo nid > /sys/devices/system/cpu/probe
*) Release a CPU
echo cpu > /sys/devices/system/cpu/release
A reserved CPU will be hot-added to the specified node.
1) nid == 0, the CPU will be added to the real node which the CPU
should be in
2) nid != 0, add the CPU to node nid even through it is a fake node.
CC: Ingo Molnar <mingo@elte.hu>
CC: Len Brown <len.brown@intel.com>
CC: Yinghai Lu <Yinghai.Lu@Sun.COM>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
---
Index: linux-hpe4/arch/x86/kernel/acpi/boot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/acpi/boot.c 2010-11-26 09:24:40.287725018 +0800
+++ linux-hpe4/arch/x86/kernel/acpi/boot.c 2010-11-26 09:24:53.277724996 +0800
@@ -647,8 +647,44 @@
}
EXPORT_SYMBOL(acpi_map_lsapic);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static void acpi_map_cpu2node_emu(int cpu, int physid, int nid)
+{
+#ifdef CONFIG_ACPI_NUMA
+#ifdef CONFIG_X86_64
+ apicid_to_node[physid] = nid;
+ numa_set_node(cpu, nid);
+#else /* CONFIG_X86_32 */
+ apicid_2_node[physid] = nid;
+ cpu_to_node_map[cpu] = nid;
+#endif
+#endif
+}
+
+static u16 cpu_to_apicid_saved[CONFIG_NR_CPUS];
+int __ref acpi_map_lsapic_emu(int pcpu, int nid)
+{
+ /* backup cpu apicid to array cpu_to_apicid_saved */
+ if (cpu_to_apicid_saved[pcpu] == 0 &&
+ per_cpu(x86_cpu_to_apicid, pcpu) != BAD_APICID)
+ cpu_to_apicid_saved[pcpu] = per_cpu(x86_cpu_to_apicid, pcpu);
+
+ per_cpu(x86_cpu_to_apicid, pcpu) = cpu_to_apicid_saved[pcpu];
+ acpi_map_cpu2node_emu(pcpu, per_cpu(x86_cpu_to_apicid, pcpu), nid);
+
+ return pcpu;
+}
+EXPORT_SYMBOL(acpi_map_lsapic_emu);
+#endif
+
int acpi_unmap_lsapic(int cpu)
{
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ /* backup cpu apicid to array cpu_to_apicid_saved */
+ if (cpu_to_apicid_saved[cpu] == 0 &&
+ per_cpu(x86_cpu_to_apicid, cpu) != BAD_APICID)
+ cpu_to_apicid_saved[cpu] = per_cpu(x86_cpu_to_apicid, cpu);
+#endif
per_cpu(x86_cpu_to_apicid, cpu) = -1;
set_cpu_present(cpu, false);
num_processors--;
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c 2010-11-26 09:24:40.297724969 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c 2010-11-26 12:48:58.977725001 +0800
@@ -107,8 +107,6 @@
mutex_unlock(&x86_cpu_hotplug_driver_mutex);
}
-ssize_t arch_cpu_probe(const char *buf, size_t count) { return -1; }
-ssize_t arch_cpu_release(const char *buf, size_t count) { return -1; }
#else
static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
#define get_idle_for_cpu(x) (idle_thread_array[(x)])
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c 2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c 2010-11-26 12:48:58.987725001 +0800
@@ -30,6 +30,9 @@
#include <linux/init.h>
#include <linux/smp.h>
#include <asm/cpu.h>
+#include <linux/cpu.h>
+#include <linux/topology.h>
+#include <linux/acpi.h>
static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
@@ -66,6 +69,74 @@
unregister_cpu(&per_cpu(cpu_devices, num).cpu);
}
EXPORT_SYMBOL(arch_unregister_cpu);
+
+ssize_t arch_cpu_probe(const char *buf, size_t count)
+{
+ int nid = 0;
+ int num = 0, selected = 0;
+
+ /* check parameters */
+ if (!buf || count < 2)
+ return -EPERM;
+
+ nid = simple_strtoul(buf, NULL, 0);
+ printk(KERN_DEBUG "Add a cpu to node : %d\n", nid);
+
+ if (nid < 0 || nid > nr_node_ids - 1) {
+ printk(KERN_ERR "Invalid NUMA node id: %d (0 <= nid < %d).\n",
+ nid, nr_node_ids);
+ return -EPERM;
+ }
+
+ if (!node_online(nid)) {
+ printk(KERN_ERR "NUMA node %d is not online, give up.\n", nid);
+ return -EPERM;
+ }
+
+ /* find first uninitialized cpu */
+ for_each_present_cpu(num) {
+ if (per_cpu(cpu_sys_devices, num) == NULL) {
+ selected = num;
+ break;
+ }
+ }
+
+ if (selected >= num_possible_cpus()) {
+ printk(KERN_ERR "No free cpu, give up cpu probing.\n");
+ return -EPERM;
+ }
+
+ /* register cpu */
+ arch_register_cpu_node(selected, nid);
+ acpi_map_lsapic_emu(selected, nid);
+
+ return count;
+}
+EXPORT_SYMBOL(arch_cpu_probe);
+
+ssize_t arch_cpu_release(const char *buf, size_t count)
+{
+ int cpu = 0;
+
+ cpu = simple_strtoul(buf, NULL, 0);
+ /* cpu 0 is not hotplugable */
+ if (cpu == 0) {
+ printk(KERN_ERR "can not release cpu 0.\n");
+ return -EPERM;
+ }
+
+ if (cpu_online(cpu)) {
+ printk(KERN_DEBUG "offline cpu %d.\n", cpu);
+ cpu_down(cpu);
+ }
+
+ arch_unregister_cpu(cpu);
+ acpi_unmap_lsapic(cpu);
+
+ return count;
+}
+EXPORT_SYMBOL(arch_cpu_release);
+
#else /* CONFIG_HOTPLUG_CPU */
static int __init arch_register_cpu(int num)
@@ -83,8 +154,14 @@
register_one_node(i);
#endif
- for_each_present_cpu(i)
- arch_register_cpu(i);
+ /*
+ * when cpu hotplug emulation enabled, register the online cpu only,
+ * the rests are reserved for cpu probe.
+ */
+ for_each_present_cpu(i) {
+ if ((cpu_hpe_on && cpu_online(i)) || !cpu_hpe_on)
+ arch_register_cpu(i);
+ }
return 0;
}
Index: linux-hpe4/arch/x86/mm/numa_64.c
===================================================================
--- linux-hpe4.orig/arch/x86/mm/numa_64.c 2010-11-26 09:24:40.317724965 +0800
+++ linux-hpe4/arch/x86/mm/numa_64.c 2010-11-26 09:24:53.297725001 +0800
@@ -12,6 +12,7 @@
#include <linux/module.h>
#include <linux/nodemask.h>
#include <linux/sched.h>
+#include <linux/cpu.h>
#include <asm/e820.h>
#include <asm/proto.h>
@@ -785,6 +786,19 @@
}
#endif
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+static __init int cpu_hpe_setup(char *opt)
+{
+ if (!opt)
+ return -EINVAL;
+
+ if (!strncmp(opt, "on", 2) || !strncmp(opt, "1", 1))
+ cpu_hpe_on = 1;
+
+ return 0;
+}
+early_param("cpu_hpe", cpu_hpe_setup);
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
void __cpuinit numa_set_node(int cpu, int node)
{
Index: linux-hpe4/drivers/acpi/processor_driver.c
===================================================================
--- linux-hpe4.orig/drivers/acpi/processor_driver.c 2010-11-26 09:24:40.327725004 +0800
+++ linux-hpe4/drivers/acpi/processor_driver.c 2010-11-26 09:24:53.297725001 +0800
@@ -530,6 +530,14 @@
goto err_free_cpumask;
sysdev = get_cpu_sysdev(pr->id);
+ /*
+ * Reserve cpu for hotplug emulation, the reserved cpu can be hot-added
+ * throu the cpu probe interface. Return directly.
+ */
+ if (sysdev == NULL) {
+ goto out;
+ }
+
if (sysfs_create_link(&device->dev.kobj, &sysdev->kobj, "sysdev")) {
result = -EFAULT;
goto err_remove_fs;
@@ -570,6 +578,7 @@
goto err_remove_sysfs;
}
+out:
return 0;
err_remove_sysfs:
Index: linux-hpe4/drivers/base/cpu.c
===================================================================
--- linux-hpe4.orig/drivers/base/cpu.c 2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/drivers/base/cpu.c 2010-11-26 09:24:53.297725001 +0800
@@ -22,9 +22,15 @@
};
EXPORT_SYMBOL(cpu_sysdev_class);
-static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
+DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * cpu_hpe_on is a switch to enable/disable cpu hotplug emulation. it is
+ * disabled in default, we can enable it throu grub parameter cpu_hpe=on
+ */
+int cpu_hpe_on;
+
static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
char *buf)
{
Index: linux-hpe4/include/linux/acpi.h
===================================================================
--- linux-hpe4.orig/include/linux/acpi.h 2010-11-26 09:24:40.347725041 +0800
+++ linux-hpe4/include/linux/acpi.h 2010-11-26 09:24:53.297725001 +0800
@@ -102,6 +102,7 @@
#ifdef CONFIG_ACPI_HOTPLUG_CPU
/* Arch dependent functions for cpu hotplug support */
int acpi_map_lsapic(acpi_handle handle, int *pcpu);
+int acpi_map_lsapic_emu(int pcpu, int nid);
int acpi_unmap_lsapic(int cpu);
#endif /* CONFIG_ACPI_HOTPLUG_CPU */
Index: linux-hpe4/include/linux/cpu.h
===================================================================
--- linux-hpe4.orig/include/linux/cpu.h 2010-11-26 09:24:52.477725000 +0800
+++ linux-hpe4/include/linux/cpu.h 2010-11-26 09:24:53.297725001 +0800
@@ -30,6 +30,8 @@
struct sys_device sysdev;
};
+DECLARE_PER_CPU(struct sys_device *, cpu_sys_devices);
+
extern int register_cpu_node(struct cpu *cpu, int num, int nid);
static inline int register_cpu(struct cpu *cpu, int num)
@@ -149,6 +151,7 @@
#define register_hotcpu_notifier(nb) register_cpu_notifier(nb)
#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
int cpu_down(unsigned int cpu);
+extern int cpu_hpe_on;
#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
extern void cpu_hotplug_driver_lock(void);
@@ -171,6 +174,7 @@
/* These aren't inline functions due to a GCC bug. */
#define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
#define unregister_hotcpu_notifier(nb) ({ (void)(nb); })
+static int cpu_hpe_on;
#endif /* CONFIG_HOTPLUG_CPU */
#ifdef CONFIG_PM_SLEEP_SMP
Index: linux-hpe4/Documentation/x86/x86_64/boot-options.txt
===================================================================
--- linux-hpe4.orig/Documentation/x86/x86_64/boot-options.txt 2010-11-26 12:49:44.847725099 +0800
+++ linux-hpe4/Documentation/x86/x86_64/boot-options.txt 2010-11-26 12:55:50.527724999 +0800
@@ -316,3 +316,9 @@
Do not use GB pages for kernel direct mappings.
gbpages
Use GB pages for kernel direct mappings.
+ cpu_hpe=on/off
+ Enable/disable CPU hotplug emulation with software method. When cpu_hpe=on,
+ sysfs provides probe/release interface to hot add/remove CPUs dynamically.
+ We can use maxcpus=<N> to reserve CPUs.
+ This option is disabled by default.
+
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [6/8, v6] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
` (4 preceding siblings ...)
2010-11-30 7:13 ` [5/8, v6] NUMA Hotplug Emulator: support cpu probe/release in x86_64 shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-11-30 7:13 ` [7/8, v6] NUMA Hotplug Emulator: extend memory probe interface to support NUMA shaohui.zheng
2010-11-30 7:13 ` [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe shaohui.zheng
7 siblings, 0 replies; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Sam Ravnborg, Haicheng Li, Shaohui Zheng
[-- Attachment #1: 006-hotplug-emulator-fake_socket_with_logic_cpu_on_x86.patch --]
[-- Type: text/plain, Size: 7989 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
When hotplug a CPU with emulator, we are using a logical CPU to emulate the
CPU hotplug process. For the CPU supported SMT, some logical CPUs are in the
same socket, but it may located in different NUMA node after we have emulator.
it misleads the scheduling domain to build the incorrect hierarchy, and it
causes the following call trace when rebalance the scheduling domain:
divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu8/online
CPU 0
Modules linked in: fbcon tileblit font bitblit softcursor radeon ttm drm_kms_helper e1000e usbhid via_rhine mii drm i2c_algo_bit igb dca
Pid: 0, comm: swapper Not tainted 2.6.32hpe #78 X8DTN
RIP: 0010:[<ffffffff81051da5>] [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
RSP: 0018:ffff880028203c30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000015ac0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880277e8cfa0 RDI: 0000000000000000
RBP: ffff880028203dc0 R08: ffff880277e8cfa0 R09: 0000000000000040
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f16cfc85770 CR3: 0000000001001000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81822000, task ffffffff8184a600)
Stack:
ffff880028203d60 ffff880028203cd0 ffff8801c204ff08 ffff880028203e38
<0> 0101ffff81018c59 ffff880028203e44 00000001810806bd ffff8801c204fe00
<0> 0000000528200000 ffffffff00000000 0000000000000018 0000000000015ac0
Call Trace:
<IRQ>
[<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
[<ffffffff81053b2c>] rebalance_domains+0x17c/0x570
[<ffffffff81018c89>] ? read_tsc+0x9/0x20
[<ffffffff81088ee0>] ? tick_dev_program_event+0x40/0xd0
[<ffffffff810569ed>] run_rebalance_domains+0xbd/0xf0
[<ffffffff8106471f>] __do_softirq+0xaf/0x1e0
[<ffffffff810b7d18>] ? handle_IRQ_event+0x58/0x160
[<ffffffff810130ac>] call_softirq+0x1c/0x30
[<ffffffff81014a85>] do_softirq+0x65/0xa0
[<ffffffff810645cd>] irq_exit+0x7d/0x90
[<ffffffff81013ff0>] do_IRQ+0x70/0xe0
[<ffffffff810128d3>] ret_from_intr+0x0/0x11
<EOI>
[<ffffffff8133387f>] ? acpi_idle_enter_bm+0x281/0x2b5
[<ffffffff81333878>] ? acpi_idle_enter_bm+0x27a/0x2b5
[<ffffffff8145dc8f>] ? cpuidle_idle_call+0x9f/0x130
[<ffffffff81010e2b>] ? cpu_idle+0xab/0x100
[<ffffffff8158aee6>] ? rest_init+0x66/0x70
[<ffffffff81905d90>] ? start_kernel+0x3e3/0x3ef
[<ffffffff8190533a>] ? x86_64_start_reservations+0x125/0x129
[<ffffffff81905438>] ? x86_64_start_kernel+0xfa/0x109
Code: 00 00 e9 4c fb ff ff 0f 1f 80 00 00 00 00 48 8b b5 d8 fe ff ff 48 8b 45 a8 4d 29 ef 8b 56 08 48 c1 e0 0a 49 89 f0 48 89 d7 31 d2 <48> f7 f7 31 d2 48 89 45 a0 8b 76 08 4c 89 f0 48 c1 e0 0a 48 f7
RIP [<ffffffff81051da5>] find_busiest_group+0x6c5/0xa10
RSP <ffff880028203c30>
Solution:
We put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only. This
method fixes the above bug.
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/arch/x86/include/asm/processor.h
===================================================================
--- linux-hpe4.orig/arch/x86/include/asm/processor.h 2010-11-17 09:00:51.354100239 +0800
+++ linux-hpe4/arch/x86/include/asm/processor.h 2010-11-17 09:01:10.222837594 +0800
@@ -113,6 +113,15 @@
/* Index into per_cpu list: */
u16 cpu_index;
#endif
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ /*
+ * Use a logic cpu to emulate a physical cpu's hotplug. We put the
+ * logical cpu into a fake socket, assign a fake physical id to it,
+ * and create a fake core.
+ */
+ __u8 cpu_probe_on; /* A flag to enable cpu probe/release */
+#endif
} __attribute__((__aligned__(SMP_CACHE_BYTES)));
#define X86_VENDOR_INTEL 0
Index: linux-hpe4/arch/x86/kernel/smpboot.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/smpboot.c 2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/smpboot.c 2010-11-17 09:01:10.222837594 +0800
@@ -97,6 +97,7 @@
*/
static DEFINE_MUTEX(x86_cpu_hotplug_driver_mutex);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
void cpu_hotplug_driver_lock()
{
mutex_lock(&x86_cpu_hotplug_driver_mutex);
@@ -106,6 +107,7 @@
{
mutex_unlock(&x86_cpu_hotplug_driver_mutex);
}
+#endif
#else
static struct task_struct *idle_thread_array[NR_CPUS] __cpuinitdata ;
@@ -198,6 +200,8 @@
{
int cpuid, phys_id;
unsigned long timeout;
+ u8 cpu_probe_on = 0;
+ struct cpuinfo_x86 *c;
/*
* If waken up by an INIT in an 82489DX configuration
@@ -277,7 +281,20 @@
/*
* Save our processor parameters
*/
+ c = &cpu_data(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ cpu_probe_on = c->cpu_probe_on;
+ phys_id = c->phys_proc_id;
+#endif
+
smp_store_cpu_info(cpuid);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ if (cpu_probe_on) {
+ c->phys_proc_id = phys_id; /* restore the fake phys_proc_id */
+ c->cpu_core_id = 0; /* force the logical cpu to core 0 */
+ c->cpu_probe_on = cpu_probe_on;
+ }
+#endif
notify_cpu_starting(cpuid);
@@ -400,6 +417,11 @@
{
int i;
struct cpuinfo_x86 *c = &cpu_data(cpu);
+ int cpu_probe_on = 0;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+ cpu_probe_on = c->cpu_probe_on;
+#endif
cpumask_set_cpu(cpu, cpu_sibling_setup_mask);
@@ -431,7 +453,8 @@
for_each_cpu(i, cpu_sibling_setup_mask) {
if (per_cpu(cpu_llc_id, cpu) != BAD_APICID &&
- per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i)) {
+ per_cpu(cpu_llc_id, cpu) == per_cpu(cpu_llc_id, i) &&
+ cpu_probe_on == 0) {
cpumask_set_cpu(i, c->llc_shared_map);
cpumask_set_cpu(cpu, cpu_data(i).llc_shared_map);
}
Index: linux-hpe4/arch/x86/kernel/topology.c
===================================================================
--- linux-hpe4.orig/arch/x86/kernel/topology.c 2010-11-17 09:01:10.202837209 +0800
+++ linux-hpe4/arch/x86/kernel/topology.c 2010-11-17 09:01:10.222837594 +0800
@@ -70,6 +70,36 @@
}
EXPORT_SYMBOL(arch_unregister_cpu);
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
+/*
+ * Put the logical cpu into a new sokect, and encapsule it into core 0.
+ */
+static void fake_cpu_socket_info(int cpu)
+{
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
+ int i, phys_id = 0;
+
+ /* calculate the max phys_id */
+ for_each_present_cpu(i) {
+ struct cpuinfo_x86 *c = &cpu_data(i);
+ if (phys_id < c->phys_proc_id)
+ phys_id = c->phys_proc_id;
+ }
+
+ c->phys_proc_id = phys_id + 1; /* pick up a unused phys_proc_id */
+ c->cpu_core_id = 0; /* always put the logical cpu to core 0 */
+ c->cpu_probe_on = 1;
+}
+
+static void clear_cpu_socket_info(int cpu)
+{
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
+ c->phys_proc_id = 0;
+ c->cpu_core_id = 0;
+ c->cpu_probe_on = 0;
+}
+
+
ssize_t arch_cpu_probe(const char *buf, size_t count)
{
int nid = 0;
@@ -109,6 +139,7 @@
/* register cpu */
arch_register_cpu_node(selected, nid);
acpi_map_lsapic_emu(selected, nid);
+ fake_cpu_socket_info(selected);
return count;
}
@@ -132,10 +163,13 @@
arch_unregister_cpu(cpu);
acpi_unmap_lsapic(cpu);
+ clear_cpu_socket_info(cpu);
+ set_cpu_present(cpu, true);
return count;
}
EXPORT_SYMBOL(arch_cpu_release);
+#endif CONFIG_ARCH_CPU_PROBE_RELEASE
#else /* CONFIG_HOTPLUG_CPU */
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [7/8, v6] NUMA Hotplug Emulator: extend memory probe interface to support NUMA
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
` (5 preceding siblings ...)
2010-11-30 7:13 ` [6/8, v6] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86 shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-12-02 0:55 ` David Rientjes
2010-11-30 7:13 ` [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe shaohui.zheng
7 siblings, 1 reply; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Shaohui Zheng, Haicheng Li, Wu Fengguang
[-- Attachment #1: 007-hotplug-emulator-extend-memory-probe-interface-to-support-numa.patch --]
[-- Type: text/plain, Size: 5889 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
Extend memory probe interface to support an extra paramter nid,
the reserved memory can be added into this node if node exists.
Add a memory section(128M) to node 3(boots with mem=1024m)
echo 0x40000000,3 > memory/probe
And more we make it friendly, it is possible to add memory to do
echo 3g > memory/probe
echo 1024m,3 > memory/probe
It maintains backwards compatibility.
Another format suggested by Dave Hansen:
echo physical_address=0x40000000 numa_node=3 > memory/probe
it is more explicit to show meaning of the parameters.
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
Index: linux-hpe4/arch/x86/Kconfig
===================================================================
--- linux-hpe4.orig/arch/x86/Kconfig 2010-11-30 12:03:49.747622002 +0800
+++ linux-hpe4/arch/x86/Kconfig 2010-11-30 12:40:52.317621999 +0800
@@ -1276,10 +1276,6 @@
def_bool y
depends on ARCH_SPARSEMEM_ENABLE
-config ARCH_MEMORY_PROBE
- def_bool X86_64
- depends on MEMORY_HOTPLUG
-
config ILLEGAL_POINTER_VALUE
hex
default 0 if X86_32
Index: linux-hpe4/drivers/base/memory.c
===================================================================
--- linux-hpe4.orig/drivers/base/memory.c 2010-11-30 12:40:43.737622001 +0800
+++ linux-hpe4/drivers/base/memory.c 2010-11-30 12:42:15.467621626 +0800
@@ -329,26 +329,76 @@
* will not need to do it from userspace. The fake hot-add code
* as well as ppc64 will do all of their discovery in userspace
* and will require this interface.
+ *
+ * Parameter format 1: physical_address,numa_node
+ * Parameter format 2: physical_address=0x40000000 numa_node=3
*/
#ifdef CONFIG_ARCH_MEMORY_PROBE
-static ssize_t
-memory_probe_store(struct class *class, struct class_attribute *attr,
- const char *buf, size_t count)
+ssize_t parse_memory_probe_store(const char *buf, size_t count)
{
- u64 phys_addr;
- int nid;
+ u64 phys_addr = 0;
+ int nid = 0;
int ret;
+ char *p = NULL, *q = NULL;
+ /* format: physical_address=0x40000000 numa_node=3 */
+ p = strchr(buf, '=');
+ if (p != NULL) {
+ *p = '\0';
+ q = strchr(buf, ' ');
+ if (q == NULL) {
+ if (strcmp(buf, "physical_address") != 0)
+ ret = -EPERM;
+ else
+ phys_addr = memparse(p+1, NULL);
+ } else {
+ *q++ = '\0';
+ p = strchr(q, '=');
+ if (strcmp(buf, "physical_address") == 0)
+ phys_addr = memparse(p+1, NULL);
+ if (strcmp(buf, "numa_node") == 0)
+ nid = simple_strtoul(p+1, NULL, 0);
+ if (strcmp(q, "physical_address") == 0)
+ phys_addr = memparse(p+1, NULL);
+ if (strcmp(q, "numa_node") == 0)
+ nid = simple_strtoul(p+1, NULL, 0);
+ }
+ } else { /* physical_address,numa_node */
+ p = strchr(buf, ',');
+ if (p != NULL && strlen(p+1) > 0) {
+ /* nid specified */
+ *p++ = '\0';
+ nid = simple_strtoul(p, NULL, 0);
+ phys_addr = memparse(buf, NULL);
+ } else {
+ phys_addr = memparse(buf, NULL);
+ nid = memory_add_physaddr_to_nid(phys_addr);
+ }
+ }
- phys_addr = simple_strtoull(buf, NULL, 0);
-
- nid = memory_add_physaddr_to_nid(phys_addr);
- ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+ if (nid < 0 || nid > nr_node_ids - 1) {
+ printk(KERN_ERR "Invalid node id %d(0<=nid<%d).\n", nid, nr_node_ids);
+ ret = -EPERM;
+ } else {
+ printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+ ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+ if (ret)
+ count = ret;
+ }
if (ret)
count = ret;
return count;
}
+EXPORT_SYMBOL(parse_memory_probe_store);
+
+static ssize_t
+memory_probe_store(struct class *class, struct class_attribute *attr,
+ const char *buf, size_t count)
+{
+ return parse_memory_probe_store(buf, count);
+}
+
static CLASS_ATTR(probe, S_IWUSR, NULL, memory_probe_store);
static int memory_probe_init(void)
Index: linux-hpe4/mm/Kconfig
===================================================================
--- linux-hpe4.orig/mm/Kconfig 2010-11-30 12:03:49.747622002 +0800
+++ linux-hpe4/mm/Kconfig 2010-11-30 12:40:52.327621999 +0800
@@ -174,6 +174,17 @@
default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
default "4"
+config ARCH_MEMORY_PROBE
+ def_bool y
+ bool "Memory hotplug emulation"
+ depends on MEMORY_HOTPLUG
+ ---help---
+ Enable memory hotplug emulation. Reserve memory with grub parameter
+ "mem=N"(such as mem=1024M), where N is the initial memory size, the
+ rest physical memory will be removed from e820 table; the memory probe
+ interface is for memory hot-add to specified node in software method.
+ This is for debuging and testing purpose
+
#
# support for memory compaction
config COMPACTION
Index: linux-hpe4/include/linux/memory_hotplug.h
===================================================================
--- linux-hpe4.orig/include/linux/memory_hotplug.h 2010-11-30 12:40:43.737622001 +0800
+++ linux-hpe4/include/linux/memory_hotplug.h 2010-11-30 12:40:52.337622000 +0800
@@ -211,5 +211,13 @@
extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
unsigned long pnum);
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+extern ssize_t parse_memory_probe_store(const char *buf, size_t count);
+#else
+static inline ssize_t parse_memory_probe_store(const char *buf, size_t count)
+{
+ return 0;
+}
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
#endif /* __LINUX_MEMORY_HOTPLUG_H */
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
` (6 preceding siblings ...)
2010-11-30 7:13 ` [7/8, v6] NUMA Hotplug Emulator: extend memory probe interface to support NUMA shaohui.zheng
@ 2010-11-30 7:13 ` shaohui.zheng
2010-12-02 0:57 ` David Rientjes
7 siblings, 1 reply; 22+ messages in thread
From: shaohui.zheng @ 2010-11-30 7:13 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Shaohui Zheng, Haicheng Li
[-- Attachment #1: 008-hotplug-emulator-implement-memory-probe-debugfs-interface.patch --]
[-- Type: text/plain, Size: 4059 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
emulation. it accepts the same parameters like
/sys/devices/system/memory/probe.
Document the interface usage to file Documentation/memory-hotplug.txt.
CC: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
--
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c 2010-11-30 14:15:23.587622002 +0800
+++ linux-hpe4/mm/memory_hotplug.c 2010-11-30 14:16:45.447622001 +0800
@@ -983,4 +983,35 @@
}
module_init(node_debug_init);
+
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t debug_memory_probe_store(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ return parse_memory_probe_store(buf, count);
+}
+
+static const struct file_operations memory_probe_file_ops = {
+ .write = debug_memory_probe_store,
+ .llseek = generic_file_llseek,
+};
+
+static int __init memory_debug_init(void)
+{
+ if (!memhp_debug_root)
+ memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+ if (!memhp_debug_root)
+ return -ENOMEM;
+
+ if (!debugfs_create_file("probe", S_IWUSR, memhp_debug_root,
+ NULL, &memory_probe_file_ops))
+ return -ENOMEM;
+
+ return 0;
+}
+
+module_init(memory_debug_init);
+
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
#endif /* CONFIG_DEBUG_FS */
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt 2010-11-30 14:15:23.587622002 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt 2010-11-30 14:40:27.267622000 +0800
@@ -198,23 +198,41 @@
In some environments, especially virtualized environment, firmware will not
notify memory hotplug event to the kernel. For such environment, "probe"
interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
+It can be also used for physical memory hotplug emulation.
-Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
-contain highly architecture codes. Please add config if you need "probe"
+Now, CONFIG_ARCH_MEMORY_PROBE is supported by powerpc and x86_64, but it does
+not contain highly architecture codes. Please add config if you need "probe"
interface.
-Probe interface is located at
-/sys/devices/system/memory/probe
+We have both sysfs and debugfs interface for memory probe. They are located at
+/sys/devices/system/memory/probe (sysfs) and /sys/kernel/debug/mem_hotplug/probe
+(debugfs), We can try any of them, they accpet the same parameters.
You can tell the physical address of new memory to the kernel by
-% echo start_address_of_new_memory > /sys/devices/system/memory/probe
+% echo start_address_of_new_memory > memory/probe
Then, [start_address_of_new_memory, start_address_of_new_memory + section_size)
memory range is hot-added. In this case, hotplug script is not called (in
current implementation). You'll have to online memory by yourself.
Please see "How to online memory" in this text.
+The probe interface can accept flexible parameters, for example:
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+ echo 0x40000000,3 > memory/probe
+
+And more we make it friendly, it is possible to add memory to do
+
+ echo 3g > memory/probe
+ echo 1024m,3 > memory/probe
+
+Another format suggested by Dave Hansen:
+
+ echo physical_address=0x40000000 numa_node=3 > memory/probe
+
+You can also use mem_hotplug/probe(debugfs) interface in the above examples.
4.3 Node hotplug emulation
------------
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [1/8, v6] NUMA Hotplug Emulator: documentation
2010-11-30 7:13 ` [1/8, v6] NUMA Hotplug Emulator: documentation shaohui.zheng
@ 2010-12-01 0:19 ` David Rientjes
2010-12-01 0:36 ` Shaohui Zheng
0 siblings, 1 reply; 22+ messages in thread
From: David Rientjes @ 2010-12-01 0:19 UTC (permalink / raw)
To: Shaohui Zheng
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li
On Tue, 30 Nov 2010, shaohui.zheng@intel.com wrote:
> From: Shaohui Zheng <shaohui.zheng@intel.com>
>
> add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
> to explain the usage for the hotplug emulator.
>
> Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [1/8, v6] NUMA Hotplug Emulator: documentation
2010-12-01 0:19 ` David Rientjes
@ 2010-12-01 0:36 ` Shaohui Zheng
0 siblings, 0 replies; 22+ messages in thread
From: Shaohui Zheng @ 2010-12-01 0:36 UTC (permalink / raw)
To: David Rientjes
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li
On Tue, Nov 30, 2010 at 04:19:00PM -0800, David Rientjes wrote:
> Signed-off-by: David Rientjes <rientjes@google.com>
Resend this patch after adding David's sign-off.
Subject: NUMA Hotplug Emulator: documentation
From: Shaohui Zheng <shaohui.zheng@intel.com>
add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.
Reviewed-By: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt 2010-11-30 09:48:52.257622002 +0800
@@ -0,0 +1,104 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug. This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+ mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+ # echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
+ On node 1 totalpages: 0
+ init_memory_mapping: 0000000080000000-0000000088000000
+ 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined. If this
+memory represents memory section 16, for example:
+
+ # echo online > /sys/devices/system/memory/memory16/state
+ Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
+ Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+ /sys/devices/system/node/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the CPU
+socket hotplug process. For the CPU supported SMT, some logical CPUs are in the
+same socket, but it may located in different NUMA node after we have emulator.
+We put the logical CPU into a fake CPU socket, and assign it a unique
+phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+ - Using boot option "maxcpus=N" hide CPUs
+ N is the number of CPUs to initialize; the reset will be hidden.
+ - Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+ when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+ $ echo nid > cpu/probe
+
+ - to hot-remove CPU
+ $ echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table, and they can be hot-added via the probe interface.
+this interface was extended to support adding memory to the specified node. It
+maintains backwards compatibility.
+
+The difficulty of Memory Release is well-known, we have no plan for it until now.
+
+ - reserve memory thru a kernel boot paramter
+ mem=1024m
+
+ - add a memory section to node 3
+ $ echo 0x40000000,3 > memory/probe
+ OR
+ $ echo 1024m,3 > memory/probe
+ OR
+ $ echo "physical_address=0x40000000 numa_node=3" > memory/probe
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+ echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+ echo 1 > $c/online;
+done
+
+- David Rientjes <rientjes@google.com>
+- Haicheng Li <haicheng.li@intel.com>
+- Shaohui Zheng <shaohui.zheng@intel.com>
+ Nov 2010
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-12-02 0:57 ` David Rientjes
@ 2010-12-01 23:45 ` Shaohui Zheng
2010-12-02 1:21 ` David Rientjes
0 siblings, 1 reply; 22+ messages in thread
From: Shaohui Zheng @ 2010-12-01 23:45 UTC (permalink / raw)
To: David Rientjes
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li
On Wed, Dec 01, 2010 at 04:57:35PM -0800, David Rientjes wrote:
> On Tue, 30 Nov 2010, shaohui.zheng@intel.com wrote:
>
> > From: Shaohui Zheng <shaohui.zheng@intel.com>
> >
> > Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
> > emulation. it accepts the same parameters like
> > /sys/devices/system/memory/probe.
> >
>
> NACK, we don't need two interfaces to do the same thing.
You may not know the background, the sysfs memory/probe interface is a general
interface. Even through we have a debugfs interface, we should still keep it.
For test purpose, the sysfs is enough, according to the comments from Greg & Dave,
we create the debugfs interface.
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option
2010-12-02 1:06 ` David Rientjes
@ 2010-12-01 23:48 ` Shaohui Zheng
0 siblings, 0 replies; 22+ messages in thread
From: Shaohui Zheng @ 2010-12-01 23:48 UTC (permalink / raw)
To: David Rientjes
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li
On Wed, Dec 01, 2010 at 05:06:02PM -0800, David Rientjes wrote:
> On Tue, 30 Nov 2010, shaohui.zheng@intel.com wrote:
>
> > From: David Rientjes <rientjes@google.com>
> >
> > Adds a numa=possible=<N> command line option to set an additional N nodes
> > as being possible for memory hotplug. This set of possible nodes
> > controls nr_node_ids and the sizes of several dynamically allocated node
> > arrays.
> >
> > This allows memory hotplug to create new nodes for newly added memory
> > rather than binding it to existing nodes.
> >
> > The first use-case for this will be node hotplug emulation which will use
> > these possible nodes to create new nodes to test the memory hotplug
> > callbacks and surrounding memory hotplug code.
> >
> > CC: Shaohui Zheng <shaohui.zheng@intel.com>
> > CC: Haicheng Li <haicheng.li@intel.com>
> > Signed-off-by: David Rientjes <rientjes@google.com>
>
> You're going to need to add your Signed-off-by line immediately after mine
> if you're pushing these to a maintainer, you're along the submission
> chain.
I did not add my name as Signed-off-by since you are the patch author, I will
add it, thanks David.
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
[not found] <A24AE1FFE7AEC5489F83450EE98351BF288D88D224@shsmsx502.ccr.corp.intel.com>
@ 2010-12-02 0:27 ` Shaohui Zheng
2010-12-02 2:13 ` David Rientjes
0 siblings, 1 reply; 22+ messages in thread
From: Shaohui Zheng @ 2010-12-02 0:27 UTC (permalink / raw)
To: shaohui.zheng@linux.intel.com, David Rientjes
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak, dave,
gregkh, Haicheng Li
>
> I doubt either Greg or Dave suggested adding duplicate interfaces for the
> same functionality.
>
> The difference is that we needed to add the add_node interface in a new
> mem_hotplug debugfs directory because it's only useful for debugging
> kernel code and, thus, doesn't really have an appropriate place in sysfs.
> Nobody is going to use add_node unless they lack hotpluggable memory
> sections in their SRAT and want to debug the memory hotplug callers. For
> example, I already wrote all of this node hotplug emulation stuff when I
> wrote the node hotplug support for SLAB.
>
> Memory hotplug, however, does serve a non-debugging function and is
> appropriate in sysfs since this is how people hotplug memory. It's an ABI
> that we can't simply remove without deprecation over a substantial period
> of time and in this case it doesn't seem to have a clear advantage. We
> need not add special emulation support for something that is already
> possible for real systems, so adding a duplicate interface in debugfs is
> inappropriate.
so we should still keep the sysfs memory/probe interface without any modifications,
but for the debugfs mem_hotplug/probe interface, we can add the memory region
to a desired node. It is an extention for the sysfs memory/probe interface, it can
be used for memory hotplug emulation. Do I understand it correctly?
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [7/8, v6] NUMA Hotplug Emulator: extend memory probe interface to support NUMA
2010-11-30 7:13 ` [7/8, v6] NUMA Hotplug Emulator: extend memory probe interface to support NUMA shaohui.zheng
@ 2010-12-02 0:55 ` David Rientjes
0 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-12-02 0:55 UTC (permalink / raw)
To: Shaohui Zheng
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li, Wu Fengguang
On Tue, 30 Nov 2010, shaohui.zheng@intel.com wrote:
> From: Shaohui Zheng <shaohui.zheng@intel.com>
>
> Extend memory probe interface to support an extra paramter nid,
> the reserved memory can be added into this node if node exists.
>
> Add a memory section(128M) to node 3(boots with mem=1024m)
>
> echo 0x40000000,3 > memory/probe
>
> And more we make it friendly, it is possible to add memory to do
>
> echo 3g > memory/probe
> echo 1024m,3 > memory/probe
>
> It maintains backwards compatibility.
>
> Another format suggested by Dave Hansen:
>
> echo physical_address=0x40000000 numa_node=3 > memory/probe
>
> it is more explicit to show meaning of the parameters.
>
I don't like this interface, I think it would be much better to map the
memory region to the desired node id prior to using probe as an extention
to debugfs.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-11-30 7:13 ` [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe shaohui.zheng
@ 2010-12-02 0:57 ` David Rientjes
2010-12-01 23:45 ` Shaohui Zheng
0 siblings, 1 reply; 22+ messages in thread
From: David Rientjes @ 2010-12-02 0:57 UTC (permalink / raw)
To: Shaohui Zheng
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li
On Tue, 30 Nov 2010, shaohui.zheng@intel.com wrote:
> From: Shaohui Zheng <shaohui.zheng@intel.com>
>
> Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
> emulation. it accepts the same parameters like
> /sys/devices/system/memory/probe.
>
NACK, we don't need two interfaces to do the same thing.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option
2010-11-30 7:13 ` [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option shaohui.zheng
@ 2010-12-02 1:06 ` David Rientjes
2010-12-01 23:48 ` Shaohui Zheng
0 siblings, 1 reply; 22+ messages in thread
From: David Rientjes @ 2010-12-02 1:06 UTC (permalink / raw)
To: Shaohui Zheng
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak,
shaohui.zheng, dave, gregkh, Haicheng Li
On Tue, 30 Nov 2010, shaohui.zheng@intel.com wrote:
> From: David Rientjes <rientjes@google.com>
>
> Adds a numa=possible=<N> command line option to set an additional N nodes
> as being possible for memory hotplug. This set of possible nodes
> controls nr_node_ids and the sizes of several dynamically allocated node
> arrays.
>
> This allows memory hotplug to create new nodes for newly added memory
> rather than binding it to existing nodes.
>
> The first use-case for this will be node hotplug emulation which will use
> these possible nodes to create new nodes to test the memory hotplug
> callbacks and surrounding memory hotplug code.
>
> CC: Shaohui Zheng <shaohui.zheng@intel.com>
> CC: Haicheng Li <haicheng.li@intel.com>
> Signed-off-by: David Rientjes <rientjes@google.com>
You're going to need to add your Signed-off-by line immediately after mine
if you're pushing these to a maintainer, you're along the submission
chain.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-12-01 23:45 ` Shaohui Zheng
@ 2010-12-02 1:21 ` David Rientjes
0 siblings, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-12-02 1:21 UTC (permalink / raw)
To: Shaohui Zheng
Cc: Andrew Morton, linux-mm, linux-kernel, lethal, Andi Kleen, dave,
Greg KH, Haicheng Li
On Thu, 2 Dec 2010, Shaohui Zheng wrote:
> > > From: Shaohui Zheng <shaohui.zheng@intel.com>
> > >
> > > Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
> > > emulation. it accepts the same parameters like
> > > /sys/devices/system/memory/probe.
> > >
> >
> > NACK, we don't need two interfaces to do the same thing.
>
> You may not know the background, the sysfs memory/probe interface is a general
> interface. Even through we have a debugfs interface, we should still keep it.
>
> For test purpose, the sysfs is enough, according to the comments from Greg & Dave,
> we create the debugfs interface.
>
I doubt either Greg or Dave suggested adding duplicate interfaces for the
same functionality.
The difference is that we needed to add the add_node interface in a new
mem_hotplug debugfs directory because it's only useful for debugging
kernel code and, thus, doesn't really have an appropriate place in sysfs.
Nobody is going to use add_node unless they lack hotpluggable memory
sections in their SRAT and want to debug the memory hotplug callers. For
example, I already wrote all of this node hotplug emulation stuff when I
wrote the node hotplug support for SLAB.
Memory hotplug, however, does serve a non-debugging function and is
appropriate in sysfs since this is how people hotplug memory. It's an ABI
that we can't simply remove without deprecation over a substantial period
of time and in this case it doesn't seem to have a clear advantage. We
need not add special emulation support for something that is already
possible for real systems, so adding a duplicate interface in debugfs is
inappropriate.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-12-02 0:27 ` Shaohui Zheng
@ 2010-12-02 2:13 ` David Rientjes
2010-12-02 2:35 ` Zheng, Shaohui
0 siblings, 1 reply; 22+ messages in thread
From: David Rientjes @ 2010-12-02 2:13 UTC (permalink / raw)
To: Shaohui Zheng
Cc: Andrew Morton, linux-mm, linux-kernel, lethal, Andi Kleen,
Dave Hansen, Greg KH, Haicheng Li
On Thu, 2 Dec 2010, Shaohui Zheng wrote:
> so we should still keep the sysfs memory/probe interface without any modifications,
> but for the debugfs mem_hotplug/probe interface, we can add the memory region
> to a desired node.
This feature would be distinct from the add_node interface already
provided: instead of hotplugging a new node to test the memory hotplug
callbacks, this new interface would only be hotadding new memory to a node
other than the one it has physical affinity with. For that support, I'd
suggest new probe files in debugfs for each online node:
/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/node0/add_memory
/sys/kernel/debug/mem_hotplug/node1/add_memory
...
and then you can offline and remove that memory with the existing hotplug
support (CONFIG_MEMORY_HOTPLUG and CONFIG_MEMORY_HOTREMOVE, respectively).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-12-02 2:13 ` David Rientjes
@ 2010-12-02 2:35 ` Zheng, Shaohui
2010-12-02 23:34 ` David Rientjes
0 siblings, 1 reply; 22+ messages in thread
From: Zheng, Shaohui @ 2010-12-02 2:35 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
lethal@linux-sh.org, Andi Kleen, Dave Hansen, Greg KH,
Li, Haicheng, shaohui.zheng@linux.intel.com
Why should we add so many interfaces for memory hotplug emulation? If so, we should create both sysfs and debugfs
entries for an online node, we are trying to add redundant code logic.
We need not make a simple thing such complicated, Simple is beautiful, I'd prefer to rename the mem_hotplug/probe
interface as mem_hotplug/add_memory.
/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)
Thanks & Regards,
Shaohui
-----Original Message-----
From: David Rientjes [mailto:rientjes@google.com]
Sent: Thursday, December 02, 2010 10:13 AM
To: Zheng, Shaohui
Cc: Andrew Morton; linux-mm@kvack.org; linux-kernel@vger.kernel.org; lethal@linux-sh.org; Andi Kleen; Dave Hansen; Greg KH; Li, Haicheng
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
On Thu, 2 Dec 2010, Shaohui Zheng wrote:
> so we should still keep the sysfs memory/probe interface without any modifications,
> but for the debugfs mem_hotplug/probe interface, we can add the memory region
> to a desired node.
This feature would be distinct from the add_node interface already
provided: instead of hotplugging a new node to test the memory hotplug
callbacks, this new interface would only be hotadding new memory to a node
other than the one it has physical affinity with. For that support, I'd
suggest new probe files in debugfs for each online node:
/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/node0/add_memory
/sys/kernel/debug/mem_hotplug/node1/add_memory
...
and then you can offline and remove that memory with the existing hotplug
support (CONFIG_MEMORY_HOTPLUG and CONFIG_MEMORY_HOTREMOVE, respectively).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-12-02 2:35 ` Zheng, Shaohui
@ 2010-12-02 23:34 ` David Rientjes
2010-12-06 1:22 ` Zheng, Shaohui
0 siblings, 1 reply; 22+ messages in thread
From: David Rientjes @ 2010-12-02 23:34 UTC (permalink / raw)
To: Zheng, Shaohui
Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
lethal@linux-sh.org, Andi Kleen, Dave Hansen, Greg KH,
Li, Haicheng
On Thu, 2 Dec 2010, Zheng, Shaohui wrote:
> Why should we add so many interfaces for memory hotplug emulation?
Because they are functionally different from real memory hotplug and we
want to support different configurations such as mapping memory to a
different node id or onlining physical nodes that don't exist.
They are in debugfs because the emulation, unlike real memory hotplug, is
used only for testing and debugging.
> If so, we should create both sysfs and debugfs
> entries for an online node, we are trying to add redundant code logic.
>
We do not need sysfs triggers for onlining a node, that already happens
automatically if the memory that is being onlined has a hotpluggable node
entry in the SRAT that has an offline node id.
> We need not make a simple thing such complicated, Simple is beautiful, I'd prefer to rename the mem_hotplug/probe
> interface as mem_hotplug/add_memory.
>
> /sys/kernel/debug/mem_hotplug/add_node (already exists)
> /sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)
>
No, add_memory would then require these bizarre lines that you've been
parsing like
echo 'physical_addr=0x80000000 node_id=3' > /sys/kernel/debug/mem_hotplug/add_memory
which is unnecessary if you introduce my proposal for per-node debugfs
directories similar to that under /sys/devices/system/node that is
extendable later if we add additional per-node triggers under
CONFIG_DEBUG_FS.
Adding /sys/kernel/debug/mem_hotplug/node2/add_memory that you write a
physical address to is a much more robust, simple, and extendable
interface.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
2010-12-02 23:34 ` David Rientjes
@ 2010-12-06 1:22 ` Zheng, Shaohui
0 siblings, 0 replies; 22+ messages in thread
From: Zheng, Shaohui @ 2010-12-06 1:22 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
lethal@linux-sh.org, Andi Kleen, Dave Hansen, Greg KH,
Li, Haicheng, shaohui.zheng@linux.intel.com
After introduce the per-node interface, the following directive can be avoided.
echo '0x80000000,3' > /sys/kernel/debug/mem_hotplug/add_memory
echo 'physical_addr=0x80000000 node_id=3' > /sys/kernel/debug/mem_hotplug/add_memory
I already implemented a draft in another thread, and waiting for comments, thanks for the proposal.
Thanks & Regards,
Shaohui
-----Original Message-----
From: David Rientjes [mailto:rientjes@google.com]
Sent: Friday, December 03, 2010 7:34 AM
To: Zheng, Shaohui
Cc: Andrew Morton; linux-mm@kvack.org; linux-kernel@vger.kernel.org; lethal@linux-sh.org; Andi Kleen; Dave Hansen; Greg KH; Li, Haicheng
Subject: RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe
On Thu, 2 Dec 2010, Zheng, Shaohui wrote:
> Why should we add so many interfaces for memory hotplug emulation?
Because they are functionally different from real memory hotplug and we
want to support different configurations such as mapping memory to a
different node id or onlining physical nodes that don't exist.
They are in debugfs because the emulation, unlike real memory hotplug, is
used only for testing and debugging.
> If so, we should create both sysfs and debugfs
> entries for an online node, we are trying to add redundant code logic.
>
We do not need sysfs triggers for onlining a node, that already happens
automatically if the memory that is being onlined has a hotpluggable node
entry in the SRAT that has an offline node id.
> We need not make a simple thing such complicated, Simple is beautiful, I'd prefer to rename the mem_hotplug/probe
> interface as mem_hotplug/add_memory.
>
> /sys/kernel/debug/mem_hotplug/add_node (already exists)
> /sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)
>
No, add_memory would then require these bizarre lines that you've been
parsing like
echo 'physical_addr=0x80000000 node_id=3' > /sys/kernel/debug/mem_hotplug/add_memory
which is unnecessary if you introduce my proposal for per-node debugfs
directories similar to that under /sys/devices/system/node that is
extendable later if we add additional per-node triggers under
CONFIG_DEBUG_FS.
Adding /sys/kernel/debug/mem_hotplug/node2/add_memory that you write a
physical address to is a much more robust, simple, and extendable
interface.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-12-06 1:23 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-30 7:13 [0/8, v6] NUMA Hotplug Emulator(v6) - Introduction & Feedbacks shaohui.zheng
2010-11-30 7:13 ` [1/8, v6] NUMA Hotplug Emulator: documentation shaohui.zheng
2010-12-01 0:19 ` David Rientjes
2010-12-01 0:36 ` Shaohui Zheng
2010-11-30 7:13 ` [2/8, v6] NUMA Hotplug Emulator: Add numa=possible option shaohui.zheng
2010-12-02 1:06 ` David Rientjes
2010-12-01 23:48 ` Shaohui Zheng
2010-11-30 7:13 ` [3/8, v6] NUMA Hotplug Emulator: Add node hotplug emulation shaohui.zheng
2010-11-30 7:13 ` [4/8, v6] NUMA Hotplug Emulation: Abstract cpu register functions shaohui.zheng
2010-11-30 7:13 ` [5/8, v6] NUMA Hotplug Emulator: support cpu probe/release in x86_64 shaohui.zheng
2010-11-30 7:13 ` [6/8, v6] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86 shaohui.zheng
2010-11-30 7:13 ` [7/8, v6] NUMA Hotplug Emulator: extend memory probe interface to support NUMA shaohui.zheng
2010-12-02 0:55 ` David Rientjes
2010-11-30 7:13 ` [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe shaohui.zheng
2010-12-02 0:57 ` David Rientjes
2010-12-01 23:45 ` Shaohui Zheng
2010-12-02 1:21 ` David Rientjes
[not found] <A24AE1FFE7AEC5489F83450EE98351BF288D88D224@shsmsx502.ccr.corp.intel.com>
2010-12-02 0:27 ` Shaohui Zheng
2010-12-02 2:13 ` David Rientjes
2010-12-02 2:35 ` Zheng, Shaohui
2010-12-02 23:34 ` David Rientjes
2010-12-06 1:22 ` Zheng, Shaohui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).