* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
[not found] <A24AE1FFE7AEC5489F83450EE98351BF2A40FED20A@shsmsx502.ccr.corp.intel.com>
@ 2010-12-09 1:21 ` Shaohui Zheng
2010-12-09 21:29 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: Shaohui Zheng @ 2010-12-09 1:21 UTC (permalink / raw)
To: rientjes
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak, gregkh,
shaohui.zheng, shaohui.zheng
>
> > From: Shaohui Zheng <shaohui.zheng@intel.com>
> >
> > Add add_memory interface to support to memory hotplug emulation for each online
> > node under debugfs. The reserved memory can be added into desired node with
> > this interface.
> >
> > The layout on debugfs:
> > mem_hotplug/node0/add_memory
> > mem_hotplug/node1/add_memory
> > mem_hotplug/node2/add_memory
> > ...
> >
> > Add a memory section(128M) to node 3(boots with mem=1024m)
> >
> > echo 0x40000000 > mem_hotplug/node3/add_memory
> >
> > And more we make it friendly, it is possible to add memory to do
> >
> > echo 1024m > mem_hotplug/node3/add_memory
> >
>
> I don't think you should be using memparse() to support this type of
> interface, the standard way of writing memory locations is by writing
> address in hex as the first example does. The idea is to not try to make
> things simpler by introducing multiple ways of doing the same thing but
> rather to standardize on a single interface.
Undoubtedly, A hex is the best way to represent a physical address. If we use
memparse function, we can use the much simpler way to represent an address,
it is not the offical way, but it takes many conveniences if we just want to
to some simple test.
When we reserce memory, we use mempasre to parse the mem=XXX parameter, we can
avoid the complicated translation when we add memory thru the add_memory interface,
how about still use the memparse here? but remove it from the document since it is
just for some simple testing.
>
> > CC: David Rientjes <rientjes@google.com>
> > CC: Dave Hansen <dave@linux.vnet.ibm.com>
> > Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> > Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> > ---
> > Index: linux-hpe4/mm/memory_hotplug.c
> > ===================================================================
> > --- linux-hpe4.orig/mm/memory_hotplug.c 2010-12-02 12:35:31.557622002 +0800
> > +++ linux-hpe4/mm/memory_hotplug.c 2010-12-06 07:30:36.067622001 +0800
> > @@ -930,6 +930,80 @@
> >
> > static struct dentry *memhp_debug_root;
> >
> > +#ifdef CONFIG_ARCH_MEMORY_PROBE
> > +
> > +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> > + size_t count, loff_t *ppos)
> > +{
> > + u64 phys_addr = 0;
> > + int nid = file->private_data - NULL;
> > + int ret;
> > +
> > + phys_addr = simple_strtoull(buf, NULL, 0);
>
> This isn't doing anything.
>
Should be removed
> > + printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> > + phys_addr = memparse(buf, NULL);
> > + ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
>
> Does the add_memory() call handle memoryless nodes such that they
> appropriately transition to N_HIGH_MEMORY when memory is added?
For memoryless nodes, it will cause OOM issue on old kernel version, but now
memoryless node is already supported, and the test result matches it well. The
emulator is a tool to reproduce the OOM issue in eraly kernel.
>
> > +
> > + if (ret)
> > + count = ret;
> > +
> > + return count;
> > +}
> > +
> > +static int add_memory_open(struct inode *inode, struct file *file)
> > +{
> > + file->private_data = inode->i_private;
> > + return 0;
> > +}
> > +
> > +static const struct file_operations add_memory_file_ops = {
> > + .open = add_memory_open,
> > + .write = add_memory_store,
> > + .llseek = generic_file_llseek,
> > +};
> > +
> > +/*
> > + * Create add_memory debugfs entry under specified node
> > + */
> > +static int debugfs_create_add_memory_entry(int nid)
> > +{
> > + char buf[32];
> > + static struct dentry *node_debug_root;
> > +
> > + snprintf(buf, sizeof(buf), "node%d", nid);
> > + node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
>
> This can fail, and if it does then the subsequent debugfs_create_file()
> will be added to root while we don't want, so this needs error handling.
>
I will add error handling code for it.
> > +
> > + /* the nid information was represented by the offset of pointer(NULL+nid) */
> > + if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> > + NULL + nid, &add_memory_file_ops))
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +static int __init memory_debug_init(void)
> > +{
> > + int nid;
> > +
> > + if (!memhp_debug_root)
> > + memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> > + if (!memhp_debug_root)
> > + return -ENOMEM;
> > +
> > + for_each_online_node(nid)
> > + debugfs_create_add_memory_entry(nid);
> > +
> > + return 0;
> > +}
> > +
> > +module_init(memory_debug_init);
> > +#else
> > +static debugfs_create_add_memory_entry(int nid)
> > +{
> > + return 0;
> > +}
> > +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> > +
> > static ssize_t add_node_store(struct file *file, const char __user *buf,
> > size_t count, loff_t *ppos)
> > {
> > @@ -960,6 +1034,8 @@
> > return -ENOMEM;
> >
> > ret = add_memory(nid, start, size);
> > +
> > + debugfs_create_add_memory_entry(nid);
> > return ret ? ret : count;
> > }
> >
> > Index: linux-hpe4/Documentation/memory-hotplug.txt
> > ===================================================================
> > --- linux-hpe4.orig/Documentation/memory-hotplug.txt 2010-12-02 12:35:31.557622002 +0800
> > +++ linux-hpe4/Documentation/memory-hotplug.txt 2010-12-06 07:39:36.007622000 +0800
> > @@ -19,6 +19,7 @@
> > 4.1 Hardware(Firmware) Support
> > 4.2 Notify memory hot-add event by hand
> > 4.3 Node hotplug emulation
> > + 4.4 Memory hotplug emulation
> > 5. Logical Memory hot-add phase
> > 5.1. State of memory
> > 5.2. How to online memory
> > @@ -239,6 +240,29 @@
> > Once the new node has been added, it is possible to online the memory by
> > toggling the "state" of its memory section(s) as described in section 5.1.
> >
> > +4.4 Memory hotplug emulation
> > +------------
> > +With debugfs, it is possible to test memory hotplug with software method, we
> > +can add memory section to desired node with add_memory interface. It is a much
> > +more powerful interface than "probe" described in section 4.2.
> > +
> > +There is an add_memory interface for each online node at the debugfs mount
> > +point.
> > + mem_hotplug/node0/add_memory
> > + mem_hotplug/node1/add_memory
> > + mem_hotplug/node2/add_memory
> > + ...
> > +
> > +Add a memory section(128M) to node 3(boots with mem=1024m)
> > +
> > + echo 0x40000000 > mem_hotplug/node3/add_memory
> > +
> > +And more we make it friendly, it is possible to add memory to do
> > +
> > + echo 1024m > mem_hotplug/node3/add_memory
> > +
> > +Once the new memory section has been added, it is possible to online the memory
> > +by toggling the "state" described in section 5.1.
> >
> > ------------------------------
> > 5. Logical Memory hot-add phase
> >
> > --
> > Thanks & Regards,
> > Shaohui
> >
> >
> >
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-09 1:21 ` [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface Shaohui Zheng
@ 2010-12-09 21:29 ` David Rientjes
2010-12-09 23:57 ` Shaohui Zheng
0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2010-12-09 21:29 UTC (permalink / raw)
To: Shaohui Zheng
Cc: akpm, linux-mm, linux-kernel, haicheng.li, lethal, ak, gregkh,
shaohui.zheng
On Thu, 9 Dec 2010, Shaohui Zheng wrote:
> > I don't think you should be using memparse() to support this type of
> > interface, the standard way of writing memory locations is by writing
> > address in hex as the first example does. The idea is to not try to make
> > things simpler by introducing multiple ways of doing the same thing but
> > rather to standardize on a single interface.
>
> Undoubtedly, A hex is the best way to represent a physical address. If we use
> memparse function, we can use the much simpler way to represent an address,
> it is not the offical way, but it takes many conveniences if we just want to
> to some simple test.
>
Testing code should be removed from the patch prior to proposal.
> When we reserce memory, we use mempasre to parse the mem=XXX parameter, we can
> avoid the complicated translation when we add memory thru the add_memory interface,
> how about still use the memparse here? but remove it from the document since it is
> just for some simple testing.
>
We really don't want a public interface to have undocumented behavior, so
it would be much better to retain the documentation if you choose to keep
the memparse(). I disagree that converting the mem= parameter to hex is
"complicated," however, so I'd prefer that the interface is similar to
that of add_node.
> > > + printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> > > + phys_addr = memparse(buf, NULL);
> > > + ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
> >
> > Does the add_memory() call handle memoryless nodes such that they
> > appropriately transition to N_HIGH_MEMORY when memory is added?
>
> For memoryless nodes, it will cause OOM issue on old kernel version, but now
> memoryless node is already supported, and the test result matches it well. The
> emulator is a tool to reproduce the OOM issue in eraly kernel.
>
That doesn't address the question. My question is whether or not adding
memory to a memoryless node in this way transitions its state to
N_HIGH_MEMORY in the VM?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-09 21:29 ` David Rientjes
@ 2010-12-09 23:57 ` Shaohui Zheng
2010-12-10 23:30 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: Shaohui Zheng @ 2010-12-09 23:57 UTC (permalink / raw)
To: David Rientjes
Cc: Shaohui Zheng, akpm, linux-mm, linux-kernel, haicheng.li, lethal,
ak, gregkh
On Thu, Dec 09, 2010 at 01:29:28PM -0800, David Rientjes wrote:
> On Thu, 9 Dec 2010, Shaohui Zheng wrote:
>
> > > I don't think you should be using memparse() to support this type of
> > > interface, the standard way of writing memory locations is by writing
> > > address in hex as the first example does. The idea is to not try to make
> > > things simpler by introducing multiple ways of doing the same thing but
> > > rather to standardize on a single interface.
> >
> > Undoubtedly, A hex is the best way to represent a physical address. If we use
> > memparse function, we can use the much simpler way to represent an address,
> > it is not the offical way, but it takes many conveniences if we just want to
> > to some simple test.
> >
>
> Testing code should be removed from the patch prior to proposal.
>
> > When we reserce memory, we use mempasre to parse the mem=XXX parameter, we can
> > avoid the complicated translation when we add memory thru the add_memory interface,
> > how about still use the memparse here? but remove it from the document since it is
> > just for some simple testing.
> >
>
> We really don't want a public interface to have undocumented behavior, so
> it would be much better to retain the documentation if you choose to keep
> the memparse(). I disagree that converting the mem= parameter to hex is
> "complicated," however, so I'd prefer that the interface is similar to
> that of add_node.
>
Okay, I will keep interface to accept hex address which is simliar wiht add_node.
> > > > + printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> > > > + phys_addr = memparse(buf, NULL);
> > > > + ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
> > >
> > > Does the add_memory() call handle memoryless nodes such that they
> > > appropriately transition to N_HIGH_MEMORY when memory is added?
> >
> > For memoryless nodes, it will cause OOM issue on old kernel version, but now
> > memoryless node is already supported, and the test result matches it well. The
> > emulator is a tool to reproduce the OOM issue in eraly kernel.
> >
>
> That doesn't address the question. My question is whether or not adding
> memory to a memoryless node in this way transitions its state to
> N_HIGH_MEMORY in the VM?
I guess that you are talking about memory hotplug on x86_32, memory hotplug is
NOT supported well for x86_32, and the function add_memory does not consider
this situlation.
For 64bit, N_HIGH_MEMORY == N_NORMAL_MEMORY, so we need not to do the transition.
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-09 23:57 ` Shaohui Zheng
@ 2010-12-10 23:30 ` David Rientjes
2010-12-13 2:09 ` Shaohui Zheng
0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2010-12-10 23:30 UTC (permalink / raw)
To: Shaohui Zheng
Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
Andi Kleen, Greg Kroah-Hartman
On Fri, 10 Dec 2010, Shaohui Zheng wrote:
> > That doesn't address the question. My question is whether or not adding
> > memory to a memoryless node in this way transitions its state to
> > N_HIGH_MEMORY in the VM?
> I guess that you are talking about memory hotplug on x86_32, memory hotplug is
> NOT supported well for x86_32, and the function add_memory does not consider
> this situlation.
>
> For 64bit, N_HIGH_MEMORY == N_NORMAL_MEMORY, so we need not to do the transition.
>
One more time :) Memoryless nodes do not have their bit set in
N_HIGH_MEMORY. When memory is added to a memoryless node with this new
interface, does the bit get set?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-10 23:30 ` David Rientjes
@ 2010-12-13 2:09 ` Shaohui Zheng
2010-12-13 20:56 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: Shaohui Zheng @ 2010-12-13 2:09 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
Andi Kleen, Greg Kroah-Hartman
On Fri, Dec 10, 2010 at 03:30:38PM -0800, David Rientjes wrote:
> On Fri, 10 Dec 2010, Shaohui Zheng wrote:
>
> > > That doesn't address the question. My question is whether or not adding
> > > memory to a memoryless node in this way transitions its state to
> > > N_HIGH_MEMORY in the VM?
> > I guess that you are talking about memory hotplug on x86_32, memory hotplug is
> > NOT supported well for x86_32, and the function add_memory does not consider
> > this situlation.
> >
> > For 64bit, N_HIGH_MEMORY == N_NORMAL_MEMORY, so we need not to do the transition.
> >
>
> One more time :) Memoryless nodes do not have their bit set in
> N_HIGH_MEMORY. When memory is added to a memoryless node with this new
> interface, does the bit get set?
When we use debugfs add_node interface to add a fake node, the node was created,
and memory sections were created, but the state of the memory section is still
__offline__, so the new added node is still memoryless node. the result of debugfs
add_memory interface doing the similar thing with add_node, it just add memory
to an exists node.
For the state transition to N_HIGH_MEMORY, it does not happen on the above too
interfaces. It happens when the memory was onlined with sysfs /sys/device/system/memory/memoryXX/online
interface.
That is the code path:
store_mem_state
->memory_block_change_state
->memory_block_action
->online_pages
if (onlined_pages) {
kswapd_run(zone_to_nid(zone));
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
}
does it address your question? thanks.
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-13 2:09 ` Shaohui Zheng
@ 2010-12-13 20:56 ` David Rientjes
0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-12-13 20:56 UTC (permalink / raw)
To: Shaohui Zheng
Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
Andi Kleen, Greg Kroah-Hartman
On Mon, 13 Dec 2010, Shaohui Zheng wrote:
> For the state transition to N_HIGH_MEMORY, it does not happen on the above too
> interfaces. It happens when the memory was onlined with sysfs /sys/device/system/memory/memoryXX/online
> interface.
>
> That is the code path:
> store_mem_state
> ->memory_block_change_state
> ->memory_block_action
> ->online_pages
>
> if (onlined_pages) {
> kswapd_run(zone_to_nid(zone));
> node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
> }
>
> does it address your question? thanks.
>
Ok, thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [0/7,v8] NUMA Hotplug Emulator (v8)
@ 2010-12-07 1:00 shaohui.zheng
2010-12-07 1:00 ` [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface shaohui.zheng
0 siblings, 1 reply; 8+ messages in thread
From: shaohui.zheng @ 2010-12-07 1:00 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh
* PATCHSET INTRODUCTION
patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.
patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface
patch 4: Abstract cpu register functions, make these interface friend for cpu
hotplug emulation
patch 5: Support cpu probe/release in x86, it provide a software method to hot
add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
domain to build the incorrect hierarchy.
patch 7: Implement per-node add_memory debugfs interface
* FEEDBACKDS & RESPONSES
v8:
Reconsider David's proposal, accept the per-node add_memory interface on debugfs.
(patch 7).
v7:
David: We don't need two different interfaces, one in sysfs and one in debugfs,
to hotplug memory.
Response: We use the debugfs for memory hotplug emulation only, for sysfs memory probe
interface, we did not do any modifications, so we remove original patch 7
from patchset.
David: Suggest new probe files in debugfs for each online node:
/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/node0/add_memory
/sys/kernel/debug/mem_hotplug/node1/add_memory
Response: We need not make a simple thing such complicated, We'd prefer to
rename the mem_hotplug/probe interface as mem_hotplug/add_memory.
/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)
v6:
Greg KH: Suggest to use interface mem_hotplug/add_node
David: Agree with Greg's suggestion
Response: We move the interface from node/add_node to mem_hotplug/add_node, and we also move
memory/probe interface to mem_hotplug/probe since both are related to memory hotplug.
Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
we send patches in future.
v5:
David: Suggests to use a flexible method to to do node hotplug emulation. After
review our 2 versions emulator implemetations, David provides a better solution
to solve both the flexibility and memory wasting issue.
Add numa=possible=<N> command line option, provide sysfs inteface
/sys/devices/system/node/add_node interface, and move the inteface to debugfs
/sys/kernel/debug/hotplug/add_node after hearing the voice from community.
Greg KH: move the interface from hotplug/add_node to node/add_node
Response: Accept David's node=possible=<n> command line options. After talking
with David, he agree to add his patch to our patchset, thanks David's solution(patch 1).
David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node
(patch 2)
Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
be the best.
Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
to support memory add on a specified node(patch 6).
We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).
Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).
Response: Thanks for Randy's careful review, we already correct them.
v4:
Split CPU hotplug emulation code since David has send a patchset for node hotplug emulation.
v3 & v2:
1) Patch 0
Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
Response: Thanks for the recommendation, With help from Fengguang, I get quilt
working, it is a great tool.
2) Patch 2
Jaswinder Singh: if (hidden_num) is not required in patch 2
Response: good catching, it is removed in v2.
3) Patch 3
Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
Greg: How big would this "list" be? What will it look like exactly?
Haicheng: It should follow "one value per file". It intends to show acceptable
parameters.
For example, if we have 4 fake offlined nodes, like node 2-5, then:
$ cat /sys/devices/system/node/probe
2-5
Then user hotadds node3 to system:
$ echo 3 > /sys/devices/system/node/probe
$ cat /sys/devices/system/node/probe
2,4-5
Greg: As you are trying to add a new sysfs file, please create the matching
Documentation/ABI/ file as well.
Response: We miss it, and we already add it in v2.
Patch 4 & 5:
Paul Mundt: This looks like an incredibly painful interface. How about scrapping all
of this _emu() mess and just reworking the register_cpu() interface?
Response: accept Paul's suggestion, and remove the cpu _emu functions.
Patch 7:
Dave Hansen: If we're going to put multiple values into the file now and
add to the ABI, can we be more explicit about it?
echo "physical_address=0x40000000 numa_node=3" > memory/probe
Response: Dave's new interface was accpeted, and more we still keep the old
format for compatibility. We documented the these interfaces into
Documentation/ABI in v2.
Greg: suggest to use configfs replace for the memory probe interface
Andi: This is a debugging interface. It doesn't need to have the
most pretty interface in the world, because it will be only used for
QA by a few people. it's just a QA interface, not the next generation
of POSIX.
Response: We still keep it as sysfs interface since node/cpu/memory probe interface
are all in sysfs, we can create another group of patches to support
configfs if we have this strong requirement in future.
v1:
the RFC version for NUMA Hotplug Emulator.
* WHAT IS HOTPLUG EMULATOR
NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.
The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu
and memory hotplug test purpose.
* WHY DO WE USE HOTPLUG EMULATOR
We are focusing on the hotplug emualation for a few months. The emualor helps
team to reproduce all the major hotplug bugs. It plays an important role to
the hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.
* Principles & Usages
NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug emulation.
1) Node hotplug emulation:
Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.
This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.
For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with
mem=2G numa=possible=4
and then creating a new 128M node at runtime:
# echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
On node 1 totalpages: 0
init_memory_mapping: 0000000080000000-0000000088000000
0080000000 - 0088000000 page 2M
Once the new node has been added, its memory can be onlined. If this
memory represents memory section 16, for example:
# echo online > /sys/devices/system/memory/memory16/state
Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
Policy zone: Normal
[ The memory section(s) mapped to a particular node are visible via
/sys/devices/system/node/node1, in this example. ]
2) CPU hotplug emulation:
The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.
When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator. We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.
- to hide CPUs
- Using boot option "maxcpus=N" hide CPUs
N is the number of initialize CPUs
- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
when cpu_hpe is enabled, the rest CPUs will not be initialized
- to hot-add CPU to node
# echo nid > cpu/probe
- to hot-remove CPU
# echo nid > cpu/release
3) Memory hotplug emulation:
The emulator reserves memory before OS boots, the reserved memory region is
removed from e820 table. Each online node has an add_memory interface, and
memory can be hot-added via the per-ndoe add_memory debugfs interface.
The difficulty of Memory Release is well-known, we have no plan for it until
now.
- reserve memory thru a kernel boot paramter
mem=1024m
- add a memory section to node 3
# echo 0x40000000 > mem_hotplug/node3/add_memory
OR
# echo 1024m > mem_hotplug/node3/add_memory
* ACKNOWLEDGMENT
NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes and
Yongkang You
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-07 1:00 [0/7,v8] NUMA Hotplug Emulator (v8) shaohui.zheng
@ 2010-12-07 1:00 ` shaohui.zheng
2010-12-08 21:31 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: shaohui.zheng @ 2010-12-07 1:00 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, haicheng.li, lethal, ak, shaohui.zheng, rientjes,
dave, gregkh, Haicheng Li, Shaohui Zheng
[-- Attachment #1: 007-hotplug-emulator-add-memory-debugfs-interface.patch --]
[-- Type: text/plain, Size: 4969 bytes --]
From: Shaohui Zheng <shaohui.zheng@intel.com>
Add add_memory interface to support to memory hotplug emulation for each online
node under debugfs. The reserved memory can be added into desired node with
this interface.
The layout on debugfs:
mem_hotplug/node0/add_memory
mem_hotplug/node1/add_memory
mem_hotplug/node2/add_memory
...
Add a memory section(128M) to node 3(boots with mem=1024m)
echo 0x40000000 > mem_hotplug/node3/add_memory
And more we make it friendly, it is possible to add memory to do
echo 1024m > mem_hotplug/node3/add_memory
CC: David Rientjes <rientjes@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
---
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c 2010-12-02 12:35:31.557622002 +0800
+++ linux-hpe4/mm/memory_hotplug.c 2010-12-06 07:30:36.067622001 +0800
@@ -930,6 +930,80 @@
static struct dentry *memhp_debug_root;
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t add_memory_store(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ u64 phys_addr = 0;
+ int nid = file->private_data - NULL;
+ int ret;
+
+ phys_addr = simple_strtoull(buf, NULL, 0);
+ printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+ phys_addr = memparse(buf, NULL);
+ ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+
+ if (ret)
+ count = ret;
+
+ return count;
+}
+
+static int add_memory_open(struct inode *inode, struct file *file)
+{
+ file->private_data = inode->i_private;
+ return 0;
+}
+
+static const struct file_operations add_memory_file_ops = {
+ .open = add_memory_open,
+ .write = add_memory_store,
+ .llseek = generic_file_llseek,
+};
+
+/*
+ * Create add_memory debugfs entry under specified node
+ */
+static int debugfs_create_add_memory_entry(int nid)
+{
+ char buf[32];
+ static struct dentry *node_debug_root;
+
+ snprintf(buf, sizeof(buf), "node%d", nid);
+ node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
+
+ /* the nid information was represented by the offset of pointer(NULL+nid) */
+ if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
+ NULL + nid, &add_memory_file_ops))
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int __init memory_debug_init(void)
+{
+ int nid;
+
+ if (!memhp_debug_root)
+ memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+ if (!memhp_debug_root)
+ return -ENOMEM;
+
+ for_each_online_node(nid)
+ debugfs_create_add_memory_entry(nid);
+
+ return 0;
+}
+
+module_init(memory_debug_init);
+#else
+static debugfs_create_add_memory_entry(int nid)
+{
+ return 0;
+}
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
+
static ssize_t add_node_store(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -960,6 +1034,8 @@
return -ENOMEM;
ret = add_memory(nid, start, size);
+
+ debugfs_create_add_memory_entry(nid);
return ret ? ret : count;
}
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt 2010-12-02 12:35:31.557622002 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt 2010-12-06 07:39:36.007622000 +0800
@@ -19,6 +19,7 @@
4.1 Hardware(Firmware) Support
4.2 Notify memory hot-add event by hand
4.3 Node hotplug emulation
+ 4.4 Memory hotplug emulation
5. Logical Memory hot-add phase
5.1. State of memory
5.2. How to online memory
@@ -239,6 +240,29 @@
Once the new node has been added, it is possible to online the memory by
toggling the "state" of its memory section(s) as described in section 5.1.
+4.4 Memory hotplug emulation
+------------
+With debugfs, it is possible to test memory hotplug with software method, we
+can add memory section to desired node with add_memory interface. It is a much
+more powerful interface than "probe" described in section 4.2.
+
+There is an add_memory interface for each online node at the debugfs mount
+point.
+ mem_hotplug/node0/add_memory
+ mem_hotplug/node1/add_memory
+ mem_hotplug/node2/add_memory
+ ...
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+ echo 0x40000000 > mem_hotplug/node3/add_memory
+
+And more we make it friendly, it is possible to add memory to do
+
+ echo 1024m > mem_hotplug/node3/add_memory
+
+Once the new memory section has been added, it is possible to online the memory
+by toggling the "state" described in section 5.1.
------------------------------
5. Logical Memory hot-add phase
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface
2010-12-07 1:00 ` [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface shaohui.zheng
@ 2010-12-08 21:31 ` David Rientjes
0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-12-08 21:31 UTC (permalink / raw)
To: Shaohui Zheng
Cc: Andrew Morton, linux-mm, linux-kernel, haicheng.li, lethal,
Andi Kleen, dave, Greg Kroah-Hartman, Haicheng Li
On Tue, 7 Dec 2010, shaohui.zheng@intel.com wrote:
> From: Shaohui Zheng <shaohui.zheng@intel.com>
>
> Add add_memory interface to support to memory hotplug emulation for each online
> node under debugfs. The reserved memory can be added into desired node with
> this interface.
>
> The layout on debugfs:
> mem_hotplug/node0/add_memory
> mem_hotplug/node1/add_memory
> mem_hotplug/node2/add_memory
> ...
>
> Add a memory section(128M) to node 3(boots with mem=1024m)
>
> echo 0x40000000 > mem_hotplug/node3/add_memory
>
> And more we make it friendly, it is possible to add memory to do
>
> echo 1024m > mem_hotplug/node3/add_memory
>
I don't think you should be using memparse() to support this type of
interface, the standard way of writing memory locations is by writing
address in hex as the first example does. The idea is to not try to make
things simpler by introducing multiple ways of doing the same thing but
rather to standardize on a single interface.
> CC: David Rientjes <rientjes@google.com>
> CC: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: Haicheng Li <haicheng.li@intel.com>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
> ---
> Index: linux-hpe4/mm/memory_hotplug.c
> ===================================================================
> --- linux-hpe4.orig/mm/memory_hotplug.c 2010-12-02 12:35:31.557622002 +0800
> +++ linux-hpe4/mm/memory_hotplug.c 2010-12-06 07:30:36.067622001 +0800
> @@ -930,6 +930,80 @@
>
> static struct dentry *memhp_debug_root;
>
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> +
> +static ssize_t add_memory_store(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + u64 phys_addr = 0;
> + int nid = file->private_data - NULL;
> + int ret;
> +
> + phys_addr = simple_strtoull(buf, NULL, 0);
This isn't doing anything.
> + printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
> + phys_addr = memparse(buf, NULL);
> + ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
Does the add_memory() call handle memoryless nodes such that they
appropriately transition to N_HIGH_MEMORY when memory is added?
> +
> + if (ret)
> + count = ret;
> +
> + return count;
> +}
> +
> +static int add_memory_open(struct inode *inode, struct file *file)
> +{
> + file->private_data = inode->i_private;
> + return 0;
> +}
> +
> +static const struct file_operations add_memory_file_ops = {
> + .open = add_memory_open,
> + .write = add_memory_store,
> + .llseek = generic_file_llseek,
> +};
> +
> +/*
> + * Create add_memory debugfs entry under specified node
> + */
> +static int debugfs_create_add_memory_entry(int nid)
> +{
> + char buf[32];
> + static struct dentry *node_debug_root;
> +
> + snprintf(buf, sizeof(buf), "node%d", nid);
> + node_debug_root = debugfs_create_dir(buf, memhp_debug_root);
This can fail, and if it does then the subsequent debugfs_create_file()
will be added to root while we don't want, so this needs error handling.
> +
> + /* the nid information was represented by the offset of pointer(NULL+nid) */
> + if (!debugfs_create_file("add_memory", S_IWUSR, node_debug_root,
> + NULL + nid, &add_memory_file_ops))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static int __init memory_debug_init(void)
> +{
> + int nid;
> +
> + if (!memhp_debug_root)
> + memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> + if (!memhp_debug_root)
> + return -ENOMEM;
> +
> + for_each_online_node(nid)
> + debugfs_create_add_memory_entry(nid);
> +
> + return 0;
> +}
> +
> +module_init(memory_debug_init);
> +#else
> +static debugfs_create_add_memory_entry(int nid)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
> +
> static ssize_t add_node_store(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos)
> {
> @@ -960,6 +1034,8 @@
> return -ENOMEM;
>
> ret = add_memory(nid, start, size);
> +
> + debugfs_create_add_memory_entry(nid);
> return ret ? ret : count;
> }
>
> Index: linux-hpe4/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/memory-hotplug.txt 2010-12-02 12:35:31.557622002 +0800
> +++ linux-hpe4/Documentation/memory-hotplug.txt 2010-12-06 07:39:36.007622000 +0800
> @@ -19,6 +19,7 @@
> 4.1 Hardware(Firmware) Support
> 4.2 Notify memory hot-add event by hand
> 4.3 Node hotplug emulation
> + 4.4 Memory hotplug emulation
> 5. Logical Memory hot-add phase
> 5.1. State of memory
> 5.2. How to online memory
> @@ -239,6 +240,29 @@
> Once the new node has been added, it is possible to online the memory by
> toggling the "state" of its memory section(s) as described in section 5.1.
>
> +4.4 Memory hotplug emulation
> +------------
> +With debugfs, it is possible to test memory hotplug with software method, we
> +can add memory section to desired node with add_memory interface. It is a much
> +more powerful interface than "probe" described in section 4.2.
> +
> +There is an add_memory interface for each online node at the debugfs mount
> +point.
> + mem_hotplug/node0/add_memory
> + mem_hotplug/node1/add_memory
> + mem_hotplug/node2/add_memory
> + ...
> +
> +Add a memory section(128M) to node 3(boots with mem=1024m)
> +
> + echo 0x40000000 > mem_hotplug/node3/add_memory
> +
> +And more we make it friendly, it is possible to add memory to do
> +
> + echo 1024m > mem_hotplug/node3/add_memory
> +
> +Once the new memory section has been added, it is possible to online the memory
> +by toggling the "state" described in section 5.1.
>
> ------------------------------
> 5. Logical Memory hot-add phase
>
> --
> Thanks & Regards,
> Shaohui
>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-12-13 20:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <A24AE1FFE7AEC5489F83450EE98351BF2A40FED20A@shsmsx502.ccr.corp.intel.com>
2010-12-09 1:21 ` [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface Shaohui Zheng
2010-12-09 21:29 ` David Rientjes
2010-12-09 23:57 ` Shaohui Zheng
2010-12-10 23:30 ` David Rientjes
2010-12-13 2:09 ` Shaohui Zheng
2010-12-13 20:56 ` David Rientjes
2010-12-07 1:00 [0/7,v8] NUMA Hotplug Emulator (v8) shaohui.zheng
2010-12-07 1:00 ` [7/7,v8] NUMA Hotplug Emulator: Implement per-node add_memory debugfs interface shaohui.zheng
2010-12-08 21:31 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).