linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 2/2] mm: add node hotplug emulation
  2010-11-21  2:28                     ` [patch 1/2] x86: add numa=possible command line option David Rientjes
@ 2010-11-21  2:28                       ` David Rientjes
  2010-11-21 17:34                         ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2010-11-21  2:28 UTC (permalink / raw)
  To: Andrew Morton, Greg Kroah-Hartman
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Shaohui Zheng,
	Paul Mundt, Andi Kleen, Yinghai Lu, Haicheng Li, Randy Dunlap,
	linux-kernel, linux-mm, x86


Add an interface to allow new nodes to be added when performing memory
hot-add.  This provides a convenient interface to test memory hotplug
notifier callbacks and surrounding hotplug code when new nodes are
onlined without actually having a machine with such hotpluggable SRAT
entries.

This adds a new interface at /sys/devices/system/memory/add_node that
behaves in a similar way to the memory hot-add "probe" interface.  Its
format is size@start, where "size" is the size of the new node to be
added and "start" is the physical address of the new memory.

The new node id is a currently offline, but possible, node.  The bit must
be set in node_possible_map so that nr_node_ids is sized appropriately.

For emulation on x86, for example, it would be possible to set aside
memory for hotplugged nodes (say, anything above 2G) and to add an
additional three nodes as being possible on boot with

	mem=2G numa=possible=3

and then creating a new 128M node at runtime:

	# echo 128M@0x80000000 > /sys/devices/system/memory/add_node
	On node 1 totalpages: 0
	init_memory_mapping: 0000000080000000-0000000088000000
	 0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined.  If this
memory represents memory section 16, for example:

	# echo online > /sys/devices/system/memory/memory16/state
	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
	Policy zone: Normal

 [ The memory section(s) mapped to a particular node are visible via
   /sys/devices/system/node/node1, in this example. ]

The new node is now hotplugged and ready for testing.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 Documentation/memory-hotplug.txt |   24 ++++++++++++++++++++++++
 drivers/base/memory.c            |   36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -18,6 +18,7 @@ be changed often.
 4. Physical memory hot-add phase
   4.1 Hardware(Firmware) Support
   4.2 Notify memory hot-add event by hand
+  4.3 Node hotplug emulation
 5. Logical Memory hot-add phase
   5.1. State of memory
   5.2. How to online memory
@@ -215,6 +216,29 @@ current implementation). You'll have to online memory by yourself.
 Please see "How to online memory" in this text.
 
 
+4.3 Node hotplug emulation
+------------
+It is possible to test node hotplug by assigning the newly added memory to a
+new node id when using a different interface with a similar behavior to
+"probe" described in section 4.2.  If a node id is possible (there are bits
+in /sys/devices/system/memory/possible that are not online), then it may be
+used to emulate a newly added node as the result of memory hotplug by using
+the "add_node" interface.
+
+The add_node interface is located at
+/sys/devices/system/memory/add_node
+
+You can create a new node of a specified size starting at the physical
+address of new memory by
+
+% echo size@start_address_of_new_memory > /sys/devices/system/memory/add_node
+
+Where "size" can be represented in megabytes or gigabytes (for example,
+"128M" or "1G").  The minumum size is that of a memory section.
+
+Once the new node has been added, it is possible to online the memory by
+toggling the "state" of its memory section(s) as described in section 5.1.
+
 
 ------------------------------
 5. Logical Memory hot-add phase
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -353,10 +353,44 @@ memory_probe_store(struct class *class, struct class_attribute *attr,
 }
 static CLASS_ATTR(probe, S_IWUSR, NULL, memory_probe_store);
 
+static ssize_t
+memory_add_node_store(struct class *class, struct class_attribute *attr,
+		      const char *buf, size_t count)
+{
+	nodemask_t mask;
+	u64 start, size;
+	char *p;
+	int nid;
+	int ret;
+
+	size = memparse(buf, &p);
+	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))
+		return -EINVAL;
+	if (*p != '@')
+		return -EINVAL;
+
+	start = simple_strtoull(p + 1, NULL, 0);
+
+	nodes_andnot(mask, node_possible_map, node_online_map);
+	nid = first_node(mask);
+	if (nid == MAX_NUMNODES)
+		return -EINVAL;
+
+	ret = add_memory(nid, start, size);
+	return ret ? ret : count;
+}
+static CLASS_ATTR(add_node, S_IWUSR, NULL, memory_add_node_store);
+
 static int memory_probe_init(void)
 {
-	return sysfs_create_file(&memory_sysdev_class.kset.kobj,
+	int err;
+
+	err = sysfs_create_file(&memory_sysdev_class.kset.kobj,
 				&class_attr_probe.attr);
+	if (err)
+		return err;
+	return sysfs_create_file(&memory_sysdev_class.kset.kobj,
+				&class_attr_add_node.attr);
 }
 #else
 static inline int memory_probe_init(void)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] mm: add node hotplug emulation
  2010-11-21  2:28                       ` [patch 2/2] mm: add node hotplug emulation David Rientjes
@ 2010-11-21 17:34                         ` Greg KH
  2010-11-21 21:48                           ` David Rientjes
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2010-11-21 17:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Shaohui Zheng, Paul Mundt, Andi Kleen, Yinghai Lu, Haicheng Li,
	Randy Dunlap, linux-kernel, linux-mm, x86

On Sat, Nov 20, 2010 at 06:28:38PM -0800, David Rientjes wrote:
> 
> Add an interface to allow new nodes to be added when performing memory
> hot-add.  This provides a convenient interface to test memory hotplug
> notifier callbacks and surrounding hotplug code when new nodes are
> onlined without actually having a machine with such hotpluggable SRAT
> entries.
> 
> This adds a new interface at /sys/devices/system/memory/add_node that
> behaves in a similar way to the memory hot-add "probe" interface.  Its
> format is size@start, where "size" is the size of the new node to be
> added and "start" is the physical address of the new memory.

Ick, we are trying to clean up the system devices right now which would
prevent this type of tree being added.

> The new node id is a currently offline, but possible, node.  The bit must
> be set in node_possible_map so that nr_node_ids is sized appropriately.
> 
> For emulation on x86, for example, it would be possible to set aside
> memory for hotplugged nodes (say, anything above 2G) and to add an
> additional three nodes as being possible on boot with
> 
> 	mem=2G numa=possible=3
> 
> and then creating a new 128M node at runtime:
> 
> 	# echo 128M@0x80000000 > /sys/devices/system/memory/add_node
> 	On node 1 totalpages: 0
> 	init_memory_mapping: 0000000080000000-0000000088000000
> 	 0080000000 - 0088000000 page 2M
> 
> Once the new node has been added, its memory can be onlined.  If this
> memory represents memory section 16, for example:
> 
> 	# echo online > /sys/devices/system/memory/memory16/state
> 	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
> 	Policy zone: Normal
> 
>  [ The memory section(s) mapped to a particular node are visible via
>    /sys/devices/system/node/node1, in this example. ]
> 
> The new node is now hotplugged and ready for testing.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  Documentation/memory-hotplug.txt |   24 ++++++++++++++++++++++++
>  drivers/base/memory.c            |   36 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 59 insertions(+), 1 deletions(-)

When adding sysfs files you need to document it in Documentation/ABI
instead.

But as this is a debugging thing, why not just put it in debugfs
instead?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] mm: add node hotplug emulation
  2010-11-21 17:34                         ` Greg KH
@ 2010-11-21 21:48                           ` David Rientjes
  0 siblings, 0 replies; 7+ messages in thread
From: David Rientjes @ 2010-11-21 21:48 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Shaohui Zheng, Paul Mundt, Andi Kleen, Yinghai Lu, Haicheng Li,
	Randy Dunlap, linux-kernel, linux-mm, x86

On Sun, 21 Nov 2010, Greg KH wrote:

> But as this is a debugging thing, why not just put it in debugfs
> instead?
> 

Ok, I think Paul had a similar suggestion during the discussion of 
Shaohui's patchset.  I'll move it, thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] mm: add node hotplug emulation
       [not found] <A24AE1FFE7AEC5489F83450EE98351BF28723FC4A7@shsmsx502.ccr.corp.intel.com>
@ 2010-11-22  1:47 ` Shaohui Zheng
  2010-11-24  6:45   ` Shaohui Zheng
  2010-11-28  2:00   ` David Rientjes
  0 siblings, 2 replies; 7+ messages in thread
From: Shaohui Zheng @ 2010-11-22  1:47 UTC (permalink / raw)
  To: akpm, gregkh, rientjes
  Cc: mingo, hpa, tglx, lethal, ak, yinghai, randy.dunlap, linux-kernel,
	linux-mm, x86, haicheng.li, haicheng.li, shaohui.zheng,
	shaohui.zheng

On Mon, Nov 22, 2010 at 09:47:02AM +0800, Zheng, Shaohui wrote:
> Add an interface to allow new nodes to be added when performing memory
> hot-add.  This provides a convenient interface to test memory hotplug
> notifier callbacks and surrounding hotplug code when new nodes are
> onlined without actually having a machine with such hotpluggable SRAT
> entries.
> 
> This adds a new interface at /sys/devices/system/memory/add_node that
> behaves in a similar way to the memory hot-add "probe" interface.  Its
> format is size@start, where "size" is the size of the new node to be
> added and "start" is the physical address of the new memory.
> 
> The new node id is a currently offline, but possible, node.  The bit must
> be set in node_possible_map so that nr_node_ids is sized appropriately.
> 
> For emulation on x86, for example, it would be possible to set aside
> memory for hotplugged nodes (say, anything above 2G) and to add an
> additional three nodes as being possible on boot with
> 
> 	mem=2G numa=possible=3
> 
> and then creating a new 128M node at runtime:
> 
> 	# echo 128M@0x80000000 > /sys/devices/system/memory/add_node
> 	On node 1 totalpages: 0
> 	init_memory_mapping: 0000000080000000-0000000088000000
> 	 0080000000 - 0088000000 page 2M

For cpu/memory physical hotplug, we have the unique interface probe/release,
it is the _standard_ interface, it is not only for x86, ppc use the the interface
as well. For node hotplug, it should follow the rule.

You are creating a new interface /sys/devices/system/memory/add_node to add both
memory and node, you are just trying to create DUPLICATED feature with the
memory probe interface, it breaks the rule. 

I did NOT see the feature difference with our emulator patch http://lkml.org/lkml/2010/11/16/740,
you pick up a piece of feature from emulator, and create an other thread. You
are trying to replace the interface with a new one, which is not recommended.
the memory probe interface is already powerful and flexible enough after apply
our patch. What's more important, it keeps the old directives, and it maintains
backwards compatibility.

Add a memory section(128M) to node 3(boots with mem=1024m)

	echo 0x40000000,3 > memory/probe

And more we make it friendly, it is possible to add memory to do

	echo 3g > memory/probe
	echo 1024m,3 > memory/probe

It maintains backwards compatibility.

Another format suggested by Dave Hansen:

	echo physical_address=0x40000000 numa_node=3 > memory/probe

we should not need duplicated interface /sys/devices/system/memory/add_node here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] mm: add node hotplug emulation
  2010-11-22  1:47 ` [patch 2/2] mm: add node hotplug emulation Shaohui Zheng
@ 2010-11-24  6:45   ` Shaohui Zheng
  2010-11-28  2:01     ` David Rientjes
  2010-11-28  2:00   ` David Rientjes
  1 sibling, 1 reply; 7+ messages in thread
From: Shaohui Zheng @ 2010-11-24  6:45 UTC (permalink / raw)
  To: akpm, gregkh, rientjes
  Cc: mingo, hpa, tglx, lethal, ak, yinghai, randy.dunlap, linux-kernel,
	linux-mm, x86, haicheng.li, haicheng.li, shaohui.zheng

On Mon, Nov 22, 2010 at 09:47:06AM +0800, Shaohui Zheng wrote:
> On Mon, Nov 22, 2010 at 09:47:02AM +0800, Zheng, Shaohui wrote:
> 
> For cpu/memory physical hotplug, we have the unique interface probe/release,
> it is the _standard_ interface, it is not only for x86, ppc use the the interface
> as well. For node hotplug, it should follow the rule.
> 
> You are creating a new interface /sys/devices/system/memory/add_node to add both
> memory and node, you are just trying to create DUPLICATED feature with the
> memory probe interface, it breaks the rule. 
> 
> I did NOT see the feature difference with our emulator patch http://lkml.org/lkml/2010/11/16/740,
> you pick up a piece of feature from emulator, and create an other thread. You
> are trying to replace the interface with a new one, which is not recommended.
> the memory probe interface is already powerful and flexible enough after apply
> our patch. What's more important, it keeps the old directives, and it maintains
> backwards compatibility.
> 
> Add a memory section(128M) to node 3(boots with mem=1024m)
> 
> 	echo 0x40000000,3 > memory/probe
> 
> And more we make it friendly, it is possible to add memory to do
> 
> 	echo 3g > memory/probe
> 	echo 1024m,3 > memory/probe
> 
> It maintains backwards compatibility.
> 
> Another format suggested by Dave Hansen:
> 
> 	echo physical_address=0x40000000 numa_node=3 > memory/probe
> 
> we should not need duplicated interface /sys/devices/system/memory/add_node here.

ah, a long time silence.

Does somebody know the status of this patch, is it accepted by the maintainer?
I am not in patch's CC list, so I will not get mail notice when the patch was
accepted by the maintainer.

the other hotplug emulator patches has dependency on this patch, so I can not
re-make my patchset if this patch is still pending. thanks.

-- 
Thanks & Regards,
Shaohui

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] mm: add node hotplug emulation
  2010-11-22  1:47 ` [patch 2/2] mm: add node hotplug emulation Shaohui Zheng
  2010-11-24  6:45   ` Shaohui Zheng
@ 2010-11-28  2:00   ` David Rientjes
  1 sibling, 0 replies; 7+ messages in thread
From: David Rientjes @ 2010-11-28  2:00 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: akpm, gregkh, mingo, hpa, tglx, lethal, ak, yinghai, randy.dunlap,
	linux-kernel, linux-mm, x86, haicheng.li, haicheng.li,
	shaohui.zheng

On Mon, 22 Nov 2010, Shaohui Zheng wrote:

> > and then creating a new 128M node at runtime:
> > 
> > 	# echo 128M@0x80000000 > /sys/devices/system/memory/add_node
> > 	On node 1 totalpages: 0
> > 	init_memory_mapping: 0000000080000000-0000000088000000
> > 	 0080000000 - 0088000000 page 2M
> 
> For cpu/memory physical hotplug, we have the unique interface probe/release,
> it is the _standard_ interface, it is not only for x86, ppc use the the interface
> as well. For node hotplug, it should follow the rule.
> 
> You are creating a new interface /sys/devices/system/memory/add_node to add both
> memory and node, you are just trying to create DUPLICATED feature with the
> memory probe interface, it breaks the rule. 
> 

It's not duplicated, the function of add_node is distinct since it maps 
the added memory to a node that wasn't previously defined (for the x86 
case, defined by the SRAT).  I think this is better than an additional 
abstraction layer that remaps memory to nodes above what the BIOS has 
defined, and there's nothing architecture specific about add_node; if an 
arch can do probe then it can use this new interface.

> I did NOT see the feature difference with our emulator patch http://lkml.org/lkml/2010/11/16/740,
> you pick up a piece of feature from emulator, and create an other thread. You
> are trying to replace the interface with a new one, which is not recommended.
> the memory probe interface is already powerful and flexible enough after apply
> our patch. What's more important, it keeps the old directives, and it maintains
> backwards compatibility.
> 

This achieves the same goal in a much cleaner and generic way.  It doesn't 
replace anything that currently sits in the kernel, instead it competes 
directly with your model for node hotplug emulation.

> Add a memory section(128M) to node 3(boots with mem=1024m)
> 
> 	echo 0x40000000,3 > memory/probe
> 
> And more we make it friendly, it is possible to add memory to do
> 
> 	echo 3g > memory/probe
> 	echo 1024m,3 > memory/probe
> 
> It maintains backwards compatibility.
> 

My patch doesn't break backwards compatibility, it adds a new debugfs file 
that allows you to test node hotplug.

> Another format suggested by Dave Hansen:
> 
> 	echo physical_address=0x40000000 numa_node=3 > memory/probe
> 
> we should not need duplicated interface /sys/devices/system/memory/add_node here.
> 

We don't need to define a node id, we only need to ensure that a possible 
node is not yet online and use it; we don't gain anything by trying to 
hotplug node ids in a sparse or interleaved way (although it is certainly 
possible with a combination of my patch and CONFIG_MEMORY_HOTREMOVE).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] mm: add node hotplug emulation
  2010-11-24  6:45   ` Shaohui Zheng
@ 2010-11-28  2:01     ` David Rientjes
  0 siblings, 0 replies; 7+ messages in thread
From: David Rientjes @ 2010-11-28  2:01 UTC (permalink / raw)
  To: Shaohui Zheng
  Cc: akpm, gregkh, mingo, hpa, tglx, lethal, ak, yinghai, randy.dunlap,
	linux-kernel, linux-mm, x86, haicheng.li, haicheng.li,
	shaohui.zheng

On Wed, 24 Nov 2010, Shaohui Zheng wrote:

> ah, a long time silence.
> 

Sorry, last week included a holiday in the USA.

> Does somebody know the status of this patch, is it accepted by the maintainer?
> I am not in patch's CC list, so I will not get mail notice when the patch was
> accepted by the maintainer.
> 

Neither of these patches have been merged anywhere yet, you're not missing 
anything :)  If/when Andrew picks it up, I'm quite certain he'll cc you on 
it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-28  2:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <A24AE1FFE7AEC5489F83450EE98351BF28723FC4A7@shsmsx502.ccr.corp.intel.com>
2010-11-22  1:47 ` [patch 2/2] mm: add node hotplug emulation Shaohui Zheng
2010-11-24  6:45   ` Shaohui Zheng
2010-11-28  2:01     ` David Rientjes
2010-11-28  2:00   ` David Rientjes
2010-11-17  2:07 [0/8,v3] NUMA Hotplug Emulator - Introduction & Feedbacks shaohui.zheng
2010-11-17  2:08 ` [2/8,v3] NUMA Hotplug Emulator: infrastructure of NUMA hotplug emulation shaohui.zheng
2010-11-17  8:16   ` David Rientjes
2010-11-17  7:51     ` Shaohui Zheng
2010-11-17 21:10       ` David Rientjes
2010-11-18  4:14         ` Shaohui Zheng
2010-11-18  6:27           ` Paul Mundt
2010-11-18  5:27             ` Shaohui Zheng
2010-11-18 21:24               ` David Rientjes
2010-11-19  0:32                 ` Shaohui Zheng
2010-11-21  0:48                   ` David Rientjes
2010-11-21  2:28                     ` [patch 1/2] x86: add numa=possible command line option David Rientjes
2010-11-21  2:28                       ` [patch 2/2] mm: add node hotplug emulation David Rientjes
2010-11-21 17:34                         ` Greg KH
2010-11-21 21:48                           ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).