From: Shaohui Zheng <shaohui.zheng@intel.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
haicheng.li@linux.intel.com, lethal@linux-sh.org,
ak@linux.intel.com, shaohui.zheng@linux.intel.com,
Yinghai Lu <yinghai@kernel.org>,
Haicheng Li <haicheng.li@intel.com>
Subject: Re: [2/8,v3] NUMA Hotplug Emulator: infrastructure of NUMA hotplug emulation
Date: Thu, 18 Nov 2010 12:14:07 +0800 [thread overview]
Message-ID: <20101118041407.GA2408@shaohui> (raw)
In-Reply-To: <alpine.DEB.2.00.1011171304060.10254@chino.kir.corp.google.com>
On Wed, Nov 17, 2010 at 01:10:50PM -0800, David Rientjes wrote:
> On Wed, 17 Nov 2010, Shaohui Zheng wrote:
>
> > > Hmm, why can't you use numa=hide to hide a specified quantity of memory
> > > from the kernel and then use the add_memory() interface to hot-add the
> > > offlined memory in the desired quantity? In other words, why do you need
> > > to track the offlined nodes with a state?
> > >
> > > The userspace interface would take a desired size of hidden memory to
> > > hot-add and the node id would be the first_unset_node(node_online_map).
> > Yes, it is a good idea, your solution is what we indeed do in our first 2
> > versions. We use mem=memsize to hide memory, and we call add_memory interface
> > to hot-add offlined memory with desired quantity, and we can also add to
> > desired nodes(even through the nodes does not exists). it is very flexible
> > solution.
> >
> > However, this solution was denied since we notice NUMA emulation, we should
> > reuse it.
> >
>
> I don't understand why that's a requirement, NUMA emulation is a seperate
> feature. Although both are primarily used to test and instrument other VM
> and kernel code, NUMA emulation is restricted to only being used at boot
> to fake nodes on smaller machines and can be used to test things like the
> slab allocator. The NUMA hotplug emulator that you're developing here is
> primarily used to test the hotplug callbacks; for that use-case, it seems
> particularly helpful if nodes can be hotplugged of various sizes and node
> ids rather than having static characteristics that cannot be changed with
> a reboot.
>
I agree with you. the early emulator do the same thing as you said, but there
is already NUMA emulation to create fake node, our emulator also creates
fake nodes. We worried about that we will suffer the critiques from the community,
so we drop the original degsin.
I did not know whether other engineers have the same attitude with you. I think
that I can publish both codes, and let the community to decide which one is prefered.
In my personal opinion, both methods are acceptable for me.
> > Currently, our solution creates static nodes when OS boots, only the node with
> > state N_HIDDEN can be hot-added with node/probe interface, and we can query
> >
>
> The idea that I've proposed (and you've apparently thought about and even
> implemented at one point) is much more powerful than that. We need not
> query the state of hidden nodes that we've setup at boot but can rather
> use the amount of hidden memory to setup the nodes in any way that we want
> at runtime (various sizes, interleaved node ids, etc).
yes, if we select your proposal. we just mark all the nodes as POSSIBLE node.
there is no hidden nodes any more. the node will be created after add memory
to the node first time.
This is the early patch( Not very formal, it is just an interanl version):
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 454997c..9dc6a02 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -73,6 +73,7 @@
*
* node_set_online(node) set bit 'node' in node_online_map
* node_set_offline(node) clear bit 'node' in node_online_map
+ * node_set_possible(node) set bit 'node' in node_possible_map
*
* for_each_node(node) for-loop node over node_possible_map
* for_each_online_node(node) for-loop node over node_online_map
@@ -432,6 +433,11 @@ static inline void node_set_offline(int nid)
node_clear_state(nid, N_ONLINE);
nr_online_nodes = num_node_state(N_ONLINE);
}
+
+static inline void node_set_possible(int nid)
+{
+ node_set_state(nid, N_POSSIBLE);
+}
#else
static inline int node_state(int node, enum node_states state)
@@ -462,6 +468,7 @@ static inline int num_node_state(enum node_states state)
#define node_set_online(node) node_set_state((node), N_ONLINE)
#define node_set_offline(node) node_clear_state((node), N_ONLINE)
+#define node_set_possible(node) node_set_state((node), N_POSSIBLE)
#endif
#define node_online_map node_states[N_ONLINE]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb40925..059ebf0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1602,6 +1602,9 @@ config HOTPLUG_CPU
( Note: power management support will enable this option
automatically on SMP systems. )
Say N if you want to disable CPU hotplug.
+config ARCH_CPU_PROBE_RELEASE
+ def_bool y
+ depends on HOTPLUG_CPU
config COMPAT_VDSO
def_bool y
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 550df48..52094bc 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -26,12 +26,11 @@ void __init setup_node_to_cpumask_map(void)
{
unsigned int node, num = 0;
- /* setup nr_node_ids if not done yet */
- if (nr_node_ids == MAX_NUMNODES) {
- for_each_node_mask(node, node_possible_map)
- num = node;
- nr_node_ids = num + 1;
- }
+ /* re-setup nr_node_ids, when CONFIG_ARCH_MEMORY_PROBE enabled and mem=XXX
+ specified, nr_node_ids will be set as the maximum value */
+ for_each_node_mask(node, node_possible_map)
+ num = node;
+ nr_node_ids = num + 1;
/* allocate the map */
for (node = 0; node < nr_node_ids; node++)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index bd02505..3d0e37c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -327,6 +327,8 @@ static int block_size_init(void)
* will not need to do it from userspace. The fake hot-add code
* as well as ppc64 will do all of their discovery in userspace
* and will require this interface.
+ *
+ * Parameter format: start_addr, nid
*/
#ifdef CONFIG_ARCH_MEMORY_PROBE
static ssize_t
@@ -336,10 +338,26 @@ memory_probe_store(struct class *class, const char *buf, size_t count)
int nid;
int ret;
- phys_addr = simple_strtoull(buf, NULL, 0);
+ char *p = strchr(buf, ',');
+
+ if (p != NULL && strlen(p+1) > 0) {
+ /* nid specified */
+ *p++ = '\0';
+ nid = simple_strtoul(p, NULL, 0);
+ phys_addr = simple_strtoull(buf, NULL, 0);
+ } else {
+ phys_addr = simple_strtoull(buf, NULL, 0);
+ nid = memory_add_physaddr_to_nid(phys_addr);
+ }
- nid = memory_add_physaddr_to_nid(phys_addr);
- ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+ if (nid < 0 || nid > nr_node_ids - 1) {
+ printk(KERN_ERR "Invalid node id %d(0<=nid<%d).\n", nid, nr_node_ids);
+ } else {
+ printk(KERN_INFO "Add a memory section to node: %d.\n", nid);
+ ret = add_memory(nid, phys_addr, PAGES_PER_SECTION << PAGE_SHIFT);
+ if (ret)
+ count = ret;
+ }
if (ret)
count = ret;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8deb9d0..0d7eeea 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3946,9 +3946,19 @@ static void __init setup_nr_node_ids(void)
unsigned int node;
unsigned int highest = 0;
+ #ifdef CONFIG_ARCH_MEMORY_PROBE
+ /* grub parameter mem=XXX specified */
+ if (1){
+ int cnt;
+ for (cnt = 0; cnt < MAX_NUMNODES; cnt++)
+ node_set_possible(cnt);
+ }
+ #endif
+
for_each_node_mask(node, node_possible_map)
highest = node;
nr_node_ids = highest + 1;
+ printk(KERN_INFO "setup_nr_node_ids: nr_node_ids : %d.\n", nr_node_ids);
}
#else
static inline void setup_nr_node_ids(void)
--
Thanks & Regards,
Shaohui
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-11-18 5:35 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-17 2:07 [0/8,v3] NUMA Hotplug Emulator - Introduction & Feedbacks shaohui.zheng
2010-11-17 2:08 ` [1/8,v3] NUMA Hotplug Emulator: add function to hide memory region via e820 table shaohui.zheng
2010-11-17 8:16 ` David Rientjes
2010-11-18 9:20 ` Shaohui Zheng
2010-11-18 21:16 ` David Rientjes
2010-11-19 0:12 ` Shaohui Zheng
2010-11-21 0:45 ` David Rientjes
2010-11-21 14:00 ` Américo Wang
2010-11-21 21:33 ` David Rientjes
2010-11-17 2:08 ` [2/8,v3] NUMA Hotplug Emulator: infrastructure of NUMA hotplug emulation shaohui.zheng
2010-11-17 8:16 ` David Rientjes
2010-11-17 7:51 ` Shaohui Zheng
2010-11-17 21:10 ` David Rientjes
2010-11-18 4:14 ` Shaohui Zheng [this message]
2010-11-18 6:27 ` Paul Mundt
2010-11-18 5:27 ` Shaohui Zheng
2010-11-18 21:24 ` David Rientjes
2010-11-19 0:32 ` Shaohui Zheng
2010-11-21 0:48 ` David Rientjes
2010-11-21 2:28 ` [patch 1/2] x86: add numa=possible command line option David Rientjes
2010-11-21 2:28 ` [patch 2/2] mm: add node hotplug emulation David Rientjes
2010-11-21 17:34 ` Greg KH
2010-11-21 21:48 ` David Rientjes
2010-11-21 23:08 ` [patch 2/2 v2] " David Rientjes
2010-11-22 0:56 ` Greg KH
2010-11-28 1:52 ` David Rientjes
2010-11-28 5:17 ` Greg KH
2010-11-30 0:04 ` David Rientjes
2010-11-21 14:26 ` [patch 1/2] x86: add numa=possible command line option Américo Wang
2010-11-21 21:46 ` David Rientjes
2010-11-22 15:43 ` Américo Wang
2010-11-21 15:14 ` [2/8,v3] NUMA Hotplug Emulator: infrastructure of NUMA hotplug emulation Li, Haicheng
2010-11-21 21:42 ` David Rientjes
2010-11-18 21:19 ` David Rientjes
2010-11-17 2:08 ` [3/8,v3] NUMA Hotplug Emulator: Userland interface to hotplug-add fake offlined nodes shaohui.zheng
2010-11-17 8:16 ` David Rientjes
2010-11-17 2:08 ` [4/8,v3] NUMA Hotplug Emulator: Abstract cpu register functions shaohui.zheng
2010-11-17 2:08 ` [5/8,v3] NUMA Hotplug Emulator: support cpu probe/release in x86 shaohui.zheng
2010-11-21 14:45 ` Américo Wang
2010-11-22 0:01 ` Shaohui Zheng
2010-11-22 15:51 ` Américo Wang
2010-11-22 23:29 ` Shaohui Zheng
2010-11-17 2:08 ` [6/8,v3] NUMA Hotplug Emulator: Fake CPU socket with logical CPU on x86 shaohui.zheng
2010-11-17 2:08 ` [7/8,v3] NUMA Hotplug Emulator: extend memory probe interface to support NUMA shaohui.zheng
2010-11-17 18:50 ` Dave Hansen
2010-11-17 21:18 ` David Rientjes
2010-11-17 21:55 ` Dave Hansen
2010-11-17 22:44 ` David Rientjes
2010-11-17 23:00 ` Dave Hansen
2010-11-17 23:17 ` David Rientjes
2010-11-18 16:59 ` Aaron Durbin
2010-11-18 4:48 ` Shaohui Zheng
2010-11-18 6:24 ` Paul Mundt
2010-11-18 21:28 ` David Rientjes
2010-11-18 21:31 ` David Rientjes
2010-11-18 4:36 ` Shaohui Zheng
2010-11-19 7:51 ` Shaohui Zheng
2010-11-19 16:36 ` Dave Hansen
2010-11-17 2:08 ` [8/8,v3] NUMA Hotplug Emulator: documentation shaohui.zheng
2010-11-17 23:06 ` Randy Dunlap
2010-11-18 2:31 ` Shaohui Zheng
2010-11-21 15:03 ` Américo Wang
2010-11-21 15:16 ` Li, Haicheng
2010-11-21 23:33 ` Shaohui Zheng
2010-11-22 16:04 ` Américo Wang
2010-11-22 23:23 ` Shaohui Zheng
2010-11-17 5:22 ` [0/8,v3] NUMA Hotplug Emulator - Introduction & Feedbacks Paul Mundt
2010-11-19 5:54 ` Shaohui Zheng
2010-11-17 9:26 ` Yinghai Lu
2010-11-18 2:03 ` Shaohui Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101118041407.GA2408@shaohui \
--to=shaohui.zheng@intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=haicheng.li@intel.com \
--cc=haicheng.li@linux.intel.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=shaohui.zheng@linux.intel.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).