linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU
@ 2013-04-19  5:23 Yasuaki Ishimatsu
  2013-04-22 22:35 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2013-04-19  5:23 UTC (permalink / raw)
  To: kosaki.motohiro, mingo, hpa, srivatsa.bhat, akpm
  Cc: linux-kernel, x86, linux-mm

When booting x86 system contains memoryless node, node numbers of CPUs
on memoryless node were changed to nearest online node number by
init_cpu_to_node() because the node is not online.

In my system, node numbers of cpu#30-44 and 75-89 were changed from 2 to 0
as follows:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89
node 0 size: 32394 MB
node 0 free: 27898 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74
node 1 size: 32768 MB
node 1 free: 30335 MB

If we hot add memory to memoryless node and offine/online all CPUs on
the node, node numbers of these CPUs are changed to correct node numbers
by srat_detect_node() because the node become online.

In this case, node numbers of cpu#30-44 and 75-89 were changed from 0 to 2
in my system as follows:

$ numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 45 46 47 48 49 50 51 52 53 54 55
56 57 58 59
node 0 size: 32394 MB
node 0 free: 27218 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74
node 1 size: 32768 MB
node 1 free: 30014 MB
node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 75 76 77 78 79 80 81
82 83 84 85 86 87 88 89
node 2 size: 16384 MB
node 2 free: 16384 MB

But "cpu to node" and "node to cpu" links were not changed as follows:

$ ls /sys/devices/system/cpu/cpu30/|grep node
node0
$ ls /sys/devices/system/node/node0/|grep cpu30
cpu30

"numactl --hardware" shows that cpu30 belongs to node 2. But sysfs links
does  not change.

This patch changes "cpu to node" and "node to cpu" links when node number
changed by onlining CPU.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
v2:
Change argument's name from num to cpuid in store_online()
Add comments for explaining why node number change
---
 drivers/base/cpu.c |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index fb10728..229d6e7 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -25,6 +25,15 @@ EXPORT_SYMBOL_GPL(cpu_subsys);
 static DEFINE_PER_CPU(struct device *, cpu_sys_devices);
 
 #ifdef CONFIG_HOTPLUG_CPU
+static void change_cpu_under_node(struct cpu *cpu,
+			unsigned int from_nid, unsigned int to_nid)
+{
+	int cpuid = cpu->dev.id;
+	unregister_cpu_under_node(cpuid, from_nid);
+	register_cpu_under_node(cpuid, to_nid);
+	cpu->node_id = to_nid;
+}
+
 static ssize_t show_online(struct device *dev,
 			   struct device_attribute *attr,
 			   char *buf)
@@ -39,17 +48,29 @@ static ssize_t __ref store_online(struct device *dev,
 				  const char *buf, size_t count)
 {
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
+	int cpuid = cpu->dev.id;
+	int from_nid, to_nid;
 	ssize_t ret;
 
 	cpu_hotplug_driver_lock();
 	switch (buf[0]) {
 	case '0':
-		ret = cpu_down(cpu->dev.id);
+		ret = cpu_down(cpuid);
 		if (!ret)
 			kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
 		break;
 	case '1':
-		ret = cpu_up(cpu->dev.id);
+		from_nid = cpu_to_node(cpuid);
+		ret = cpu_up(cpuid);
+
+		/*
+		 * When hot adding memory to memoryless node and enabling a cpu
+		 * on the node, node number of the cpu may internally change.
+		 */
+		to_nid = cpu_to_node(cpuid);
+		if (from_nid != to_nid)
+			change_cpu_under_node(cpu, from_nid, to_nid);
+
 		if (!ret)
 			kobject_uevent(&dev->kobj, KOBJ_ONLINE);
 		break;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU
  2013-04-19  5:23 [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU Yasuaki Ishimatsu
@ 2013-04-22 22:35 ` Andrew Morton
  2013-04-23  0:04   ` Yasuaki Ishimatsu
  2013-04-23 16:06   ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2013-04-22 22:35 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: kosaki.motohiro, mingo, hpa, srivatsa.bhat, linux-kernel, x86,
	linux-mm

On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:

> When booting x86 system contains memoryless node, node numbers of CPUs
> on memoryless node were changed to nearest online node number by
> init_cpu_to_node() because the node is not online.
> 
> ...
>
> If we hot add memory to memoryless node and offine/online all CPUs on
> the node, node numbers of these CPUs are changed to correct node numbers
> by srat_detect_node() because the node become online.

OK, here's a dumb question.

At boot time the CPUs are assigned to the "nearest online node" rather
than to their real memoryless node.  The patch arranges for those CPUs
to still be assigned to the "nearest online node" _after_ some memory
is hot-added to their real node.  Correct?

Would it not be better to fix this by assigning those CPUs to their real,
memoryless node right at the initial boot?  Or is there something in
the kernel which makes cpus-on-a-memoryless-node not work correctly?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU
  2013-04-22 22:35 ` Andrew Morton
@ 2013-04-23  0:04   ` Yasuaki Ishimatsu
  2013-04-23  0:34     ` Andrew Morton
  2013-04-23 16:06   ` Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2013-04-23  0:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, mingo, hpa, srivatsa.bhat, linux-kernel, x86,
	linux-mm

2013/04/23 7:35, Andrew Morton wrote:
> On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> When booting x86 system contains memoryless node, node numbers of CPUs
>> on memoryless node were changed to nearest online node number by
>> init_cpu_to_node() because the node is not online.
>>
>> ...
>>
>> If we hot add memory to memoryless node and offine/online all CPUs on
>> the node, node numbers of these CPUs are changed to correct node numbers
>> by srat_detect_node() because the node become online.
>
> OK, here's a dumb question.
>
> At boot time the CPUs are assigned to the "nearest online node" rather
> than to their real memoryless node.  The patch arranges for those CPUs
> to still be assigned to the "nearest online node" _after_ some memory
> is hot-added to their real node.  Correct?

Yes. For changing node number of CPUs safely, we should offline CPUs.

>
> Would it not be better to fix this by assigning those CPUs to their real,
> memoryless node right at the initial boot?  Or is there something in
> the kernel which makes cpus-on-a-memoryless-node not work correctly?
>

I think assigning CPUs to real node is better. But current Linux's node
strongly depend on memory. Thus if we just create cpus-on-a-memoryless-node,
the kernel cannot work correctly.

Thanks,
Yasuaki Ishimatsu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU
  2013-04-23  0:04   ` Yasuaki Ishimatsu
@ 2013-04-23  0:34     ` Andrew Morton
  2013-04-23  1:24       ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2013-04-23  0:34 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: kosaki.motohiro, mingo, hpa, srivatsa.bhat, linux-kernel, x86,
	linux-mm

On Tue, 23 Apr 2013 09:04:46 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:

> 2013/04/23 7:35, Andrew Morton wrote:
> > On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
> >
> >> When booting x86 system contains memoryless node, node numbers of CPUs
> >> on memoryless node were changed to nearest online node number by
> >> init_cpu_to_node() because the node is not online.
> >>
> >> ...
> >>
> >> If we hot add memory to memoryless node and offine/online all CPUs on
> >> the node, node numbers of these CPUs are changed to correct node numbers
> >> by srat_detect_node() because the node become online.
> >
> > OK, here's a dumb question.
> >
> > At boot time the CPUs are assigned to the "nearest online node" rather
> > than to their real memoryless node.  The patch arranges for those CPUs
> > to still be assigned to the "nearest online node" _after_ some memory
> > is hot-added to their real node.  Correct?
> 
> Yes. For changing node number of CPUs safely, we should offline CPUs.
> 
> >
> > Would it not be better to fix this by assigning those CPUs to their real,
> > memoryless node right at the initial boot?  Or is there something in
> > the kernel which makes cpus-on-a-memoryless-node not work correctly?
> >
> 
> I think assigning CPUs to real node is better. But current Linux's node
> strongly depend on memory. Thus if we just create cpus-on-a-memoryless-node,
> the kernel cannot work correctly.

hm, why.  I'd have thought that if we tell the kernel something like
"this node has one zone, the size of which is zero bytes" then a
surprising amount of the existing code will Just Work.

What goes wrong?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU
  2013-04-23  0:34     ` Andrew Morton
@ 2013-04-23  1:24       ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2013-04-23  1:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, mingo, hpa, srivatsa.bhat, linux-kernel, x86,
	linux-mm

2013/04/23 9:34, Andrew Morton wrote:
> On Tue, 23 Apr 2013 09:04:46 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>
>> 2013/04/23 7:35, Andrew Morton wrote:
>>> On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> wrote:
>>>
>>>> When booting x86 system contains memoryless node, node numbers of CPUs
>>>> on memoryless node were changed to nearest online node number by
>>>> init_cpu_to_node() because the node is not online.
>>>>
>>>> ...
>>>>
>>>> If we hot add memory to memoryless node and offine/online all CPUs on
>>>> the node, node numbers of these CPUs are changed to correct node numbers
>>>> by srat_detect_node() because the node become online.
>>>
>>> OK, here's a dumb question.
>>>
>>> At boot time the CPUs are assigned to the "nearest online node" rather
>>> than to their real memoryless node.  The patch arranges for those CPUs
>>> to still be assigned to the "nearest online node" _after_ some memory
>>> is hot-added to their real node.  Correct?
>>
>> Yes. For changing node number of CPUs safely, we should offline CPUs.
>>
>>>
>>> Would it not be better to fix this by assigning those CPUs to their real,
>>> memoryless node right at the initial boot?  Or is there something in
>>> the kernel which makes cpus-on-a-memoryless-node not work correctly?
>>>
>>
>> I think assigning CPUs to real node is better. But current Linux's node
>> strongly depend on memory. Thus if we just create cpus-on-a-memoryless-node,
>> the kernel cannot work correctly.
>
> hm, why.  I'd have thought that if we tell the kernel something like
> "this node has one zone, the size of which is zero bytes" then a
> surprising amount of the existing code will Just Work.
>
> What goes wrong?

Sorry I forgot detailed issue.
When I saw following issue, I tried to fix it and found that current
Linux's node strongly depend on memory.
https://lkml.org/lkml/2012/9/12/20

I'll try to fix it again.

Thanks,
Yasuaki Ishimatsu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU
  2013-04-22 22:35 ` Andrew Morton
  2013-04-23  0:04   ` Yasuaki Ishimatsu
@ 2013-04-23 16:06   ` Andi Kleen
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2013-04-23 16:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yasuaki Ishimatsu, kosaki.motohiro, mingo, hpa, srivatsa.bhat,
	linux-kernel, x86, linux-mm

Andrew Morton <akpm@linux-foundation.org> writes:
>
> Would it not be better to fix this by assigning those CPUs to their real,
> memoryless node right at the initial boot?  Or is there something in
> the kernel which makes cpus-on-a-memoryless-node not work correctly?

I probably added this originally. The original reason was that long
ago the VM was broken with memory less nodes. These days it is likely
obsolete.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-23 16:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19  5:23 [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU Yasuaki Ishimatsu
2013-04-22 22:35 ` Andrew Morton
2013-04-23  0:04   ` Yasuaki Ishimatsu
2013-04-23  0:34     ` Andrew Morton
2013-04-23  1:24       ` Yasuaki Ishimatsu
2013-04-23 16:06   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).