* [ANNOUNCE] [PATCH] Node Hotplug Support
@ 2004-05-07 15:39 Keiichiro Tokunaga
2004-05-07 15:49 ` Dave Hansen
2004-05-07 16:16 ` [ANNOUNCE] [PATCH] " Martin J. Bligh
0 siblings, 2 replies; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-07 15:39 UTC (permalink / raw)
To: linux-kernel, linux-hotplug-devel, lhns-devel
Hi,
I'm announcing a new project "LHNS (Linux Hotplug Node Support)"
and patches for 2.6.5:
http://sourceforge.net/projects/lhns/
Goal
====
The main goal of LHNS is to support "node hotplug" in Linux.
Where, "node" is a container device that contains CPU, memory,
and/or I/O devices. The definition of "node" is different from one
of "NUMA node".
- "NUMA node" is: a block of memory and the CPUs, I/O, etc.
physically on the same bus as the memory (according to NUMA
project website, this is a common definition).
- "node" used here is: a hotpluggable hardware that contains
CPUs, memory, and/or I/O.
The "node" and "NUMA node" are not always the same. (I'm
sorry for the inconvenient :( For instance, the "node" could be a
device that contains:
- processors, memory, and I/O devices
- processors and memory
- processors only
- memory only
- I/O devices only
- etc
"node hotplug" allows you to hot-add and hot-remove "node"
without stopping the system.
Why?
====
Someone might think like "Why don't we invoke CPU, memory, IO
hotplug individually without node hotplug?". However, CPU,
memory, and IO hotplug cannot remove a node (container device)
from the system physically. That's a node hotplug's job. Also,
when hotplug request occurs on a node, node hotplug searchs
resouces on the node, and invokes CPU, memory, and IO hotplug
in proper order. Actually, the order is very important. For instance,
when hot-adding a NUMA node, memory should be added first
so that CPUs could allocate data in the memory while the CPUs is
being added. Otherwise, CPUs need to allocate it in other memory
on other node. This might cause performance issue.
Design
======
ACPI is used to do some hardware manipulation.
There is no general purpose interface to get hardware information
and manipulate hardware today, but hardware proprietary interfaces.
ACPI is one of them, and I decided to use it because:
- Its spec is open.
- I can use it without any hardware special knowledge:)
The following assumptions are necessary for node hotplug:
- The system has hotpluggable "node" (hardware).
- Each of "node" is defined as a container device (HID=PNP0A05)
in ACPI namespace.
- Existing CPU, memory, I/O hotplug (LHCS, LHMS, PCI HotPlug
for Linux, etc) produce a hook (function) for Node hotplug.
"node hotplug" consists of one main part and three sub parts so far:
- ACPI container device hotplug (main)
- ACPI based CPU hotplug (sub)
- ACPI based memory hotplug (sub)
- ACPI based I/O hotplug (sub)
"ACPI container device hotplug" is: the main part of node hotplug,
which is supposed to do:
[hot-addition]
1. Its handler is invoked when ACPI notify is occured by a node
attached to the system.
2. This creates data structures for the node.
If CONFIG_ACPI_NUMA_CONTAINER=y, this handles NUMA
related data also (not implemented yet).
3. This invokes ACPI based CPU/memory/IO hotplug in proper
order with proper arguments (e.g. acpi_handle of the added node)
4. This notifies userland that the node is added/removed.
5. This writes results of hotplug processing to a log file.
[hot-removal]
1. Its handler is invoked when ACPI notify is occured or user issues
hot-removal request via sysfs.
2. This deletes data structures of the node.
If CONFIG_ACPI_NUMA_CONTAINER=y, this handles NUMA
related data also (not implemented yet).
3. This invokes ACPI based CPU/memory/IO hotplug in proper
order with proper arguments (e.g. acpi_handle of the node)
4. This evaluates _EJ0 method of the node.
5. This notifies userland that the node is added/removed.
6. This writes results of hotplugging into log file.
"ACPI based CPU/memory/IO hotplug" are: the sub part, which are
supposed to do:
1. They are invoked by the main part with the argument.
2. They search ACPI namespace to check if there are any devices
on the node that they should handle. For instance, if ACPI
based CPU hotplug searches ACPI namespace to see if there
are any CPUs on an added node.
3. If they find devices, they do something and call existing hotplug
features:
o LHCS (Linux Hotplug CPU Support)
o LHMS (Linux Hotplug Memory Support)
o PCI HotPlug for Linux
o etc.
with some appropriate arguments (TBD).
Then each hotplug handles each device hotplug.
See http://lhns.sourceforge.net/ for more information and figures
(module and component diagram).
Patches
=======
The following nine patches are available at:
http://sourceforge.net/projects/lhns/
(Actually, these files are packed into .bz2 file.)
- p00001_sci_emu.patch:
SCI interrupt emulation.
- p00002_over-20030109.patch:
Override an original BIOS's DSDT with a DSDT that you make up.
- p00003_procfs_util.patch
Extension for procfs.
A new function remove_recursive_proc_entry().
- p00004_acpi_processor_bug.patch
Replace remove_proc_entry() to remove_recursive_proc_entry()
in drivers/acpi/processor.c to avoid getting Call Trace.
- p00005_acpi_hp_config.patch
Kconfigs and Makefiles for ACPI hotplug.
- p00006_acpi_core.patch
Changes to ACPI core part for hotplug.
A new function acpi_bus_scan_free(), etc...
- p00007_acpi_hp_util.patch
Utility functions for ACPI hotplug.
- p00008_acpi_container_hp.patch
The main part of node hotplug.
- p00009_acpi_container_hp_cpu.patch
The sub part (CPU) of node hotplug.
Current Status
==============
The first patches have been tested in an emulation and a virtual
environment without LHCS, LHMS, etc's codes. However, they are
very early versions of the code, so there are still many rough codes,
not Linux style codes, and TBDs. The first patches are just nice
to show my idea and how they work. Some sub parts and NUMA
node support have not been implemented yet. (Please don't
enable "Hotplug memory" and "Hotplug IO" when you're doing
"make config":)
Usage
=====
See "Documents" section at http://lhns.sourceforge.net/. There
are some documents and instructions:
- How to build the kernel with LHNS patches
- How to make a fake DSDT
- How to generate SCI interrupt (emulation)
- How to invoke hot-addition and hot-removal for container hotplug
Please feel free to take it for a spin!
Plan
====
First of all, I'd like to discuss the design of node hotplug and
interface with hotplug folks. After I get feedback from the people,
I'll update my patches. Also I'll release the following features in
the near future. Then I'll make them work all together and test
them.
- NUMA node support
- ACPI based memory hotplug
- ACPI based IO hotplug
- H2P bridge hotplug
- P2P brdige hotplug
- IOSAPIC hotplug
Any comments, bug reports, or suggestions are welcome.
Thanks,
Keiichiro Tokunaga
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-07 15:39 [ANNOUNCE] [PATCH] Node Hotplug Support Keiichiro Tokunaga
@ 2004-05-07 15:49 ` Dave Hansen
2004-05-10 1:47 ` Keiichiro Tokunaga
2004-05-07 16:16 ` [ANNOUNCE] [PATCH] " Martin J. Bligh
1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2004-05-07 15:49 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Linux Kernel Mailing List, hotplug devel, lhns-devel
On Fri, 2004-05-07 at 08:39, Keiichiro Tokunaga wrote:
> First of all, I'd like to discuss the design of node hotplug and
> interface with hotplug folks. After I get feedback from the people,
> I'll update my patches. Also I'll release the following features in
> the near future. Then I'll make them work all together and test
> them.
>
> - NUMA node support
> - ACPI based memory hotplug
> - ACPI based IO hotplug
> - H2P bridge hotplug
> - P2P brdige hotplug
> - IOSAPIC hotplug
How does this interoperate with the current NUMA topology already in
sysfs today? I don't see any references at all to the current code.
-- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-07 15:39 [ANNOUNCE] [PATCH] Node Hotplug Support Keiichiro Tokunaga
2004-05-07 15:49 ` Dave Hansen
@ 2004-05-07 16:16 ` Martin J. Bligh
2004-05-10 2:03 ` Keiichiro Tokunaga
1 sibling, 1 reply; 19+ messages in thread
From: Martin J. Bligh @ 2004-05-07 16:16 UTC (permalink / raw)
To: Keiichiro Tokunaga, linux-kernel, linux-hotplug-devel, lhns-devel
> ACPI is used to do some hardware manipulation.
> There is no general purpose interface to get hardware information
> and manipulate hardware today, but hardware proprietary interfaces.
> ACPI is one of them, and I decided to use it because:
>
> - Its spec is open.
> - I can use it without any hardware special knowledge:)
You can't base platform-independant Linux code on ACPI, when not all
NUMA boxes will support it. The fact that your particular box may
support it doesn't make it a generally applicable idea ;-)
You need a better abstraction (and preferably one without the massive
complexity whilst you're at it).
M.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
[not found] <1TfLX-4M4-9@gated-at.bofh.it>
@ 2004-05-07 19:10 ` Andi Kleen
2004-05-08 18:00 ` Ingo Oeser
2004-05-10 2:17 ` Keiichiro Tokunaga
0 siblings, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2004-05-07 19:10 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: linux-kernel
Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com> writes:
Hallo,
>
> Where, "node" is a container device that contains CPU, memory,
> and/or I/O devices. The definition of "node" is different from one
> of "NUMA node".
This does not really fit well into the Linux/Unix philosophy to do one thing
with one mechanism and do it well, and keep policy in user space.
Better would be to do individual mechanisms to hot plug these various
things; and if you feel the need to combine them e.g. for administrative
purposes do this in user space.
> Why?
> ====
> Someone might think like "Why don't we invoke CPU, memory, IO
> hotplug individually without node hotplug?". However, CPU,
> memory, and IO hotplug cannot remove a node (container device)
> from the system physically. That's a node hotplug's job. Also,
You can remove them physically as soon as all the hardware
in it is removed. There can be many more things than you listed
anyways on such a "node"; lprocesses which are bound to the CPU
of the node (which need to be killed) or device drivers bound
to slots on the node (which need to be shutdown). I do not think it
makes much sense to attempt to combine all these at kernel level;
that is more a job for a shell script. The bad experience with
the current sysfs power management hierarchy shows that the kernel
is a poor place to attempt this.
> when hotplug request occurs on a node, node hotplug searchs
> resouces on the node, and invokes CPU, memory, and IO hotplug
> in proper order. Actually, the order is very important. For instance,
> when hot-adding a NUMA node, memory should be added first
> so that CPUs could allocate data in the memory while the CPUs is
> being added. Otherwise, CPUs need to allocate it in other memory
> on other node. This might cause performance issue.
This order can be also easily done in user space.
This has the advantage that if there is some reason to
add CPUs without memory (or memory without CPUs or PCI slots
without anything) it will just work too. It is not clear
that your "lump everything mechanism into one" can handle all
that. Most likely you would need to add lots of special
cases to it to handle all this. Separate mechanisms can
do this cleaner.
Admittedly there would still need to be some coordination in case you
would want to remove a whole building block of your machine like
you said. An nice way to do this would be to add an atomic "to be removed"
state to the individual unregister mechanisms that prevents the
device from being reregistered until removed.
> Design
> ======
> ACPI is used to do some hardware manipulation.
That will not work as a portable mechanism. Most linux ports
including some that support NUMA and hotplug do not have ACPI.
Rather you could do an layer to do this which is implemented
in the architecture. Then all ACPi supporting architectures could
use a common implementation and the other architectures their own.
The design of this layer would need some discussion first to
make sure it does not have too many assumptions from your
system that would not hold on others.
-Andi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-07 19:10 ` Andi Kleen
@ 2004-05-08 18:00 ` Ingo Oeser
2004-05-10 2:17 ` Keiichiro Tokunaga
1 sibling, 0 replies; 19+ messages in thread
From: Ingo Oeser @ 2004-05-08 18:00 UTC (permalink / raw)
To: linux-kernel; +Cc: Andi Kleen, Keiichiro Tokunaga
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Friday 07 May 2004 21:10, Andi Kleen wrote:
> Admittedly there would still need to be some coordination in case you
> would want to remove a whole building block of your machine like
> you said. An nice way to do this would be to add an atomic "to be removed"
> state to the individual unregister mechanisms that prevents the
> device from being reregistered until removed.
This is also needed for "to be replaced" state.
Imagine a node with b0rken CPU, where you still like to use its memory.
Both states can be the same actually ;-)
Regards
Ingo Oeser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAnSA1U56oYWuOrkARAnjYAJ9CfcVGC9GnqlmvSpwRzI10jj7WGwCguKYq
qtawhKJiy/j8r0d3qfqBrSw=
=Z+oX
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-07 15:49 ` Dave Hansen
@ 2004-05-10 1:47 ` Keiichiro Tokunaga
2004-05-10 2:12 ` [Lhns-devel] " Takayoshi Kochi
2004-05-10 5:45 ` Dave Hansen
0 siblings, 2 replies; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-10 1:47 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-kernel, linux-hotplug-devel, lhns-devel
On Fri, 07 May 2004 08:49:05 -0700
Dave Hansen <haveblue@us.ibm.com> wrote:
> On Fri, 2004-05-07 at 08:39, Keiichiro Tokunaga wrote:
> > First of all, I'd like to discuss the design of node hotplug and
> > interface with hotplug folks. After I get feedback from the people,
> > I'll update my patches. Also I'll release the following features in
> > the near future. Then I'll make them work all together and test
> > them.
> >
> > - NUMA node support
> > - ACPI based memory hotplug
> > - ACPI based IO hotplug
> > - H2P bridge hotplug
> > - P2P brdige hotplug
> > - IOSAPIC hotplug
>
> How does this interoperate with the current NUMA topology already in
> sysfs today? I don't see any references at all to the current code.
There is no NUMA support in the current code yet. I'll post a
rough patch to show my idea soon. I'm thinking to regard a
container device that has PXM as a NUMA node so far.
Thanks,
Kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-07 16:16 ` [ANNOUNCE] [PATCH] " Martin J. Bligh
@ 2004-05-10 2:03 ` Keiichiro Tokunaga
0 siblings, 0 replies; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-10 2:03 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, linux-hotplug-devel, lhns-devel
On Fri, 07 May 2004 09:16:51 -0700
"Martin J. Bligh" <mbligh@aracnet.com> wrote:
> > ACPI is used to do some hardware manipulation.
> > There is no general purpose interface to get hardware information
> > and manipulate hardware today, but hardware proprietary interfaces.
> > ACPI is one of them, and I decided to use it because:
> >
> > - Its spec is open.
> > - I can use it without any hardware special knowledge:)
>
> You can't base platform-independant Linux code on ACPI, when not all
> NUMA boxes will support it. The fact that your particular box may
> support it doesn't make it a generally applicable idea ;-)
I'm not trying to base everything on ACPI, this happens in the first
patces though. I would separate them in the future release.
(Actually I put comments about it in the first patch:)
There will be the dependent and independent codes. Any platform
can share the independent code, and dependent code needs to
be made by each platform. I think that PCI hotplug takes the same
way.
Thanks,
Kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lhns-devel] Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-10 1:47 ` Keiichiro Tokunaga
@ 2004-05-10 2:12 ` Takayoshi Kochi
2004-05-10 11:20 ` Keiichiro Tokunaga
2004-05-10 5:45 ` Dave Hansen
1 sibling, 1 reply; 19+ messages in thread
From: Takayoshi Kochi @ 2004-05-10 2:12 UTC (permalink / raw)
To: tokunaga.keiich; +Cc: haveblue, linux-kernel, linux-hotplug-devel, lhns-devel
From: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Subject: [Lhns-devel] Re: [ANNOUNCE] [PATCH] Node Hotplug Support
Date: Mon, 10 May 2004 10:47:25 +0900
> > How does this interoperate with the current NUMA topology already in
> > sysfs today? I don't see any references at all to the current code.
>
> There is no NUMA support in the current code yet. I'll post a
> rough patch to show my idea soon. I'm thinking to regard a
> container device that has PXM as a NUMA node so far.
I've not looking closely into the code, but why do you use "PNP0A05"
for container device?
"PNP0A05" is defined as "Generic ISA devie" in the ACPI spec.
I think "module device (ACPI0004)" sounds more suitable for the
purpose, though I don't know whether your hardware will support it
or not.
Also, assuming devices that have _PXM are nodes sounds a bit too
aggressive for me. For example, something like below is possible.
Device(\_SB) {
Processor(CPU0...) {
Name(_PXM, 0)
}
Processor(CPU1...) {
Name(_PXM, 1)
}
Device(PCI0) {
Name(_PXM, 0)
}
Device(PCI1) {
Name(_PXM,1)
}
}
(I don't know if such an implementation exists, but from the spec,
it is possible)
In this case, OS has to group devices by same number.
Please don't assume specific ACPI AML implementatin as a generic
rule.
---
Takayoshi Kochi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-07 19:10 ` Andi Kleen
2004-05-08 18:00 ` Ingo Oeser
@ 2004-05-10 2:17 ` Keiichiro Tokunaga
2004-05-10 19:54 ` Andi Kleen
1 sibling, 1 reply; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-10 2:17 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, lhns-devel
On Fri, 07 May 2004 21:10:58 +0200
Andi Kleen <ak@muc.de> wrote:
> Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com> writes:
>
> Hallo,
> >
> > Where, "node" is a container device that contains CPU, memory,
> > and/or I/O devices. The definition of "node" is different from one
> > of "NUMA node".
>
> This does not really fit well into the Linux/Unix philosophy to do one thing
> with one mechanism and do it well, and keep policy in user space.
Policy could be moved in user space as you mentioned.
> Better would be to do individual mechanisms to hot plug these various
> things; and if you feel the need to combine them e.g. for administrative
> purposes do this in user space.
"node hotplug" of LHNS is not just a controller to invoke CPU,
memory, IO hotplug with a policy. It does hotplug a container
device also. When a container device is attached to the system,
it should be detected by someone. Node hotplug (kernel) can do it.
> > Why?
> > ====
> > Someone might think like "Why don't we invoke CPU, memory, IO
> > hotplug individually without node hotplug?". However, CPU,
> > memory, and IO hotplug cannot remove a node (container device)
> > from the system physically. That's a node hotplug's job. Also,
>
> You can remove them physically as soon as all the hardware
> in it is removed. There can be many more things than you listed
> anyways on such a "node"; lprocesses which are bound to the CPU
> of the node (which need to be killed) or device drivers bound
> to slots on the node (which need to be shutdown). I do not think it
> makes much sense to attempt to combine all these at kernel level;
> that is more a job for a shell script. The bad experience with
> the current sysfs power management hierarchy shows that the kernel
> is a poor place to attempt this.
As you mentioned, there can be many more things. However,
the examples you showed should be handled by individual hotplug.
For instance, the processes bound to a certain CPU should be
handled by CPU hotplug when node hotplug invokes CPU hotplug.
Also, the device drivers bound to slots should be detached by
IO hotplug. Node hotplug is supposed to take care of a container
device and just invoke individual hotplugs for other devices.
Node hotplug doesn't break any policy of each hotplug (CPU,
memory, IO, etc).
Shell script could handle hot-removal of container hotplug, but
detecting an added container device needs to be done in kernel
level.
> > when hotplug request occurs on a node, node hotplug searchs
> > resouces on the node, and invokes CPU, memory, and IO hotplug
> > in proper order. Actually, the order is very important. For instance,
> > when hot-adding a NUMA node, memory should be added first
> > so that CPUs could allocate data in the memory while the CPUs is
> > being added. Otherwise, CPUs need to allocate it in other memory
> > on other node. This might cause performance issue.
>
> This order can be also easily done in user space.
Yes.
> This has the advantage that if there is some reason to
> add CPUs without memory (or memory without CPUs or PCI slots
> without anything) it will just work too. It is not clear
> that your "lump everything mechanism into one" can handle all
> that. Most likely you would need to add lots of special
> cases to it to handle all this. Separate mechanisms can
> do this cleaner.
Yes. The cases (combination of hardware) would increase.
However, I don't think there would be that many today.
> Admittedly there would still need to be some coordination in case you
> would want to remove a whole building block of your machine like
> you said. An nice way to do this would be to add an atomic "to be removed"
> state to the individual unregister mechanisms that prevents the
> device from being reregistered until removed.
I agree. Actually, I already prepare the "to be removed" state in
node hotplug, it's not used yet though:) It's also necessary for
individual hotplug as you mentioned.
> > Design
> > ======
> > ACPI is used to do some hardware manipulation.
>
> That will not work as a portable mechanism. Most linux ports
> including some that support NUMA and hotplug do not have ACPI.
>
> Rather you could do an layer to do this which is implemented
> in the architecture. Then all ACPi supporting architectures could
> use a common implementation and the other architectures their own.
> The design of this layer would need some discussion first to
> make sure it does not have too many assumptions from your
> system that would not hold on others.
Are you saying here that platform-independent and dependent
codes should be separated so that the independent part could
be implemented by any other platform (firmware) in their own
style? If so, I agree and some discussion are necessary :)
Thanks,
Kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-10 1:47 ` Keiichiro Tokunaga
2004-05-10 2:12 ` [Lhns-devel] " Takayoshi Kochi
@ 2004-05-10 5:45 ` Dave Hansen
2004-05-13 1:27 ` Keiichiro Tokunaga
1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2004-05-10 5:45 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Linux Kernel Mailing List, hotplug devel, lhns-devel
On Sun, 2004-05-09 at 18:47, Keiichiro Tokunaga wrote:
> There is no NUMA support in the current code yet. I'll post a
> rough patch to show my idea soon. I'm thinking to regard a
> container device that has PXM as a NUMA node so far.
Don't you think it would be a good idea to work with some of the current
code, instead of trying to wrap around it?
I'm sure Matt Dobson can give you some great ideas about things in the
current NUMA code that aren't hotplug safe. That really needs to be
done before any other work, anyway.
-- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lhns-devel] Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-10 2:12 ` [Lhns-devel] " Takayoshi Kochi
@ 2004-05-10 11:20 ` Keiichiro Tokunaga
0 siblings, 0 replies; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-10 11:20 UTC (permalink / raw)
To: Takayoshi Kochi; +Cc: haveblue, linux-kernel, linux-hotplug-devel, lhns-devel
On Mon, 10 May 2004 11:12:40 +0900 (JST)
Takayoshi Kochi <t-kochi@bq.jp.nec.com> wrote:
> From: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
> Subject: [Lhns-devel] Re: [ANNOUNCE] [PATCH] Node Hotplug Support
> Date: Mon, 10 May 2004 10:47:25 +0900
>
> > > How does this interoperate with the current NUMA topology already in
> > > sysfs today? I don't see any references at all to the current code.
> >
> > There is no NUMA support in the current code yet. I'll post a
> > rough patch to show my idea soon. I'm thinking to regard a
> > container device that has PXM as a NUMA node so far.
>
> I've not looking closely into the code, but why do you use "PNP0A05"
> for container device?
> "PNP0A05" is defined as "Generic ISA devie" in the ACPI spec.
>
> I think "module device (ACPI0004)" sounds more suitable for the
> purpose, though I don't know whether your hardware will support it
> or not.
Yes. ACPI0004 could be used as well. Actually, PNP0A05 is
redefined as "Generic Container Device" in ACPI spec2.0c.
According to that, both PNP0A05 and ACPI0004 behave
almost the same way. PNP0A05 also sounds nice for me:)
> Also, assuming devices that have _PXM are nodes sounds a bit too
> aggressive for me. For example, something like below is possible.
>
> Device(\_SB) {
> Processor(CPU0...) {
> Name(_PXM, 0)
> }
> Processor(CPU1...) {
> Name(_PXM, 1)
> }
>
> Device(PCI0) {
> Name(_PXM, 0)
> }
> Device(PCI1) {
> Name(_PXM,1)
> }
> }
>
> (I don't know if such an implementation exists, but from the spec,
> it is possible)
> In this case, OS has to group devices by same number.
> Please don't assume specific ACPI AML implementatin as a generic
> rule.
If ACPI ASL is written that way, LHNS cannot do anything since
there is no container device (this is not ACPI term here) in the
system. LHNS's target is a container *device* (again, not ACPI
term:) and hotplugs it physically (of course, the device needs to
be hotpluggable). So, LHNS can handle a NUMA node if all
CPUs and memory of the node are on a container device. I know
that it's a big restriction :(
Thanks,
kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-10 2:17 ` Keiichiro Tokunaga
@ 2004-05-10 19:54 ` Andi Kleen
0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2004-05-10 19:54 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Andi Kleen, linux-kernel, lhns-devel
On Mon, May 10, 2004 at 11:17:22AM +0900, Keiichiro Tokunaga wrote:
> As you mentioned, there can be many more things. However,
> the examples you showed should be handled by individual hotplug.
> For instance, the processes bound to a certain CPU should be
> handled by CPU hotplug when node hotplug invokes CPU hotplug.
> Also, the device drivers bound to slots should be detached by
> IO hotplug. Node hotplug is supposed to take care of a container
> device and just invoke individual hotplugs for other devices.
> Node hotplug doesn't break any policy of each hotplug (CPU,
> memory, IO, etc).
To coordinate all them you need user space infrastructure anyways
that knows about all the elements of a node. I don't see why you want a
kernel level container too; userspace can do this job already
and would be much more flexible.
> > This has the advantage that if there is some reason to
> > add CPUs without memory (or memory without CPUs or PCI slots
> > without anything) it will just work too. It is not clear
> > that your "lump everything mechanism into one" can handle all
> > that. Most likely you would need to add lots of special
> > cases to it to handle all this. Separate mechanisms can
> > do this cleaner.
>
> Yes. The cases (combination of hardware) would increase.
> However, I don't think there would be that many today.
I am mainly thinking about virtualization here for which hotplug
of everything makes definitely sense. But the requirements could
be very different from what a big iron machine wants.
> Are you saying here that platform-independent and dependent
> codes should be separated so that the independent part could
> be implemented by any other platform (firmware) in their own
> style? If so, I agree and some discussion are necessary :)
Yep.
-Andi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-10 5:45 ` Dave Hansen
@ 2004-05-13 1:27 ` Keiichiro Tokunaga
2004-05-13 2:04 ` Dave Hansen
0 siblings, 1 reply; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-13 1:27 UTC (permalink / raw)
To: Dave Hansen
Cc: tokunaga.keiich, linux-kernel, linux-hotplug-devel, lhns-devel
On Sun, 09 May 2004 22:45:42 -0700
Dave Hansen <haveblue@us.ibm.com> wrote:
> On Sun, 2004-05-09 at 18:47, Keiichiro Tokunaga wrote:
> > There is no NUMA support in the current code yet. I'll post a
> > rough patch to show my idea soon. I'm thinking to regard a
> > container device that has PXM as a NUMA node so far.
>
> Don't you think it would be a good idea to work with some of the current
> code, instead of trying to wrap around it?
Are you saying that LHNS should use the current NUMA code
(or coming code in the future) to support NUMA node hotplug?
> I'm sure Matt Dobson can give you some great ideas about things in the
> current NUMA code that aren't hotplug safe. That really needs to be
> done before any other work, anyway.
Thanks,
Kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-13 1:27 ` Keiichiro Tokunaga
@ 2004-05-13 2:04 ` Dave Hansen
2004-05-13 6:35 ` Keiichiro Tokunaga
0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2004-05-13 2:04 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Linux Kernel Mailing List, hotplug devel, lhns-devel
On Wed, 2004-05-12 at 18:27, Keiichiro Tokunaga wrote:
> On Sun, 09 May 2004 22:45:42 -0700
> Dave Hansen <haveblue@us.ibm.com> wrote:
>
> > On Sun, 2004-05-09 at 18:47, Keiichiro Tokunaga wrote:
> > > There is no NUMA support in the current code yet. I'll post a
> > > rough patch to show my idea soon. I'm thinking to regard a
> > > container device that has PXM as a NUMA node so far.
> >
> > Don't you think it would be a good idea to work with some of the current
> > code, instead of trying to wrap around it?
>
> Are you saying that LHNS should use the current NUMA code
> (or coming code in the future) to support NUMA node hotplug?
Absolutely. Why do we need wrappers when we can offline entire nodes
with 6-line shell scripts? The CPU hotplug interfaces are here today
and the memory stuff will be here soon. Perhaps you could help with the
NUMA part.
#!/bin/sh
NODENUM=$1
NODEDIR=/sys/devices/system/node/node${NODENUM}
for i in $NODEDIR/cpu* $NODEDIR/memory*; do
echo 0 > $i/control/online
fi
echo 0 > $NODEDIR/control/online
We don't currently export bus to node mappings in sysfs, but we have
them in the kernel, so that won't be too hard to export as well.
-- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ANNOUNCE] [PATCH] Node Hotplug Support
2004-05-13 2:04 ` Dave Hansen
@ 2004-05-13 6:35 ` Keiichiro Tokunaga
2004-05-13 6:45 ` Dave Hansen
0 siblings, 1 reply; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-13 6:35 UTC (permalink / raw)
To: Dave Hansen
Cc: tokunaga.keiich, linux-kernel, linux-hotplug-devel, lhns-devel
On Wed, 12 May 2004 19:04:47 -0700
Dave Hansen <haveblue@us.ibm.com> wrote:
> On Wed, 2004-05-12 at 18:27, Keiichiro Tokunaga wrote:
> > On Sun, 09 May 2004 22:45:42 -0700
> > Dave Hansen <haveblue@us.ibm.com> wrote:
> >
> > > On Sun, 2004-05-09 at 18:47, Keiichiro Tokunaga wrote:
> > > > There is no NUMA support in the current code yet. I'll post a
> > > > rough patch to show my idea soon. I'm thinking to regard a
> > > > container device that has PXM as a NUMA node so far.
> > >
> > > Don't you think it would be a good idea to work with some of the current
> > > code, instead of trying to wrap around it?
> >
> > Are you saying that LHNS should use the current NUMA code
> > (or coming code in the future) to support NUMA node hotplug?
>
> Absolutely. Why do we need wrappers when we can offline entire nodes
> with 6-line shell scripts? The CPU hotplug interfaces are here today
> and the memory stuff will be here soon. Perhaps you could help with the
> NUMA part.
>
> #!/bin/sh
> NODENUM=$1
> NODEDIR=/sys/devices/system/node/node${NODENUM}
> for i in $NODEDIR/cpu* $NODEDIR/memory*; do
> echo 0 > $i/control/online
> fi
> echo 0 > $NODEDIR/control/online
>
> We don't currently export bus to node mappings in sysfs, but we have
> them in the kernel, so that won't be too hard to export as well.
LHNS is focusing on "container device hotplug". Container device
could contain CPUs, memory, and/or IO devices. Container device
could contain only IO devices. In this case, LHNS cannot use
$NODED/control/online (NUMA stuff) for the container device.
By the way, what happen when you issue
"echo 0 > $NODEDIR/control/online"? Can you detach it
from the system after echo-ing?
Thanks,
Kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Node Hotplug Support
2004-05-13 6:35 ` Keiichiro Tokunaga
@ 2004-05-13 6:45 ` Dave Hansen
2004-05-13 6:54 ` [Lhns-devel] " Dave Hansen
2004-05-14 1:13 ` Keiichiro Tokunaga
0 siblings, 2 replies; 19+ messages in thread
From: Dave Hansen @ 2004-05-13 6:45 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Linux Kernel Mailing List, hotplug devel, lhns-devel
On Wed, 2004-05-12 at 23:35, Keiichiro Tokunaga wrote:
> LHNS is focusing on "container device hotplug". Container device
> could contain CPUs, memory, and/or IO devices. Container device
> could contain only IO devices. In this case, LHNS cannot use
> $NODED/control/online (NUMA stuff) for the container device.
So, why not expose your containers in the same way that all of the other
NUMA node information is exported? What makes your NUMA containers
different from all of the other flavors of NUMA implementations in
Linux?
> By the way, what happen when you issue
> "echo 0 > $NODEDIR/control/online"? Can you detach it
> from the system after echo-ing?
Well, since it doesn't exist yet... Sure :)
-- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lhns-devel] Re: Node Hotplug Support
2004-05-13 6:45 ` Dave Hansen
@ 2004-05-13 6:54 ` Dave Hansen
2004-05-14 1:13 ` Keiichiro Tokunaga
1 sibling, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2004-05-13 6:54 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Linux Kernel Mailing List, hotplug devel, lhns-devel
On Wed, 2004-05-12 at 23:45, Dave Hansen wrote:
> On Wed, 2004-05-12 at 23:35, Keiichiro Tokunaga wrote:
> > By the way, what happen when you issue
> > "echo 0 > $NODEDIR/control/online"? Can you detach it
> > from the system after echo-ing?
>
> Well, since it doesn't exist yet... Sure :)
What we really need is something like eject(1) for system devices.
-- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Node Hotplug Support
2004-05-13 6:45 ` Dave Hansen
2004-05-13 6:54 ` [Lhns-devel] " Dave Hansen
@ 2004-05-14 1:13 ` Keiichiro Tokunaga
2004-05-14 1:21 ` Dave Hansen
1 sibling, 1 reply; 19+ messages in thread
From: Keiichiro Tokunaga @ 2004-05-14 1:13 UTC (permalink / raw)
To: Dave Hansen
Cc: tokunaga.keiich, linux-kernel, linux-hotplug-devel, lhns-devel
On Wed, 12 May 2004 23:45:38 -0700
Dave Hansen <haveblue@us.ibm.com> wrote:
> On Wed, 2004-05-12 at 23:35, Keiichiro Tokunaga wrote:
> > LHNS is focusing on "container device hotplug". Container device
> > could contain CPUs, memory, and/or IO devices. Container device
> > could contain only IO devices. In this case, LHNS cannot use
> > $NODED/control/online (NUMA stuff) for the container device.
>
> So, why not expose your containers in the same way that all of the other
> NUMA node information is exported? What makes your NUMA containers
> different from all of the other flavors of NUMA implementations in
> Linux?
>
> > By the way, what happen when you issue
> > "echo 0 > $NODEDIR/control/online"? Can you detach it
> > from the system after echo-ing?
>
> Well, since it doesn't exist yet... Sure :)
The same way as you described in your previous email seems
work for a container device if we have $CONTAINERD/online.
Anyway, I feel we need to be more specific about implementation.
Is there already any information that I can access?
Thanks,
Kei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Node Hotplug Support
2004-05-14 1:13 ` Keiichiro Tokunaga
@ 2004-05-14 1:21 ` Dave Hansen
0 siblings, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2004-05-14 1:21 UTC (permalink / raw)
To: Keiichiro Tokunaga; +Cc: Linux Kernel Mailing List, hotplug devel, lhns-devel
On Thu, 2004-05-13 at 18:13, Keiichiro Tokunaga wrote:
> The same way as you described in your previous email seems
> work for a container device if we have $CONTAINERD/online.
> Anyway, I feel we need to be more specific about implementation.
> Is there already any information that I can access?
I don't think I understand. We already have the concept of a node. Why
do we also need a container?
-- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2004-05-14 1:21 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-07 15:39 [ANNOUNCE] [PATCH] Node Hotplug Support Keiichiro Tokunaga
2004-05-07 15:49 ` Dave Hansen
2004-05-10 1:47 ` Keiichiro Tokunaga
2004-05-10 2:12 ` [Lhns-devel] " Takayoshi Kochi
2004-05-10 11:20 ` Keiichiro Tokunaga
2004-05-10 5:45 ` Dave Hansen
2004-05-13 1:27 ` Keiichiro Tokunaga
2004-05-13 2:04 ` Dave Hansen
2004-05-13 6:35 ` Keiichiro Tokunaga
2004-05-13 6:45 ` Dave Hansen
2004-05-13 6:54 ` [Lhns-devel] " Dave Hansen
2004-05-14 1:13 ` Keiichiro Tokunaga
2004-05-14 1:21 ` Dave Hansen
2004-05-07 16:16 ` [ANNOUNCE] [PATCH] " Martin J. Bligh
2004-05-10 2:03 ` Keiichiro Tokunaga
[not found] <1TfLX-4M4-9@gated-at.bofh.it>
2004-05-07 19:10 ` Andi Kleen
2004-05-08 18:00 ` Ingo Oeser
2004-05-10 2:17 ` Keiichiro Tokunaga
2004-05-10 19:54 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox