* [PATCH] remove empty node at boot time
@ 2006-06-01 11:04 KAMEZAWA Hiroyuki
2006-07-07 23:26 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-06-01 11:04 UTC (permalink / raw)
To: linux-ia64; +Cc: tony.luck, linux-kernel, Andrew Morton
Remove empty node -- a node which containes no cpu, no memory (and no I/O).
for ia64.
This patch online nodes which has available resouces and avoid onlining
nodes which has only possible resouces.
SRAT describes possible resources, cpu and memory. It also shows proximity
domain, pxm. Each numa node is created according to pxm.
Current ia64 SRAT parser onlining node when new pxm is found. But sometimes
pxm just includes 'possible' resources, doesn't includes available resources.
Such pxms will create an empty node.
When an empty node is onlined, it allocates a pgdat for an empty node.
Now, fundamental codes for node-hot-plug are ready in -mm. We can add
cpu and memory dynamically to the created new node. (memory-less-node hotplug is
not ready. But I don't know whether there are demands for it now.)
Then, we can remove empty nodes, which just includes possible resource.
And, I'm now considering allocating new pgdat on-node. Empty nodes are
obstacles to do that.
TBD: I/O only node detections scheme should be fixed (if necessary).
Does anyone have a suggestion ?
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.17-rc5-mm2/arch/ia64/kernel/setup.c
=================================--- linux-2.6.17-rc5-mm2.orig/arch/ia64/kernel/setup.c 2006-06-01 18:34:08.000000000 +0900
+++ linux-2.6.17-rc5-mm2/arch/ia64/kernel/setup.c 2006-06-01 19:09:19.000000000 +0900
@@ -418,7 +418,7 @@
if (early_console_setup(*cmdline_p) = 0)
mark_bsp_online();
-
+ reserve_memory();
#ifdef CONFIG_ACPI
/* Initialize the ACPI boot-time table parser */
acpi_table_init();
Index: linux-2.6.17-rc5-mm2/arch/ia64/mm/contig.c
=================================--- linux-2.6.17-rc5-mm2.orig/arch/ia64/mm/contig.c 2006-06-01 18:32:18.000000000 +0900
+++ linux-2.6.17-rc5-mm2/arch/ia64/mm/contig.c 2006-06-01 19:09:19.000000000 +0900
@@ -146,8 +146,6 @@
{
unsigned long bootmap_size;
- reserve_memory();
-
/* first find highest page frame number */
max_pfn = 0;
efi_memmap_walk(find_max_pfn, &max_pfn);
Index: linux-2.6.17-rc5-mm2/arch/ia64/mm/discontig.c
=================================--- linux-2.6.17-rc5-mm2.orig/arch/ia64/mm/discontig.c 2006-06-01 18:34:08.000000000 +0900
+++ linux-2.6.17-rc5-mm2/arch/ia64/mm/discontig.c 2006-06-01 19:09:19.000000000 +0900
@@ -443,8 +443,6 @@
{
int node;
- reserve_memory();
-
if (num_online_nodes() = 0) {
printk(KERN_ERR "node info missing!\n");
node_set_online(0);
Index: linux-2.6.17-rc5-mm2/arch/ia64/kernel/acpi.c
=================================--- linux-2.6.17-rc5-mm2.orig/arch/ia64/kernel/acpi.c 2006-06-01 18:34:08.000000000 +0900
+++ linux-2.6.17-rc5-mm2/arch/ia64/kernel/acpi.c 2006-06-01 19:09:19.000000000 +0900
@@ -515,6 +515,43 @@
num_node_memblks++;
}
+/* online node if node has valid memory */
+static
+int find_valid_memory_range(unsigned long start, unsigned long end, void *arg)
+{
+ int i;
+ struct node_memblk_s *p;
+ start = __pa(start);
+ end = __pa(end);
+ for (i = 0; i < num_node_memblks; ++i) {
+ p = &node_memblk[i];
+ if (end < p->start_paddr)
+ continue;
+ if (p->start_paddr + p->size <= start)
+ continue;
+ node_set_online(p->nid);
+ }
+ return 0;
+}
+
+static void
+acpi_online_node_fixup(void)
+{
+ int i, cpu;
+ /* online node if a node has available cpus */
+ for (i = 0; i < srat_num_cpus; ++i)
+ for (cpu = 0; cpu < available_cpus; ++cpu)
+ if (smp_boot_data.cpu_phys_id[cpu] =
+ node_cpuid[i].phys_id) {
+ node_set_online(node_cpuid[i].nid);
+ break;
+ }
+ /* memory */
+ efi_memmap_walk(find_valid_memory_range, NULL);
+
+ /* TBD: check I/O devices which have valid nid. and online it*/
+}
+
void __init acpi_numa_arch_fixup(void)
{
int i, j, node_from, node_to;
@@ -526,22 +563,28 @@
return;
}
- /*
- * MCD - This can probably be dropped now. No need for pxm ID to node ID
- * mapping with sparse node numbering iff MAX_PXM_DOMAINS <= MAX_NUMNODES.
- */
nodes_clear(node_online_map);
+ /* MAP pxm to nid */
for (i = 0; i < MAX_PXM_DOMAINS; i++) {
if (pxm_bit_test(i)) {
- int nid = acpi_map_pxm_to_node(i);
- node_set_online(nid);
+ /* this makes pxm <-> nid mapping */
+ acpi_map_pxm_to_node(i);
}
}
+ /* convert pxm information to nid information */
- /* set logical node id in memory chunk structure */
for (i = 0; i < num_node_memblks; i++)
node_memblk[i].nid = pxm_to_node(node_memblk[i].nid);
+ for (i = 0; i < srat_num_cpus; i++)
+ node_cpuid[i].nid = pxm_to_node(node_cpuid[i].nid);
+
+ /*
+ * confirm node is online or not.
+ * onlined node will have their own NODE_DATA
+ */
+ acpi_online_node_fixup();
+
/* assign memory bank numbers for each chunk on each node */
for_each_online_node(i) {
int bank;
@@ -552,9 +595,6 @@
node_memblk[j].bank = bank++;
}
- /* set logical node id in cpu structure */
- for (i = 0; i < srat_num_cpus; i++)
- node_cpuid[i].nid = pxm_to_node(node_cpuid[i].nid);
printk(KERN_INFO "Number of logical nodes in system = %d\n",
num_online_nodes());
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-06-01 11:04 [PATCH] remove empty node at boot time KAMEZAWA Hiroyuki
@ 2006-07-07 23:26 ` Bjorn Helgaas
2006-07-10 0:34 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2006-07-07 23:26 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-ia64, tony.luck, linux-kernel, Andrew Morton
On Thursday 01 June 2006 05:04, KAMEZAWA Hiroyuki wrote:
> Remove empty node -- a node which containes no cpu, no memory (and no I/O).
> for ia64.
>
> This patch online nodes which has available resouces and avoid onlining
> nodes which has only possible resouces.
This patch breaks my HP rx8640 box. I suppose we have some unusual
SRAT configuration. I'll debug it more next week. If there's something
in particular I should look for, let me know.
Comparing old (working) with new (broken), I see:
- Number of logical nodes in system = 3
+ Number of logical nodes in system = 1
This box has two cells. Each cell has four CPUs and some local memory.
There is also an interleaved region that uses memory from both cells.
I think firmware presents this as a logical node for each cell, plus
one for the interleaved region.
This box is configured with minimal local memory on each cell (8MB).
That's less than a granule, so we should discard it, leaving two nodes
with CPUs but no memory, and a third node with all the interleaved
memory but no CPUs.
It looks like this patch throws away two of the nodes, so I'm guessing
we discarded the nodes with CPUs and no memory.
> SRAT describes possible resources, cpu and memory. It also shows proximity
> domain, pxm. Each numa node is created according to pxm.
>
> Current ia64 SRAT parser onlining node when new pxm is found. But sometimes
> pxm just includes 'possible' resources, doesn't includes available resources.
> Such pxms will create an empty node.
>
> When an empty node is onlined, it allocates a pgdat for an empty node.
>
> Now, fundamental codes for node-hot-plug are ready in -mm. We can add
> cpu and memory dynamically to the created new node. (memory-less-node hotplug is
> not ready. But I don't know whether there are demands for it now.)
> Then, we can remove empty nodes, which just includes possible resource.
>
> And, I'm now considering allocating new pgdat on-node. Empty nodes are
> obstacles to do that.
>
> TBD: I/O only node detections scheme should be fixed (if necessary).
> Does anyone have a suggestion ?
>
> Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
> Index: linux-2.6.17-rc5-mm2/arch/ia64/kernel/setup.c
> =================================> --- linux-2.6.17-rc5-mm2.orig/arch/ia64/kernel/setup.c 2006-06-01 18:34:08.000000000 +0900
> +++ linux-2.6.17-rc5-mm2/arch/ia64/kernel/setup.c 2006-06-01 19:09:19.000000000 +0900
> @@ -418,7 +418,7 @@
>
> if (early_console_setup(*cmdline_p) = 0)
> mark_bsp_online();
> -
> + reserve_memory();
> #ifdef CONFIG_ACPI
> /* Initialize the ACPI boot-time table parser */
> acpi_table_init();
> Index: linux-2.6.17-rc5-mm2/arch/ia64/mm/contig.c
> =================================> --- linux-2.6.17-rc5-mm2.orig/arch/ia64/mm/contig.c 2006-06-01 18:32:18.000000000 +0900
> +++ linux-2.6.17-rc5-mm2/arch/ia64/mm/contig.c 2006-06-01 19:09:19.000000000 +0900
> @@ -146,8 +146,6 @@
> {
> unsigned long bootmap_size;
>
> - reserve_memory();
> -
> /* first find highest page frame number */
> max_pfn = 0;
> efi_memmap_walk(find_max_pfn, &max_pfn);
> Index: linux-2.6.17-rc5-mm2/arch/ia64/mm/discontig.c
> =================================> --- linux-2.6.17-rc5-mm2.orig/arch/ia64/mm/discontig.c 2006-06-01 18:34:08.000000000 +0900
> +++ linux-2.6.17-rc5-mm2/arch/ia64/mm/discontig.c 2006-06-01 19:09:19.000000000 +0900
> @@ -443,8 +443,6 @@
> {
> int node;
>
> - reserve_memory();
> -
> if (num_online_nodes() = 0) {
> printk(KERN_ERR "node info missing!\n");
> node_set_online(0);
> Index: linux-2.6.17-rc5-mm2/arch/ia64/kernel/acpi.c
> =================================> --- linux-2.6.17-rc5-mm2.orig/arch/ia64/kernel/acpi.c 2006-06-01 18:34:08.000000000 +0900
> +++ linux-2.6.17-rc5-mm2/arch/ia64/kernel/acpi.c 2006-06-01 19:09:19.000000000 +0900
> @@ -515,6 +515,43 @@
> num_node_memblks++;
> }
>
> +/* online node if node has valid memory */
> +static
> +int find_valid_memory_range(unsigned long start, unsigned long end, void *arg)
> +{
> + int i;
> + struct node_memblk_s *p;
> + start = __pa(start);
> + end = __pa(end);
> + for (i = 0; i < num_node_memblks; ++i) {
> + p = &node_memblk[i];
> + if (end < p->start_paddr)
> + continue;
> + if (p->start_paddr + p->size <= start)
> + continue;
> + node_set_online(p->nid);
> + }
> + return 0;
> +}
> +
> +static void
> +acpi_online_node_fixup(void)
> +{
> + int i, cpu;
> + /* online node if a node has available cpus */
> + for (i = 0; i < srat_num_cpus; ++i)
> + for (cpu = 0; cpu < available_cpus; ++cpu)
> + if (smp_boot_data.cpu_phys_id[cpu] =
> + node_cpuid[i].phys_id) {
> + node_set_online(node_cpuid[i].nid);
> + break;
> + }
> + /* memory */
> + efi_memmap_walk(find_valid_memory_range, NULL);
> +
> + /* TBD: check I/O devices which have valid nid. and online it*/
> +}
> +
> void __init acpi_numa_arch_fixup(void)
> {
> int i, j, node_from, node_to;
> @@ -526,22 +563,28 @@
> return;
> }
>
> - /*
> - * MCD - This can probably be dropped now. No need for pxm ID to node ID
> - * mapping with sparse node numbering iff MAX_PXM_DOMAINS <= MAX_NUMNODES.
> - */
> nodes_clear(node_online_map);
> + /* MAP pxm to nid */
> for (i = 0; i < MAX_PXM_DOMAINS; i++) {
> if (pxm_bit_test(i)) {
> - int nid = acpi_map_pxm_to_node(i);
> - node_set_online(nid);
> + /* this makes pxm <-> nid mapping */
> + acpi_map_pxm_to_node(i);
> }
> }
> + /* convert pxm information to nid information */
>
> - /* set logical node id in memory chunk structure */
> for (i = 0; i < num_node_memblks; i++)
> node_memblk[i].nid = pxm_to_node(node_memblk[i].nid);
>
> + for (i = 0; i < srat_num_cpus; i++)
> + node_cpuid[i].nid = pxm_to_node(node_cpuid[i].nid);
> +
> + /*
> + * confirm node is online or not.
> + * onlined node will have their own NODE_DATA
> + */
> + acpi_online_node_fixup();
> +
> /* assign memory bank numbers for each chunk on each node */
> for_each_online_node(i) {
> int bank;
> @@ -552,9 +595,6 @@
> node_memblk[j].bank = bank++;
> }
>
> - /* set logical node id in cpu structure */
> - for (i = 0; i < srat_num_cpus; i++)
> - node_cpuid[i].nid = pxm_to_node(node_cpuid[i].nid);
>
> printk(KERN_INFO "Number of logical nodes in system = %d\n",
> num_online_nodes());
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-07 23:26 ` Bjorn Helgaas
@ 2006-07-10 0:34 ` KAMEZAWA Hiroyuki
2006-07-10 2:38 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-07-10 0:34 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Fri, 7 Jul 2006 17:26:31 -0600
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> On Thursday 01 June 2006 05:04, KAMEZAWA Hiroyuki wrote:
> > Remove empty node -- a node which containes no cpu, no memory (and no I/O).
> > for ia64.
> >
> > This patch online nodes which has available resouces and avoid onlining
> > nodes which has only possible resouces.
>
> This patch breaks my HP rx8640 box. I suppose we have some unusual
> SRAT configuration. I'll debug it more next week. If there's something
> in particular I should look for, let me know.
>
> Comparing old (working) with new (broken), I see:
>
> - Number of logical nodes in system = 3
> + Number of logical nodes in system = 1
>
> This box has two cells. Each cell has four CPUs and some local memory.
> There is also an interleaved region that uses memory from both cells.
> I think firmware presents this as a logical node for each cell, plus
> one for the interleaved region.
>
> This box is configured with minimal local memory on each cell (8MB).
> That's less than a granule, so we should discard it, leaving two nodes
> with CPUs but no memory, and a third node with all the interleaved
> memory but no CPUs.
>
Then, your box has
node 0 : cpu x 4, small memory
node 1 : cpu x 4, small memory
node 2 : big memory.
if above node 0 and node 1 disappears, it looks there are some bugs in
cpu detection.
> + int i, cpu;
> + /* online node if a node has available cpus */
> + for (i = 0; i < srat_num_cpus; ++i)
> + for (cpu = 0; cpu < available_cpus; ++cpu)
> + if (smp_boot_data.cpu_phys_id[cpu] =
> + node_cpuid[i].phys_id) {
> + node_set_online(node_cpuid[i].nid);
> + break;
> + }
Note:
smp_boot_data.cpu_phys_id[i] is set by acpi_parse_lsapic().
node_cpuid[j].phys_id is set by acpi_numa_processor_affinity_init().
-Kame
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-10 0:34 ` KAMEZAWA Hiroyuki
@ 2006-07-10 2:38 ` Bjorn Helgaas
2006-07-10 3:29 ` KAMEZAWA Hiroyuki
2006-07-10 5:19 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2006-07-10 2:38 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Sunday 09 July 2006 18:34, KAMEZAWA Hiroyuki wrote:
> Then, your box has
> node 0 : cpu x 4, small memory
> node 1 : cpu x 4, small memory
> node 2 : big memory.
Yes.
> if above node 0 and node 1 disappears, it looks there are some bugs in
> cpu detection.
Yes. Here's the relevant part of the call tree:
setup_arch
acpi_numa_init
acpi_numa_arch_fixup
acpi_online_node_fixup (test available_cpus)
...
acpi_boot_init
acpi_table_parse_madt(..., acpi_parse_lsapic, ...)
acpi_parse_lsapic (increment available_cpus)
Note that we test available_cpus in acpi_online_node_fixup()
before we increment it in acpi_parse_lsapic(), so the inner
loop is never executed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-10 2:38 ` Bjorn Helgaas
@ 2006-07-10 3:29 ` KAMEZAWA Hiroyuki
2006-07-10 5:19 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-07-10 3:29 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Sun, 9 Jul 2006 20:38:40 -0600
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> Yes. Here's the relevant part of the call tree:
>
> setup_arch
> acpi_numa_init
> acpi_numa_arch_fixup
> acpi_online_node_fixup (test available_cpus)
> ...
> acpi_boot_init
> acpi_table_parse_madt(..., acpi_parse_lsapic, ...)
> acpi_parse_lsapic (increment available_cpus)
>
> Note that we test available_cpus in acpi_online_node_fixup()
> before we increment it in acpi_parse_lsapic(), so the inner
> loop is never executed.
Hmm...okay, I misunderstood the boot path..
To work with my remove-empty-node patch, parsing lsapic should be done
before SRAT.
I'd like to fix this. BTW, can we move parsing MADT before SRAT ?
-Kame
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-10 2:38 ` Bjorn Helgaas
2006-07-10 3:29 ` KAMEZAWA Hiroyuki
@ 2006-07-10 5:19 ` KAMEZAWA Hiroyuki
2006-07-10 17:03 ` Bjorn Helgaas
1 sibling, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-07-10 5:19 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Sun, 9 Jul 2006 20:38:40 -0600
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> setup_arch
> acpi_numa_init
> acpi_numa_arch_fixup
> acpi_online_node_fixup (test available_cpus)
> ...
> acpi_boot_init
> acpi_table_parse_madt(..., acpi_parse_lsapic, ...)
> acpi_parse_lsapic (increment available_cpus)
>
> Note that we test available_cpus in acpi_online_node_fixup()
> before we increment it in acpi_parse_lsapic(), so the inner
> loop is never executed.
>
Could you try this patch ? (against 2.6.18-rc1)
I think this is very straightforward fix. booted well with my box and NUMA
emulation environment.
if SRAT had "present" bit, I'm happy ;(
Thanks,
-Kame
empty-node-fix-fix.patch
empty-node-fix.patch uses lsapic information to detect cpu. But it has a
problem with cpu-only-node because lsapic information is not parsed before SRAT.
This patch moves parsing lsapic information before SRAT. By this, we can get
information of avilable cpus in early time.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
arch/ia64/kernel/acpi.c | 21 +++++++++++++++------
arch/ia64/kernel/setup.c | 3 +++
2 files changed, 18 insertions(+), 6 deletions(-)
Index: linux-2.6.18-rc1/arch/ia64/kernel/acpi.c
=================================--- linux-2.6.18-rc1.orig/arch/ia64/kernel/acpi.c 2006-07-10 14:00:16.000000000 +0900
+++ linux-2.6.18-rc1/arch/ia64/kernel/acpi.c 2006-07-10 14:00:38.000000000 +0900
@@ -650,9 +650,12 @@
return rsdp_phys;
}
-int __init acpi_boot_init(void)
+static int madt_is_available __initdata;
+/*
+ * check LSAPIC in early phase, to detect available cpus.
+ */
+void __init ia64_acpi_madt_early_init(void)
{
-
/*
* MADT
* ----
@@ -663,9 +666,9 @@
if (acpi_table_parse(ACPI_APIC, acpi_parse_madt) < 1) {
printk(KERN_ERR PREFIX "Can't find MADT\n");
- goto skip_madt;
+ return;
}
-
+ madt_is_available = 1;
/* Local APIC */
if (acpi_table_parse_madt
@@ -682,8 +685,14 @@
< 0)
printk(KERN_ERR PREFIX "Error parsing LAPIC NMI entry\n");
- /* I/O APIC */
+ return;
+}
+int __init acpi_boot_init(void)
+{
+ if (!madt_is_available)
+ goto skip_madt;
+ /* IO-APIC */
if (acpi_table_parse_madt
(ACPI_MADT_IOSAPIC, acpi_parse_iosapic, NR_IOSAPICS) < 1)
printk(KERN_ERR PREFIX
@@ -704,8 +713,8 @@
if (acpi_table_parse_madt(ACPI_MADT_NMI_SRC, acpi_parse_nmi_src, 0) < 0)
printk(KERN_ERR PREFIX "Error parsing NMI SRC entry\n");
- skip_madt:
+skip_madt:
/*
* FADT says whether a legacy keyboard controller is present.
* The FADT also contains an SCI_INT line, by which the system
Index: linux-2.6.18-rc1/arch/ia64/kernel/setup.c
=================================--- linux-2.6.18-rc1.orig/arch/ia64/kernel/setup.c 2006-07-10 14:00:16.000000000 +0900
+++ linux-2.6.18-rc1/arch/ia64/kernel/setup.c 2006-07-10 14:00:38.000000000 +0900
@@ -71,6 +71,7 @@
#endif
extern void ia64_setup_printk_clock(void);
+extern int ia64_acpi_madt_early_init(void);
DEFINE_PER_CPU(struct cpuinfo_ia64, cpu_info);
DEFINE_PER_CPU(unsigned long, local_per_cpu_offset);
@@ -422,6 +423,8 @@
#ifdef CONFIG_ACPI
/* Initialize the ACPI boot-time table parser */
acpi_table_init();
+ /* read ACPI table */
+ ia64_acpi_madt_early_init();
# ifdef CONFIG_ACPI_NUMA
acpi_numa_init();
# endif
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-10 5:19 ` KAMEZAWA Hiroyuki
@ 2006-07-10 17:03 ` Bjorn Helgaas
2006-07-11 6:55 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2006-07-10 17:03 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Sunday 09 July 2006 23:19, KAMEZAWA Hiroyuki wrote:
> Could you try this patch ? (against 2.6.18-rc1)
Your patch does fix it. But I'm worried about removing
empty nodes at boot-time. I want to support the following
scenario:
node 0: 1 enabled CPU, 3 disabled CPUs, no local memory
node 1: 4 disabled CPUs, no local memory
node 2: no CPUs, big interleaved memory across nodes 0 & 1
At run-time, I'd like to be able to enable any or all of the
7 disabled CPUs. If you remove the "empty" node 1 at boot-time,
it sounds like I won't be able to enable its CPUs later.
Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-10 17:03 ` Bjorn Helgaas
@ 2006-07-11 6:55 ` KAMEZAWA Hiroyuki
2006-07-11 15:37 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-07-11 6:55 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Mon, 10 Jul 2006 11:03:03 -0600
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> On Sunday 09 July 2006 23:19, KAMEZAWA Hiroyuki wrote:
> > Could you try this patch ? (against 2.6.18-rc1)
>
> Your patch does fix it. But I'm worried about removing
> empty nodes at boot-time. I want to support the following
> scenario:
>
> node 0: 1 enabled CPU, 3 disabled CPUs, no local memory
> node 1: 4 disabled CPUs, no local memory
> node 2: no CPUs, big interleaved memory across nodes 0 & 1
>
> At run-time, I'd like to be able to enable any or all of the
> 7 disabled CPUs. If you remove the "empty" node 1 at boot-time,
> it sounds like I won't be able to enable its CPUs later.
>
Hmm.. in my understanding, all structures for *possible* cpus are allocated
at boot time. Then, only problem seems that a cpu is tied to
not-exisiting-node at boot time.
(see arch/ia64/kernel/numa.c, build_cpu_to_node_map())
==
void __init build_cpu_to_node_map(void)
{
int cpu, i, node;
for(node=0; node < MAX_NUMNODES; node++)
cpus_clear(node_to_cpu_mask[node]);
for(cpu = 0; cpu < NR_CPUS; ++cpu) {
node = -1;
for (i = 0; i < NR_CPUS; ++i)
if (cpu_physical_id(cpu) = node_cpuid[i].phys_id) {
node = node_cpuid[i].nid;
break;
}
cpu_to_node_map[cpu] = (node >= 0) ? node : 0;
if (node >= 0)
cpu_set(cpu, node_to_cpu_mask[node]);
}
}
==
Then what we have to do here are
1. remap cpu to the first existing node at hot-add event
or
2. implement node-hot-add triggered by cpu-hot-add.
Because we already have implemented node-hot-add triggered by memory-hotadd
we can do it by small effort.
I think above will work for your environment.
do you have any idea other than "don't remove empty node at boot time" ?
or reserve empty node is the best way ?
- Kame
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] remove empty node at boot time
2006-07-11 6:55 ` KAMEZAWA Hiroyuki
@ 2006-07-11 15:37 ` Bjorn Helgaas
0 siblings, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2006-07-11 15:37 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-ia64, tony.luck, linux-kernel, akpm
On Tuesday 11 July 2006 00:55, KAMEZAWA Hiroyuki wrote:
> On Mon, 10 Jul 2006 11:03:03 -0600
> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>
> > On Sunday 09 July 2006 23:19, KAMEZAWA Hiroyuki wrote:
> > > Could you try this patch ? (against 2.6.18-rc1)
> >
> > Your patch does fix it. But I'm worried about removing
> > empty nodes at boot-time. I want to support the following
> > scenario:
> >
> > node 0: 1 enabled CPU, 3 disabled CPUs, no local memory
> > node 1: 4 disabled CPUs, no local memory
> > node 2: no CPUs, big interleaved memory across nodes 0 & 1
> >
> > At run-time, I'd like to be able to enable any or all of the
> > 7 disabled CPUs. If you remove the "empty" node 1 at boot-time,
> > it sounds like I won't be able to enable its CPUs later.
> >
>
> Hmm.. in my understanding, all structures for *possible* cpus are allocated
> at boot time. Then, only problem seems that a cpu is tied to
> not-exisiting-node at boot time.
> (see arch/ia64/kernel/numa.c, build_cpu_to_node_map())
>
> ==
> void __init build_cpu_to_node_map(void)
> {
> int cpu, i, node;
>
> for(node=0; node < MAX_NUMNODES; node++)
> cpus_clear(node_to_cpu_mask[node]);
>
> for(cpu = 0; cpu < NR_CPUS; ++cpu) {
> node = -1;
> for (i = 0; i < NR_CPUS; ++i)
> if (cpu_physical_id(cpu) = node_cpuid[i].phys_id) {
> node = node_cpuid[i].nid;
> break;
> }
> cpu_to_node_map[cpu] = (node >= 0) ? node : 0;
> if (node >= 0)
> cpu_set(cpu, node_to_cpu_mask[node]);
> }
> }
> ==>
> Then what we have to do here are
> 1. remap cpu to the first existing node at hot-add event
> or
> 2. implement node-hot-add triggered by cpu-hot-add.
> Because we already have implemented node-hot-add triggered by memory-hotadd
> we can do it by small effort.
I haven't paid much attention to the memory/cpu/node hotplug stuff,
but (2) sounds reasonable.
> I think above will work for your environment.
> do you have any idea other than "don't remove empty node at boot time" ?
> or reserve empty node is the best way ?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-07-11 15:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-01 11:04 [PATCH] remove empty node at boot time KAMEZAWA Hiroyuki
2006-07-07 23:26 ` Bjorn Helgaas
2006-07-10 0:34 ` KAMEZAWA Hiroyuki
2006-07-10 2:38 ` Bjorn Helgaas
2006-07-10 3:29 ` KAMEZAWA Hiroyuki
2006-07-10 5:19 ` KAMEZAWA Hiroyuki
2006-07-10 17:03 ` Bjorn Helgaas
2006-07-11 6:55 ` KAMEZAWA Hiroyuki
2006-07-11 15:37 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox