* [PATCH 1/7] powerpc numa: fix boot_cpuid always assigned to node 0
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
@ 2006-03-21 0:34 ` Nathan Lynch
2006-03-21 0:34 ` [PATCH 2/7] powerpc numa: Minor debugging code changes Nathan Lynch
` (5 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:34 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
At boot, the numa code is assigning boot_cpuid to node 0
unconditionally. Basically, numa_setup_cpu is being stupid about it,
but this is the minimal fix -- just call numa_setup_cpu(boot_cpuid)
later, after all nodes have been set online.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
d5ecb195c3b93cb954264e075c7fe29a0bdc6db7
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 2863a91..b813bad 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -375,7 +375,7 @@ static int __init parse_numa_properties(
{
struct device_node *cpu = NULL;
struct device_node *memory = NULL;
- int max_domain;
+ int max_domain = 0;
unsigned long i;
if (numa_enabled == 0) {
@@ -389,8 +389,6 @@ static int __init parse_numa_properties(
if (min_common_depth < 0)
return min_common_depth;
- max_domain = numa_setup_cpu(boot_cpuid);
-
/*
* Even though we connect cpus to numa domains later in SMP init,
* we need to know the maximum node id now. This is because each
@@ -469,6 +467,8 @@ new_range:
for (i = 0; i <= max_domain; i++)
node_set_online(i);
+ max_domain = numa_setup_cpu(boot_cpuid);
+
return 0;
}
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 2/7] powerpc numa: Minor debugging code changes
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
2006-03-21 0:34 ` [PATCH 1/7] powerpc numa: fix boot_cpuid always assigned to node 0 Nathan Lynch
@ 2006-03-21 0:34 ` Nathan Lynch
2006-03-21 18:27 ` Dave Hansen
2006-03-21 0:35 ` [PATCH 3/7] powerpc numa: Minor cpu hotplug-related cleanups Nathan Lynch
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:34 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
Add debug statement for map_cpu_to_node; it's useful for cpu hotplug.
Clarify debug statement about not finding the numa reference points
property.
Don't print a meaningless associativity depth (-1) on non-numa systems.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
3b4f550f0a92badbec6e5784eee4da7524a75938
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b813bad..de99e47 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -133,6 +133,8 @@ static inline void map_cpu_to_node(int c
{
numa_cpu_lookup_table[cpu] = node;
+ dbg("adding cpu %d to node %d\n", cpu, node);
+
if (!(cpu_isset(cpu, numa_cpumask_lookup_table[node])))
cpu_set(cpu, numa_cpumask_lookup_table[node]);
}
@@ -246,8 +248,7 @@ static int __init find_min_common_depth(
if ((len >= 1) && ref_points) {
depth = ref_points[1];
} else {
- dbg("WARNING: could not find NUMA "
- "associativity reference point\n");
+ dbg("NUMA: ibm,associativity-reference-points not found.\n");
depth = -1;
}
of_node_put(rtas_root);
@@ -385,10 +386,11 @@ static int __init parse_numa_properties(
min_common_depth = find_min_common_depth();
- dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
if (min_common_depth < 0)
return min_common_depth;
+ dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
+
/*
* Even though we connect cpus to numa domains later in SMP init,
* we need to know the maximum node id now. This is because each
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 2/7] powerpc numa: Minor debugging code changes
2006-03-21 0:34 ` [PATCH 2/7] powerpc numa: Minor debugging code changes Nathan Lynch
@ 2006-03-21 18:27 ` Dave Hansen
2006-03-21 18:54 ` Nathan Lynch
0 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2006-03-21 18:27 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev
On Mon, 2006-03-20 at 18:34 -0600, Nathan Lynch wrote:
> Don't print a meaningless associativity depth (-1) on non-numa systems.
...
> - dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
> if (min_common_depth < 0)
> return min_common_depth;
>
> + dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
This is debugging code anyway, right?
I thought this might be useful when you're booting on a machine which
you _think_ should be NUMA, but doesn't come up that way. Did you boot
a non-NUMA kernel, or is something in the reporting wrong? It makes it
pretty obvious when you see this printout.
-- Dave
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 2/7] powerpc numa: Minor debugging code changes
2006-03-21 18:27 ` Dave Hansen
@ 2006-03-21 18:54 ` Nathan Lynch
0 siblings, 0 replies; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 18:54 UTC (permalink / raw)
To: Dave Hansen; +Cc: linuxppc-dev
On Tue, 2006-03-21 at 10:27 -0800, Dave Hansen wrote:
> On Mon, 2006-03-20 at 18:34 -0600, Nathan Lynch wrote:
> > Don't print a meaningless associativity depth (-1) on non-numa systems.
> ...
> > - dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
> > if (min_common_depth < 0)
> > return min_common_depth;
> >
> > + dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
>
> This is debugging code anyway, right?
>
> I thought this might be useful when you're booting on a machine which
> you _think_ should be NUMA, but doesn't come up that way. Did you boot
> a non-NUMA kernel, or is something in the reporting wrong? It makes it
> pretty obvious when you see this printout.
I think it's pretty obvious anyway -- we still print a message about not
finding the ibm,associativity-reference-points property, which is the
only reason min_common_depth would be -1.
This file isn't built when CONFIG_NUMA=n, so the placement of the dbg()
isn't going to shed any light on that particular operator error.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/7] powerpc numa: Minor cpu hotplug-related cleanups
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
2006-03-21 0:34 ` [PATCH 1/7] powerpc numa: fix boot_cpuid always assigned to node 0 Nathan Lynch
2006-03-21 0:34 ` [PATCH 2/7] powerpc numa: Minor debugging code changes Nathan Lynch
@ 2006-03-21 0:35 ` Nathan Lynch
2006-03-21 0:35 ` [PATCH 4/7] powerpc numa: Get rid of "numa domain" terminology Nathan Lynch
` (3 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:35 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
map_cpu_to_node does not need to be inline, it is never called in a
hot path.
map_cpu_to_node, numa_setup_cpu, and find_cpu_node can be marked
__cpuinit, as they are never used after boot if CONFIG_HOTPLUG_CPU=n.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
5685935151d9ed413571e03b8e7c9b4673bd5e88
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index de99e47..1c3df1d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -129,7 +129,7 @@ void __init get_region(unsigned int nid,
*start_pfn = 0;
}
-static inline void map_cpu_to_node(int cpu, int node)
+static void __cpuinit map_cpu_to_node(int cpu, int node)
{
numa_cpu_lookup_table[cpu] = node;
@@ -155,7 +155,7 @@ static void unmap_cpu_from_node(unsigned
}
#endif /* CONFIG_HOTPLUG_CPU */
-static struct device_node *find_cpu_node(unsigned int cpu)
+static struct device_node * __cpuinit find_cpu_node(unsigned int cpu)
{
unsigned int hw_cpuid = get_hard_smp_processor_id(cpu);
struct device_node *cpu_node = NULL;
@@ -284,7 +284,7 @@ static unsigned long __devinit read_n_ce
* Figure out to which domain a cpu belongs and stick it there.
* Return the id of the domain used.
*/
-static int numa_setup_cpu(unsigned long lcpu)
+static int __cpuinit numa_setup_cpu(unsigned long lcpu)
{
int numa_domain = 0;
struct device_node *cpu = find_cpu_node(lcpu);
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 4/7] powerpc numa: Get rid of "numa domain" terminology
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
` (2 preceding siblings ...)
2006-03-21 0:35 ` [PATCH 3/7] powerpc numa: Minor cpu hotplug-related cleanups Nathan Lynch
@ 2006-03-21 0:35 ` Nathan Lynch
2006-03-21 0:36 ` [PATCH 5/7] powerpc numa: Consolidate handling of Power4 special case Nathan Lynch
` (2 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:35 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
Since we effectively treat the domain ids given to us by firmare as
logical node ids, make this explicit (basically s/numa_domain/nid/).
No functional changes, only variable and function names are modified.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 78 ++++++++++++++++++++++++------------------------
1 files changed, 39 insertions(+), 39 deletions(-)
9bdff379544896ca96bad4881fa614a19ddf9d21
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 1c3df1d..e511ca1 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -191,9 +191,9 @@ static int *of_get_associativity(struct
return (unsigned int *)get_property(dev, "ibm,associativity", NULL);
}
-static int of_node_numa_domain(struct device_node *device)
+static int of_node_to_nid(struct device_node *device)
{
- int numa_domain;
+ int nid;
unsigned int *tmp;
if (min_common_depth == -1)
@@ -201,13 +201,13 @@ static int of_node_numa_domain(struct de
tmp = of_get_associativity(device);
if (tmp && (tmp[0] >= min_common_depth)) {
- numa_domain = tmp[min_common_depth];
+ nid = tmp[min_common_depth];
} else {
dbg("WARNING: no NUMA information for %s\n",
device->full_name);
- numa_domain = 0;
+ nid = 0;
}
- return numa_domain;
+ return nid;
}
/*
@@ -286,7 +286,7 @@ static unsigned long __devinit read_n_ce
*/
static int __cpuinit numa_setup_cpu(unsigned long lcpu)
{
- int numa_domain = 0;
+ int nid = 0;
struct device_node *cpu = find_cpu_node(lcpu);
if (!cpu) {
@@ -294,27 +294,27 @@ static int __cpuinit numa_setup_cpu(unsi
goto out;
}
- numa_domain = of_node_numa_domain(cpu);
+ nid = of_node_to_nid(cpu);
- if (numa_domain >= num_online_nodes()) {
+ if (nid >= num_online_nodes()) {
/*
* POWER4 LPAR uses 0xffff as invalid node,
* dont warn in this case.
*/
- if (numa_domain != 0xffff)
+ if (nid != 0xffff)
printk(KERN_ERR "WARNING: cpu %ld "
"maps to invalid NUMA node %d\n",
- lcpu, numa_domain);
- numa_domain = 0;
+ lcpu, nid);
+ nid = 0;
}
out:
- node_set_online(numa_domain);
+ node_set_online(nid);
- map_cpu_to_node(lcpu, numa_domain);
+ map_cpu_to_node(lcpu, nid);
of_node_put(cpu);
- return numa_domain;
+ return nid;
}
static int cpu_numa_callback(struct notifier_block *nfb,
@@ -399,17 +399,17 @@ static int __init parse_numa_properties(
* with larger node ids. In that case we force the cpu into node 0.
*/
for_each_cpu(i) {
- int numa_domain;
+ int nid;
cpu = find_cpu_node(i);
if (cpu) {
- numa_domain = of_node_numa_domain(cpu);
+ nid = of_node_to_nid(cpu);
of_node_put(cpu);
- if (numa_domain < MAX_NUMNODES &&
- max_domain < numa_domain)
- max_domain = numa_domain;
+ if (nid < MAX_NUMNODES &&
+ max_domain < nid)
+ max_domain = nid;
}
}
@@ -418,7 +418,7 @@ static int __init parse_numa_properties(
while ((memory = of_find_node_by_type(memory, "memory")) != NULL) {
unsigned long start;
unsigned long size;
- int numa_domain;
+ int nid;
int ranges;
unsigned int *memcell_buf;
unsigned int len;
@@ -439,18 +439,18 @@ new_range:
start = read_n_cells(n_mem_addr_cells, &memcell_buf);
size = read_n_cells(n_mem_size_cells, &memcell_buf);
- numa_domain = of_node_numa_domain(memory);
+ nid = of_node_to_nid(memory);
- if (numa_domain >= MAX_NUMNODES) {
- if (numa_domain != 0xffff)
+ if (nid >= MAX_NUMNODES) {
+ if (nid != 0xffff)
printk(KERN_ERR "WARNING: memory at %lx maps "
"to invalid NUMA node %d\n", start,
- numa_domain);
- numa_domain = 0;
+ nid);
+ nid = 0;
}
- if (max_domain < numa_domain)
- max_domain = numa_domain;
+ if (max_domain < nid)
+ max_domain = nid;
if (!(size = numa_enforce_memory_limit(start, size))) {
if (--ranges)
@@ -459,7 +459,7 @@ new_range:
continue;
}
- add_region(numa_domain, start >> PAGE_SHIFT,
+ add_region(nid, start >> PAGE_SHIFT,
size >> PAGE_SHIFT);
if (--ranges)
@@ -769,10 +769,10 @@ int hot_add_scn_to_nid(unsigned long scn
{
struct device_node *memory = NULL;
nodemask_t nodes;
- int numa_domain = 0;
+ int nid = 0;
if (!numa_enabled || (min_common_depth < 0))
- return numa_domain;
+ return nid;
while ((memory = of_find_node_by_type(memory, "memory")) != NULL) {
unsigned long start, size;
@@ -789,15 +789,15 @@ int hot_add_scn_to_nid(unsigned long scn
ha_new_range:
start = read_n_cells(n_mem_addr_cells, &memcell_buf);
size = read_n_cells(n_mem_size_cells, &memcell_buf);
- numa_domain = of_node_numa_domain(memory);
+ nid = of_node_to_nid(memory);
/* Domains not present at boot default to 0 */
- if (!node_online(numa_domain))
- numa_domain = any_online_node(NODE_MASK_ALL);
+ if (!node_online(nid))
+ nid = any_online_node(NODE_MASK_ALL);
if ((scn_addr >= start) && (scn_addr < (start + size))) {
of_node_put(memory);
- goto got_numa_domain;
+ goto got_nid;
}
if (--ranges) /* process all ranges in cell */
@@ -806,12 +806,12 @@ ha_new_range:
BUG(); /* section address should be found above */
/* Temporary code to ensure that returned node is not empty */
-got_numa_domain:
+got_nid:
nodes_setall(nodes);
- while (NODE_DATA(numa_domain)->node_spanned_pages == 0) {
- node_clear(numa_domain, nodes);
- numa_domain = any_online_node(nodes);
+ while (NODE_DATA(nid)->node_spanned_pages == 0) {
+ node_clear(nid, nodes);
+ nid = any_online_node(nodes);
}
- return numa_domain;
+ return nid;
}
#endif /* CONFIG_MEMORY_HOTPLUG */
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 5/7] powerpc numa: Consolidate handling of Power4 special case
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
` (3 preceding siblings ...)
2006-03-21 0:35 ` [PATCH 4/7] powerpc numa: Get rid of "numa domain" terminology Nathan Lynch
@ 2006-03-21 0:36 ` Nathan Lynch
2006-03-21 3:54 ` Jon Mason
2006-03-21 0:36 ` [PATCH 6/7] powerpc numa: Support sparse online node map Nathan Lynch
2006-03-21 0:37 ` [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes Nathan Lynch
6 siblings, 1 reply; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
Code to handle Power4's invalid node id (0xffff) is duplicated for cpu
and memory. Better to handle this case in one place --
of_node_to_nid. Overall behavior should be unchanged.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 23 +++++++++++------------
1 files changed, 11 insertions(+), 12 deletions(-)
d9dd3889e58eeb34d1130d2514fea905ca2cab6a
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index e511ca1..4a6cbb0 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -207,6 +207,11 @@ static int of_node_to_nid(struct device_
device->full_name);
nid = 0;
}
+
+ /* POWER4 LPAR uses 0xffff as invalid node */
+ if (nid == 0xffff)
+ nid = 0;
+
return nid;
}
@@ -297,14 +302,9 @@ static int __cpuinit numa_setup_cpu(unsi
nid = of_node_to_nid(cpu);
if (nid >= num_online_nodes()) {
- /*
- * POWER4 LPAR uses 0xffff as invalid node,
- * dont warn in this case.
- */
- if (nid != 0xffff)
- printk(KERN_ERR "WARNING: cpu %ld "
- "maps to invalid NUMA node %d\n",
- lcpu, nid);
+ printk(KERN_ERR "WARNING: cpu %ld "
+ "maps to invalid NUMA node %d\n",
+ lcpu, nid);
nid = 0;
}
out:
@@ -442,10 +442,9 @@ new_range:
nid = of_node_to_nid(memory);
if (nid >= MAX_NUMNODES) {
- if (nid != 0xffff)
- printk(KERN_ERR "WARNING: memory at %lx maps "
- "to invalid NUMA node %d\n", start,
- nid);
+ printk(KERN_ERR "WARNING: memory at %lx maps "
+ "to invalid NUMA node %d\n", start,
+ nid);
nid = 0;
}
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 5/7] powerpc numa: Consolidate handling of Power4 special case
2006-03-21 0:36 ` [PATCH 5/7] powerpc numa: Consolidate handling of Power4 special case Nathan Lynch
@ 2006-03-21 3:54 ` Jon Mason
0 siblings, 0 replies; 16+ messages in thread
From: Jon Mason @ 2006-03-21 3:54 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev
On Mon, Mar 20, 2006 at 06:36:15PM -0600, Nathan Lynch wrote:
> Code to handle Power4's invalid node id (0xffff) is duplicated for cpu
> and memory. Better to handle this case in one place --
> of_node_to_nid. Overall behavior should be unchanged.
>
> Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
>
> ---
>
> arch/powerpc/mm/numa.c | 23 +++++++++++------------
> 1 files changed, 11 insertions(+), 12 deletions(-)
>
> d9dd3889e58eeb34d1130d2514fea905ca2cab6a
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index e511ca1..4a6cbb0 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -207,6 +207,11 @@ static int of_node_to_nid(struct device_
> device->full_name);
> nid = 0;
> }
> +
> + /* POWER4 LPAR uses 0xffff as invalid node */
> + if (nid == 0xffff)
A #define for 0xffff would make the code much nicer. Something like
POWER4_LPAR_INVALID_NODEID could also enable you to remove the above
comment.
Thanks,
Jon
> + nid = 0;
> +
> return nid;
> }
>
> @@ -297,14 +302,9 @@ static int __cpuinit numa_setup_cpu(unsi
> nid = of_node_to_nid(cpu);
>
> if (nid >= num_online_nodes()) {
> - /*
> - * POWER4 LPAR uses 0xffff as invalid node,
> - * dont warn in this case.
> - */
> - if (nid != 0xffff)
> - printk(KERN_ERR "WARNING: cpu %ld "
> - "maps to invalid NUMA node %d\n",
> - lcpu, nid);
> + printk(KERN_ERR "WARNING: cpu %ld "
> + "maps to invalid NUMA node %d\n",
> + lcpu, nid);
> nid = 0;
> }
> out:
> @@ -442,10 +442,9 @@ new_range:
> nid = of_node_to_nid(memory);
>
> if (nid >= MAX_NUMNODES) {
> - if (nid != 0xffff)
> - printk(KERN_ERR "WARNING: memory at %lx maps "
> - "to invalid NUMA node %d\n", start,
> - nid);
> + printk(KERN_ERR "WARNING: memory at %lx maps "
> + "to invalid NUMA node %d\n", start,
> + nid);
> nid = 0;
> }
>
> --
> 1.2.4
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6/7] powerpc numa: Support sparse online node map
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
` (4 preceding siblings ...)
2006-03-21 0:36 ` [PATCH 5/7] powerpc numa: Consolidate handling of Power4 special case Nathan Lynch
@ 2006-03-21 0:36 ` Nathan Lynch
2006-03-21 0:37 ` [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes Nathan Lynch
6 siblings, 0 replies; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
The powerpc numa code unconditionally onlines all nodes from 0 to the
highest node id found, regardless of whether cpus or memory are
present in the nodes. This wastes 8K per node and complicates some
cpu and memory hotplug situations, such as adding a resource that
doesn't map to one of the nodes discovered at boot.
Set nodes online as resources are scanned. Fall back to node 0 only
when we're sure this isn't a NUMA machine.
Instead of defaulting to node 0 for cases of hot-adding a resource
which doesn't belong to any initialized node, assign it to the first
online node.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 95 ++++++++++++++++++++++--------------------------
1 files changed, 43 insertions(+), 52 deletions(-)
d49f5d199a1939ebf55c9554caf78fbe7e32c598
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 4a6cbb0..fe0ee6d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -191,27 +191,28 @@ static int *of_get_associativity(struct
return (unsigned int *)get_property(dev, "ibm,associativity", NULL);
}
+/* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
+ * info is found.
+ */
static int of_node_to_nid(struct device_node *device)
{
- int nid;
+ int nid = -1;
unsigned int *tmp;
if (min_common_depth == -1)
- return 0;
+ goto out;
tmp = of_get_associativity(device);
- if (tmp && (tmp[0] >= min_common_depth)) {
+ if (!tmp)
+ goto out;
+
+ if (tmp[0] >= min_common_depth)
nid = tmp[min_common_depth];
- } else {
- dbg("WARNING: no NUMA information for %s\n",
- device->full_name);
- nid = 0;
- }
/* POWER4 LPAR uses 0xffff as invalid node */
- if (nid == 0xffff)
- nid = 0;
-
+ if (nid == 0xffff || nid >= MAX_NUMNODES)
+ nid = -1;
+out:
return nid;
}
@@ -301,15 +302,9 @@ static int __cpuinit numa_setup_cpu(unsi
nid = of_node_to_nid(cpu);
- if (nid >= num_online_nodes()) {
- printk(KERN_ERR "WARNING: cpu %ld "
- "maps to invalid NUMA node %d\n",
- lcpu, nid);
- nid = 0;
- }
+ if (nid < 0 || !node_online(nid))
+ nid = any_online_node(NODE_MASK_ALL);
out:
- node_set_online(nid);
-
map_cpu_to_node(lcpu, nid);
of_node_put(cpu);
@@ -376,7 +371,7 @@ static int __init parse_numa_properties(
{
struct device_node *cpu = NULL;
struct device_node *memory = NULL;
- int max_domain = 0;
+ int default_nid = 0;
unsigned long i;
if (numa_enabled == 0) {
@@ -392,25 +387,26 @@ static int __init parse_numa_properties(
dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
/*
- * Even though we connect cpus to numa domains later in SMP init,
- * we need to know the maximum node id now. This is because each
- * node id must have NODE_DATA etc backing it.
- * As a result of hotplug we could still have cpus appear later on
- * with larger node ids. In that case we force the cpu into node 0.
+ * Even though we connect cpus to numa domains later in SMP
+ * init, we need to know the node ids now. This is because
+ * each node to be onlined must have NODE_DATA etc backing it.
*/
- for_each_cpu(i) {
+ for_each_present_cpu(i) {
int nid;
cpu = find_cpu_node(i);
+ BUG_ON(!cpu);
+ nid = of_node_to_nid(cpu);
+ of_node_put(cpu);
- if (cpu) {
- nid = of_node_to_nid(cpu);
- of_node_put(cpu);
-
- if (nid < MAX_NUMNODES &&
- max_domain < nid)
- max_domain = nid;
- }
+ /*
+ * Don't fall back to default_nid yet -- we will plug
+ * cpus into nodes once the memory scan has discovered
+ * the topology.
+ */
+ if (nid < 0)
+ continue;
+ node_set_online(nid);
}
get_n_mem_cells(&n_mem_addr_cells, &n_mem_size_cells);
@@ -439,17 +435,15 @@ new_range:
start = read_n_cells(n_mem_addr_cells, &memcell_buf);
size = read_n_cells(n_mem_size_cells, &memcell_buf);
+ /*
+ * Assumption: either all memory nodes or none will
+ * have associativity properties. If none, then
+ * everything goes to default_nid.
+ */
nid = of_node_to_nid(memory);
-
- if (nid >= MAX_NUMNODES) {
- printk(KERN_ERR "WARNING: memory at %lx maps "
- "to invalid NUMA node %d\n", start,
- nid);
- nid = 0;
- }
-
- if (max_domain < nid)
- max_domain = nid;
+ if (nid < 0)
+ nid = default_nid;
+ node_set_online(nid);
if (!(size = numa_enforce_memory_limit(start, size))) {
if (--ranges)
@@ -465,10 +459,7 @@ new_range:
goto new_range;
}
- for (i = 0; i <= max_domain; i++)
- node_set_online(i);
-
- max_domain = numa_setup_cpu(boot_cpuid);
+ numa_setup_cpu(boot_cpuid);
return 0;
}
@@ -768,10 +759,10 @@ int hot_add_scn_to_nid(unsigned long scn
{
struct device_node *memory = NULL;
nodemask_t nodes;
- int nid = 0;
+ int default_nid = any_online_node(NODE_MASK_ALL);
if (!numa_enabled || (min_common_depth < 0))
- return nid;
+ return default_nid;
while ((memory = of_find_node_by_type(memory, "memory")) != NULL) {
unsigned long start, size;
@@ -791,8 +782,8 @@ ha_new_range:
nid = of_node_to_nid(memory);
/* Domains not present at boot default to 0 */
- if (!node_online(nid))
- nid = any_online_node(NODE_MASK_ALL);
+ if (nid < 0 || !node_online(nid))
+ nid = default_nid;
if ((scn_addr >= start) && (scn_addr < (start + size))) {
of_node_put(memory);
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes
2006-03-21 0:33 [PATCH 0/7] powerpc numa updates and fixes Nathan Lynch
` (5 preceding siblings ...)
2006-03-21 0:36 ` [PATCH 6/7] powerpc numa: Support sparse online node map Nathan Lynch
@ 2006-03-21 0:37 ` Nathan Lynch
2006-03-21 18:38 ` Dave Hansen
6 siblings, 1 reply; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 0:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
We can plug the boot cpu into its node independently of whether numa
topology is detected. And numa_setup_cpu does the right thing for all
cases now, so remove special-casing for non-numa from the cpu hotplug
callback.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
arch/powerpc/mm/numa.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)
69d1ca13915d4ba423d43177e491cd176b92e94c
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index fe0ee6d..e9f340d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -321,10 +321,7 @@ static int cpu_numa_callback(struct noti
switch (action) {
case CPU_UP_PREPARE:
- if (min_common_depth == -1 || !numa_enabled)
- map_cpu_to_node(lcpu, 0);
- else
- numa_setup_cpu(lcpu);
+ numa_setup_cpu(lcpu);
ret = NOTIFY_OK;
break;
#ifdef CONFIG_HOTPLUG_CPU
@@ -459,8 +456,6 @@ new_range:
goto new_range;
}
- numa_setup_cpu(boot_cpuid);
-
return 0;
}
@@ -475,7 +470,6 @@ static void __init setup_nonnuma(void)
printk(KERN_INFO "Memory hole size: %ldMB\n",
(top_of_ram - total_ram) >> 20);
- map_cpu_to_node(boot_cpuid, 0);
for (i = 0; i < lmb.memory.cnt; ++i)
add_region(0, lmb.memory.region[i].base >> PAGE_SHIFT,
lmb_size_pages(&lmb.memory, i));
@@ -612,6 +606,8 @@ void __init do_init_bootmem(void)
dump_numa_memory_topology();
register_cpu_notifier(&ppc64_numa_nb);
+ cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
+ (void *)(unsigned long)boot_cpuid);
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn, pages_present;
--
1.2.4
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes
2006-03-21 0:37 ` [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes Nathan Lynch
@ 2006-03-21 18:38 ` Dave Hansen
2006-03-21 19:16 ` Nathan Lynch
0 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2006-03-21 18:38 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev
On Mon, 2006-03-20 at 18:37 -0600, Nathan Lynch wrote:
> + cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
> + (void *)(unsigned long)boot_cpuid);
That double-cast really caught my eye. cpu_numa_callback() looks a
little bit confused about what type cpuids should be.
Its lcpu is an "unsigned long", but it has integers passed into it
(boot_cpuid), and calls map_cpu_to_node(lcpu, 0), where the first
argument is an integer, but an "unsigned long" is passed in. This may
be harmless, but I still have to think about it, which is bad.
Seems like just making cpu_numa_callback()'s lcpu an int would get rid
of at least one net cast. Why not just pass &boot_cpuid in there, and
do this:
int lcpu = *(int *)hcpu;
That makes it _really_ obvious what is going on. While it isn't
horribly uncommon to pass integers around inside of void*s, it can be a
bit confusing. You also get readability issues with long<->int
conversions as you saw.
By the way, what do the "l" and "h" in front of "cpu" mean anyway?
-- Dave
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes
2006-03-21 18:38 ` Dave Hansen
@ 2006-03-21 19:16 ` Nathan Lynch
2006-03-21 23:09 ` Michael Ellerman
0 siblings, 1 reply; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 19:16 UTC (permalink / raw)
To: Dave Hansen; +Cc: linuxppc-dev
On Tue, 2006-03-21 at 10:38 -0800, Dave Hansen wrote:
> On Mon, 2006-03-20 at 18:37 -0600, Nathan Lynch wrote:
> > + cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
> > + (void *)(unsigned long)boot_cpuid);
>
> That double-cast really caught my eye. cpu_numa_callback() looks a
> little bit confused about what type cpuids should be.
>
> Its lcpu is an "unsigned long", but it has integers passed into it
> (boot_cpuid), and calls map_cpu_to_node(lcpu, 0), where the first
> argument is an integer, but an "unsigned long" is passed in. This may
> be harmless, but I still have to think about it, which is bad.
>
> Seems like just making cpu_numa_callback()'s lcpu an int would get rid
> of at least one net cast. Why not just pass &boot_cpuid in there, and
> do this:
>
> int lcpu = *(int *)hcpu;
That's not the convention for cpu hotplug notifiers. The id of the cpu
subject to online/offline is passed in the void * argument. I'd have to
change the cpu hotplug core and every notifier in the kernel to
implement your suggestion.
>
> That makes it _really_ obvious what is going on. While it isn't
> horribly uncommon to pass integers around inside of void*s, it can be a
> bit confusing. You also get readability issues with long<->int
> conversions as you saw.
>
> By the way, what do the "l" and "h" in front of "cpu" mean anyway?
"logical" and "hot"? I dunno, just seemed to be the convention in other
cpu notifiers at the time the code was written.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes
2006-03-21 19:16 ` Nathan Lynch
@ 2006-03-21 23:09 ` Michael Ellerman
2006-03-21 23:22 ` Nathan Lynch
0 siblings, 1 reply; 16+ messages in thread
From: Michael Ellerman @ 2006-03-21 23:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
[-- Attachment #1: Type: text/plain, Size: 782 bytes --]
On Wed, 22 Mar 2006 06:16, Nathan Lynch wrote:
> On Tue, 2006-03-21 at 10:38 -0800, Dave Hansen wrote:
> > By the way, what do the "l" and "h" in front of "cpu" mean anyway?
>
> "logical" and "hot"? I dunno, just seemed to be the convention in other
> cpu notifiers at the time the code was written.
Ouch, that's unfortunate. In the powerpc code hcpu _usually_ means hard cpu
number, as opposed to logical (lcpu). In this case though it looks like hcpu
holds the logical cpu number, which is a bit icky. That might be worth
fixing.
cheers
--
Michael Ellerman
IBM OzLabs
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes
2006-03-21 23:09 ` Michael Ellerman
@ 2006-03-21 23:22 ` Nathan Lynch
2006-03-21 23:34 ` Michael Ellerman
0 siblings, 1 reply; 16+ messages in thread
From: Nathan Lynch @ 2006-03-21 23:22 UTC (permalink / raw)
To: michael; +Cc: linuxppc-dev
On Wed, 2006-03-22 at 10:09 +1100, Michael Ellerman wrote:
> On Wed, 22 Mar 2006 06:16, Nathan Lynch wrote:
> > On Tue, 2006-03-21 at 10:38 -0800, Dave Hansen wrote:
> > > By the way, what do the "l" and "h" in front of "cpu" mean anyway?
> >
> > "logical" and "hot"? I dunno, just seemed to be the convention in other
> > cpu notifiers at the time the code was written.
>
> Ouch, that's unfortunate. In the powerpc code hcpu _usually_ means hard cpu
> number, as opposed to logical (lcpu).
Grep begs to differ:
$ grep -rw hcpu arch/powerpc include/asm-powerpc
arch/powerpc/kernel/sysfs.c: unsigned long action, void *hcpu)
arch/powerpc/kernel/sysfs.c: unsigned int cpu = (unsigned int)(long)hcpu;
arch/powerpc/mm/numa.c: void *hcpu)
arch/powerpc/mm/numa.c: unsigned long lcpu = (unsigned long)hcpu;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] powerpc numa: Consolidate assignment of cpus to nodes
2006-03-21 23:22 ` Nathan Lynch
@ 2006-03-21 23:34 ` Michael Ellerman
0 siblings, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2006-03-21 23:34 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch
[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]
On Wed, 22 Mar 2006 10:22, Nathan Lynch wrote:
> On Wed, 2006-03-22 at 10:09 +1100, Michael Ellerman wrote:
> > On Wed, 22 Mar 2006 06:16, Nathan Lynch wrote:
> > > On Tue, 2006-03-21 at 10:38 -0800, Dave Hansen wrote:
> > > > By the way, what do the "l" and "h" in front of "cpu" mean anyway?
> > >
> > > "logical" and "hot"? I dunno, just seemed to be the convention in
> > > other cpu notifiers at the time the code was written.
> >
> > Ouch, that's unfortunate. In the powerpc code hcpu _usually_ means hard
> > cpu number, as opposed to logical (lcpu).
>
> Grep begs to differ:
>
> $ grep -rw hcpu arch/powerpc include/asm-powerpc
> arch/powerpc/kernel/sysfs.c: unsigned long
> action, void *hcpu) arch/powerpc/kernel/sysfs.c: unsigned int cpu =
> (unsigned int)(long)hcpu; arch/powerpc/mm/numa.c: void
> *hcpu)
> arch/powerpc/mm/numa.c: unsigned long lcpu = (unsigned long)hcpu;
You're right, it's actually a mixture of pcpu, hw_cpuid, hardid etc. So there
should be no confusion by using hcpu for "hot" cpu.
cheers
--
Michael Ellerman
IBM OzLabs
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread