* [PATCH] x86: use near online node instead of round bin for numa
@ 2009-10-03 18:26 Yinghai Lu
2009-10-05 14:44 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2009-10-03 18:26 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Suresh Siddha,
Tejun Heo
Cc: linux-kernel@vger.kernel.org
cpu to node mapping is set in following sequence:
1. numa_init_array: set up roundbin from cpu to online node
2. init_cpu_to_node: set that according to apicid_to_node[] according to srat
only handle that node is online, and leave other cpu on node
without ram (aka not online) to still round-bin
3. later srat_detect_node for intel/amd, will use first_online node or near by
node.
problem is that setup_per_cpu_areas() is called between 2 and 3. the per_cpu
for cpu on node with ram is on different node. and could put that on node with
two hops away.
so try add find_near_online_node() and call int init_cpu_to_node()
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/kernel/cpu/intel.c | 6 +++++-
arch/x86/mm/numa_64.c | 21 ++++++++++++++++++++-
2 files changed, 25 insertions(+), 2 deletions(-)
Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -601,6 +601,25 @@ static __init int numa_setup(char *opt)
early_param("numa", numa_setup);
#ifdef CONFIG_NUMA
+
+static __init int find_near_online_node(int node)
+{
+ int n, val;
+ int min_val = INT_MAX;
+ int best_node = -1;
+
+ for_each_online_node(n) {
+ val = node_distance(node, n);
+
+ if (val < min_val) {
+ min_val = val;
+ best_node = n;
+ }
+ }
+
+ return best_node;
+}
+
/*
* Setup early cpu_to_node.
*
@@ -632,7 +651,7 @@ void __init init_cpu_to_node(void)
if (node == NUMA_NO_NODE)
continue;
if (!node_online(node))
- continue;
+ node = find_near_online_node(node);
numa_set_node(cpu, node);
}
}
Index: linux-2.6/arch/x86/kernel/cpu/intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/intel.c
+++ linux-2.6/arch/x86/kernel/cpu/intel.c
@@ -263,8 +263,12 @@ static void __cpuinit srat_detect_node(s
/* Don't do the funky fallback heuristics the AMD version employs
for now. */
node = apicid_to_node[apicid];
- if (node == NUMA_NO_NODE || !node_online(node))
+ if (node == NUMA_NO_NODE)
node = first_node(node_online_map);
+ else if (!node_online(node)) {
+ /* reuse the value from init_cpu_to_node() */
+ node = cpu_to_node(cpu);
+ }
numa_set_node(cpu, node);
printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86: use near online node instead of round bin for numa
2009-10-03 18:26 [PATCH] x86: use near online node instead of round bin for numa Yinghai Lu
@ 2009-10-05 14:44 ` Andi Kleen
2009-10-05 18:09 ` Yinghai Lu
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2009-10-05 14:44 UTC (permalink / raw)
To: Yinghai Lu
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Suresh Siddha,
Tejun Heo, linux-kernel@vger.kernel.org
Yinghai Lu <yinghai@kernel.org> writes:
> cpu to node mapping is set in following sequence:
> 1. numa_init_array: set up roundbin from cpu to online node
> 2. init_cpu_to_node: set that according to apicid_to_node[] according to srat
> only handle that node is online, and leave other cpu on node
> without ram (aka not online) to still round-bin
> 3. later srat_detect_node for intel/amd, will use first_online node or near by
> node.
>
> problem is that setup_per_cpu_areas() is called between 2 and 3. the per_cpu
> for cpu on node with ram is on different node. and could put that on node with
> two hops away.
>
> so try add find_near_online_node() and call int init_cpu_to_node()
This fallback case should not really happen anyways, unless the BIOS is buggy
(in this case it might better to completely reject the SRAT because
more might be wrong).
Do you have a system where this is needed?
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86: use near online node instead of round bin for numa
2009-10-05 14:44 ` Andi Kleen
@ 2009-10-05 18:09 ` Yinghai Lu
2009-10-05 18:35 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2009-10-05 18:09 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Suresh Siddha,
Tejun Heo, linux-kernel@vger.kernel.org
Andi Kleen wrote:
> Yinghai Lu <yinghai@kernel.org> writes:
>
>> cpu to node mapping is set in following sequence:
>> 1. numa_init_array: set up roundbin from cpu to online node
>> 2. init_cpu_to_node: set that according to apicid_to_node[] according to srat
>> only handle that node is online, and leave other cpu on node
>> without ram (aka not online) to still round-bin
>> 3. later srat_detect_node for intel/amd, will use first_online node or near by
>> node.
>>
>> problem is that setup_per_cpu_areas() is called between 2 and 3. the per_cpu
>> for cpu on node with ram is on different node. and could put that on node with
>> two hops away.
>>
>> so try add find_near_online_node() and call int init_cpu_to_node()
>
> This fallback case should not really happen anyways, unless the BIOS is buggy
> (in this case it might better to completely reject the SRAT because
> more might be wrong).
SRAT is right, and some node has no ram installed.
>
> Do you have a system where this is needed?
sure.
YH
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86: use near online node instead of round bin for numa
2009-10-05 18:09 ` Yinghai Lu
@ 2009-10-05 18:35 ` Andi Kleen
2009-10-05 18:40 ` Yinghai Lu
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2009-10-05 18:35 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andi Kleen, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Suresh Siddha, Tejun Heo, linux-kernel@vger.kernel.org
On Mon, Oct 05, 2009 at 11:09:59AM -0700, Yinghai Lu wrote:
> Andi Kleen wrote:
> > Yinghai Lu <yinghai@kernel.org> writes:
> >
> >> cpu to node mapping is set in following sequence:
> >> 1. numa_init_array: set up roundbin from cpu to online node
> >> 2. init_cpu_to_node: set that according to apicid_to_node[] according to srat
> >> only handle that node is online, and leave other cpu on node
> >> without ram (aka not online) to still round-bin
> >> 3. later srat_detect_node for intel/amd, will use first_online node or near by
> >> node.
> >>
> >> problem is that setup_per_cpu_areas() is called between 2 and 3. the per_cpu
> >> for cpu on node with ram is on different node. and could put that on node with
> >> two hops away.
> >>
> >> so try add find_near_online_node() and call int init_cpu_to_node()
> >
> > This fallback case should not really happen anyways, unless the BIOS is buggy
> > (in this case it might better to completely reject the SRAT because
> > more might be wrong).
> SRAT is right, and some node has no ram installed.
In this case there should be still a PXM to define the CPU locality -- your BIOS is broken.
Please fix it there.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86: use near online node instead of round bin for numa
2009-10-05 18:35 ` Andi Kleen
@ 2009-10-05 18:40 ` Yinghai Lu
2009-10-05 18:50 ` Andi Kleen
0 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2009-10-05 18:40 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Suresh Siddha,
Tejun Heo, linux-kernel@vger.kernel.org
Andi Kleen wrote:
> On Mon, Oct 05, 2009 at 11:09:59AM -0700, Yinghai Lu wrote:
>> Andi Kleen wrote:
>>> Yinghai Lu <yinghai@kernel.org> writes:
>>>
>>>> cpu to node mapping is set in following sequence:
>>>> 1. numa_init_array: set up roundbin from cpu to online node
>>>> 2. init_cpu_to_node: set that according to apicid_to_node[] according to srat
>>>> only handle that node is online, and leave other cpu on node
>>>> without ram (aka not online) to still round-bin
>>>> 3. later srat_detect_node for intel/amd, will use first_online node or near by
>>>> node.
>>>>
>>>> problem is that setup_per_cpu_areas() is called between 2 and 3. the per_cpu
>>>> for cpu on node with ram is on different node. and could put that on node with
>>>> two hops away.
>>>>
>>>> so try add find_near_online_node() and call int init_cpu_to_node()
>>> This fallback case should not really happen anyways, unless the BIOS is buggy
>>> (in this case it might better to completely reject the SRAT because
>>> more might be wrong).
>> SRAT is right, and some node has no ram installed.
>
> In this case there should be still a PXM to define the CPU locality -- your BIOS is broken.
> Please fix it there.
I don't think so.
YH
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86: use near online node instead of round bin for numa
2009-10-05 18:40 ` Yinghai Lu
@ 2009-10-05 18:50 ` Andi Kleen
0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2009-10-05 18:50 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andi Kleen, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Suresh Siddha, Tejun Heo, linux-kernel@vger.kernel.org
On Mon, Oct 05, 2009 at 11:40:46AM -0700, Yinghai Lu wrote:
> Andi Kleen wrote:
> > On Mon, Oct 05, 2009 at 11:09:59AM -0700, Yinghai Lu wrote:
> >> Andi Kleen wrote:
> >>> Yinghai Lu <yinghai@kernel.org> writes:
> >>>
> >>>> cpu to node mapping is set in following sequence:
> >>>> 1. numa_init_array: set up roundbin from cpu to online node
> >>>> 2. init_cpu_to_node: set that according to apicid_to_node[] according to srat
> >>>> only handle that node is online, and leave other cpu on node
> >>>> without ram (aka not online) to still round-bin
> >>>> 3. later srat_detect_node for intel/amd, will use first_online node or near by
> >>>> node.
> >>>>
> >>>> problem is that setup_per_cpu_areas() is called between 2 and 3. the per_cpu
> >>>> for cpu on node with ram is on different node. and could put that on node with
> >>>> two hops away.
> >>>>
> >>>> so try add find_near_online_node() and call int init_cpu_to_node()
> >>> This fallback case should not really happen anyways, unless the BIOS is buggy
> >>> (in this case it might better to completely reject the SRAT because
> >>> more might be wrong).
> >> SRAT is right, and some node has no ram installed.
> >
> > In this case there should be still a PXM to define the CPU locality -- your BIOS is broken.
> > Please fix it there.
>
> I don't think so.
Let's put it like this: your BIOS does not describe the full system
topology which is a severe BIOS bug. Putting hacks into Linux
to work around that is not the right solution.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-05 18:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-03 18:26 [PATCH] x86: use near online node instead of round bin for numa Yinghai Lu
2009-10-05 14:44 ` Andi Kleen
2009-10-05 18:09 ` Yinghai Lu
2009-10-05 18:35 ` Andi Kleen
2009-10-05 18:40 ` Yinghai Lu
2009-10-05 18:50 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).