* [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 6:52 ` Alex Shi
0 siblings, 0 replies; 13+ messages in thread
From: Alex Shi @ 2012-06-06 6:52 UTC (permalink / raw)
To: a.p.zijlstra
Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
seto.hidetoshi, ak, arjan.van.de.ven
commit cb83b629b remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.
But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is
not too slow between nodes. So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.
This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39eb601..b2ee41a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6286,7 +6286,7 @@ static int sched_domains_curr_level;
static inline int sd_local_flags(int level)
{
- if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+ if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
return 0;
return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
--
1.7.5.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 6:52 ` Alex Shi
0 siblings, 0 replies; 13+ messages in thread
From: Alex Shi @ 2012-06-06 6:52 UTC (permalink / raw)
To: a.p.zijlstra
Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
seto.hidetoshi, ak, arjan.van.de.ven
commit cb83b629b remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.
But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is
not too slow between nodes. So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.
This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39eb601..b2ee41a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6286,7 +6286,7 @@ static int sched_domains_curr_level;
static inline int sd_local_flags(int level)
{
- if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+ if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
return 0;
return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
--
1.7.5.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 6:52 ` Alex Shi
0 siblings, 0 replies; 13+ messages in thread
From: Alex Shi @ 2012-06-06 6:52 UTC (permalink / raw)
To: a.p.zijlstra
Cc: linux-mips, linux-ia64, linux-sh, dhowells, paulus, hpa,
sparclinux, mingo, sivanich, x86, greg.pearson, chris.mason,
arjan.van.de.ven, mattst88, pjt, fenghua.yu, seto.hidetoshi,
cmetcalf, ak, ink, anton, tglx, kamezawa.hiroyu, rth, tony.luck,
torvalds, linux-kernel, ralf, lethal, linux-alpha, bob.picco,
akpm, linuxppc-dev, davem
commit cb83b629b remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.
But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is
not too slow between nodes. So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.
This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
kernel/sched/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39eb601..b2ee41a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6286,7 +6286,7 @@ static int sched_domains_curr_level;
static inline int sd_local_flags(int level)
{
- if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+ if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
return 0;
return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
--
1.7.5.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
2012-06-06 6:52 ` Alex Shi
(?)
@ 2012-06-06 9:01 ` Peter Zijlstra
-1 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2012-06-06 9:01 UTC (permalink / raw)
To: Alex Shi
Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
seto.hidetoshi, ak, arjan.van.de.ven
On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
> - if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
> + if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
I actually considered this.. I just felt a little uneasy re-purposing
the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
mean expensive-away-distance.
So I've taken this.
thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 9:01 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2012-06-06 9:01 UTC (permalink / raw)
To: Alex Shi
Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
seto.hidetoshi, ak, arjan.van.de.ven
On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
> - if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
> + if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
I actually considered this.. I just felt a little uneasy re-purposing
the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
mean expensive-away-distance.
So I've taken this.
thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 9:01 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2012-06-06 9:01 UTC (permalink / raw)
To: Alex Shi
Cc: linux-mips, linux-ia64, linux-sh, dhowells, paulus, hpa,
sparclinux, mingo, sivanich, x86, greg.pearson, chris.mason,
arjan.van.de.ven, mattst88, pjt, fenghua.yu, seto.hidetoshi,
cmetcalf, ak, ink, anton, tglx, kamezawa.hiroyu, rth, tony.luck,
torvalds, linux-kernel, ralf, lethal, linux-alpha, bob.picco,
akpm, linuxppc-dev, davem
On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
> - if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
> + if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)=20
I actually considered this.. I just felt a little uneasy re-purposing
the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
mean expensive-away-distance.
So I've taken this.
thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
2012-06-06 6:52 ` Alex Shi
(?)
@ 2012-06-06 10:53 ` Sergei Shtylyov
-1 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtylyov @ 2012-06-06 10:53 UTC (permalink / raw)
To: Alex Shi
Cc: a.p.zijlstra, anton, benh, cmetcalf, dhowells, davem, fenghua.yu,
hpa, ink, linux-alpha, linux-ia64, linux-kernel, linux-mips,
linuxppc-dev, linux-sh, mattst88, paulus, lethal, ralf, rth,
sparclinux, tony.luck, x86, sivanich, greg.pearson,
kamezawa.hiroyu, bob.picco, chris.mason, torvalds, akpm, mingo,
pjt, tglx, seto.hidetoshi, ak, arjan.van.de.ven
Hello.
On 06-06-2012 10:52, Alex Shi wrote:
> commit cb83b629b
Please also specify that commit's summary in parens.
> remove the NODE sched domain and check if the node
> distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
> lose the load balance chance at exec/fork/wake_affine points.
> But actually, even the node distance is farther than REMOTE_DISTANCE,
> Modern CPUs also has QPI like connections, that make memory access is
"Is" not needed here.
> not too slow between nodes. So above losing on NUMA machine make a
> huge performance regression on benchmark: hackbench, tbench, netperf
> and oltp etc.
> This patch will recover the scheduler behavior to old mode on all my
> Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
> perfromance regressions. (all of them just has 2 kinds distance, 10 21)
> Signed-off-by: Alex Shi<alex.shi@intel.com>
WBR, Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 10:53 ` Sergei Shtylyov
0 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtylyov @ 2012-06-06 10:53 UTC (permalink / raw)
To: Alex Shi
Cc: a.p.zijlstra, anton, benh, cmetcalf, dhowells, davem, fenghua.yu,
hpa, ink, linux-alpha, linux-ia64, linux-kernel, linux-mips,
linuxppc-dev, linux-sh, mattst88, paulus, lethal, ralf, rth,
sparclinux, tony.luck, x86, sivanich, greg.pearson,
kamezawa.hiroyu, bob.picco, chris.mason, torvalds, akpm, mingo,
pjt, tglx, seto.hidetoshi, ak, arjan.van.de.ven
Hello.
On 06-06-2012 10:52, Alex Shi wrote:
> commit cb83b629b
Please also specify that commit's summary in parens.
> remove the NODE sched domain and check if the node
> distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
> lose the load balance chance at exec/fork/wake_affine points.
> But actually, even the node distance is farther than REMOTE_DISTANCE,
> Modern CPUs also has QPI like connections, that make memory access is
"Is" not needed here.
> not too slow between nodes. So above losing on NUMA machine make a
> huge performance regression on benchmark: hackbench, tbench, netperf
> and oltp etc.
> This patch will recover the scheduler behavior to old mode on all my
> Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
> perfromance regressions. (all of them just has 2 kinds distance, 10 21)
> Signed-off-by: Alex Shi<alex.shi@intel.com>
WBR, Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06 10:53 ` Sergei Shtylyov
0 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtylyov @ 2012-06-06 10:53 UTC (permalink / raw)
To: Alex Shi
Cc: linux-mips, linux-ia64, linux-sh, dhowells, paulus, hpa,
sparclinux, mingo, sivanich, x86, greg.pearson, chris.mason,
arjan.van.de.ven, mattst88, pjt, fenghua.yu, seto.hidetoshi,
a.p.zijlstra, cmetcalf, ak, ink, anton, tglx, kamezawa.hiroyu,
rth, tony.luck, torvalds, linux-kernel, ralf, lethal, linux-alpha,
bob.picco, akpm, linuxppc-dev, davem
Hello.
On 06-06-2012 10:52, Alex Shi wrote:
> commit cb83b629b
Please also specify that commit's summary in parens.
> remove the NODE sched domain and check if the node
> distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
> lose the load balance chance at exec/fork/wake_affine points.
> But actually, even the node distance is farther than REMOTE_DISTANCE,
> Modern CPUs also has QPI like connections, that make memory access is
"Is" not needed here.
> not too slow between nodes. So above losing on NUMA machine make a
> huge performance regression on benchmark: hackbench, tbench, netperf
> and oltp etc.
> This patch will recover the scheduler behavior to old mode on all my
> Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
> perfromance regressions. (all of them just has 2 kinds distance, 10 21)
> Signed-off-by: Alex Shi<alex.shi@intel.com>
WBR, Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:sched/urgent] sched/numa: Load balance between remote nodes
2012-06-06 6:52 ` Alex Shi
` (3 preceding siblings ...)
(?)
@ 2012-06-06 15:53 ` tip-bot for Alex Shi
-1 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Alex Shi @ 2012-06-06 15:53 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, a.p.zijlstra, alex.shi, tglx
Commit-ID: 10717dcde10d09f9fcee53a12a4236af1a82b484
Gitweb: http://git.kernel.org/tip/10717dcde10d09f9fcee53a12a4236af1a82b484
Author: Alex Shi <alex.shi@intel.com>
AuthorDate: Wed, 6 Jun 2012 14:52:51 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Jun 2012 16:52:25 +0200
sched/numa: Load balance between remote nodes
Commit cb83b629b ("sched/numa: Rewrite the CONFIG_NUMA sched
domain support") removed the NODE sched domain and started checking
if the node distance in SLIT table is farther than REMOTE_DISTANCE,
if so, it will lose the load balance chance at exec/fork/wake_affine
points.
But actually, even the node distance is farther than REMOTE_DISTANCE.
Modern CPUs also has QPI like connections, which ensures that memory
access is not too slow between nodes. So the above change in behavior
on NUMA machine causes a performance regression on various benchmarks:
hackbench, tbench, netperf, oltp, etc.
This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and thus fixes the
perfromance regressions. (all of them just have 2 kinds distance, 10, 21)
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338965571-9812-1-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c46958e..6546083 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6321,7 +6321,7 @@ static int sched_domains_curr_level;
static inline int sd_local_flags(int level)
{
- if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+ if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
return 0;
return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
2012-06-06 9:01 ` Peter Zijlstra
(?)
@ 2012-06-07 0:33 ` Alex Shi
-1 siblings, 0 replies; 13+ messages in thread
From: Alex Shi @ 2012-06-07 0:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
seto.hidetoshi, ak, arjan.van.de.ven
On 06/06/2012 05:01 PM, Peter Zijlstra wrote:
> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> - if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> + if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
>
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
>
I understand you, the BIOS guys don't have a good alignment with us on
this.
> So I've taken this.
>
> thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-07 0:33 ` Alex Shi
0 siblings, 0 replies; 13+ messages in thread
From: Alex Shi @ 2012-06-07 0:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
seto.hidetoshi, ak, arjan.van.de.ven
On 06/06/2012 05:01 PM, Peter Zijlstra wrote:
> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> - if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> + if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
>
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
>
I understand you, the BIOS guys don't have a good alignment with us on
this.
> So I've taken this.
>
> thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-07 0:33 ` Alex Shi
0 siblings, 0 replies; 13+ messages in thread
From: Alex Shi @ 2012-06-07 0:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-ia64, linux-sh, dhowells, paulus, hpa,
sparclinux, mingo, sivanich, x86, greg.pearson, chris.mason,
arjan.van.de.ven, mattst88, pjt, fenghua.yu, seto.hidetoshi,
cmetcalf, ak, ink, anton, tglx, kamezawa.hiroyu, rth, tony.luck,
torvalds, linux-kernel, ralf, lethal, linux-alpha, bob.picco,
akpm, linuxppc-dev, davem
On 06/06/2012 05:01 PM, Peter Zijlstra wrote:
> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> - if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> + if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
>
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
>
I understand you, the BIOS guys don't have a good alignment with us on
this.
> So I've taken this.
>
> thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-06-07 0:35 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-06 6:52 [RFC PATCH] sched/numa: do load balance between remote nodes Alex Shi
2012-06-06 6:52 ` Alex Shi
2012-06-06 6:52 ` Alex Shi
2012-06-06 9:01 ` Peter Zijlstra
2012-06-06 9:01 ` Peter Zijlstra
2012-06-06 9:01 ` Peter Zijlstra
2012-06-07 0:33 ` Alex Shi
2012-06-07 0:33 ` Alex Shi
2012-06-07 0:33 ` Alex Shi
2012-06-06 10:53 ` Sergei Shtylyov
2012-06-06 10:53 ` Sergei Shtylyov
2012-06-06 10:53 ` Sergei Shtylyov
2012-06-06 15:53 ` [tip:sched/urgent] sched/numa: Load " tip-bot for Alex Shi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.