From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>,
Mike Galbraith <efault@gmx.de>,
Suresh Siddha <suresh.b.siddha@intel.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Paul Turner <pjt@google.com>
Subject: Re: sched: Avoid SMT siblings in select_idle_sibling() if possible
Date: Thu, 22 Mar 2012 21:02:05 +0530 [thread overview]
Message-ID: <20120322153205.GA28570@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120306091410.GD27238@elte.hu>
* Ingo Molnar <mingo@elte.hu> [2012-03-06 10:14:11]:
> > I did some experiments with volanomark and it does turn out to
> > be sensitive to SD_BALANCE_WAKE, while the other wake-heavy
> > benchmark that I am dealing with (Trade) benefits from it.
>
> Does volanomark still do yield(), thereby invoking a random
> shuffle of thread scheduling and pretty much voluntarily
> ejecting itself from most scheduler performance considerations?
>
> If it uses a real locking primitive such as futexes then its
> performance matters more.
Some more interesting results on more recent tip kernel.
Machine : 2 Quad-core Intel X5570 CPU w/ H/T enabled (16 cpus)
Kernel : tip (HEAD at ee415e2)
guest VM : 2.6.18 linux kernel based enterprise guest
Benchmarks are run in two scenarios:
1. BM -> Bare Metal. Benchmark is run on bare metal in root cgroup
2. VM -> Benchmark is run inside a guest VM. Several cpu hogs (in
various cgroups) are run on host. Cgroup setup is as below:
/sys (cpu.shares = 1024, hosts all system tasks)
/libvirt (cpu.shares = 20000)
/libvirt/qemu/VM (cpu.shares = 8192. guest VM w/ 8 vcpus)
/libvirt/qemu/hoga (cpu.shares = 1024. hosts 4 cpu hogs)
/libvirt/qemu/hogb (cpu.shares = 1024. hosts 4 cpu hogs)
/libvirt/qemu/hogc (cpu.shares = 1024. hosts 4 cpu hogs)
/libvirt/qemu/hogd (cpu.shares = 1024. hosts 4 cpu hogs)
First BM (bare metal) scenario:
tip tip + patch
volano 1 0.955 (4.5% degradation)
sysbench [n1] 1 0.9984 (0.16% degradation)
tbench 1 [n2] 1 0.9096 (9% degradation)
Now the more interesting VM scenario:
tip tip + patch
volano 1 1.29 (29% improvement)
sysbench [n3] 1 2 (100% improvement)
tbench 1 [n4] 1 1.07 (7% improvement)
tbench 8 [n5] 1 1.26 (26% improvement)
httperf [n6] 1 1.05 (5% improvement)
Trade 1 1.31 (31% improvement)
Notes:
n1. sysbench was run with 16 threads.
n2. tbench was run on localhost with 1 client
n3. sysbench was run with 8 threads
n4. tbench was run on localhost with 1 client
n5. tbench was run over network with 8 clients
n6. httperf was run as with burst-length of 100 and wsess of 100,500,0
So the patch seems to be a wholesome win when VCPU threads are waking
up (in a highly contended environment). One reason could be that any assumption
of better cache hits by running (vcpu) threads on its prev_cpu may not
be fully correct as vcpu threads could represent many different threads
internally?
Anyway, there are degradations as well, considering which I see several
possibilities:
1. Do balance-on-wake for vcpu threads only.
2. Document tuning possibility to improve performance in virtualized
environment:
- Either via sched_domain flags (disable SD_WAKE_AFFINE
at all levels and enable SD_BALANCE_WAKE at SMT/MC levels)
- Or via a new sched_feat(BALANCE_WAKE) tunable
Any other thoughts or suggestions for more experiments?
--
Balance threads on wakeup to let it run on least-loaded CPU in the same
cache domain as its prev_cpu (or cur_cpu if wake_affine() test obliges).
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
---
include/linux/topology.h | 4 ++--
kernel/sched/fair.c | 5 ++++-
2 files changed, 6 insertions(+), 3 deletions(-)
Index: current/include/linux/topology.h
===================================================================
--- current.orig/include/linux/topology.h
+++ current/include/linux/topology.h
@@ -96,7 +96,7 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
+ | 1*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
| 1*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
@@ -129,7 +129,7 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
+ | 1*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
Index: current/kernel/sched/fair.c
===================================================================
--- current.orig/kernel/sched/fair.c
+++ current/kernel/sched/fair.c
@@ -2766,7 +2766,10 @@ select_task_rq_fair(struct task_struct *
prev_cpu = cpu;
new_cpu = select_idle_sibling(p, prev_cpu);
- goto unlock;
+ if (idle_cpu(new_cpu))
+ goto unlock;
+ sd = rcu_dereference(per_cpu(sd_llc, prev_cpu));
+ cpu = prev_cpu;
}
while (sd) {
next prev parent reply other threads:[~2012-03-22 15:32 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1329764866.2293.376.camhel@twins>
2012-03-05 15:24 ` sched: Avoid SMT siblings in select_idle_sibling() if possible Srivatsa Vaddagiri
2012-03-06 9:14 ` Ingo Molnar
2012-03-06 10:03 ` Srivatsa Vaddagiri
2012-03-22 15:32 ` Srivatsa Vaddagiri [this message]
2012-03-23 6:38 ` Mike Galbraith
2012-03-26 8:29 ` Peter Zijlstra
2012-03-26 8:36 ` Peter Zijlstra
2012-03-26 17:35 ` Srivatsa Vaddagiri
2012-03-26 18:06 ` Peter Zijlstra
2012-03-27 13:56 ` Mike Galbraith
2011-11-15 9:46 Peter Zijlstra
2011-11-16 1:14 ` Suresh Siddha
2011-11-16 9:24 ` Mike Galbraith
2011-11-16 18:37 ` Suresh Siddha
2011-11-17 1:59 ` Mike Galbraith
2011-11-17 15:38 ` Mike Galbraith
2011-11-17 15:56 ` Peter Zijlstra
2011-11-17 16:38 ` Mike Galbraith
2011-11-17 17:36 ` Suresh Siddha
2011-11-18 15:14 ` Mike Galbraith
2012-02-20 14:41 ` Peter Zijlstra
2012-02-20 15:03 ` Srivatsa Vaddagiri
2012-02-20 18:25 ` Mike Galbraith
2012-02-21 0:06 ` Srivatsa Vaddagiri
2012-02-21 6:37 ` Mike Galbraith
2012-02-21 8:09 ` Srivatsa Vaddagiri
2012-02-20 18:14 ` Mike Galbraith
2012-02-20 18:15 ` Peter Zijlstra
2012-02-20 19:07 ` Peter Zijlstra
2012-02-21 5:43 ` Mike Galbraith
2012-02-21 8:32 ` Srivatsa Vaddagiri
2012-02-21 9:21 ` Mike Galbraith
2012-02-21 10:37 ` Peter Zijlstra
2012-02-21 14:58 ` Srivatsa Vaddagiri
2012-02-23 10:49 ` Srivatsa Vaddagiri
2012-02-23 11:19 ` Ingo Molnar
2012-02-23 12:18 ` Srivatsa Vaddagiri
2012-02-23 11:20 ` Srivatsa Vaddagiri
2012-02-23 11:26 ` Ingo Molnar
2012-02-23 11:32 ` Srivatsa Vaddagiri
2012-02-23 16:17 ` Ingo Molnar
2012-02-23 11:21 ` Mike Galbraith
2012-02-25 6:54 ` Srivatsa Vaddagiri
2012-02-25 8:30 ` Mike Galbraith
2012-02-27 22:11 ` Suresh Siddha
2012-02-28 5:05 ` Mike Galbraith
2011-11-17 19:08 ` Suresh Siddha
2011-11-18 15:12 ` Peter Zijlstra
2011-11-18 15:26 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120322153205.GA28570@linux.vnet.ibm.com \
--to=vatsa@linux.vnet.ibm.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox