From: tip-bot for Mike Galbraith <efault@gmx.de>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org,
torvalds@linux-foundation.org, a.p.zijlstra@chello.nl,
efault@gmx.de, akpm@linux-foundation.org, tglx@linutronix.de
Subject: [tip:sched/core] sched: Improve scalability via 'CPU buddies', which withstand random perturbations
Date: Tue, 24 Jul 2012 07:18:20 -0700 [thread overview]
Message-ID: <tip-970e178985cadbca660feb02f4d2ee3a09f7fdda@git.kernel.org> (raw)
In-Reply-To: <1339471112.7352.32.camel@marge.simpson.net>
Commit-ID: 970e178985cadbca660feb02f4d2ee3a09f7fdda
Gitweb: http://git.kernel.org/tip/970e178985cadbca660feb02f4d2ee3a09f7fdda
Author: Mike Galbraith <efault@gmx.de>
AuthorDate: Tue, 12 Jun 2012 05:18:32 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 24 Jul 2012 13:53:34 +0200
sched: Improve scalability via 'CPU buddies', which withstand random perturbations
Traversing an entire package is not only expensive, it also leads to tasks
bouncing all over a partially idle and possible quite large package. Fix
that up by assigning a 'buddy' CPU to try to motivate. Each buddy may try
to motivate that one other CPU, if it's busy, tough, it may then try its
SMT sibling, but that's all this optimization is allowed to cost.
Sibling cache buddies are cross-wired to prevent bouncing.
4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better:
clients 1 2 4 8 16 32 64 128
..........................................................................
pre 30 41 118 645 3769 6214 12233 14312
post 299 603 1211 2418 4697 6847 11606 14557
A nice increase in performance.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 39 ++++++++++++++++++++++++++++++++++++++-
kernel/sched/fair.c | 28 +++++++---------------------
3 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4a1f493..bc99529 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -949,6 +949,7 @@ struct sched_domain {
unsigned int smt_gain;
int flags; /* See SD_* */
int level;
+ int idle_buddy; /* cpu assigned to select_idle_sibling() */
/* Runtime fields. */
unsigned long last_balance; /* init to jiffies. units in jiffies */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b4a63d..536b213 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6024,6 +6024,11 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu)
* SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this
* allows us to avoid some pointer chasing select_idle_sibling().
*
+ * Iterate domains and sched_groups downward, assigning CPUs to be
+ * select_idle_sibling() hw buddy. Cross-wiring hw makes bouncing
+ * due to random perturbation self canceling, ie sw buddies pull
+ * their counterpart to their CPU's hw counterpart.
+ *
* Also keep a unique ID per domain (we use the first cpu number in
* the cpumask of the domain), this allows us to quickly tell if
* two cpus are in the same cache domain, see cpus_share_cache().
@@ -6037,8 +6042,40 @@ static void update_top_cache_domain(int cpu)
int id = cpu;
sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES);
- if (sd)
+ if (sd) {
+ struct sched_domain *tmp = sd;
+ struct sched_group *sg, *prev;
+ bool right;
+
+ /*
+ * Traverse to first CPU in group, and count hops
+ * to cpu from there, switching direction on each
+ * hop, never ever pointing the last CPU rightward.
+ */
+ do {
+ id = cpumask_first(sched_domain_span(tmp));
+ prev = sg = tmp->groups;
+ right = 1;
+
+ while (cpumask_first(sched_group_cpus(sg)) != id)
+ sg = sg->next;
+
+ while (!cpumask_test_cpu(cpu, sched_group_cpus(sg))) {
+ prev = sg;
+ sg = sg->next;
+ right = !right;
+ }
+
+ /* A CPU went down, never point back to domain start. */
+ if (right && cpumask_first(sched_group_cpus(sg->next)) == id)
+ right = false;
+
+ sg = right ? sg->next : prev;
+ tmp->idle_buddy = cpumask_first(sched_group_cpus(sg));
+ } while ((tmp = tmp->child));
+
id = cpumask_first(sched_domain_span(sd));
+ }
rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
per_cpu(sd_llc_id, cpu) = id;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c099cc6..dd00aaf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2637,8 +2637,6 @@ static int select_idle_sibling(struct task_struct *p, int target)
int cpu = smp_processor_id();
int prev_cpu = task_cpu(p);
struct sched_domain *sd;
- struct sched_group *sg;
- int i;
/*
* If the task is going to be woken-up on this cpu and if it is
@@ -2655,29 +2653,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
return prev_cpu;
/*
- * Otherwise, iterate the domains and find an elegible idle cpu.
+ * Otherwise, check assigned siblings to find an elegible idle cpu.
*/
sd = rcu_dereference(per_cpu(sd_llc, target));
- for_each_lower_domain(sd) {
- sg = sd->groups;
- do {
- if (!cpumask_intersects(sched_group_cpus(sg),
- tsk_cpus_allowed(p)))
- goto next;
-
- for_each_cpu(i, sched_group_cpus(sg)) {
- if (!idle_cpu(i))
- goto next;
- }
- target = cpumask_first_and(sched_group_cpus(sg),
- tsk_cpus_allowed(p));
- goto done;
-next:
- sg = sg->next;
- } while (sg != sd->groups);
+ for_each_lower_domain(sd) {
+ if (!cpumask_test_cpu(sd->idle_buddy, tsk_cpus_allowed(p)))
+ continue;
+ if (idle_cpu(sd->idle_buddy))
+ return sd->idle_buddy;
}
-done:
+
return target;
}
next prev parent reply other threads:[~2012-07-24 14:18 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-24 11:04 [rfc][patch] select_idle_sibling() inducing bouncing on westmere Mike Galbraith
2012-05-24 13:17 ` Peter Zijlstra
2012-05-24 13:20 ` Peter Zijlstra
2012-05-25 6:14 ` Mike Galbraith
2012-05-26 6:37 ` Mike Galbraith
2012-05-26 7:29 ` Peter Zijlstra
2012-05-26 8:27 ` Mike Galbraith
2012-05-27 9:17 ` Mike Galbraith
2012-05-27 11:02 ` Mike Galbraith
2012-05-27 11:12 ` Mike Galbraith
2012-05-27 14:11 ` Arjan van de Ven
2012-05-27 14:29 ` Mike Galbraith
2012-05-27 14:32 ` Mike Galbraith
2012-05-29 18:58 ` Andreas Herrmann
2012-05-25 6:08 ` Mike Galbraith
2012-05-25 8:06 ` Mike Galbraith
2012-06-05 14:30 ` Mike Galbraith
2012-06-11 16:57 ` [patch v3] sched: fix select_idle_sibling() induced bouncing Mike Galbraith
2012-06-11 17:22 ` Peter Zijlstra
2012-06-11 17:55 ` Mike Galbraith
2012-06-11 18:53 ` Suresh Siddha
2012-06-12 3:18 ` Mike Galbraith
2012-06-20 10:48 ` [tip:sched/core] sched: Improve scalability via 'CPU buddies', which withstand random perturbations tip-bot for Mike Galbraith
2012-07-24 14:18 ` tip-bot for Mike Galbraith [this message]
2012-06-19 8:47 ` [patch v3] sched: fix select_idle_sibling() induced bouncing Paul Turner
2012-06-06 10:17 ` [rfc][patch] select_idle_sibling() inducing bouncing on westmere Mike Galbraith
2012-06-06 10:38 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-970e178985cadbca660feb02f4d2ee3a09f7fdda@git.kernel.org \
--to=efault@gmx.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox