From: Gautham R Shenoy <ego@in.ibm.com>
To: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>,
"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
"Ingo Molnar" <mingo@elte.hu>
Cc: linux-kernel@vger.kernel.org,
"Suresh Siddha" <suresh.b.siddha@intel.com>,
"Balbir Singh" <balbir@in.ibm.com>,
Gautham R Shenoy <ego@in.ibm.com>
Subject: [PATCH 3 5/6] sched: Arbitrate the nomination of preferred_wakeup_cpu
Date: Wed, 18 Mar 2009 14:52:43 +0530 [thread overview]
Message-ID: <20090318092243.24787.92087.stgit@sofia.in.ibm.com> (raw)
In-Reply-To: <20090318092054.24787.18730.stgit@sofia.in.ibm.com>
Currently for sched_mc/smt_power_savings = 2, we consolidate tasks
by having a preferred_wakeup_cpu which will be used for all the
further wake ups.
This preferred_wakeup_cpu is currently nominated by find_busiest_group()
while loadbalancing for sched_domains which has SD_POWERSAVINGS_BALANCE flag
set.
However, on systems which are multi-threaded and multi-core, we can
have multiple sched_domains in the same hierarchy with
SD_POWERSAVINGS_BALANCE flag set.
Currently we don't have any arbitration mechanism as to while load balancing
for which sched_domain in the hierarchy should find_busiest_group(sd)
nominate the preferred_wakeup_cpu. Hence can overwrite valid nominations
made previously thereby causing the preferred_wakup_cpu to ping-pong
thereby preventing us from effectively consolidating tasks.
Fix this by means of an arbitration algorithm, where in we nominate the
preferred_wakeup_cpu sched_domain in find_busiest_group() for a particular
sched_domain if the sched_domain:
- is the topmost power aware sched_domain.
OR
- contains the previously nominated preferred wake up cpu in it's span.
This will help to further fine tune the wake-up biasing logic by
identifying a partially busy core within a CPU package instead of
potentially waking up a completely idle core.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
kernel/sched.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 16d7655..651550c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -522,6 +522,14 @@ struct root_domain {
* This is triggered at POWERSAVINGS_BALANCE_WAKEUP(2).
*/
unsigned int preferred_wakeup_cpu;
+
+ /*
+ * top_powersavings_sd_lvl records the level of the highest
+ * sched_domain that has the SD_POWERSAVINGS_BALANCE flag set.
+ *
+ * Used to arbitrate nomination of the preferred_wakeup_cpu.
+ */
+ enum sched_domain_level top_powersavings_sd_lvl;
#endif
};
@@ -3416,9 +3424,27 @@ out_balanced:
goto ret;
if (this == group_leader && group_leader != group_min) {
+ struct root_domain *my_rd = cpu_rq(this_cpu)->rd;
*imbalance = min_load_per_task;
- if (active_power_savings_level >= POWERSAVINGS_BALANCE_WAKEUP) {
- cpu_rq(this_cpu)->rd->preferred_wakeup_cpu =
+ /*
+ * To avoid overwriting of preferred_wakeup_cpu nominations
+ * while calling find_busiest_group() at various sched_domain
+ * levels, we define an arbitration mechanism wherein
+ * find_busiest_group() nominates a preferred_wakeup_cpu at
+ * the sched_domain sd if:
+ *
+ * - sd is the highest sched_domain in the hierarchy having the
+ * SD_POWERSAVINGS_BALANCE flag set.
+ *
+ * OR
+ *
+ * - sd contains the previously nominated preferred_wakeup_cpu
+ * in it's span.
+ */
+ if (sd->level == my_rd->top_powersavings_sd_lvl ||
+ cpu_isset(my_rd->preferred_wakeup_cpu,
+ *sched_domain_span(sd))) {
+ my_rd->preferred_wakeup_cpu =
cpumask_first(sched_group_cpus(group_leader));
}
return group_min;
@@ -7541,6 +7567,8 @@ static int __build_sched_domains(const struct cpumask *cpu_map,
struct root_domain *rd;
cpumask_var_t nodemask, this_sibling_map, this_core_map, send_covered,
tmpmask;
+ struct sched_domain *sd;
+
#ifdef CONFIG_NUMA
cpumask_var_t domainspan, covered, notcovered;
struct sched_group **sched_group_nodes = NULL;
@@ -7816,6 +7844,19 @@ static int __build_sched_domains(const struct cpumask *cpu_map,
err = 0;
+ rd->preferred_wakeup_cpu = UINT_MAX;
+ rd->top_powersavings_sd_lvl = SD_LV_NONE;
+
+ if (active_power_savings_level < POWERSAVINGS_BALANCE_WAKEUP)
+ goto free_tmpmask;
+
+ /* Record the level of the highest power-aware sched_domain */
+ for_each_domain(first_cpu(*cpu_map), sd) {
+ if (!(sd->flags & SD_POWERSAVINGS_BALANCE))
+ continue;
+ rd->top_powersavings_sd_lvl = sd->level;
+ }
+
free_tmpmask:
free_cpumask_var(tmpmask);
free_send_covered:
next prev parent reply other threads:[~2009-03-18 9:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-18 9:22 [PATCH v3 0/6] sched: Extend sched_mc/smt_framework Gautham R Shenoy
2009-03-18 9:22 ` [PATCH 3 1/6] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_*_power() Gautham R Shenoy
2009-03-19 16:22 ` Vaidyanathan Srinivasan
2009-03-18 9:22 ` [PATCH 3 2/6] sched: Record the current active power savings level Gautham R Shenoy
2009-03-19 16:32 ` Vaidyanathan Srinivasan
2009-03-20 4:34 ` Gautham R Shenoy
2009-03-18 9:22 ` [PATCH 3 3/6] sched: Add Comments at the beginning of find_busiest_group Gautham R Shenoy
2009-03-19 16:38 ` Vaidyanathan Srinivasan
2009-03-18 9:22 ` [PATCH 3 4/6] sched: Rename the variable sched_mc_preferred_wakeup_cpu Gautham R Shenoy
2009-03-19 16:41 ` Vaidyanathan Srinivasan
2009-03-18 9:22 ` Gautham R Shenoy [this message]
2009-03-19 17:23 ` [PATCH 3 5/6] sched: Arbitrate the nomination of preferred_wakeup_cpu Vaidyanathan Srinivasan
2009-03-18 9:22 ` [PATCH 3 6/6] sched: Fix sd_parent_degenerate for SD_POWERSAVINGS_BALANCE Gautham R Shenoy
2009-03-19 16:55 ` Vaidyanathan Srinivasan
2009-03-19 16:57 ` Vaidyanathan Srinivasan
2009-03-19 15:17 ` [PATCH v3 0/6] sched: Extend sched_mc/smt_framework Vaidyanathan Srinivasan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090318092243.24787.92087.stgit@sofia.in.ibm.com \
--to=ego@in.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=balbir@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suresh.b.siddha@intel.com \
--cc=svaidy@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.