* [PATCH 1/3] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_mc/package_power()
2009-02-16 16:51 [PATCH 0/3] sched: Extend sched_mc/smt_power_savings framework Gautham R Shenoy
@ 2009-02-16 16:51 ` Gautham R Shenoy
2009-02-16 17:43 ` Peter Zijlstra
2009-02-16 16:51 ` [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings Gautham R Shenoy
2009-02-16 16:51 ` [PATCH 3/3] sched: Fix sd_parent_degenerate for SD_POWERSAVINGS_BALANCE Gautham R Shenoy
2 siblings, 1 reply; 9+ messages in thread
From: Gautham R Shenoy @ 2009-02-16 16:51 UTC (permalink / raw)
To: linux-kernel, svaidy, mingo, a.p.zijlstra, suresh.b.siddha, ego
Cc: balbir, dipankar, efault, andi, Gautham R Shenoy
This is a code cleanup patch which combines the functions of
sd_balance_for_mc/package_power() and sd_power_saving_flags() into a single
function.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
include/linux/sched.h | 46 ++++++++++++++++------------------------------
include/linux/topology.h | 6 ++----
2 files changed, 18 insertions(+), 34 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 02e16d2..06c5c6c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -777,34 +777,30 @@ enum powersavings_balance_level {
extern int sched_mc_power_savings, sched_smt_power_savings;
-static inline int sd_balance_for_mc_power(void)
-{
- if (sched_smt_power_savings)
- return SD_POWERSAVINGS_BALANCE;
-
- return 0;
-}
-
-static inline int sd_balance_for_package_power(void)
-{
- if (sched_mc_power_savings | sched_smt_power_savings)
- return SD_POWERSAVINGS_BALANCE;
-
- return 0;
-}
+enum sched_domain_level {
+ SD_LV_NONE = 0,
+ SD_LV_SIBLING,
+ SD_LV_MC,
+ SD_LV_CPU,
+ SD_LV_NODE,
+ SD_LV_ALLNODES,
+ SD_LV_MAX
+};
/*
* Optimise SD flags for power savings:
* SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
* Keep default SD flags if sched_{smt,mc}_power_saving=0
*/
-
-static inline int sd_power_saving_flags(void)
+static inline int sd_power_saving_flags(enum sched_domain_level level)
{
- if (sched_mc_power_savings | sched_smt_power_savings)
- return SD_BALANCE_NEWIDLE;
+ if (level == SD_LV_MC && !sched_smt_power_savings)
+ return 0;
+ if (level == SD_LV_CPU &&
+ !(sched_mc_power_savings || sched_smt_power_savings))
+ return 0;
- return 0;
+ return SD_POWERSAVINGS_BALANCE | SD_BALANCE_NEWIDLE;
}
struct sched_group {
@@ -830,16 +826,6 @@ static inline struct cpumask *sched_group_cpus(struct sched_group *sg)
return to_cpumask(sg->cpumask);
}
-enum sched_domain_level {
- SD_LV_NONE = 0,
- SD_LV_SIBLING,
- SD_LV_MC,
- SD_LV_CPU,
- SD_LV_NODE,
- SD_LV_ALLNODES,
- SD_LV_MAX
-};
-
struct sched_domain_attr {
int relax_domain_level;
};
diff --git a/include/linux/topology.h b/include/linux/topology.h
index e632d29..8097dce 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -125,8 +125,7 @@ int arch_update_cpu_topology(void);
| SD_WAKE_AFFINE \
| SD_WAKE_BALANCE \
| SD_SHARE_PKG_RESOURCES\
- | sd_balance_for_mc_power()\
- | sd_power_saving_flags(),\
+ | sd_power_saving_flags(SD_LV_MC),\
.last_balance = jiffies, \
.balance_interval = 1, \
}
@@ -151,8 +150,7 @@ int arch_update_cpu_topology(void);
| SD_BALANCE_FORK \
| SD_WAKE_AFFINE \
| SD_WAKE_BALANCE \
- | sd_balance_for_package_power()\
- | sd_power_saving_flags(),\
+ | sd_power_saving_flags(SD_LV_CPU),\
.last_balance = jiffies, \
.balance_interval = 1, \
}
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_mc/package_power()
2009-02-16 16:51 ` [PATCH 1/3] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_mc/package_power() Gautham R Shenoy
@ 2009-02-16 17:43 ` Peter Zijlstra
2009-02-17 6:55 ` Gautham R Shenoy
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-02-16 17:43 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: linux-kernel, svaidy, mingo, suresh.b.siddha, balbir, dipankar,
efault, andi
On Mon, 2009-02-16 at 22:21 +0530, Gautham R Shenoy wrote:
> +enum sched_domain_level {
> + SD_LV_NONE = 0,
> + SD_LV_SIBLING,
> + SD_LV_MC,
> + SD_LV_CPU,
> + SD_LV_NODE,
> + SD_LV_ALLNODES,
> + SD_LV_MAX
> +};
Better names would be:
LV_THREAD
LV_CORE
LV_PACKAGE
LV_NODE
But I realize that renaming these is going to be painful, however could
we at least provide proper comments with these to describe them. I
always get confused.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 1/3] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_mc/package_power()
2009-02-16 17:43 ` Peter Zijlstra
@ 2009-02-17 6:55 ` Gautham R Shenoy
0 siblings, 0 replies; 9+ messages in thread
From: Gautham R Shenoy @ 2009-02-17 6:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, svaidy, mingo, suresh.b.siddha, balbir, dipankar,
efault, andi
On Mon, Feb 16, 2009 at 06:43:01PM +0100, Peter Zijlstra wrote:
> On Mon, 2009-02-16 at 22:21 +0530, Gautham R Shenoy wrote:
>
> > +enum sched_domain_level {
> > + SD_LV_NONE = 0,
> > + SD_LV_SIBLING,
> > + SD_LV_MC,
> > + SD_LV_CPU,
> > + SD_LV_NODE,
> > + SD_LV_ALLNODES,
> > + SD_LV_MAX
> > +};
>
>
> Better names would be:
>
> LV_THREAD
> LV_CORE
> LV_PACKAGE
> LV_NODE
>
> But I realize that renaming these is going to be painful, however could
> we at least provide proper comments with these to describe them. I
> always get confused.
Will do the comments for now.
While on the topic, it might also be a good thing to discuss/consider renaming.
I am for it, since these newer names are much more intuitive when compared to
the existing ones.
>
--
Thanks and Regards
gautham
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings.
2009-02-16 16:51 [PATCH 0/3] sched: Extend sched_mc/smt_power_savings framework Gautham R Shenoy
2009-02-16 16:51 ` [PATCH 1/3] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_mc/package_power() Gautham R Shenoy
@ 2009-02-16 16:51 ` Gautham R Shenoy
2009-02-16 17:44 ` Peter Zijlstra
2009-02-16 17:45 ` Peter Zijlstra
2009-02-16 16:51 ` [PATCH 3/3] sched: Fix sd_parent_degenerate for SD_POWERSAVINGS_BALANCE Gautham R Shenoy
2 siblings, 2 replies; 9+ messages in thread
From: Gautham R Shenoy @ 2009-02-16 16:51 UTC (permalink / raw)
To: linux-kernel, svaidy, mingo, a.p.zijlstra, suresh.b.siddha, ego
Cc: balbir, dipankar, efault, andi, Gautham R Shenoy
The existing algorithm to nominate a preferred wake up cpu would not
work on a machine which has both sched_mc_power_savings and
sched_smt_power_savings enabled. On such machines, the nomination at a lower
level would keep overwriting the nominations by it's peer-level as well as
higher level sched_domains. This would lead to the ping-ponging of the
nominated wake-up cpu, thereby preventing us from effectively consolidating
tasks.
Correct this by defining the authorized nomination sched_domain level, which
is either the highest sched_domain level containing the
SD_POWERSAVINGS_BALANCE flag or a lower level which contains the previously
nominated wake-up cpu in it's span.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
include/linux/sched.h | 1 +
kernel/sched.c | 43 ++++++++++++++++++++++++++++++++++++++++---
kernel/sched_fair.c | 2 +-
3 files changed, 42 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 06c5c6c..9827297 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -776,6 +776,7 @@ enum powersavings_balance_level {
};
extern int sched_mc_power_savings, sched_smt_power_savings;
+extern enum powersavings_balance_level active_power_savings_level;
enum sched_domain_level {
SD_LV_NONE = 0,
diff --git a/kernel/sched.c b/kernel/sched.c
index 52bbf1c..af88f5a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -520,6 +520,11 @@ struct root_domain {
* low system utilisation. Triggered at POWERSAVINGS_BALANCE_WAKEUP(2)
*/
unsigned int sched_mc_preferred_wakeup_cpu;
+ /*
+ * The sched-domain level which is authorized to nominate the preferred
+ * wake up cpu.
+ */
+ enum sched_domain_level authorized_nomination_level;
#endif
};
@@ -3397,9 +3402,17 @@ out_balanced:
goto ret;
if (this == group_leader && group_leader != group_min) {
+ struct root_domain *my_rd = cpu_rq(this_cpu)->rd;
*imbalance = min_load_per_task;
- if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP) {
- cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu =
+ /*
+ * The preferred wakeup cpu should be nominated by power-aware
+ * sched-domains which contain the currently nominated cpu.
+ */
+ if (sd->level == my_rd->authorized_nomination_level ||
+ (sd->level < my_rd->authorized_nomination_level &&
+ cpu_isset(my_rd->sched_mc_preferred_wakeup_cpu,
+ *sched_domain_span(sd)))) {
+ my_rd->sched_mc_preferred_wakeup_cpu =
cpumask_first(sched_group_cpus(group_leader));
}
return group_min;
@@ -3683,7 +3696,8 @@ redo:
!test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
return -1;
- if (sched_mc_power_savings < POWERSAVINGS_BALANCE_WAKEUP)
+ if (active_power_savings_level <
+ POWERSAVINGS_BALANCE_WAKEUP)
return -1;
if (sd->nr_balance_failed++ < 2)
@@ -7192,6 +7206,7 @@ static void sched_domain_node_span(int node, struct cpumask *span)
#endif /* CONFIG_NUMA */
int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
+enum powersavings_balance_level active_power_savings_level;
/*
* The cpus mask in sched_group and sched_domain hangs off the end.
@@ -7781,6 +7796,25 @@ static int __build_sched_domains(const struct cpumask *cpu_map,
err = 0;
+/* Assign the sched-domain level which can nominate preferred wake-up cpu */
+ rd->sched_mc_preferred_wakeup_cpu = UINT_MAX;
+ rd->authorized_nomination_level = SD_LV_NONE;
+
+ if (active_power_savings_level >= POWERSAVINGS_BALANCE_WAKEUP) {
+ struct sched_domain *sd;
+ enum sched_domain_level authorized_nomination_level =
+ SD_LV_NONE;
+
+ for_each_domain(first_cpu(*cpu_map), sd) {
+ if (!(sd->flags & SD_POWERSAVINGS_BALANCE))
+ continue;
+ authorized_nomination_level = sd->level;
+ }
+
+ rd->authorized_nomination_level = authorized_nomination_level;
+ }
+
+
free_tmpmask:
free_cpumask_var(tmpmask);
free_send_covered:
@@ -8027,6 +8061,9 @@ static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
else
sched_mc_power_savings = level;
+ active_power_savings_level = max(sched_smt_power_savings,
+ sched_mc_power_savings);
+
arch_reinit_sched_domains();
return count;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5cc1c16..bddee3e 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1042,7 +1042,7 @@ static int wake_idle(int cpu, struct task_struct *p)
chosen_wakeup_cpu =
cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu;
- if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP &&
+ if (active_power_savings_level >= POWERSAVINGS_BALANCE_WAKEUP &&
idle_cpu(cpu) && idle_cpu(this_cpu) &&
p->mm && !(p->flags & PF_KTHREAD) &&
cpu_isset(chosen_wakeup_cpu, p->cpus_allowed))
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings.
2009-02-16 16:51 ` [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings Gautham R Shenoy
@ 2009-02-16 17:44 ` Peter Zijlstra
2009-02-16 17:45 ` Peter Zijlstra
1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2009-02-16 17:44 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: linux-kernel, svaidy, mingo, suresh.b.siddha, balbir, dipankar,
efault, andi
On Mon, 2009-02-16 at 22:21 +0530, Gautham R Shenoy wrote:
> +/* Assign the sched-domain level which can nominate preferred wake-up cpu */
> + rd->sched_mc_preferred_wakeup_cpu = UINT_MAX;
> + rd->authorized_nomination_level = SD_LV_NONE;
/me wonders what's wrong with:
/*
* Assing ....
* ... cpu
*/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings.
2009-02-16 16:51 ` [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings Gautham R Shenoy
2009-02-16 17:44 ` Peter Zijlstra
@ 2009-02-16 17:45 ` Peter Zijlstra
2009-02-17 6:59 ` Gautham R Shenoy
1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-02-16 17:45 UTC (permalink / raw)
To: Gautham R Shenoy
Cc: linux-kernel, svaidy, mingo, suresh.b.siddha, balbir, dipankar,
efault, andi
On Mon, 2009-02-16 at 22:21 +0530, Gautham R Shenoy wrote:
> @@ -520,6 +520,11 @@ struct root_domain {
> * low system utilisation. Triggered at POWERSAVINGS_BALANCE_WAKEUP(2)
> */
> unsigned int sched_mc_preferred_wakeup_cpu;
> + /*
> + * The sched-domain level which is authorized to nominate the preferred
> + * wake up cpu.
> + */
> + enum sched_domain_level authorized_nomination_level
Some day my brain is just going to explode.. ;-)
Can we do a nice concise writeup of the power-save dynamics somewhere?
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings.
2009-02-16 17:45 ` Peter Zijlstra
@ 2009-02-17 6:59 ` Gautham R Shenoy
0 siblings, 0 replies; 9+ messages in thread
From: Gautham R Shenoy @ 2009-02-17 6:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, svaidy, mingo, suresh.b.siddha, balbir, dipankar,
efault, andi
On Mon, Feb 16, 2009 at 06:45:23PM +0100, Peter Zijlstra wrote:
> On Mon, 2009-02-16 at 22:21 +0530, Gautham R Shenoy wrote:
> > @@ -520,6 +520,11 @@ struct root_domain {
> > * low system utilisation. Triggered at POWERSAVINGS_BALANCE_WAKEUP(2)
> > */
> > unsigned int sched_mc_preferred_wakeup_cpu;
> > + /*
> > + * The sched-domain level which is authorized to nominate the preferred
> > + * wake up cpu.
> > + */
> > + enum sched_domain_level authorized_nomination_level
>
> Some day my brain is just going to explode.. ;-)
>
> Can we do a nice concise writeup of the power-save dynamics somewhere?
Sure, will work on that one.
--
Thanks and Regards
gautham
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] sched: Fix sd_parent_degenerate for SD_POWERSAVINGS_BALANCE.
2009-02-16 16:51 [PATCH 0/3] sched: Extend sched_mc/smt_power_savings framework Gautham R Shenoy
2009-02-16 16:51 ` [PATCH 1/3] sched: code cleanup - sd_power_saving_flags(), sd_balance_for_mc/package_power() Gautham R Shenoy
2009-02-16 16:51 ` [PATCH 2/3] sched: Fix the wakeup nomination for sched_mc/smt_power_savings Gautham R Shenoy
@ 2009-02-16 16:51 ` Gautham R Shenoy
2 siblings, 0 replies; 9+ messages in thread
From: Gautham R Shenoy @ 2009-02-16 16:51 UTC (permalink / raw)
To: linux-kernel, svaidy, mingo, a.p.zijlstra, suresh.b.siddha, ego
Cc: balbir, dipankar, efault, andi, Gautham R Shenoy
Currently a sched_domain having a single group can be prevented from getting
degenerated if it contains a SD_POWERSAVINGS_BALANCE flag. But since it has
only one group, it won't have any scope for performing powersavings balance as
it does not have a sibling group to pull from.
Apart from not provide any powersavings, it also fails to participate
in normal load-balancing.
Fix this by allowing such a sched_domain to degenerate and pass on the
responsibility of performing the POWERSAVINGS_BALANCE to it's parent domain.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
kernel/sched.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index af88f5a..32858fd 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6934,6 +6934,17 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
SD_SHARE_PKG_RESOURCES);
if (nr_node_ids == 1)
pflags &= ~SD_SERIALIZE;
+
+ /*
+ * If the only flag that is preventing us from degenerating
+ * a domain with a single group is power_savings balance,
+ * see if we can transfer that responsibility to the new parent
+ */
+ if (pflags & SD_POWERSAVINGS_BALANCE && parent->parent) {
+ pflags &= ~SD_POWERSAVINGS_BALANCE;
+ parent->parent->flags |=
+ sd_power_saving_flags(parent->level);
+ }
}
if (~cflags & pflags)
return 0;
^ permalink raw reply related [flat|nested] 9+ messages in thread