* [PATCH v2 1/6] sched/fair: Generalize the load/util averages resolution definition
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
@ 2015-10-21 23:24 ` Yuyang Du
2015-10-21 23:24 ` [PATCH v2 2/6] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE Yuyang Du
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-10-21 23:24 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti, Yuyang Du
Integer metric needs fixed point arithmetic. In sched/fair, a few
metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity,
may have different fixed point ranges, which makes their update and
usage error-prone.
In order to avoid the errors relating to the fixed point range, we
definie a basic fixed point range, and then formalize all metrics to
base on the basic range.
The basic range is 1024 or (1 << 10). Further, one can recursively
apply the basic range to have larger range.
Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has
1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they
must be well calibrated.
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
include/linux/sched.h | 16 +++++++++++++---
kernel/sched/fair.c | 4 ----
kernel/sched/sched.h | 15 ++++++++++-----
3 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index bd38b3e..ab59390 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -910,9 +910,19 @@ enum cpu_idle_type {
};
/*
+ * Integer metrics need fixed point arithmetic, e.g., sched/fair
+ * has a few: load, load_avg, util_avg, freq, and capacity.
+ *
+ * We define a basic fixed point arithmetic range, and then formalize
+ * all these metrics based on that basic range.
+ */
+# define SCHED_FIXEDPOINT_SHIFT 10
+# define SCHED_FIXEDPOINT_SCALE (1L << SCHED_FIXEDPOINT_SHIFT)
+
+/*
* Increase resolution of cpu_capacity calculations
*/
-#define SCHED_CAPACITY_SHIFT 10
+#define SCHED_CAPACITY_SHIFT SCHED_FIXEDPOINT_SHIFT
#define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
/*
@@ -1180,8 +1190,8 @@ struct load_weight {
* 1) load_avg factors frequency scaling into the amount of time that a
* sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
* aggregated such weights of all runnable and blocked sched_entities.
- * 2) util_avg factors frequency and cpu scaling into the amount of time
- * that a sched_entity is running on a CPU, in the range [0..SCHED_LOAD_SCALE].
+ * 2) util_avg factors frequency and cpu capacity scaling into the amount of time
+ * that a sched_entity is running on a CPU, in the range [0..SCHED_CAPACITY_SCALE].
* For cfs_rq, it is the aggregated such times of all runnable and
* blocked sched_entities.
* The 64 bit load_sum can:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..c61fd8e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2522,10 +2522,6 @@ static u32 __compute_runnable_contrib(u64 n)
return contrib + runnable_avg_yN_sum[n];
}
-#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
-#error "load tracking assumes 2^10 as unit"
-#endif
-
#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3845a71..61a5858 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -53,18 +53,23 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
* increased costs.
*/
#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
-# define SCHED_LOAD_RESOLUTION 10
-# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
-# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
+# define SCHED_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
+# define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
+# define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT)
#else
-# define SCHED_LOAD_RESOLUTION 0
+# define SCHED_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
# define scale_load(w) (w)
# define scale_load_down(w) (w)
#endif
-#define SCHED_LOAD_SHIFT (10 + SCHED_LOAD_RESOLUTION)
#define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
+/*
+ * NICE_0's weight (visible to user) and its load (invisible to user) have
+ * independent ranges, but they should be well calibrated. We use scale_load()
+ * and scale_load_down(w) to convert between them, the following must be true:
+ * scale_load(prio_to_weight[20]) == NICE_0_LOAD
+ */
#define NICE_0_LOAD SCHED_LOAD_SCALE
#define NICE_0_SHIFT SCHED_LOAD_SHIFT
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 2/6] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
2015-10-21 23:24 ` [PATCH v2 1/6] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
@ 2015-10-21 23:24 ` Yuyang Du
2015-10-21 23:24 ` [PATCH v2 3/6] sched/fair: Add introduction to the sched load avg metrics Yuyang Du
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-10-21 23:24 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti, Yuyang Du
After cleaning up the sched metrics, these two definitions that cause
ambiguity are not needed any more. Use NICE_0_LOAD_SHIFT and NICE_0_LOAD
instead (the names suggest clearly who they are).
Suggested-by: Ben Segall <bsegall@google.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
kernel/sched/fair.c | 4 ++--
kernel/sched/sched.h | 22 +++++++++++-----------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c61fd8e..19d34a5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,7 +682,7 @@ void init_entity_runnable_average(struct sched_entity *se)
sa->period_contrib = 1023;
sa->load_avg = scale_load_down(se->load.weight);
sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
- sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
+ sa->util_avg = SCHED_CAPACITY_SCALE;
sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
}
@@ -6651,7 +6651,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
if (busiest->group_type == group_overloaded &&
local->group_type == group_overloaded) {
load_above_capacity = busiest->sum_nr_running *
- SCHED_LOAD_SCALE;
+ scale_load_down(NICE_0_LOAD);
if (load_above_capacity > busiest->group_capacity)
load_above_capacity -= busiest->group_capacity;
else
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 61a5858..ded3a6d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -53,25 +53,25 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
* increased costs.
*/
#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
-# define SCHED_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
+# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
# define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
# define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT)
#else
-# define SCHED_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
+# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
# define scale_load(w) (w)
# define scale_load_down(w) (w)
#endif
-#define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
-
/*
- * NICE_0's weight (visible to user) and its load (invisible to user) have
- * independent ranges, but they should be well calibrated. We use scale_load()
- * and scale_load_down(w) to convert between them, the following must be true:
- * scale_load(prio_to_weight[20]) == NICE_0_LOAD
+ * Task weight (visible to user) and its load (invisible to user) have
+ * independent resolution, but they should be well calibrated. We use
+ * scale_load() and scale_load_down(w) to convert between them. The
+ * following must be true:
+ *
+ * scale_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ *
*/
-#define NICE_0_LOAD SCHED_LOAD_SCALE
-#define NICE_0_SHIFT SCHED_LOAD_SHIFT
+#define NICE_0_LOAD (1L << NICE_0_LOAD_SHIFT)
/*
* Single value that decides SCHED_DEADLINE internal math precision.
@@ -850,7 +850,7 @@ DECLARE_PER_CPU(struct sched_domain *, sd_asym);
struct sched_group_capacity {
atomic_t ref;
/*
- * CPU capacity of this group, SCHED_LOAD_SCALE being max capacity
+ * CPU capacity of this group, SCHED_CAPACITY_SCALE being max capacity
* for a single CPU.
*/
unsigned int capacity;
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 3/6] sched/fair: Add introduction to the sched load avg metrics
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
2015-10-21 23:24 ` [PATCH v2 1/6] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
2015-10-21 23:24 ` [PATCH v2 2/6] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE Yuyang Du
@ 2015-10-21 23:24 ` Yuyang Du
2015-10-21 23:24 ` [PATCH v2 4/6] sched/fair: Remove scale_load_down() for load_avg Yuyang Du
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-10-21 23:24 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti, Yuyang Du
These sched metrics have become complex enough. We introduce them
at their definition.
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
include/linux/sched.h | 60 +++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 49 insertions(+), 11 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ab59390..ab34792 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1186,18 +1186,56 @@ struct load_weight {
};
/*
- * The load_avg/util_avg accumulates an infinite geometric series.
- * 1) load_avg factors frequency scaling into the amount of time that a
- * sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
- * aggregated such weights of all runnable and blocked sched_entities.
- * 2) util_avg factors frequency and cpu capacity scaling into the amount of time
- * that a sched_entity is running on a CPU, in the range [0..SCHED_CAPACITY_SCALE].
- * For cfs_rq, it is the aggregated such times of all runnable and
+ * The load_avg/util_avg accumulates an infinite geometric series
+ * (see __update_load_avg() in kernel/sched/fair.c).
+ *
+ * [load_avg definition]
+ *
+ * load_avg = runnable% * scale_load_down(load)
+ *
+ * where runnable% is the time ratio that a sched_entity is runnable.
+ * For cfs_rq, it is the aggregated such load_avg of all runnable and
* blocked sched_entities.
- * The 64 bit load_sum can:
- * 1) for cfs_rq, afford 4353082796 (=2^64/47742/88761) entities with
- * the highest weight (=88761) always runnable, we should not overflow
- * 2) for entity, support any load.weight always runnable
+ *
+ * load_avg may also take frequency scaling into account:
+ *
+ * load_avg = runnable% * scale_load_down(load) * freq%
+ *
+ * where freq% is the CPU frequency normalize to the highest frequency
+ *
+ * [util_avg definition]
+ *
+ * util_avg = running% * SCHED_CAPACITY_SCALE
+ *
+ * where running% is the time ratio that a sched_entity is running on
+ * a CPU. For cfs_rq, it is the aggregated such util_avg of all runnable
+ * and blocked sched_entities.
+ *
+ * util_avg may also factor frequency scaling and CPU capacity scaling:
+ *
+ * util_avg = running% * SCHED_CAPACITY_SCALE * freq% * capacity%
+ *
+ * where freq% is the same as above, and capacity% is the CPU capacity
+ * normalized to the greatest capacity (due to uarch differences, etc).
+ *
+ * N.B., the above ratios (runnable%, running%, freq%, and capacity%)
+ * themselves are in the range of [0, 1]. To do fixed point arithmetic,
+ * we therefore scale them to as large range as necessary. This is for
+ * example reflected by util_avg's SCHED_CAPACITY_SCALE.
+ *
+ * [Overflow issue]
+ *
+ * The 64bit load_sum can have 4353082796 (=2^64/47742/88761) entities
+ * with the highest load (=88761) always runnable on a single cfs_rq, we
+ * should not overflow as the number already hits PID_MAX_LIMIT.
+ *
+ * For all other cases (including 32bit kernel), struct load_weight's
+ * weight will overflow first before we do, because:
+ *
+ * Max(load_avg) <= Max(load.weight)
+ *
+ * Then, it is the load_weight's responsibility to consider overflow
+ * issues.
*/
struct sched_avg {
u64 last_update_time, load_sum;
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 4/6] sched/fair: Remove scale_load_down() for load_avg
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
` (2 preceding siblings ...)
2015-10-21 23:24 ` [PATCH v2 3/6] sched/fair: Add introduction to the sched load avg metrics Yuyang Du
@ 2015-10-21 23:24 ` Yuyang Du
2015-10-21 23:24 ` [PATCH v2 5/6] sched/fair: Rename scale_load() and scale_load_down() Yuyang Du
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-10-21 23:24 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti, Yuyang Du
Currently, load_avg = scale_load_down(load) * runnable%. This does
not make much sense, because load_avg is primarily the load that
takes runnable time ratio into account.
We therefore remove scale_load_down() for load_avg. But we need to
carefully consider the overflow risk if load has higher range
(2*SCHED_FIXEDPOINT_SHIFT). The only case an overflow may occur due
to us is on 64bit kernel with increased load range. In that case,
the 64bit load_sum can afford 4251057 (=2^64/47742/88761/1024)
entities with the highest load (=88761*1024) always runnable on one
single cfs_rq, which may be an issue, but should be fine. Even this
occurs at the end of day, with the condition in which it occurs the
load average will not be useful anyway.
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
[update calculate_imbalance]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
include/linux/sched.h | 19 ++++++++++++++-----
kernel/sched/fair.c | 19 +++++++++----------
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ab34792..aa432e8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1191,7 +1191,7 @@ struct load_weight {
*
* [load_avg definition]
*
- * load_avg = runnable% * scale_load_down(load)
+ * load_avg = runnable% * load
*
* where runnable% is the time ratio that a sched_entity is runnable.
* For cfs_rq, it is the aggregated such load_avg of all runnable and
@@ -1199,7 +1199,7 @@ struct load_weight {
*
* load_avg may also take frequency scaling into account:
*
- * load_avg = runnable% * scale_load_down(load) * freq%
+ * load_avg = runnable% * load * freq%
*
* where freq% is the CPU frequency normalize to the highest frequency
*
@@ -1225,9 +1225,18 @@ struct load_weight {
*
* [Overflow issue]
*
- * The 64bit load_sum can have 4353082796 (=2^64/47742/88761) entities
- * with the highest load (=88761) always runnable on a single cfs_rq, we
- * should not overflow as the number already hits PID_MAX_LIMIT.
+ * On 64bit kernel:
+ *
+ * When load has small fixed point range (SCHED_FIXEDPOINT_SHIFT), the
+ * 64bit load_sum can have 4353082796 (=2^64/47742/88761) tasks with
+ * the highest load (=88761) always runnable on a cfs_rq, we should
+ * not overflow as the number already hits PID_MAX_LIMIT.
+ *
+ * When load has large fixed point range (2*SCHED_FIXEDPOINT_SHIFT),
+ * the 64bit load_sum can have 4251057 (=2^64/47742/88761/1024) tasks
+ * with the highest load (=88761*1024) always runnable on ONE cfs_rq,
+ * we should be fine. Even if the overflow occurs at the end of day,
+ * at the time the load_avg won't be useful anyway in that situation.
*
* For all other cases (including 32bit kernel), struct load_weight's
* weight will overflow first before we do, because:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 19d34a5..76b9ee9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,7 +680,7 @@ void init_entity_runnable_average(struct sched_entity *se)
* will definitely be update (after enqueue).
*/
sa->period_contrib = 1023;
- sa->load_avg = scale_load_down(se->load.weight);
+ sa->load_avg = se->load.weight;
sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
sa->util_avg = SCHED_CAPACITY_SCALE;
sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
@@ -2697,7 +2697,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
}
decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
- scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq);
+ cfs_rq->load.weight, cfs_rq->curr != NULL, cfs_rq);
#ifndef CONFIG_64BIT
smp_wmb();
@@ -2718,8 +2718,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
* Track task load average for carrying it to new CPU after migrated, and
* track group sched_entity load average for task_h_load calc in migration
*/
- __update_load_avg(now, cpu, &se->avg,
- se->on_rq * scale_load_down(se->load.weight),
+ __update_load_avg(now, cpu, &se->avg, se->on_rq * se->load.weight,
cfs_rq->curr == se, NULL);
if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
@@ -2756,7 +2755,7 @@ skip_aging:
static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
- &se->avg, se->on_rq * scale_load_down(se->load.weight),
+ &se->avg, se->on_rq * se->load.weight,
cfs_rq->curr == se, NULL);
cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
@@ -2776,7 +2775,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
migrated = !sa->last_update_time;
if (!migrated) {
__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
- se->on_rq * scale_load_down(se->load.weight),
+ se->on_rq * se->load.weight,
cfs_rq->curr == se, NULL);
}
@@ -6650,10 +6649,10 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
*/
if (busiest->group_type == group_overloaded &&
local->group_type == group_overloaded) {
- load_above_capacity = busiest->sum_nr_running *
- scale_load_down(NICE_0_LOAD);
- if (load_above_capacity > busiest->group_capacity)
- load_above_capacity -= busiest->group_capacity;
+ load_above_capacity = busiest->sum_nr_running * NICE_0_LOAD;
+ if (load_above_capacity > scale_load(busiest->group_capacity))
+ load_above_capacity -=
+ scale_load(busiest->group_capacity);
else
load_above_capacity = ~0UL;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 5/6] sched/fair: Rename scale_load() and scale_load_down()
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
` (3 preceding siblings ...)
2015-10-21 23:24 ` [PATCH v2 4/6] sched/fair: Remove scale_load_down() for load_avg Yuyang Du
@ 2015-10-21 23:24 ` Yuyang Du
2015-10-21 23:24 ` [PATCH v2 6/6] sched/fair: Remove unconditionally inactive code Yuyang Du
2015-11-23 0:25 ` [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-10-21 23:24 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti, Yuyang Du
Rename scale_load() and scale_load_down() to user_to_kernel_load()
and kernel_to_user_load() respectively, to allow the names to bear
what they are really about.
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
[update calculate_imbalance]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/core.c | 8 ++++----
kernel/sched/fair.c | 14 ++++++++------
kernel/sched/sched.h | 16 ++++++++--------
3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ffe7b7e..1359871 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -818,12 +818,12 @@ static void set_load_weight(struct task_struct *p)
* SCHED_IDLE tasks get minimal weight:
*/
if (idle_policy(p->policy)) {
- load->weight = scale_load(WEIGHT_IDLEPRIO);
+ load->weight = user_to_kernel_load(WEIGHT_IDLEPRIO);
load->inv_weight = WMULT_IDLEPRIO;
return;
}
- load->weight = scale_load(prio_to_weight[prio]);
+ load->weight = user_to_kernel_load(prio_to_weight[prio]);
load->inv_weight = prio_to_wmult[prio];
}
@@ -8199,7 +8199,7 @@ static void cpu_cgroup_exit(struct cgroup_subsys_state *css,
static int cpu_shares_write_u64(struct cgroup_subsys_state *css,
struct cftype *cftype, u64 shareval)
{
- return sched_group_set_shares(css_tg(css), scale_load(shareval));
+ return sched_group_set_shares(css_tg(css), user_to_kernel_load(shareval));
}
static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
@@ -8207,7 +8207,7 @@ static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
{
struct task_group *tg = css_tg(css);
- return (u64) scale_load_down(tg->shares);
+ return (u64) kernel_to_user_load(tg->shares);
}
#ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 76b9ee9..f29cc9c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -189,7 +189,7 @@ static void __update_inv_weight(struct load_weight *lw)
if (likely(lw->inv_weight))
return;
- w = scale_load_down(lw->weight);
+ w = kernel_to_user_load(lw->weight);
if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
lw->inv_weight = 1;
@@ -213,7 +213,7 @@ static void __update_inv_weight(struct load_weight *lw)
*/
static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct load_weight *lw)
{
- u64 fact = scale_load_down(weight);
+ u64 fact = kernel_to_user_load(weight);
int shift = WMULT_SHIFT;
__update_inv_weight(lw);
@@ -6649,10 +6649,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
*/
if (busiest->group_type == group_overloaded &&
local->group_type == group_overloaded) {
+ unsigned long min_cpu_load =
+ kernel_to_user_load(NICE_0_LOAD) * busiest->group_capacity;
load_above_capacity = busiest->sum_nr_running * NICE_0_LOAD;
- if (load_above_capacity > scale_load(busiest->group_capacity))
- load_above_capacity -=
- scale_load(busiest->group_capacity);
+ if (load_above_capacity > min_cpu_load)
+ load_above_capacity -= min_cpu_load;
else
load_above_capacity = ~0UL;
}
@@ -8205,7 +8206,8 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
if (!tg->se[0])
return -EINVAL;
- shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));
+ shares = clamp(shares, user_to_kernel_load(MIN_SHARES),
+ user_to_kernel_load(MAX_SHARES));
mutex_lock(&shares_mutex);
if (tg->shares == shares)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ded3a6d..cd8ee78 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -54,22 +54,22 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
*/
#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
-# define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
-# define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT)
+# define user_to_kernel_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
+# define kernel_to_user_load(w) ((w) >> SCHED_FIXEDPOINT_SHIFT)
#else
# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
-# define scale_load(w) (w)
-# define scale_load_down(w) (w)
+# define user_to_kernel_load(w) (w)
+# define kernel_to_user_load(w) (w)
#endif
/*
* Task weight (visible to user) and its load (invisible to user) have
* independent resolution, but they should be well calibrated. We use
- * scale_load() and scale_load_down(w) to convert between them. The
- * following must be true:
- *
- * scale_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * user_to_kernel_load() and kernel_to_user_load(w) to convert between
+ * them. The following must be true:
*
+ * user_to_kernel_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * kernel_to_user_load(NICE_0_LOAD) == prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
*/
#define NICE_0_LOAD (1L << NICE_0_LOAD_SHIFT)
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 6/6] sched/fair: Remove unconditionally inactive code
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
` (4 preceding siblings ...)
2015-10-21 23:24 ` [PATCH v2 5/6] sched/fair: Rename scale_load() and scale_load_down() Yuyang Du
@ 2015-10-21 23:24 ` Yuyang Du
2015-11-23 0:25 ` [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-10-21 23:24 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti, Yuyang Du
The increased load resolution (fixed point arithmetic range) is
unconditionally deactivated with #if 0.
As the increased load range is still used somewhere (e.g., in Google),
we want to keep this feature. We define CONFIG_CFS_INCREASE_LOAD_RANGE
and it depends on FAIR_GROUP_SCHED and 64BIT and BROKEN.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
init/Kconfig | 16 +++++++++++++++
kernel/sched/sched.h | 55 +++++++++++++++++++++-------------------------------
2 files changed, 38 insertions(+), 33 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index c24b6f7..4e1d075 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1103,6 +1103,22 @@ config CFS_BANDWIDTH
restriction.
See tip/Documentation/scheduler/sched-bwc.txt for more information.
+config CFS_INCREASE_LOAD_RANGE
+ bool "Increase kernel load range"
+ depends on 64BIT && BROKEN
+ default n
+ help
+ Increase resolution of nice-level calculations for 64-bit architectures.
+ The extra resolution improves shares distribution and load balancing of
+ low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
+ hierarchies, especially on larger systems. This is not a user-visible change
+ and does not change the user-interface for setting shares/weights.
+ We increase resolution only if we have enough bits to allow this increased
+ resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
+ when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
+ increased costs.
+ Currently broken: it increases power usage under light load.
+
config RT_GROUP_SCHED
bool "Group scheduling for SCHED_RR/FIFO"
depends on CGROUP_SCHED
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cd8ee78..7351eb9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -41,39 +41,6 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
#define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
/*
- * Increase resolution of nice-level calculations for 64-bit architectures.
- * The extra resolution improves shares distribution and load balancing of
- * low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
- * hierarchies, especially on larger systems. This is not a user-visible change
- * and does not change the user-interface for setting shares/weights.
- *
- * We increase resolution only if we have enough bits to allow this increased
- * resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
- * when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
- * increased costs.
- */
-#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
-# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
-# define user_to_kernel_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
-# define kernel_to_user_load(w) ((w) >> SCHED_FIXEDPOINT_SHIFT)
-#else
-# define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
-# define user_to_kernel_load(w) (w)
-# define kernel_to_user_load(w) (w)
-#endif
-
-/*
- * Task weight (visible to user) and its load (invisible to user) have
- * independent resolution, but they should be well calibrated. We use
- * user_to_kernel_load() and kernel_to_user_load(w) to convert between
- * them. The following must be true:
- *
- * user_to_kernel_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
- * kernel_to_user_load(NICE_0_LOAD) == prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
- */
-#define NICE_0_LOAD (1L << NICE_0_LOAD_SHIFT)
-
-/*
* Single value that decides SCHED_DEADLINE internal math precision.
* 10 -> just above 1us
* 9 -> just above 0.5us
@@ -1160,6 +1127,28 @@ static const u32 prio_to_wmult[40] = {
/* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
};
+/*
+ * Task weight (visible to user) and its load (invisible to user) have
+ * independent ranges, but they should be well calibrated. We use
+ * user_to_kernel_load() and kernel_to_user_load(w) to convert between
+ * them.
+ *
+ * The following must also be true:
+ * user_to_kernel_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * kernel_to_user_load(NICE_0_LOAD) == prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
+ */
+#ifdef CONFIG_CFS_INCREASE_LOAD_RANGE
+#define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
+#define user_to_kernel_load(w) (w << SCHED_FIXEDPOINT_SHIFT)
+#define kernel_to_user_load(w) (w >> SCHED_FIXEDPOINT_SHIFT)
+#else
+#define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
+#define user_to_kernel_load(w) (w)
+#define kernel_to_user_load(w) (w)
+#endif
+
+#define NICE_0_LOAD (1UL << NICE_0_LOAD_SHIFT)
+
#define ENQUEUE_WAKEUP 1
#define ENQUEUE_HEAD 2
#ifdef CONFIG_SMP
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v2 0/6] sched/fair: Clean up sched metric definitions
2015-10-21 23:24 [PATCH v2 0/6] sched/fair: Clean up sched metric definitions Yuyang Du
` (5 preceding siblings ...)
2015-10-21 23:24 ` [PATCH v2 6/6] sched/fair: Remove unconditionally inactive code Yuyang Du
@ 2015-11-23 0:25 ` Yuyang Du
6 siblings, 0 replies; 8+ messages in thread
From: Yuyang Du @ 2015-11-23 0:25 UTC (permalink / raw)
To: mingo, peterz, linux-kernel
Cc: bsegall, pjt, morten.rasmussen, vincent.guittot, dietmar.eggemann,
lizefan, umgwanakikbuti
Hi Peter and Ingo,
A reminder of this patch searies, in case you forget.
Thanks,
Yuyang
On Thu, Oct 22, 2015 at 07:24:42AM +0800, Yuyang Du wrote:
> Hi Peter and Ingo,
>
> As discussed recently, the sched metrics need a little bit cleanup. This
> series of patches attempt to do that: refactor, rename, remove...
>
> Thanks a lot to Ben, Morten, Dietmar, Vincent, and others who provided
> valuable comments.
>
> v2 changes:
> - Rename SCHED_RESOLUTION_SHIFT to SCHED_FIXEDPOINT_SHIFT, thanks to Peter
> - Fix bugs in calculate_imbalance(), thanks to Vincent
> - Fix "#if 0" for increased kernel load
>
> Thanks,
> Yuyang
>
> Yuyang Du (6):
> sched/fair: Generalize the load/util averages resolution definition
> sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE
> sched/fair: Add introduction to the sched load avg metrics
> sched/fair: Remove scale_load_down() for load_avg
> sched/fair: Rename scale_load() and scale_load_down()
> sched/fair: Remove unconditionally inactive code
>
> include/linux/sched.h | 81 +++++++++++++++++++++++++++++++++++++++++++--------
> init/Kconfig | 16 ++++++++++
> kernel/sched/core.c | 8 ++---
> kernel/sched/fair.c | 33 ++++++++++-----------
> kernel/sched/sched.h | 52 +++++++++++++++------------------
> 5 files changed, 127 insertions(+), 63 deletions(-)
>
> --
> 2.1.4
^ permalink raw reply [flat|nested] 8+ messages in thread