From: pjt@google.com
To: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@elte.hu>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@in.ibm.com>,
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
Venki Pallipadi <venki@google.com>,
Ben Segall <bsegall@google.com>, Mike Galbraith <efault@gmx.de>,
Vincent Guittot <vincent.guittot@linaro.org>,
Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
Morten Rasmussen <Morten.Rasmussen@arm.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Namhyung Kim <namhyung@kernel.org>
Subject: [patch 09/16] sched: normalize tg load contributions against runnable time
Date: Thu, 23 Aug 2012 07:14:31 -0700 [thread overview]
Message-ID: <20120823141506.930124292@google.com> (raw)
In-Reply-To: 20120823141422.444396696@google.com
[-- Attachment #1: sched-normalize_runnable_shares.patch --]
[-- Type: text/plain, Size: 5440 bytes --]
From: Paul Turner <pjt@google.com>
Entities of equal weight should receive equitable distribution of cpu time.
This is challenging in the case of a task_group's shares as execution may be
occurring on multiple cpus simultaneously.
To handle this we divide up the shares into weights proportionate with the load
on each cfs_rq. This does not however, account for the fact that the sum of
the parts may be less than one cpu and so we need to normalize:
load(tg) = min(runnable_avg(tg), 1) * tg->shares
Where runnable_avg is the aggregate time in which the task_group had runnable
children.
Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>.
---
kernel/sched/debug.c | 4 ++++
kernel/sched/fair.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 2 ++
3 files changed, 62 insertions(+), 0 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2908923..71b0ea3 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -234,6 +234,10 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
atomic64_read(&cfs_rq->tg->load_avg));
SEQ_printf(m, " .%-30s: %lld\n", "tg_load_contrib",
cfs_rq->tg_load_contrib);
+ SEQ_printf(m, " .%-30s: %d\n", "tg_runnable_contrib",
+ cfs_rq->tg_runnable_contrib);
+ SEQ_printf(m, " .%-30s: %d\n", "tg->runnable_avg",
+ atomic_read(&cfs_rq->tg->runnable_avg));
#endif
print_cfs_group_stats(m, cpu, cfs_rq->tg);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 92ef5f1..47a7998 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1112,19 +1112,73 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
}
}
+/*
+ * Aggregate cfs_rq runnable averages into an equivalent task_group
+ * representation for computing load contributions.
+ */
+static inline void __update_tg_runnable_avg(struct sched_avg *sa,
+ struct cfs_rq *cfs_rq)
+{
+ struct task_group *tg = cfs_rq->tg;
+ long contrib;
+
+ /* The fraction of a cpu used by this cfs_rq */
+ contrib = div_u64(sa->runnable_avg_sum << NICE_0_SHIFT,
+ sa->runnable_avg_period + 1);
+ contrib -= cfs_rq->tg_runnable_contrib;
+
+ if (abs(contrib) > cfs_rq->tg_runnable_contrib / 64) {
+ atomic_add(contrib, &tg->runnable_avg);
+ cfs_rq->tg_runnable_contrib += contrib;
+ }
+}
+
static inline void __update_group_entity_contrib(struct sched_entity *se)
{
struct cfs_rq *cfs_rq = group_cfs_rq(se);
struct task_group *tg = cfs_rq->tg;
+ int runnable_avg;
+
u64 contrib;
contrib = cfs_rq->tg_load_contrib * tg->shares;
se->avg.load_avg_contrib = div64_u64(contrib,
atomic64_read(&tg->load_avg) + 1);
+
+ /*
+ * For group entities we need to compute a correction term in the case
+ * that they are consuming <1 cpu so that we would contribute the same
+ * load as a task of equal weight.
+ *
+ * Explicitly co-ordinating this measurement would be expensive, but
+ * fortunately the sum of each cpus contribution forms a usable
+ * lower-bound on the true value.
+ *
+ * Consider the aggregate of 2 contributions. Either they are disjoint
+ * (and the sum represents true value) or they are disjoint and we are
+ * understating by the aggregate of their overlap.
+ *
+ * Extending this to N cpus, for a given overlap, the maximum amount we
+ * understand is then n_i(n_i+1)/2 * w_i where n_i is the number of
+ * cpus that overlap for this interval and w_i is the interval width.
+ *
+ * On a small machine; the first term is well-bounded which bounds the
+ * total error since w_i is a subset of the period. Whereas on a
+ * larger machine, while this first term can be larger, if w_i is the
+ * of consequential size guaranteed to see n_i*w_i quickly converge to
+ * our upper bound of 1-cpu.
+ */
+ runnable_avg = atomic_read(&tg->runnable_avg);
+ if (runnable_avg < NICE_0_LOAD) {
+ se->avg.load_avg_contrib *= runnable_avg;
+ se->avg.load_avg_contrib >>= NICE_0_SHIFT;
+ }
}
#else
static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
int force_update) {}
+static inline void __update_tg_runnable_avg(struct sched_avg *sa,
+ struct cfs_rq *cfs_rq) {}
static inline void __update_group_entity_contrib(struct sched_entity *se) {}
#endif
@@ -1146,6 +1200,7 @@ static long __update_entity_load_avg_contrib(struct sched_entity *se)
if (entity_is_task(se)) {
__update_task_entity_contrib(se);
} else {
+ __update_tg_runnable_avg(&se->avg, group_cfs_rq(se));
__update_group_entity_contrib(se);
}
@@ -1214,6 +1269,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update)
static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
{
__update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable);
+ __update_tg_runnable_avg(&rq->avg, &rq->cfs);
}
/* Add the load generated by se into cfs_rq's child load-average */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0c453e7..1474bf2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -113,6 +113,7 @@ struct task_group {
atomic_t load_weight;
atomic64_t load_avg;
+ atomic_t runnable_avg;
#endif
#ifdef CONFIG_RT_GROUP_SCHED
@@ -234,6 +235,7 @@ struct cfs_rq {
atomic64_t decay_counter, removed_load;
u64 last_decay;
#ifdef CONFIG_FAIR_GROUP_SCHED
+ u32 tg_runnable_contrib;
u64 tg_load_contrib;
#endif
#endif
next prev parent reply other threads:[~2012-08-23 14:15 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-23 14:14 [patch 00/16] sched: per-entity load-tracking pjt
2012-08-23 14:14 ` [patch 01/16] sched: track the runnable average on a per-task entitiy basis pjt
2012-08-24 8:20 ` Namhyung Kim
2012-08-28 22:12 ` Paul Turner
2012-10-24 9:43 ` [tip:sched/core] sched: Track the runnable average on a per-task entity basis tip-bot for Paul Turner
2012-10-25 3:28 ` li guang
2012-10-25 16:58 ` Benjamin Segall
2012-08-23 14:14 ` [patch 02/16] sched: maintain per-rq runnable averages pjt
2012-10-24 9:44 ` [tip:sched/core] sched: Maintain " tip-bot for Ben Segall
2012-10-28 10:12 ` [patch 02/16] sched: maintain " Preeti Murthy
2012-10-29 17:38 ` Benjamin Segall
2012-11-07 8:28 ` Preeti U Murthy
2012-08-23 14:14 ` [patch 03/16] sched: aggregate load contributed by task entities on parenting cfs_rq pjt
2012-10-24 9:45 ` [tip:sched/core] sched: Aggregate " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 04/16] sched: maintain the load contribution of blocked entities pjt
2012-10-24 9:46 ` [tip:sched/core] sched: Maintain " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 05/16] sched: add an rq migration call-back to sched_class pjt
2012-10-24 9:47 ` [tip:sched/core] sched: Add " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 06/16] sched: account for blocked load waking back up pjt
[not found] ` <CAM4v1pO8SPCmqJTTBHpqwrwuO7noPdskg0RSooxyPsWoE395_A@mail.gmail.com>
2012-09-04 17:29 ` Benjamin Segall
2012-10-24 9:48 ` [tip:sched/core] sched: Account " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 07/16] sched: aggregate total task_group load pjt
2012-10-24 9:49 ` [tip:sched/core] sched: Aggregate " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 08/16] sched: compute load contribution by a group entity pjt
2012-10-24 9:50 ` [tip:sched/core] sched: Compute " tip-bot for Paul Turner
2012-08-23 14:14 ` pjt [this message]
2012-10-24 9:51 ` [tip:sched/core] sched: Normalize tg load contributions against runnable time tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 10/16] sched: maintain runnable averages across throttled periods pjt
2012-10-24 9:52 ` [tip:sched/core] sched: Maintain " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 11/16] sched: replace update_shares weight distribution with per-entity computation pjt
2012-09-24 19:44 ` "Jan H. Schönherr"
2012-09-24 20:39 ` Benjamin Segall
2012-10-02 21:14 ` Paul Turner
2012-10-24 9:53 ` [tip:sched/core] sched: Replace " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 12/16] sched: refactor update_shares_cpu() -> update_blocked_avgs() pjt
2012-10-24 9:54 ` [tip:sched/core] sched: Refactor " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 13/16] sched: update_cfs_shares at period edge pjt
2012-09-24 19:51 ` "Jan H. Schönherr"
2012-10-02 21:09 ` Paul Turner
2012-10-24 9:55 ` [tip:sched/core] sched: Update_cfs_shares " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 14/16] sched: make __update_entity_runnable_avg() fast pjt
2012-08-24 8:28 ` Namhyung Kim
2012-08-28 22:18 ` Paul Turner
2012-10-24 9:56 ` [tip:sched/core] sched: Make " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 15/16] sched: implement usage tracking pjt
2012-10-19 12:18 ` Vincent Guittot
2012-08-23 14:14 ` [patch 16/16] sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking pjt
2012-10-24 9:57 ` [tip:sched/core] sched: Introduce " tip-bot for Paul Turner
2012-09-24 9:30 ` [patch 00/16] sched: per-entity load-tracking "Jan H. Schönherr"
2012-09-24 17:16 ` Benjamin Segall
2012-10-05 9:07 ` Paul Turner
2012-11-26 13:08 ` Jassi Brar
2012-12-20 7:39 ` Stephen Boyd
2012-12-20 8:08 ` Jassi Brar
-- strict thread matches above, loose matches on Subject: below --
2012-06-28 2:24 [PATCH 00/16] Series short description Paul Turner
2012-06-28 2:24 ` [PATCH 09/16] sched: normalize tg load contributions against runnable time Paul Turner
2012-06-29 7:26 ` Namhyung Kim
2012-07-04 19:48 ` Peter Zijlstra
2012-07-06 11:52 ` Peter Zijlstra
2012-07-12 1:08 ` Andre Noll
2012-07-12 0:02 ` Paul Turner
2012-07-06 12:23 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120823141506.930124292@google.com \
--to=pjt@google.com \
--cc=Morten.Rasmussen@arm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bsegall@google.com \
--cc=efault@gmx.de \
--cc=kamalesh@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=namhyung@kernel.org \
--cc=nikunj@linux.vnet.ibm.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=vatsa@in.ibm.com \
--cc=venki@google.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).