[patch 09/16] sched: normalize tg load contributions against runnable time

All of lore.kernel.org
 help / color / mirror / Atom feed

From: pjt@google.com
To: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Venki Pallipadi <venki@google.com>,
	Ben Segall <bsegall@google.com>, Mike Galbraith <efault@gmx.de>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Namhyung Kim <namhyung@kernel.org>
Subject: [patch 09/16] sched: normalize tg load contributions against runnable time
Date: Thu, 23 Aug 2012 07:14:31 -0700	[thread overview]
Message-ID: <20120823141506.930124292@google.com> (raw)
In-Reply-To: 20120823141422.444396696@google.com

[-- Attachment #1: sched-normalize_runnable_shares.patch --]
[-- Type: text/plain, Size: 5440 bytes --]

From: Paul Turner <pjt@google.com>

Entities of equal weight should receive equitable distribution of cpu time.
This is challenging in the case of a task_group's shares as execution may be
occurring on multiple cpus simultaneously.

To handle this we divide up the shares into weights proportionate with the load
on each cfs_rq.  This does not however, account for the fact that the sum of
the parts may be less than one cpu and so we need to normalize:
  load(tg) = min(runnable_avg(tg), 1) * tg->shares
Where runnable_avg is the aggregate time in which the task_group had runnable
children.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>.
---
 kernel/sched/debug.c |    4 ++++
 kernel/sched/fair.c  |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |    2 ++
 3 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2908923..71b0ea3 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -234,6 +234,10 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 			atomic64_read(&cfs_rq->tg->load_avg));
 	SEQ_printf(m, "  .%-30s: %lld\n", "tg_load_contrib",
 			cfs_rq->tg_load_contrib);
+	SEQ_printf(m, "  .%-30s: %d\n", "tg_runnable_contrib",
+			cfs_rq->tg_runnable_contrib);
+	SEQ_printf(m, "  .%-30s: %d\n", "tg->runnable_avg",
+			atomic_read(&cfs_rq->tg->runnable_avg));
 #endif
 
 	print_cfs_group_stats(m, cpu, cfs_rq->tg);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 92ef5f1..47a7998 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1112,19 +1112,73 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 	}
 }
 
+/*
+ * Aggregate cfs_rq runnable averages into an equivalent task_group
+ * representation for computing load contributions.
+ */
+static inline void __update_tg_runnable_avg(struct sched_avg *sa,
+						  struct cfs_rq *cfs_rq)
+{
+	struct task_group *tg = cfs_rq->tg;
+	long contrib;
+
+	/* The fraction of a cpu used by this cfs_rq */
+	contrib = div_u64(sa->runnable_avg_sum << NICE_0_SHIFT,
+			  sa->runnable_avg_period + 1);
+	contrib -= cfs_rq->tg_runnable_contrib;
+
+	if (abs(contrib) > cfs_rq->tg_runnable_contrib / 64) {
+		atomic_add(contrib, &tg->runnable_avg);
+		cfs_rq->tg_runnable_contrib += contrib;
+	}
+}
+
 static inline void __update_group_entity_contrib(struct sched_entity *se)
 {
 	struct cfs_rq *cfs_rq = group_cfs_rq(se);
 	struct task_group *tg = cfs_rq->tg;
+	int runnable_avg;
+
 	u64 contrib;
 
 	contrib = cfs_rq->tg_load_contrib * tg->shares;
 	se->avg.load_avg_contrib = div64_u64(contrib,
 					     atomic64_read(&tg->load_avg) + 1);
+
+	/*
+	 * For group entities we need to compute a correction term in the case
+	 * that they are consuming <1 cpu so that we would contribute the same
+	 * load as a task of equal weight.
+	 *
+	 * Explicitly co-ordinating this measurement would be expensive, but
+	 * fortunately the sum of each cpus contribution forms a usable
+	 * lower-bound on the true value.
+	 *
+	 * Consider the aggregate of 2 contributions.  Either they are disjoint
+	 * (and the sum represents true value) or they are disjoint and we are
+	 * understating by the aggregate of their overlap.
+	 *
+	 * Extending this to N cpus, for a given overlap, the maximum amount we
+	 * understand is then n_i(n_i+1)/2 * w_i where n_i is the number of
+	 * cpus that overlap for this interval and w_i is the interval width.
+	 *
+	 * On a small machine; the first term is well-bounded which bounds the
+	 * total error since w_i is a subset of the period.  Whereas on a
+	 * larger machine, while this first term can be larger, if w_i is the
+	 * of consequential size guaranteed to see n_i*w_i quickly converge to
+	 * our upper bound of 1-cpu.
+	 */
+	runnable_avg = atomic_read(&tg->runnable_avg);
+	if (runnable_avg < NICE_0_LOAD) {
+		se->avg.load_avg_contrib *= runnable_avg;
+		se->avg.load_avg_contrib >>= NICE_0_SHIFT;
+	}
 }
 #else
 static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
 						 int force_update) {}
+static inline void __update_tg_runnable_avg(struct sched_avg *sa,
+						  struct cfs_rq *cfs_rq) {}
 static inline void __update_group_entity_contrib(struct sched_entity *se) {}
 #endif
 
@@ -1146,6 +1200,7 @@ static long __update_entity_load_avg_contrib(struct sched_entity *se)
 	if (entity_is_task(se)) {
 		__update_task_entity_contrib(se);
 	} else {
+		__update_tg_runnable_avg(&se->avg, group_cfs_rq(se));
 		__update_group_entity_contrib(se);
 	}
 
@@ -1214,6 +1269,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update)
 static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
 {
 	__update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable);
+	__update_tg_runnable_avg(&rq->avg, &rq->cfs);
 }
 
 /* Add the load generated by se into cfs_rq's child load-average */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0c453e7..1474bf2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -113,6 +113,7 @@ struct task_group {
 
 	atomic_t load_weight;
 	atomic64_t load_avg;
+	atomic_t runnable_avg;
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
@@ -234,6 +235,7 @@ struct cfs_rq {
 	atomic64_t decay_counter, removed_load;
 	u64 last_decay;
 #ifdef CONFIG_FAIR_GROUP_SCHED
+	u32 tg_runnable_contrib;
 	u64 tg_load_contrib;
 #endif
 #endif

next prev parent reply	other threads:[~2012-08-23 14:15 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-23 14:14 [patch 00/16] sched: per-entity load-tracking pjt
2012-08-23 14:14 ` [patch 01/16] sched: track the runnable average on a per-task entitiy basis pjt
2012-08-24  8:20   ` Namhyung Kim
2012-08-28 22:12     ` Paul Turner
2012-10-24  9:43   ` [tip:sched/core] sched: Track the runnable average on a per-task entity basis tip-bot for Paul Turner
2012-10-25  3:28     ` li guang
2012-10-25 16:58       ` Benjamin Segall
2012-08-23 14:14 ` [patch 02/16] sched: maintain per-rq runnable averages pjt
2012-10-24  9:44   ` [tip:sched/core] sched: Maintain " tip-bot for Ben Segall
2012-10-28 10:12   ` [patch 02/16] sched: maintain " Preeti Murthy
2012-10-29 17:38     ` Benjamin Segall
2012-11-07  8:28       ` Preeti U Murthy
2012-08-23 14:14 ` [patch 03/16] sched: aggregate load contributed by task entities on parenting cfs_rq pjt
2012-10-24  9:45   ` [tip:sched/core] sched: Aggregate " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 04/16] sched: maintain the load contribution of blocked entities pjt
2012-10-24  9:46   ` [tip:sched/core] sched: Maintain " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 05/16] sched: add an rq migration call-back to sched_class pjt
2012-10-24  9:47   ` [tip:sched/core] sched: Add " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 06/16] sched: account for blocked load waking back up pjt
     [not found]   ` <CAM4v1pO8SPCmqJTTBHpqwrwuO7noPdskg0RSooxyPsWoE395_A@mail.gmail.com>
2012-09-04 17:29     ` Benjamin Segall
2012-10-24  9:48   ` [tip:sched/core] sched: Account " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 07/16] sched: aggregate total task_group load pjt
2012-10-24  9:49   ` [tip:sched/core] sched: Aggregate " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 08/16] sched: compute load contribution by a group entity pjt
2012-10-24  9:50   ` [tip:sched/core] sched: Compute " tip-bot for Paul Turner
2012-08-23 14:14 ` pjt [this message]
2012-10-24  9:51   ` [tip:sched/core] sched: Normalize tg load contributions against runnable time tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 10/16] sched: maintain runnable averages across throttled periods pjt
2012-10-24  9:52   ` [tip:sched/core] sched: Maintain " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 11/16] sched: replace update_shares weight distribution with per-entity computation pjt
2012-09-24 19:44   ` "Jan H. Schönherr"
2012-09-24 20:39     ` Benjamin Segall
2012-10-02 21:14       ` Paul Turner
2012-10-24  9:53   ` [tip:sched/core] sched: Replace " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 12/16] sched: refactor update_shares_cpu() -> update_blocked_avgs() pjt
2012-10-24  9:54   ` [tip:sched/core] sched: Refactor " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 13/16] sched: update_cfs_shares at period edge pjt
2012-09-24 19:51   ` "Jan H. Schönherr"
2012-10-02 21:09     ` Paul Turner
2012-10-24  9:55   ` [tip:sched/core] sched: Update_cfs_shares " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 14/16] sched: make __update_entity_runnable_avg() fast pjt
2012-08-24  8:28   ` Namhyung Kim
2012-08-28 22:18     ` Paul Turner
2012-10-24  9:56   ` [tip:sched/core] sched: Make " tip-bot for Paul Turner
2012-08-23 14:14 ` [patch 15/16] sched: implement usage tracking pjt
2012-10-19 12:18   ` Vincent Guittot
2012-08-23 14:14 ` [patch 16/16] sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking pjt
2012-10-24  9:57   ` [tip:sched/core] sched: Introduce " tip-bot for Paul Turner
2012-09-24  9:30 ` [patch 00/16] sched: per-entity load-tracking "Jan H. Schönherr"
2012-09-24 17:16   ` Benjamin Segall
2012-10-05  9:07     ` Paul Turner
2012-11-26 13:08 ` Jassi Brar
2012-12-20  7:39   ` Stephen Boyd
2012-12-20  8:08     ` Jassi Brar
  -- strict thread matches above, loose matches on Subject: below --
2012-06-28  2:24 [PATCH 00/16] Series short description Paul Turner
2012-06-28  2:24 ` [PATCH 09/16] sched: normalize tg load contributions against runnable time Paul Turner
2012-06-29  7:26   ` Namhyung Kim
2012-07-04 19:48   ` Peter Zijlstra
2012-07-06 11:52     ` Peter Zijlstra
2012-07-12  1:08       ` Andre Noll
2012-07-12  0:02     ` Paul Turner
2012-07-06 12:23   ` Peter Zijlstra

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2908923 dfblob:71b0ea3 dfblob:92ef5f1 dfblob:47a7998
dfblob:0c453e7 dfblob:1474bf2 )
 OR (
bs:"[patch 09/16] sched: normalize tg load contributions against runnable time" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120823141506.930124292@google.com \
    --to=pjt@google.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bsegall@google.com \
    --cc=efault@gmx.de \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=namhyung@kernel.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=venki@google.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.