linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Ben Segall <bsegall@google.com>, Yuyang Du <yuyang.du@intel.com>
Subject: Re: [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation
Date: Fri, 13 Oct 2017 17:22:54 +0200	[thread overview]
Message-ID: <20171013152254.GA7393@linaro.org> (raw)
In-Reply-To: <CAKfTPtBh7rLNvLLkhaa1qc6UFomUOgKVY6Rv-ZLVm7i4CENy7A@mail.gmail.com>

Hi Peter,

Le Tuesday 10 Oct 2017 à 09:44:53 (+0200), Vincent Guittot a écrit :
> On 10 October 2017 at 09:29, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Oct 09, 2017 at 05:29:04PM +0200, Vincent Guittot wrote:
> >> On 9 October 2017 at 17:03, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >> > On 1 September 2017 at 15:21, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> >> +/*
> >> >> + * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> >> >> + * propagate its contribution. The key to this propagation is the invariant
> >> >> + * that for each group:
> >> >> + *
> >> >> + *   ge->avg == grq->avg                                               (1)
> >> >> + *
> >> >> + * _IFF_ we look at the pure running and runnable sums. Because they
> >> >> + * represent the very same entity, just at different points in the hierarchy.
> >> >
> >> > I agree for the running part because only one entity can be running
> >> > but i'm not sure for the pure runnable sum because we can have
> >> > several runnable task in a cfs_rq but only one runnable group entity
> >> > to reflect them or I misunderstand (1)
> >
> > The idea is that they (ge and grq) are the _same_ entity, just at
> > different levels in the hierarchy. If the grq is runnable, it is through
> > the ge.
> >
> > As a whole, they don't care how many runnable tasks there are.
> 
> Ok so I agree with this point that both grp and ge runnable follow the
> same trend
> 
> my point is that even if same changes apply on both grp and ge, we
> can't directly apply changes of grp's runnable on ge's runnable
> 
> >
> >> > As an example, we have 2 always running task TA and TB so their
> >> > load_sum is LOAD_AVG_MAX for each task The grq->avg.load_sum = \Sum
> >> > se->avg.load_sum = 2*LOAD_AVG_MAX But the ge->avg.load_sum will be
> >> > only LOAD_AVG_MAX
> >> >
> >> > So If we apply directly the d(TB->avg.load_sum) on the group hierachy
> >> > and on ge->avg.load_sum in particular, the latter decreases to 0
> >> > whereas it should decrease only by half
> >> >
> >> > I have been able to see this wrong behavior with a rt-app json file
> >> >
> >> > so I think that we should instead remove only
> >> >
> >> > delta = se->avg.load_sum / grq->avg.load_sum * ge->avg.load_sum
> >>
> >> delta = se->avg.load_sum / (grq->avg.load_sum+se->avg.load_sum) *
> >> ge->avg.load_sum
> >>
> >> as the se has already been detached
> >>
> >> > We don't have grq->avg.load_sum but we can have a rough estimate with
> >> > grq->avg.load_avg/grq->weight
> >
> > Hurm, I think I see what you're saying, let me ponder this more.
> 
> The formula above works was an example for detaching but it doesn't
> apply for all use case. we need a more generic way to propagate
> runnable changes

I have studied a bit more how to improve the propagation formula and the
changes below is doing the job for the UCs that I have tested.

Unlike running, we can't directly propagate the runnable through hierarchy
when we migrate a task. Instead we must ensure that we will not
over/underestimate the impact of the migration thanks to several rules:
-ge->avg.runnable_sum can't be higher than LOAD_AVG_MAX
-ge->avg.runnable_sum can't be lower than ge->avg.running_sum (once scaled to
the same range)
-we can't directly propagate a negative delta of runnable_sum because part of
this runnable time can be "shared" with others sched_entities and stays on the
gcfs_rq. Instead, we can't estimate the new runnable_sum of the gcfs_rq with
the formula: gcfs_rq's runnable sum = gcfs_rq's load_sum / gcfs_rq's weight.
-ge->avg.runnable_sum can't increase when we detach a task.

---
 kernel/sched/fair.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 45 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 350dbec0..a063b048 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3489,33 +3489,67 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 static inline void
 update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long runnable_sum = gcfs_rq->prop_runnable_sum;
-	long runnable_load_avg, load_avg;
-	s64 runnable_load_sum, load_sum;
+	long running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+	long runnable_load_avg, delta_avg, load_avg;
+	s64 runnable_load_sum, delta_sum, load_sum = 0;
 
 	if (!runnable_sum)
 		return;
 
 	gcfs_rq->prop_runnable_sum = 0;
 
+	/*
+	 * Get a rough estimate of gcfs_rq's runnable
+	 * This is a low guess as it assumes that tasks are equally
+	 * runnable which is not true but we can't really do better
+	 */
+	if (scale_load_down(gcfs_rq->load.weight))
+		load_sum = div_s64(gcfs_rq->avg.load_sum,
+				scale_load_down(gcfs_rq->load.weight));
+
+	/*
+	 * Propating a delta of runnable is not just adding it to ge's
+	 * runnable_sum:
+	 * - Adding a delta runnable can't make ge's runnable_sum higher than
+	 *   LOAD_AVG_MAX
+	 * - We can't directly remove a delta of runnable from
+	 *   ge's runnable_sum but we can only guest estimate what runnable
+	 *   will become thanks to few simple rules:
+	 *   - gcfs_rq's runnable is a good estimate
+	 *   - ge's runnable_sum can't increase when we remove runnable
+	 *   - runnable_sum can't be lower than running_sum
+	 */
+	if (runnable_sum >= 0) {
+		runnable_sum += se->avg.load_sum;
+		runnable_sum = min(runnable_sum, LOAD_AVG_MAX);
+	} else
+		runnable_sum = min(se->avg.load_sum, load_sum);
+
+	running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
+	runnable_sum = max(runnable_sum, running_sum);
+
 	load_sum = (s64)se_weight(se) * runnable_sum;
 	load_avg = div_s64(load_sum, LOAD_AVG_MAX);
 
-	add_positive(&se->avg.load_sum, runnable_sum);
-	add_positive(&se->avg.load_avg, load_avg);
+	delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
+	delta_avg = load_avg - se->avg.load_avg;
 
-	add_positive(&cfs_rq->avg.load_avg, load_avg);
-	add_positive(&cfs_rq->avg.load_sum, load_sum);
+	se->avg.load_sum = runnable_sum;
+	se->avg.load_avg = load_avg;
+	add_positive(&cfs_rq->avg.load_avg, delta_avg);
+	add_positive(&cfs_rq->avg.load_sum, delta_sum);
 
 	runnable_load_sum = (s64)se_runnable(se) * runnable_sum;
 	runnable_load_avg = div_s64(runnable_load_sum, LOAD_AVG_MAX);
+	delta_sum = runnable_load_sum - se_weight(se) * se->avg.runnable_load_sum;
+	delta_avg = runnable_load_avg - se->avg.runnable_load_avg;
 
-	add_positive(&se->avg.runnable_load_sum, runnable_sum);
-	add_positive(&se->avg.runnable_load_avg, runnable_load_avg);
+	se->avg.runnable_load_sum = runnable_sum;
+	se->avg.runnable_load_avg = runnable_load_avg;
 
 	if (se->on_rq) {
-		add_positive(&cfs_rq->avg.runnable_load_avg, runnable_load_avg);
-		add_positive(&cfs_rq->avg.runnable_load_sum, runnable_load_sum);
+		add_positive(&cfs_rq->avg.runnable_load_avg, delta_avg);
+		add_positive(&cfs_rq->avg.runnable_load_sum, delta_sum);
 	}
 }
 
-- 
2.7.4

  reply	other threads:[~2017-10-13 15:23 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 13:20 [PATCH -v2 00/18] sched/fair: A bit of a cgroup/PELT overhaul Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 01/18] sched/fair: Clean up calc_cfs_shares() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 02/18] sched/fair: Add comment to calc_cfs_shares() Peter Zijlstra
2017-09-28 10:03   ` Morten Rasmussen
2017-09-29 11:35     ` Peter Zijlstra
2017-09-29 13:03       ` Morten Rasmussen
2017-09-01 13:21 ` [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity() Peter Zijlstra
2017-09-29  9:04   ` Morten Rasmussen
2017-09-29 11:38     ` Peter Zijlstra
2017-09-29 13:00       ` Morten Rasmussen
2017-09-01 13:21 ` [PATCH -v2 04/18] sched/fair: Remove se->load.weight from se->avg.load_sum Peter Zijlstra
2017-09-29 15:26   ` Morten Rasmussen
2017-09-29 16:39     ` Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 05/18] sched/fair: Change update_load_avg() arguments Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 06/18] sched/fair: Move enqueue migrate handling Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 07/18] sched/fair: Rename {en,de}queue_entity_load_avg() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 08/18] sched/fair: Introduce {en,de}queue_load_avg() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 09/18] sched/fair: More accurate reweight_entity() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 10/18] sched/fair: Use reweight_entity() for set_user_nice() Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 11/18] sched/fair: Rewrite cfs_rq->removed_*avg Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 12/18] sched/fair: Rewrite PELT migration propagation Peter Zijlstra
2017-10-09  8:08   ` Morten Rasmussen
2017-10-09  9:45     ` Peter Zijlstra
2017-10-18 12:45       ` Morten Rasmussen
2017-10-30 13:35         ` Peter Zijlstra
2017-10-09 15:03   ` Vincent Guittot
2017-10-09 15:29     ` Vincent Guittot
2017-10-10  7:29       ` Peter Zijlstra
2017-10-10  7:44         ` Vincent Guittot
2017-10-13 15:22           ` Vincent Guittot [this message]
2017-10-13 20:41             ` Peter Zijlstra
2017-10-15 12:01               ` Vincent Guittot
2017-10-16 13:55               ` Vincent Guittot
2017-10-19 15:04                 ` Vincent Guittot
2017-10-30 17:20                   ` Peter Zijlstra
2017-10-31 11:14                     ` Vincent Guittot
2017-10-31 15:01                       ` Peter Zijlstra
2017-10-31 16:38                         ` Vincent Guittot
2017-11-16 14:09                         ` [PATCH v3] sched: Update runnable propagation rule Vincent Guittot
2017-11-16 14:21                           ` [PATCH v4] " Vincent Guittot
2017-12-06 11:40                             ` Peter Zijlstra
2017-12-06 17:10                               ` Ingo Molnar
2017-12-06 20:29                             ` [tip:sched/core] sched/fair: Update and fix the " tip-bot for Vincent Guittot
2017-09-01 13:21 ` [PATCH -v2 13/18] sched/fair: Propagate an effective runnable_load_avg Peter Zijlstra
2017-10-02 17:46   ` Dietmar Eggemann
2017-10-03  8:50     ` Peter Zijlstra
2017-10-03  9:29       ` Dietmar Eggemann
2017-10-03 12:26   ` Dietmar Eggemann
2017-09-01 13:21 ` [PATCH -v2 14/18] sched/fair: Synchonous PELT detach on load-balance migrate Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 15/18] sched/fair: Align PELT windows between cfs_rq and its se Peter Zijlstra
2017-10-04 19:27   ` Dietmar Eggemann
2017-10-06 13:02     ` Peter Zijlstra
2017-10-09 12:15       ` Dietmar Eggemann
2017-10-09 12:19         ` Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 16/18] sched/fair: More accurate async detach Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 17/18] sched/fair: Calculate runnable_weight slightly differently Peter Zijlstra
2017-09-01 13:21 ` [PATCH -v2 18/18] sched/fair: Update calc_group_*() comments Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171013152254.GA7393@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=bsegall@google.com \
    --cc=clm@fb.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).