From: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Ben Segall <bsegall@google.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Yuyang Du <yuyang.du@intel.com>
Subject: Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
Date: Mon, 6 Jun 2016 14:11:03 +0200 [thread overview]
Message-ID: <CAKfTPtC_wdkem0YR0f_p__feG6Onu97=Li8cq1kzh9DH0cvBrw@mail.gmail.com> (raw)
In-Reply-To: <1464809962-25814-3-git-send-email-dietmar.eggemann@arm.com>
Hi Dietmar,
On 1 June 2016 at 21:39, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> Since task utilization is accrued only on the root cfs_rq, there are a
> couple of places where the se has to be synced with the root cfs_rq:
>
> (1) The root cfs_rq has to be updated in attach_entity_load_avg() for
> an se representing a task in a tg other than the root tg before
> the se utilization can be added to it.
>
> (2) The last_update_time value of the root cfs_rq can be higher
> than the one of the cfs_rq the se is enqueued in. Call
> __update_load_avg() on the se with the last_update_time value of
> the root cfs_rq before removing se's utilization from the root
> cfs_rq in [remove|detach]_entity_load_avg().
>
> In case the difference between the last_update_time value of the cfs_rq
> and the root cfs_rq is smaller than 1024ns, the additional calls to
> __update_load_avg() will bail early.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> kernel/sched/fair.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 212becd3708f..3ae8e79fb687 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>
> static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + struct cfs_rq* root_cfs_rq;
> +
> if (!sched_feat(ATTACH_AGE_LOAD))
> goto skip_aging;
>
> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> if (!entity_is_task(se))
> return;
>
> - rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
> - rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
> + root_cfs_rq = &rq_of(cfs_rq)->cfs;
> +
> + if (parent_entity(se))
> + __update_load_avg(cfs_rq_clock_task(root_cfs_rq),
> + cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
> + scale_load_down(root_cfs_rq->load.weight),
> + upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
> +
> + root_cfs_rq->avg.util_avg += se->avg.util_avg;
> + root_cfs_rq->avg.util_sum += se->avg.util_sum;
The main issue with flat utilization is that we can't keep the
sched_avg on an sched_entity synced (from a last_update_time pov) with
both the cfs_rq on which load is attached and the root_cfs rq on which
the utilization is attached.
With this additional sync to root cfs_rq in
attach/detach_entity_load_avg and in remove_entity_load_avg, the load
of a sched_entity is no more synced to the time stamp of cfs_rq onto
which it is attached. This can generate several wrong update of the
load of the latter.
As an example, lets take a task TA that sleeps and move it on TGB
which has not run recently so TGB.avg.last_update_time << root
cfs_rq.avg.last_update_time (a decay of 20ms remove 35% of the load)
When we attach TA to TGB, TA is sync with TGB for attaching it and
then decayed to be synced with root cfs_rq.
If TA is then moved to another task group, we try to sync TA to TGB
but TA is in the future so TA.avg.last_update_time is set to TGB one.
Then, TA load is removed to TGB but TA load has been decayed so only a
part will be effectively subtracted. Then, TA load is synced with root
cfs_rq which means decayed one more time for the same time slot
because TA.avg.last_update_time has been reset to
TGB.avg.last_update_time so we will substract less utilization than
what we should in root cfs_rq.
I think that similar behavior can apply with the removed load.
>
> cfs_rq_util_change(cfs_rq);
> }
> @@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> if (!entity_is_task(se))
> return;
>
> + __update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> + &se->avg, se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
> +
> rq_of(cfs_rq)->cfs.avg.util_avg =
> max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
> rq_of(cfs_rq)->cfs.avg.util_sum =
> @@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
> if (!entity_is_task(se))
> return;
>
> + last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
> +
> + __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
> atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
> }
>
> --
> 1.9.1
>
next prev parent reply other threads:[~2016-06-06 12:11 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
2016-06-02 9:23 ` Juri Lelli
2016-06-02 15:53 ` Dietmar Eggemann
2016-06-02 16:11 ` Juri Lelli
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
2016-06-06 2:59 ` Leo Yan
2016-06-06 8:45 ` Dietmar Eggemann
2016-06-06 12:11 ` Vincent Guittot [this message]
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
2016-06-01 20:11 ` Peter Zijlstra
2016-06-02 15:59 ` Dietmar Eggemann
2016-06-02 9:25 ` Juri Lelli
2016-06-02 17:27 ` Dietmar Eggemann
2016-06-03 10:56 ` Juri Lelli
2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
2016-06-02 15:40 ` Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKfTPtC_wdkem0YR0f_p__feG6Onu97=Li8cq1kzh9DH0cvBrw@mail.gmail.com' \
--to=vincent.guittot@linaro.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).