From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
peterz@infradead.org, mingo@kernel.org,
linux-kernel@vger.kernel.org
Cc: yuyang.du@intel.com, Morten.Rasmussen@arm.com, pjt@google.com,
bsegall@google.com, juri.lelli@arm.com
Subject: Re: [RFC PATCH] sched: reflect sched_entity movement into task_group's utilization
Date: Wed, 11 May 2016 15:45:36 +0100 [thread overview]
Message-ID: <57334590.8040405@arm.com> (raw)
In-Reply-To: <1462346220-18823-1-git-send-email-vincent.guittot@linaro.org>
Hi Vincent,
On 04/05/16 08:17, Vincent Guittot wrote:
> Ensure that changes of the utilization of a sched_entity will be
> reflected in the task_group hierarchy down to the root cfs.
>
> This patch tries another way than the flat utilization hierarchy proposal to
> ensure that the changes will be propagated down to the root cfs.
IMHO, the biggest advantage over the flat utilization hierarchy approach
is that you don't have to sync the sched_avg::last_update_time values
between se's and cfs_rq's.
> The way to compute the sched average metrics stays the same so the utilization
> only need to be synced with the local cfs rq timestamp.
>
> We keep an estimate of the utilization of each task group which is not used
> for now but might be usefull in the future even if i don't have idea so far.
A simple test case (a 50% task (run/period 8/16ms) switching between 2
cpus every 160ms (due to restricted cpu affinity to one of these cpus)
and running in tg_root/tg_1) shows the aimed behaviour at the root
cfs_rq (immediate utilization drop from ~550 to 0 on the src cpu and
immediate utilization rise from 0 to ~550 on the dst cpu in case of a
migration from src to dst cpu.
But in case I run the same task in tg_root/tg_1/tg_11 , then the
utilization of the root cfs_rq looks like the one of a system w/o the
patch. The problem seems to be that cfs_rq->diff_util_avg (owned by
tg_root/tg_1 on cpuX) is not 0 (instead it has ~ -470) in case the task
gets enqueued on cpuX. So the utilization signal of root cfs_rq ramps up
slowly and doesn't drop in case the task migrates to the other cpu.
[...]
> +/*
> + * Save how much utilization has just been added/removed on cfs rq so we can
> + * propagate it across the whole tg tree
> + */
> +static void set_tg_cfs_util(struct cfs_rq *cfs_rq, int delta)
Maybe s/cfs_cfs_rq ?
> +{
> + if (!cfs_rq->tg)
I guess here you want to bail if cfs_rq is the root cfs_rq. The current
condition will always be true if CONFIG_FAIR_GROUP_SCHED is set. For the
root cfs_rq cfs->tg is equal to &root_task_group.
Since you don't need diff_util_avg on the root cfs_rq, you could use
cfs_rq->tg == &root_task_group or &rq->cfs == cfs_rq or
!cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))]
It doesn't harm functionality though, it's just the fact that you update
cfs_rq->diff_util_avg for the root cfs_rq needlessly.
> + return;
> +
> + cfs_rq->diff_util_avg += delta;
> +}
> +
> +/*
> + * Look at pending utilization change in the cfs rq and reflect it in the
> + * sched_entity that represents the cfs rq at parent level
Not sure if I understand the 'parent level' here correctly. For me, this
se, the cfs_rq it 'owns' (se->my_q or group_cfs_rq(se)) and the tg
(cfs_rq->tg) are at the same level.
se->parent, se->cfs_rq (or cfs_rq_of(se)), cfs_rq->tg->parent would be
the parent level.
So for me update_tg_se_util() operates in one level and the update of
the parent level would happen in the caller functon, e.g.
attach_entity_load_avg().
> + */
> +static int update_tg_se_util(struct sched_entity *se)
> +{
> + struct cfs_rq *cfs_rq;
> + long update_util_avg;
> + long old_util_avg;
> +
> + if (entity_is_task(se))
> + return 0;
> +
> + cfs_rq = se->my_q;
Minor thing, Couldn't you use group_cfs_rq(se) here?
> +
> + update_util_avg = cfs_rq->diff_util_avg;
> +
> + if (!update_util_avg)
> + return 0;
> +
> + cfs_rq->diff_util_avg = 0;
> +
> + old_util_avg = se->avg.util_avg;
> + se->avg.util_avg = max_t(long, se->avg.util_avg + update_util_avg, 0);
> + se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
> +
> + return se->avg.util_avg - old_util_avg;
> +}
> +
> +
> +/* Take into account the change of the utilization of a child task group */
> +static void update_tg_cfs_util(struct sched_entity *se)
> +{
> + int delta;
> + struct cfs_rq *cfs_rq;
> +
> + if (!se)
> + return;
This condition is only necessary for the call from
update_blocked_averages() for a root cfs_rq, I guess?
[...]
> @@ -6260,6 +6348,8 @@ static void update_blocked_averages(int cpu)
>
> if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))
> update_tg_load_avg(cfs_rq, 0);
> + /* Propagate pending util changes to the parent */
> + update_tg_cfs_util(cfs_rq->tg->se[cpu_of(rq)]);
Couldn't cpu_of(rq)] not just be cpu?
[...]
next prev parent reply other threads:[~2016-05-11 14:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-04 7:17 [RFC PATCH] sched: reflect sched_entity movement into task_group's utilization Vincent Guittot
2016-05-11 14:45 ` Dietmar Eggemann [this message]
2016-05-11 16:52 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57334590.8040405@arm.com \
--to=dietmar.eggemann@arm.com \
--cc=Morten.Rasmussen@arm.com \
--cc=bsegall@google.com \
--cc=juri.lelli@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=vincent.guittot@linaro.org \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).