From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Paul Turner <pjt@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@android.com>,
Joel Fernandes <joelaf@google.com>,
Steve Muckle <smuckle@google.com>
Subject: Re: [PATCH v3 1/3] sched/fair: add util_est on top of PELT
Date: Tue, 30 Jan 2018 12:46:33 +0000 [thread overview]
Message-ID: <20180130124632.GC5739@e110439-lin> (raw)
In-Reply-To: <20180129163642.GF2228@hirez.programming.kicks-ass.net>
On 29-Jan 17:36, Peter Zijlstra wrote:
> On Tue, Jan 23, 2018 at 06:08:45PM +0000, Patrick Bellasi wrote:
> > +static inline void util_est_dequeue(struct task_struct *p, int flags)
> > +{
> > + struct cfs_rq *cfs_rq = &task_rq(p)->cfs;
> > + unsigned long util_last = task_util(p);
> > + bool sleep = flags & DEQUEUE_SLEEP;
> > + unsigned long ewma;
> > + long util_est = 0;
> > +
> > + if (!sched_feat(UTIL_EST))
> > + return;
> > +
> > + /*
> > + * Update root cfs_rq's estimated utilization
> > + *
> > + * If *p is the last task then the root cfs_rq's estimated utilization
> > + * of a CPU is 0 by definition.
> > + */
> > + if (cfs_rq->nr_running) {
> > + util_est = READ_ONCE(cfs_rq->util_est_runnable);
> > + util_est -= min_t(long, util_est, task_util_est(p));
> > + }
> > + WRITE_ONCE(cfs_rq->util_est_runnable, util_est);
> > +
> > + /*
> > + * Skip update of task's estimated utilization when the task has not
> > + * yet completed an activation, e.g. being migrated.
> > + */
> > + if (!sleep)
> > + return;
> > +
>
> Since you only use sleep once, you might as well write it out there.
Right, will move the flag check right here.
> Also, does GCC lower the task_util() eval to here?
Good point, kind-of... on ARM64 it generates a register load just before
the above if condition. I guess it does that to speculatively trigger a
load from memory in case the above check should pass?
Anyway, looks more it can be also a micro-arch optimization.
Thus, even just for better readability of the following chunk, it's
better to move the util_last definition here.
> > + /*
> > + * Skip update of task's estimated utilization when its EWMA is already
> > + * ~1% close to its last activation value.
> > + */
> > + util_est = p->util_est.ewma;
> > + if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100))
> > + return;
>
> Aside from that being whitespace challenged, did you also try:
>
> if ((unsigned)((util_est - util_last) + LIM - 1) < (2 * LIM - 1))
No, since the above code IMO is so much "easy to parse for humans" :)
But, mainly because since the cache alignment update, also while testing on a
"big" Intel machine I cannot see regressions on hackbench.
This is the code I get on my Xeon E5-2690 v2:
if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100))
6ba0: 8b 86 7c 02 00 00 mov 0x27c(%rsi),%eax
6ba6: 48 29 c8 sub %rcx,%rax
6ba9: 48 99 cqto
6bab: 48 31 d0 xor %rdx,%rax
6bae: 48 29 d0 sub %rdx,%rax
6bb1: 48 83 f8 0a cmp $0xa,%rax
6bb5: 7e 1d jle 6bd4 <dequeue_task_fair+0x7e4>
Does it look so bad?
> Also, since we only care about the absolute value; we could use:
>
> util_last - ewma
>
> here (note the above also forgets to use READ_ONCE), and reuse the result:
>
> > +
> > + /*
> > + * Update Task's estimated utilization
> > + *
> > + * When *p completes an activation we can consolidate another sample
> > + * about the task size. This is done by storing the last PELT value
> > + * for this task and using this value to load another sample in the
> > + * exponential weighted moving average:
> > + *
> > + * ewma(t) = w * task_util(p) + (1 - w) ewma(t-1)
> > + * = w * task_util(p) + ewma(t-1) - w * ewma(t-1)
> > + * = w * (task_util(p) + ewma(t-1) / w - ewma(t-1))
> > + *
> > + * Where 'w' is the weight of new samples, which is configured to be
> > + * 0.25, thus making w=1/4
> > + */
> > + p->se.avg.util_est.last = util_last;
> > + ewma = READ_ONCE(p->se.avg.util_est.ewma);
> > + ewma = util_last + (ewma << UTIL_EST_WEIGHT_SHIFT) - ewma;
>
> here.
Right! +1
>
> > + ewma >>= UTIL_EST_WEIGHT_SHIFT;
> > + WRITE_ONCE(p->se.avg.util_est.ewma, ewma);
> > +}
>
> So something along these lines:
>
> ewma = READ_ONCE(p->se.avg.util_est.ewma);
> diff = util_last - ewma;
> if ((unsigned)(diff + LIM - 1) < (2 * LIM - 1))
> return;
>
> p->se.avg.util_est.last = util_last;
> ewma = (diff + (ewma << EWMA_SHIFT)) >> EWMA_SHIFT;
> WRITE_ONCE(p->se.avg.util_est.ewma, ewma);
>
> Make sense?
Looks ok to me, I will for sure update to reuse the difference.
Regarding the comparison, I'll try your formula to check again if there is any
noticeable difference on hackbench.
Thanks Patrick
--
#include <best/regards.h>
Patrick Bellasi
next prev parent reply other threads:[~2018-01-30 12:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-23 18:08 [PATCH v3 0/3] Utilization estimation (util_est) for FAIR tasks Patrick Bellasi
2018-01-23 18:08 ` [PATCH v3 1/3] sched/fair: add util_est on top of PELT Patrick Bellasi
2018-01-24 16:40 ` Joel Fernandes
2018-01-24 19:16 ` Patrick Bellasi
2018-01-24 22:06 ` Joel Fernandes
2018-01-29 16:36 ` Peter Zijlstra
2018-01-30 12:46 ` Patrick Bellasi [this message]
2018-01-30 13:04 ` Peter Zijlstra
2018-01-30 14:01 ` Peter Zijlstra
2018-02-05 17:49 ` Patrick Bellasi
2018-01-23 18:08 ` [PATCH v3 2/3] sched/fair: use util_est in LB and WU paths Patrick Bellasi
2018-01-24 11:33 ` Pavan Kondeti
2018-01-24 19:31 ` Patrick Bellasi
2018-01-25 14:33 ` Pavan Kondeti
2018-01-31 15:32 ` Patrick Bellasi
2018-01-23 18:08 ` [PATCH v3 3/3] sched/cpufreq_schedutil: use util_est for OPP selection Patrick Bellasi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180130124632.GC5739@e110439-lin \
--to=patrick.bellasi@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=joelaf@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rafael.j.wysocki@intel.com \
--cc=smuckle@google.com \
--cc=tkjos@android.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).