* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
[not found] ` <20180222170153.673-2-patrick.bellasi@arm.com>
@ 2018-03-01 17:42 ` Patrick Bellasi
2018-03-06 18:56 ` Peter Zijlstra
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-01 17:42 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
Vincent Guittot, Paul Turner, Dietmar Eggemann, Morten Rasmussen,
Juri Lelli, Todd Kjos, Joel Fernandes, Steve Muckle
This is missing the below #ifdef guards, adding here has a note for
the next resping on list.
On 22-Feb 17:01, Patrick Bellasi wrote:
[...]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e1febd252a84..c8526687f107 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5205,6 +5205,23 @@ static inline void hrtick_update(struct rq *rq)
> }
> #endif
>
#ifdef CONFIG_SMP
> +static inline unsigned long task_util(struct task_struct *p);
> +static inline unsigned long _task_util_est(struct task_struct *p);
> +
> +static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
> + struct task_struct *p)
> +{
> + unsigned int enqueued;
> +
> + if (!sched_feat(UTIL_EST))
> + return;
> +
> + /* Update root cfs_rq's estimated utilization */
> + enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> + enqueued += _task_util_est(p);
> + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> +}
> +
#else
static inline void util_est_enqueue(struct cfs_rq *cfs_rq
struct task_struct *p)
{
}
#endif /* CONFIG_SMP */
> /*
> * The enqueue_task method is called before nr_running is
> * increased. Here we update the fair scheduling stats and
> @@ -5257,9 +5274,86 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> if (!se)
> add_nr_running(rq, 1);
>
> + util_est_enqueue(&rq->cfs, p);
> hrtick_update(rq);
> }
>
> +/*
> + * Check if a (signed) value is within a specified (unsigned) margin,
> + * based on the observation that:
> + * abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
> + *
> + * NOTE: this only works when value + maring < INT_MAX.
> + */
> +static inline bool within_margin(int value, int margin)
> +{
> + return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
> +}
> +
> +static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
> + struct task_struct *p,
> + bool task_sleep)
> +{
#ifdef CONFIG_SMP
> + long last_ewma_diff;
> + struct util_est ue;
> +
> + if (!sched_feat(UTIL_EST))
> + return;
> +
> + /*
> + * Update root cfs_rq's estimated utilization
> + *
> + * If *p is the last task then the root cfs_rq's estimated utilization
> + * of a CPU is 0 by definition.
> + */
> + ue.enqueued = 0;
> + if (cfs_rq->nr_running) {
> + ue.enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> + ue.enqueued -= min_t(unsigned int, ue.enqueued,
> + _task_util_est(p));
> + }
> + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
> +
> + /*
> + * Skip update of task's estimated utilization when the task has not
> + * yet completed an activation, e.g. being migrated.
> + */
> + if (!task_sleep)
> + return;
> +
> + /*
> + * Skip update of task's estimated utilization when its EWMA is
> + * already ~1% close to its last activation value.
> + */
> + ue = READ_ONCE(p->se.avg.util_est);
> + ue.enqueued = task_util(p);
> + last_ewma_diff = ue.enqueued - ue.ewma;
> + if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
> + return;
> +
> + /*
> + * Update Task's estimated utilization
> + *
> + * When *p completes an activation we can consolidate another sample
> + * of the task size. This is done by storing the current PELT value
> + * as ue.enqueued and by using this value to update the Exponential
> + * Weighted Moving Average (EWMA):
> + *
> + * ewma(t) = w * task_util(p) + (1-w) * ewma(t-1)
> + * = w * task_util(p) + ewma(t-1) - w * ewma(t-1)
> + * = w * (task_util(p) - ewma(t-1)) + ewma(t-1)
> + * = w * ( last_ewma_diff ) + ewma(t-1)
> + * = w * (last_ewma_diff + ewma(t-1) / w)
> + *
> + * Where 'w' is the weight of new samples, which is configured to be
> + * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT)
> + */
> + ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
> + ue.ewma += last_ewma_diff;
> + ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
> + WRITE_ONCE(p->se.avg.util_est, ue);
#endif /* CONFIG_SMP */
> +}
> +
> static void set_next_buddy(struct sched_entity *se);
>
> /*
> @@ -5316,6 +5410,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> if (!se)
> sub_nr_running(rq, 1);
>
> + util_est_dequeue(&rq->cfs, p, task_sleep);
> hrtick_update(rq);
> }
>
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
[not found] ` <20180222170153.673-2-patrick.bellasi@arm.com>
2018-03-01 17:42 ` [PATCH v5 1/4] sched/fair: add util_est on top of PELT Patrick Bellasi
@ 2018-03-06 18:56 ` Peter Zijlstra
2018-03-07 12:32 ` Patrick Bellasi
2018-03-06 18:58 ` Peter Zijlstra
2018-03-06 19:02 ` Peter Zijlstra
3 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-06 18:56 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> +/**
> + * Estimation Utilization for FAIR tasks.
> + *
> + * Support data structure to track an Exponential Weighted Moving Average
> + * (EWMA) of a FAIR task's utilization. New samples are added to the moving
> + * average each time a task completes an activation. Sample's weight is
> + * chosen so that the EWMA will be relatively insensitive to transient changes
> + * to the task's workload.
> + *
> + * @enqueued: instantaneous estimated utilization of a task/cpu
> + * task: the task's util_avg at last task dequeue time
> + * cfs_rq: the sum of util_est.enqueued for each RUNNABLE task on that CPU
> + *
> + * Thus, the util_est.enqueued of a task represents the contribution on the
> + * estimated utilization of the CPU where that task is currently enqueued.
> + *
> + * @ewma: the Exponential Weighted Moving Average (EWMA) utilization of a task
> + * Only for tasks we track a moving average of the past instantaneous
> + * estimated utilization. This allows to absorb sporadic drops in
> + * utilization of an otherwise almost periodic task.
> + *
> + */
The above comment appears to have whitespace issues, the paragraph
starting with "Thus" looks indented by one character for exmaple.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-06 18:56 ` Peter Zijlstra
@ 2018-03-07 12:32 ` Patrick Bellasi
0 siblings, 0 replies; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-07 12:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On 06-Mar 19:56, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > +/**
> > + * Estimation Utilization for FAIR tasks.
> > + *
> > + * Support data structure to track an Exponential Weighted Moving Average
> > + * (EWMA) of a FAIR task's utilization. New samples are added to the moving
> > + * average each time a task completes an activation. Sample's weight is
> > + * chosen so that the EWMA will be relatively insensitive to transient changes
> > + * to the task's workload.
> > + *
> > + * @enqueued: instantaneous estimated utilization of a task/cpu
> > + * task: the task's util_avg at last task dequeue time
> > + * cfs_rq: the sum of util_est.enqueued for each RUNNABLE task on that CPU
> > + *
> > + * Thus, the util_est.enqueued of a task represents the contribution on the
> > + * estimated utilization of the CPU where that task is currently enqueued.
> > + *
> > + * @ewma: the Exponential Weighted Moving Average (EWMA) utilization of a task
> > + * Only for tasks we track a moving average of the past instantaneous
> > + * estimated utilization. This allows to absorb sporadic drops in
> > + * utilization of an otherwise almost periodic task.
> > + *
> > + */
>
> The above comment appears to have whitespace issues, the paragraph
> starting with "Thus" looks indented by one character for exmaple.
That was actually intentional... I wanted to keep it aligned after the
"@" to better mark paragraphs describing the struct members.
However, I've just notice the overall format is not sphinx valid.
Thus, I'll update it to ensure also that the documentation is properly
generated.
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
[not found] ` <20180222170153.673-2-patrick.bellasi@arm.com>
2018-03-01 17:42 ` [PATCH v5 1/4] sched/fair: add util_est on top of PELT Patrick Bellasi
2018-03-06 18:56 ` Peter Zijlstra
@ 2018-03-06 18:58 ` Peter Zijlstra
2018-03-07 9:39 ` Peter Zijlstra
2018-03-07 11:31 ` Patrick Bellasi
2018-03-06 19:02 ` Peter Zijlstra
3 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-06 18:58 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> +static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
> + struct task_struct *p)
> +{
> + unsigned int enqueued;
> +
> + if (!sched_feat(UTIL_EST))
> + return;
> +
> + /* Update root cfs_rq's estimated utilization */
> + enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> + enqueued += _task_util_est(p);
> + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> +}
> +static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
> + struct task_struct *p,
> + bool task_sleep)
> +{
> + long last_ewma_diff;
> + struct util_est ue;
> +
> + if (!sched_feat(UTIL_EST))
> + return;
> +
> + /*
> + * Update root cfs_rq's estimated utilization
> + *
> + * If *p is the last task then the root cfs_rq's estimated utilization
> + * of a CPU is 0 by definition.
> + */
> + ue.enqueued = 0;
> + if (cfs_rq->nr_running) {
> + ue.enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> + ue.enqueued -= min_t(unsigned int, ue.enqueued,
> + _task_util_est(p));
> + }
> + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
It appears to me this isn't a stable situation and completely relies on
the !nr_running case to recalibrate. If we ensure that doesn't happen
for a significant while the sum can run-away, right?
Should we put a max in enqueue to avoid this?
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-06 18:58 ` Peter Zijlstra
@ 2018-03-07 9:39 ` Peter Zijlstra
2018-03-07 15:37 ` Patrick Bellasi
2018-03-07 11:31 ` Patrick Bellasi
1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-07 9:39 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Tue, Mar 06, 2018 at 07:58:51PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > +static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
> > + struct task_struct *p)
> > +{
> > + unsigned int enqueued;
> > +
> > + if (!sched_feat(UTIL_EST))
> > + return;
> > +
> > + /* Update root cfs_rq's estimated utilization */
> > + enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> > + enqueued += _task_util_est(p);
> > + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> > +}
> It appears to me this isn't a stable situation and completely relies on
> the !nr_running case to recalibrate. If we ensure that doesn't happen
> for a significant while the sum can run-away, right?
>
> Should we put a max in enqueue to avoid this?
Thinking about this a bit more; would it make sense to adjust the
running sum/avg on migration? Something along the lines of:
util_avg = se->load_avg / (cfs_rq->load_avg + se->load_avg);
(which disregards cgroups), because that should more or less be the time
it ends up running, given the WFQ rule.
That way the disparity between tasks migrating into the CPU at u=1 and
them going to sleep at u<1 is much smaller and the above sum doesn't run
away nearly as wild (it still needs some upper bound though).
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-07 9:39 ` Peter Zijlstra
@ 2018-03-07 15:37 ` Patrick Bellasi
0 siblings, 0 replies; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-07 15:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On 07-Mar 10:39, Peter Zijlstra wrote:
> On Tue, Mar 06, 2018 at 07:58:51PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > > +static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
> > > + struct task_struct *p)
> > > +{
> > > + unsigned int enqueued;
> > > +
> > > + if (!sched_feat(UTIL_EST))
> > > + return;
> > > +
> > > + /* Update root cfs_rq's estimated utilization */
> > > + enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> > > + enqueued += _task_util_est(p);
> > > + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> > > +}
>
> > It appears to me this isn't a stable situation and completely relies on
> > the !nr_running case to recalibrate. If we ensure that doesn't happen
> > for a significant while the sum can run-away, right?
> >
> > Should we put a max in enqueue to avoid this?
>
> Thinking about this a bit more; would it make sense to adjust the
> running sum/avg on migration? Something along the lines of:
>
> util_avg = se->load_avg / (cfs_rq->load_avg + se->load_avg);
>
> (which disregards cgroups), because that should more or less be the time
> it ends up running, given the WFQ rule.
I would say it makes sense from a purely mechanism stanpoing, but I'm
not entirely convinced it can be useful from a practical stanpoint.
First of all, that should be applied only when we migrate to a more
saturated CPU. Otherwise, when migrating on an empty CPU we would set
util_avg = 100%
Secondly, when we migrate to a saturated CPU, this adjustment will
contribute to under-estimate the task utilization.
Let say the task was running on a completely empty CPU, and thus we
was able to ramp up without being preempted. This value represents a
good estimation of the (most recent) task CPU demands.
Now, if on a following activation, we wakeup the task on an IDLE CPU
with a lot of blocked load, then we will scale down its util_avg
and assume the task will be smaller.
But:
a) if the blocked load does not turns into some task waking up again,
underestimated the task introduces only further ramp-up latencies
b) if the load it due to really active tasks, the task will be
preempted and it's utilization smaller... but we are already in a
domain where utilization does not tell us anything useful for a
task... and thus, why bothering to make it converging sooner?
> That way the disparity between tasks migrating into the CPU at u=1 and
> them going to sleep at u<1 is much smaller and the above sum doesn't run
> away nearly as wild (it still needs some upper bound though).
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-06 18:58 ` Peter Zijlstra
2018-03-07 9:39 ` Peter Zijlstra
@ 2018-03-07 11:31 ` Patrick Bellasi
2018-03-07 12:24 ` Peter Zijlstra
1 sibling, 1 reply; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-07 11:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On 06-Mar 19:58, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > +static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
> > + struct task_struct *p)
> > +{
> > + unsigned int enqueued;
> > +
> > + if (!sched_feat(UTIL_EST))
> > + return;
> > +
> > + /* Update root cfs_rq's estimated utilization */
> > + enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> > + enqueued += _task_util_est(p);
> > + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
> > +}
>
> > +static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
> > + struct task_struct *p,
> > + bool task_sleep)
> > +{
> > + long last_ewma_diff;
> > + struct util_est ue;
> > +
> > + if (!sched_feat(UTIL_EST))
> > + return;
> > +
> > + /*
> > + * Update root cfs_rq's estimated utilization
> > + *
> > + * If *p is the last task then the root cfs_rq's estimated utilization
> > + * of a CPU is 0 by definition.
> > + */
> > + ue.enqueued = 0;
> > + if (cfs_rq->nr_running) {
> > + ue.enqueued = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> > + ue.enqueued -= min_t(unsigned int, ue.enqueued,
> > + _task_util_est(p));
> > + }
> > + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
>
> It appears to me this isn't a stable situation and completely relies on
> the !nr_running case to recalibrate. If we ensure that doesn't happen
> for a significant while the sum can run-away, right?
By away you mean go over 1024 or overflow the unsigned int storage?
In the first case, I think we don't care about exceeding 1024 since:
- we cap to capacity_orig_of in cpu_util_est
- by directly reading the cfs_rq->avg.util_est.enqueued we can
actually detect conditions in which a CPU is over-saturated.
In the second case, with an unsigned int we can enqueue up to few
millions of 100% tasks on a single CPU without overflowing.
> Should we put a max in enqueue to avoid this?
IMO the capping from the cpu_util_est getter should be enough...
Maybe I'm missing your point here?
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-07 11:31 ` Patrick Bellasi
@ 2018-03-07 12:24 ` Peter Zijlstra
2018-03-07 15:24 ` Patrick Bellasi
0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-07 12:24 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Wed, Mar 07, 2018 at 11:31:49AM +0000, Patrick Bellasi wrote:
> > It appears to me this isn't a stable situation and completely relies on
> > the !nr_running case to recalibrate. If we ensure that doesn't happen
> > for a significant while the sum can run-away, right?
>
> By away you mean go over 1024 or overflow the unsigned int storage?
the later, I think you can make it arbitrarily large. Have a busy task
on CPU0, this ensure !nr_running never happens.
Start a busy task on CPU1, wait for it to hit u=1, then migrate it to
CPU0, then wait for it to hit u=.5 then kill it, this effectively adds
.5 to the enqueued value, repeat indefinitely.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-07 12:24 ` Peter Zijlstra
@ 2018-03-07 15:24 ` Patrick Bellasi
2018-03-07 17:35 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-07 15:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On 07-Mar 13:24, Peter Zijlstra wrote:
> On Wed, Mar 07, 2018 at 11:31:49AM +0000, Patrick Bellasi wrote:
> > > It appears to me this isn't a stable situation and completely relies on
> > > the !nr_running case to recalibrate. If we ensure that doesn't happen
> > > for a significant while the sum can run-away, right?
> >
> > By away you mean go over 1024 or overflow the unsigned int storage?
>
> the later, I think you can make it arbitrarily large. Have a busy task
> on CPU0, this ensure !nr_running never happens.
>
> Start a busy task on CPU1, wait for it to hit u=1, then migrate it to
> CPU0,
At this point util_est(CPU0) = 2048, which is:
+1024 for the busy running task
assuming it has been enqueued with the utilization since the beginning
+1024 for the newly migrated task from CPU1
which is enqueued with the value he reached at dequeued time
from CPU1
> then wait for it to hit u=.5 then kill it,
... but when we kill it, the task is dequeued, and thus we remove
1024.
Maybe that's the tricky bit: we remove the value we enqueued, _not_
the current util_avg. Notice we use _task_util_est(p)... with the
leading "_".
> this effectively adds
> .5 to the enqueued value, repeat indefinitely.
Thus this should not happen.
Basically, the RQ's util_est is the sum of the RUNNABLE tasks's
util_est at their enqueue time... which has been update at their last
dequeue time, hence the usage of name "dequeued" for both tasks and
rqs.
Does it make sense now?
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-07 15:24 ` Patrick Bellasi
@ 2018-03-07 17:35 ` Peter Zijlstra
0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-07 17:35 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Wed, Mar 07, 2018 at 03:24:58PM +0000, Patrick Bellasi wrote:
> Maybe that's the tricky bit: we remove the value we enqueued, _not_
> the current util_avg. Notice we use _task_util_est(p)... with the
> leading "_".
ARGH, ok let me try that again ;-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
[not found] ` <20180222170153.673-2-patrick.bellasi@arm.com>
` (2 preceding siblings ...)
2018-03-06 18:58 ` Peter Zijlstra
@ 2018-03-06 19:02 ` Peter Zijlstra
2018-03-07 11:47 ` Patrick Bellasi
3 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-06 19:02 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> +struct util_est {
> + unsigned int enqueued;
> + unsigned int ewma;
> +#define UTIL_EST_WEIGHT_SHIFT 2
> +};
> + ue = READ_ONCE(p->se.avg.util_est);
> + WRITE_ONCE(p->se.avg.util_est, ue);
That is actually quite dodgy... and relies on the fact that we have the
8 byte case in __write_once_size() and __read_once_size()
unconditionally. It then further relies on the compiler DTRT for 32bit
platforms, which is generating 2 32bit loads/stores.
The advantage is of course that it will use single u64 loads/stores
where available.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-06 19:02 ` Peter Zijlstra
@ 2018-03-07 11:47 ` Patrick Bellasi
2018-03-07 12:26 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-07 11:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On 06-Mar 20:02, Peter Zijlstra wrote:
> On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > +struct util_est {
> > + unsigned int enqueued;
> > + unsigned int ewma;
> > +#define UTIL_EST_WEIGHT_SHIFT 2
> > +};
>
> > + ue = READ_ONCE(p->se.avg.util_est);
>
> > + WRITE_ONCE(p->se.avg.util_est, ue);
>
> That is actually quite dodgy... and relies on the fact that we have the
> 8 byte case in __write_once_size() and __read_once_size()
> unconditionally. It then further relies on the compiler DTRT for 32bit
> platforms, which is generating 2 32bit loads/stores.
>
> The advantage is of course that it will use single u64 loads/stores
> where available.
Yes, that's mainly an "optimization" for 64bit targets... but perhaps
the benefits are negligible.
Do you prefer to keep more "under control" the generated code by using
two {READ,WRITE}_ONCEs?
IMO here we can also go with just the WRITE_ONCEs. I don't see a case
for the compiler to mangle load/store. While the WRITE_ONCE are still
required to sync with non rq-lock serialized code.
But... maybe I'm missing something... ?
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-07 11:47 ` Patrick Bellasi
@ 2018-03-07 12:26 ` Peter Zijlstra
2018-03-07 15:16 ` Patrick Bellasi
0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2018-03-07 12:26 UTC (permalink / raw)
To: Patrick Bellasi
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On Wed, Mar 07, 2018 at 11:47:11AM +0000, Patrick Bellasi wrote:
> On 06-Mar 20:02, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > > +struct util_est {
> > > + unsigned int enqueued;
> > > + unsigned int ewma;
> > > +#define UTIL_EST_WEIGHT_SHIFT 2
> > > +};
> >
> > > + ue = READ_ONCE(p->se.avg.util_est);
> >
> > > + WRITE_ONCE(p->se.avg.util_est, ue);
> >
> > That is actually quite dodgy... and relies on the fact that we have the
> > 8 byte case in __write_once_size() and __read_once_size()
> > unconditionally. It then further relies on the compiler DTRT for 32bit
> > platforms, which is generating 2 32bit loads/stores.
> >
> > The advantage is of course that it will use single u64 loads/stores
> > where available.
>
> Yes, that's mainly an "optimization" for 64bit targets... but perhaps
> the benefits are negligible.
>
> Do you prefer to keep more "under control" the generated code by using
> two {READ,WRITE}_ONCEs?
>
> IMO here we can also go with just the WRITE_ONCEs. I don't see a case
> for the compiler to mangle load/store. While the WRITE_ONCE are still
> required to sync with non rq-lock serialized code.
> But... maybe I'm missing something... ?
I'm not sure we rely on READ/WRITE_ONCE() of 64bit variables on 32bit
targets to be sane anywhere else (we could be, I just dont know).
I suspect it all works as expected... but its a tad tricky.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v5 1/4] sched/fair: add util_est on top of PELT
2018-03-07 12:26 ` Peter Zijlstra
@ 2018-03-07 15:16 ` Patrick Bellasi
0 siblings, 0 replies; 21+ messages in thread
From: Patrick Bellasi @ 2018-03-07 15:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
Viresh Kumar, Vincent Guittot, Paul Turner, Dietmar Eggemann,
Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
Steve Muckle
On 07-Mar 13:26, Peter Zijlstra wrote:
> On Wed, Mar 07, 2018 at 11:47:11AM +0000, Patrick Bellasi wrote:
> > On 06-Mar 20:02, Peter Zijlstra wrote:
> > > On Thu, Feb 22, 2018 at 05:01:50PM +0000, Patrick Bellasi wrote:
> > > > +struct util_est {
> > > > + unsigned int enqueued;
> > > > + unsigned int ewma;
> > > > +#define UTIL_EST_WEIGHT_SHIFT 2
> > > > +};
> > >
> > > > + ue = READ_ONCE(p->se.avg.util_est);
> > >
> > > > + WRITE_ONCE(p->se.avg.util_est, ue);
> > >
> > > That is actually quite dodgy... and relies on the fact that we have the
> > > 8 byte case in __write_once_size() and __read_once_size()
> > > unconditionally. It then further relies on the compiler DTRT for 32bit
> > > platforms, which is generating 2 32bit loads/stores.
> > >
> > > The advantage is of course that it will use single u64 loads/stores
> > > where available.
> >
> > Yes, that's mainly an "optimization" for 64bit targets... but perhaps
> > the benefits are negligible.
> >
> > Do you prefer to keep more "under control" the generated code by using
> > two {READ,WRITE}_ONCEs?
Any specific preference on this previous point?
> > IMO here we can also go with just the WRITE_ONCEs. I don't see a case
> > for the compiler to mangle load/store. While the WRITE_ONCE are still
> > required to sync with non rq-lock serialized code.
> > But... maybe I'm missing something... ?
>
> I'm not sure we rely on READ/WRITE_ONCE() of 64bit variables on 32bit
> targets to be sane anywhere else (we could be, I just dont know).
My understating is that, since here we are in an rq-lock protected
section, and only in this section we can write these vars, then the
load is a dependency for the store and the compiler cannot screw up...
> I suspect it all works as expected... but its a tad tricky.
Then let's keep them for the time being... meanwhile I try to get
some more "internal" feedback before next posting.
--
#include <best/regards.h>
Patrick Bellasi
^ permalink raw reply [flat|nested] 21+ messages in thread