From: Hongyan Xia <hongyan.xia2@arm.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>
Cc: Qais Yousef <qyousef@layalina.io>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Lukasz Luba <lukasz.luba@arm.com>,
Christian Loehle <christian.loehle@arm.com>,
pierre.gondois@arm.com, linux-kernel@vger.kernel.org,
linux-pm@vger.kernel.org
Subject: Re: [RFC PATCH v3 3/6] sched/fair: Use util biases for utilization and frequency
Date: Tue, 28 May 2024 12:38:11 +0100 [thread overview]
Message-ID: <ede83265-4107-4c21-bc33-f733326daafe@arm.com> (raw)
In-Reply-To: <a1adc985-056f-4fac-9b2d-d140c3ee9e9b@arm.com>
On 26/05/2024 23:52, Dietmar Eggemann wrote:
> On 07/05/2024 14:50, Hongyan Xia wrote:
>> Use the new util_avg_bias for task and runqueue utilization. We also
>> maintain separate util_est and util_est_uclamp signals.
>>
>> Now that we have the sum aggregated CFS util value, we do not need to
>
> There is the work uclamp missing somehow here.
Ack.
>> consult uclamp buckets to know how the frequency should be clamped. We
>> simply look at the aggregated top level rq->cfs.avg.util_avg +
>> rq->cfs.avg.util_avg_bias and rq->cfs.avg.util_est_uclamp to know what
>> frequency to choose and how to place tasks.
>
> [...]
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 571c8de59508..14376f23a8b7 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4819,14 +4819,14 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
>>
>> static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf);
>>
>> -static inline unsigned long task_util(struct task_struct *p)
>> +static inline unsigned long task_runnable(struct task_struct *p)
>> {
>> - return READ_ONCE(p->se.avg.util_avg);
>> + return READ_ONCE(p->se.avg.runnable_avg);
>> }
>>
>> -static inline unsigned long task_runnable(struct task_struct *p)
>> +static inline unsigned long task_util(struct task_struct *p)
>> {
>> - return READ_ONCE(p->se.avg.runnable_avg);
>> + return READ_ONCE(p->se.avg.util_avg);
>> }
>>
>> static inline unsigned long _task_util_est(struct task_struct *p)
>> @@ -4839,6 +4839,52 @@ static inline unsigned long task_util_est(struct task_struct *p)
>> return max(task_util(p), _task_util_est(p));
>> }
>
> IMHO, we now have two complete hierarchies of util signals:
>
> (a) (b) (uclamp version of (a))
>
> (1) task_util() task_util_uclamp() (+ task_util_bias())
>
> (2) _task_util_est() _task_util_est_uclamp()
>
> (3) task_util_est() task_util_est_uclamp()
>
> (4) cpu_util() cpu_util_uclamp()
>
> (5) cpu_util_cfs() cpu_util_cfs_uclamp()
>
> (6) cpu_util_cfs_boost() cpu_util_cfs_boost_uclamp()
>
>
> For (4), (5), (6) we now have:
>
> --- cpu_util() cpu_util_uclamp()
>
> eenv_pd_busy_time() x
>
> eenv_pd_max_util() x
>
> find_energy_efficient_cpu() x x <-- (A)
>
> cpu_util_without() x
>
> --- cpu_util_cfs() cpu_util_cfs_uclamp()
>
> cpu_overutilized() x x <-- (B)
>
> update_sg_lb_stats() x
>
> update_numa_stats() x
>
> sched_cpu_util() x
>
> --- cpu_util_cfs_boost() cpu_util_cfs_boost_uclamp()
>
> sched_balance_find_src_rq() x
>
> sugov_get_util() x
>
> uclamp version falls back to mon-uclamp version for !CONFIG_UCLAMP_TASK.
> So it looks like taht in this case we calculate the same cpu utilization
> value twice in (A) and (B).
That is indeed the case. I'll see how I can avoid this duplication,
hopefully without too much #ifdef mess.
> [...]
>
>> @@ -4970,134 +5026,29 @@ static inline unsigned long get_actual_cpu_capacity(int cpu)
>> }
>>
>> static inline int util_fits_cpu(unsigned long util,
>> - unsigned long uclamp_min,
>> - unsigned long uclamp_max,
>> + unsigned long util_uclamp,
>> int cpu)
>> {
>> unsigned long capacity = capacity_of(cpu);
>> - unsigned long capacity_orig;
>> - bool fits, uclamp_max_fits;
>> -
>> - /*
>> - * Check if the real util fits without any uclamp boost/cap applied.
>> - */
>> - fits = fits_capacity(util, capacity);
>> -
>> - if (!uclamp_is_used())
>> - return fits;
>> -
>> - /*
>> - * We must use arch_scale_cpu_capacity() for comparing against uclamp_min and
>> - * uclamp_max. We only care about capacity pressure (by using
>> - * capacity_of()) for comparing against the real util.
>> - *
>> - * If a task is boosted to 1024 for example, we don't want a tiny
>> - * pressure to skew the check whether it fits a CPU or not.
>> - *
>> - * Similarly if a task is capped to arch_scale_cpu_capacity(little_cpu), it
>> - * should fit a little cpu even if there's some pressure.
>> - *
>> - * Only exception is for HW or cpufreq pressure since it has a direct impact
>> - * on available OPP of the system.
>> - *
>> - * We honour it for uclamp_min only as a drop in performance level
>> - * could result in not getting the requested minimum performance level.
>> - *
>> - * For uclamp_max, we can tolerate a drop in performance level as the
>> - * goal is to cap the task. So it's okay if it's getting less.
>> - */
>> - capacity_orig = arch_scale_cpu_capacity(cpu);
>>
>> - /*
>> - * We want to force a task to fit a cpu as implied by uclamp_max.
>> - * But we do have some corner cases to cater for..
>> - *
>> - *
>> - * C=z
>> - * | ___
>> - * | C=y | |
>> - * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max
>> - * | C=x | | | |
>> - * | ___ | | | |
>> - * | | | | | | | (util somewhere in this region)
>> - * | | | | | | |
>> - * | | | | | | |
>> - * +----------------------------------------
>> - * CPU0 CPU1 CPU2
>> - *
>> - * In the above example if a task is capped to a specific performance
>> - * point, y, then when:
>> - *
>> - * * util = 80% of x then it does not fit on CPU0 and should migrate
>> - * to CPU1
>> - * * util = 80% of y then it is forced to fit on CPU1 to honour
>> - * uclamp_max request.
>> - *
>> - * which is what we're enforcing here. A task always fits if
>> - * uclamp_max <= capacity_orig. But when uclamp_max > capacity_orig,
>> - * the normal upmigration rules should withhold still.
>> - *
>> - * Only exception is when we are on max capacity, then we need to be
>> - * careful not to block overutilized state. This is so because:
>> - *
>> - * 1. There's no concept of capping at max_capacity! We can't go
>> - * beyond this performance level anyway.
>> - * 2. The system is being saturated when we're operating near
>> - * max capacity, it doesn't make sense to block overutilized.
>> - */
>> - uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
>> - uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
>> - fits = fits || uclamp_max_fits;
>> + if (fits_capacity(util_uclamp, capacity))
>> + return 1;
>>
>> - /*
>> - *
>> - * C=z
>> - * | ___ (region a, capped, util >= uclamp_max)
>> - * | C=y | |
>> - * |_ _ _ _ _ _ _ _ _ ___ _ _ _ | _ | _ _ _ _ _ uclamp_max
>> - * | C=x | | | |
>> - * | ___ | | | | (region b, uclamp_min <= util <= uclamp_max)
>> - * |_ _ _|_ _|_ _ _ _| _ | _ _ _| _ | _ _ _ _ _ uclamp_min
>> - * | | | | | | |
>> - * | | | | | | | (region c, boosted, util < uclamp_min)
>> - * +----------------------------------------
>> - * CPU0 CPU1 CPU2
>> - *
>> - * a) If util > uclamp_max, then we're capped, we don't care about
>> - * actual fitness value here. We only care if uclamp_max fits
>> - * capacity without taking margin/pressure into account.
>> - * See comment above.
>> - *
>> - * b) If uclamp_min <= util <= uclamp_max, then the normal
>> - * fits_capacity() rules apply. Except we need to ensure that we
>> - * enforce we remain within uclamp_max, see comment above.
>> - *
>> - * c) If util < uclamp_min, then we are boosted. Same as (b) but we
>> - * need to take into account the boosted value fits the CPU without
>> - * taking margin/pressure into account.
>> - *
>> - * Cases (a) and (b) are handled in the 'fits' variable already. We
>> - * just need to consider an extra check for case (c) after ensuring we
>> - * handle the case uclamp_min > uclamp_max.
>> - */
>> - uclamp_min = min(uclamp_min, uclamp_max);
>> - if (fits && (util < uclamp_min) &&
>> - (uclamp_min > get_actual_cpu_capacity(cpu)))
>> + if (fits_capacity(util, capacity))
>> return -1;
>>
>> - return fits;
>> + return 0;
>
> The possible return values stay the same it seems: 1 if uclamp (and so
> util_avg) fit, -1 if util_avg fit and 0 if uclamp and uclamp don't fit.
Yes, this is the intended change to address some comments elsewhere and
in previous revisions.
> [...]
>
>> @@ -6914,6 +6861,10 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>> "0 tasks on CFS of CPU %d, but util_avg_bias is %d\n",
>> rq->cpu, rq->cfs.avg.util_avg_bias);
>> WRITE_ONCE(rq->cfs.avg.util_avg_bias, 0);
>> + WARN_ONCE(rq->cfs.avg.util_est_uclamp,
>> + "0 tasks on CFS of CPU %d, but util_est_uclamp is %u\n",
>> + rq->cpu, rq->cfs.avg.util_est_uclamp);
>> + WRITE_ONCE(rq->cfs.avg.util_est_uclamp, 0);
>
> Maybe better:
>
> if (WARN_ONCE(...
> WRITE_ONCE(rq->cfs.avg.util_est_uclamp, 0);
>
> How can this happen, that there are 0 tasks on rq->cfs hierarcy and
> rq->cfs.avg.util_est_uclamp is !0 ? Is there a specific code path when
> this happens?
Ack.
This should never happen. Triggering the warning here means something
has gone very wrong in the maths and should be reported.
>> }
>> #endif
>> }
>> @@ -7485,7 +7436,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>> static int
>> select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>> {
>> - unsigned long task_util, util_min, util_max, best_cap = 0;
>> + unsigned long task_util, task_util_uclamp, best_cap = 0;
>> int fits, best_fits = 0;
>> int cpu, best_cpu = -1;
>> struct cpumask *cpus;
>> @@ -7494,8 +7445,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>> cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>>
>> task_util = task_util_est(p);
>> - util_min = uclamp_eff_value(p, UCLAMP_MIN);
>> - util_max = uclamp_eff_value(p, UCLAMP_MAX);
>> + task_util_uclamp = task_util_est_uclamp(p);
>
> select_idle_sibling() could pass task_util and task_util_uclamp into
> select_idle_capacity(). This way we would save these calls to
> task_util_est() and task_util_est_uclamp() here.
Good idea.
>> for_each_cpu_wrap(cpu, cpus, target) {
>> unsigned long cpu_cap = capacity_of(cpu);
>> @@ -7503,7 +7453,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>> if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
>> continue;
>>
>> - fits = util_fits_cpu(task_util, util_min, util_max, cpu);
>> + fits = util_fits_cpu(task_util, task_util_uclamp, cpu);
>>
>> /* This CPU fits with all requirements */
>> if (fits > 0)
>
> [...]
>
prev parent reply other threads:[~2024-05-28 11:38 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1715082714.git.hongyan.xia2@arm.com>
2024-05-07 12:50 ` [RFC PATCH v3 3/6] sched/fair: Use util biases for utilization and frequency Hongyan Xia
2024-05-26 22:52 ` Dietmar Eggemann
2024-05-28 11:38 ` Hongyan Xia [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ede83265-4107-4c21-bc33-f733326daafe@arm.com \
--to=hongyan.xia2@arm.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pierre.gondois@arm.com \
--cc=qyousef@layalina.io \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox