Re: [PATCH 1/7] sched/uclamp: Fix relationship between uclamp and migration margin

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vincent Guittot <vincent.guittot@linaro.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	linux-kernel@vger.kernel.org, Xuewen Yan <xuewen.yan94@gmail.com>,
	Wei Wang <wvw@google.com>,
	Jonathan JMChen <Jonathan.JMChen@mediatek.com>,
	Hank <han.lin@mediatek.com>, Lukasz Luba <lukasz.luba@arm.com>
Subject: Re: [PATCH 1/7] sched/uclamp: Fix relationship between uclamp and migration margin
Date: Fri, 22 Jul 2022 17:13:00 +0200	[thread overview]
Message-ID: <20220722151300.GA30193@vingu-book> (raw)
In-Reply-To: <20220721140431.cl73bhn6ydzdhdkm@wubuntu>

Le jeudi 21 juil. 2022 à 15:04:31 (+0100), Qais Yousef a écrit :
> On 07/20/22 09:29, Vincent Guittot wrote:
> > On Fri, 15 Jul 2022 at 12:37, Qais Yousef <qais.yousef@arm.com> wrote:
> > >
> > > On 07/13/22 14:39, Vincent Guittot wrote:
> > >
> > > [...]
> > >
> > > > > > That's why I have mentioned that I have thermal pressure and irq in
> > > > > > mind. I'm speaking about performance level but not about bandwidth and
> > > > > > time sharing.
> > > > >
> > > > > irq pressure has no impact on the cpu's ability to get any OPP, no? It purely
> > > > > reduces the bandwidth availability for CFS tasks AFAIU. So the task's ability
> > > > > to achieve a performance level has no correlation with irq pressure IMO. Unless
> > > > > I missed something.
> > > >
> > > > The way irq is accounted in pelt might impact the result. TBH, i
> > > > haven't looked in details what would be the impact
> > >
> > > I can't see how irq can impact what performance level we can achieve on any
> > > CPU. It should just impact bandwidth?
> > 
> > It impacts the cpu and task utilization as your task utilization is
> > expressed in the range of the time not used by IRQ so could be lower
> > than what you think when you compare with uclamp and decide what to do
> 
> I need more helping hand to understand please.
> 
> So for the case of uclamp_min = 1024, this request means:
> 
> 	When I run, I want to run at max performance point of the system.
> 
> Which translates into running at highest frequency on SMP, and highest
> frequency + biggest CPU on HMP.
> 
> If a CPU has irq pressure, how this will prevent the task from running at
> highest frequency? What am I missing?

I was thinking of the case of uclamp_min not being 1024. But the real
task util_avg (ie including the impact of irq pressure) will be always
lower than the task clock version so the comparison with uclamp_min will
always be satisfied.

> 
> I am assuming that the task is actually small so it will never be able to run
> at max frequency without this hint, ie: util_avg = 300.
> 
> Keep in mind that util_fits_cpu() still verifies that util_avg is within the
> 80% range of capacity_of() which takes into account all types of pressures.
> 
> > 
> > >
> > > [...]
> > >
> > > > > > more concerned by the thermal pressure as I mentioned previously. As
> > > > > > an example the thermal pressure reflects the impact on the performance
> > > > > > while task is running.
> > > > >
> > > > > Like we discussed on that RT email thread. If you have a 1024 task, tiny
> > > > > thermal pressure will make it look like it won't fit anywhere.
> > > >
> > > > maybe another big core without pressure. Otherwise if the task can
> > >
> > > Isn't thermal pressure per perf domain?
> > 
> > From a scheduler PoV, we don't have any rule on this
> > 
> > >
> > > > accept a lower compute capacity why not setting uclamp_min to a lower
> > > > value like 900
> > >
> > > Well if the system has lost its top 10% and you're still running as fast as
> > > the system can possibly do, what better can you do?
> > >
> > > I can't see how comparing uclamp with thermal pressure will help.
> > >
> > > In feec() we pick the highest spare capacity CPU. So if the bigs were split
> > > into 1 per perf domain and truly one of them can become severely throttled
> > > while the other isn't as you're trying to say, then this distribution will pick
> > > the highest spare capacity one.
> > 
> > The cpu with highest spare capacity might not be the one with highest
> > performance
> 
> True. But all of this is best effort. And I think this is good enough for the
> common case. I don't mind addressing the thermal problem, but it's not a simple
> one. And there's a complexity cost that is AFAICS is high.
> 

Using capacity_orig_of(cpu) - thermal_load_avg(rq_of(cpu)) seems like
a simple solution to cover thermal mitigation

Also I was looking more deeply at your condition and get hard time to
understand why uclamp_max_fits needs to be false when both
(capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE) ?

+       max_capacity = (capacity_orig == SCHED_CAPACITY_SCALE) &&
(uclamp_max == SCHED_CAPACITY_SCALE);
+       uclamp_max_fits = !max_capacity && (uclamp_max <= capacity_orig);
+       fits = fits || uclamp_max_fits;

For task I would have done only :

+       capacity_orig = capacity_orig_of(cpu) - thermal_load_avg(rq_of(cpu));
+       uclamp_max_fits = (uclamp_max <= capacity_orig);
       fits = fits || uclamp_max_fits;

and I would use a different one for cpu_overutlized in orde to discard the test
with uclamp_max if uclamp_max one equals SCHED_CAPACITY_SCALE

+       uclamp_max_fits = (uclamp_max <= capacity_orig) && (uclamp_max != SCHED_CAPACITY_SCALE);

and I don't think that we should compare uclamp_min <= capacity_orig for
cpu_overutlized() but only for task to detect misfit one because uclamp_min is
a performance hint not a bandwidth as you said previously.

> > 
> > >
> > > fits_capacity() just says this CPU is a candidate that we can consider.
> > >
> > > [...]
> > >
> > > > > > TaskA usually runs 4 ms every 8ms but wants to ensure a running time
> > > > > > around 5ms. Task A asks for a uclamp_min of 768.
> > > > > > medium cpu capacity_orig is 800 but runs at half its max freq because
> > > > > > of thermal mitigation then your task will runs more than 8ms
> > > > >
> > > > > If thermal pressure is 50%, then capacity_of() is 400. A 50% task will have
> > > > > util_avg of 512, which is much larger than 0.8 * 400. So this is dealt with
> > > > > already in this code, no?
> > > >
> > > > May be my example is not perfect but apply a mitigation of 20% and you
> > > > fall in the case
> > >
> > >         capacity_orig_of(medium) = 800
> > >         capacity_of(medium) = 800 * 0.8 - sum_of_(irq, rt) pressure :: <= 640
> > >
> > >         migration_margin * capacity_of(medium) = 0.8 * 640 = 512 === p->util_avg
> > >
> > > So this task will struggle still to run on the medium even under 20% pressure.
> > 
> > you are nitpicking. 19.75% should be ok
> 
> I was just trying to highlight it took a bit of effort to reach to the corner
> case. Which indicates the corner case is specific.

hmmm 19%.75% is not a corner case, i was just lazy to compute the exact number

> 
> > 
> > >
> > > I can see your point for sure that we could have scenarios where we should pick
> > > a bigger CPU. But my counter point is that if there's a meaningful thermal
> > > pressure we are screwed already and uclamp can't save the day.
> > 
> > uclamp can save it by triggering the search of another cpu with lower pressure
> 
> How would you do that?
> 
> If a task hints towards uclamp_min = 1024. If there's 1% pressure on all cpus,
> is triggering overutilized right? What's tripping me off is how would you do
> that fallback gracefully?
> 

As proposed above, you should use different rules for cpu_overutilized
and task fits cpus to make a difference beteen overutlized cpu and misfit task

> > 
> > >
> > > I'll repeat my question, how would you encode the relationship?
> > >
> > > Consider these scenarios:
> > >
> > >
> > >         capaity_orig_of(little) = 400
> > >         capaity_orig_of(medium) = 800
> > >         capaity_orig_of(big) = 1024
> > >
> > >         p0->util_avg = 300
> > >         p0->uclamp_min = 800
> > >
> > >         p1->util_avg = 300
> > >         p1->uclamp_min = 1024
> > >
> > >
> > > When there's 10% thermal pressure on all CPUs.
> > >
> > > Does p1 fit on big still? Fit here means the  big is a viable candidate from
> > > uclamp point of view.
> > 
> > I agree that this one is tricky because if all cpus are throttled,
> > there is no cpu but it's worth looking for the big cpu with lowest
> > throttling otherwise
> 
> If there's an easy path to achieving this, I'm happy to try it.
> 
> > 
> > >
> > > How would you define the relationship so that p0 will not fit the medium, but
> > > p1 still fits the big.
> > 
> > I would compare uclamp_min with capacity_orig() - thermal pressure to
> > decide if we should look for another cpu
> 
> Are you referring to instantaneous pressure here? Because with the average
> signal we would take a long time to decay, and lose a lot of opportunities to
> do better. And this is really the crust of the problem.
> 
> My understanding has been is that this signal can easily be non-zero. But maybe
> I need to re-evaluate that if you don't see this as a problem.
> 
> Maybe with Lukasz patch to speed up the decaying we can do better?
> 
> 	https://lore.kernel.org/lkml/20220429091245.12423-1-lukasz.luba@arm.com/
> 
> 
> But even then, the case of
> 
> 	capaity_orig_of(little) = 400
> 	capaity_orig_of(medium) = 800
> 	capaity_orig_of(big) = 1024
> 
> 	p0->util_avg = 300
> 	p0->uclamp_min = 1024
> 
> would unnecessarily trigger overutilized for all values of thermal pressure up
> to ~20% on the big cores. Which I see is wrong.
> 
> IMO better here means keeping the task on the big core is this honours the best
> available performance hint. Only exception is if we go into capacity inversion,
> which I think we can handle.
> 
> 
> Thanks
> 
> --
> Qais Yousef
> 
> > 
> > >
> > > What happens when thermal pressure is 1%? Should p0 still fit on the medium
> > > then? As Lukasz highlighted in other email threads, the decay of thermal
> > > pressure signal has a very long tail.
> > >
> > >
> > > Thanks!
> > >
> > > --
> > > Qais Yousef

next prev parent reply	other threads:[~2022-07-22 15:13 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 19:46 [PATCH 0/7] Fix relationship between uclamp and fits_capacity() Qais Yousef
2022-06-29 19:46 ` [PATCH 1/7] sched/uclamp: Fix relationship between uclamp and migration margin Qais Yousef
2022-07-11 12:36   ` Vincent Guittot
2022-07-12 10:23     ` Qais Yousef
2022-07-12 13:21       ` Vincent Guittot
2022-07-12 14:20         ` Qais Yousef
2022-07-13 12:39           ` Vincent Guittot
2022-07-15 10:37             ` Qais Yousef
2022-07-20  7:29               ` Vincent Guittot
2022-07-21 14:04                 ` Qais Yousef
2022-07-22 15:13                   ` Vincent Guittot [this message]
2022-07-27 16:08                     ` Qais Yousef
2022-08-04 14:59                     ` Qais Yousef
2022-07-20  7:17   ` Xuewen Yan
2022-07-21 10:24     ` Qais Yousef
2022-07-25 11:59       ` Xuewen Yan
2022-07-27 16:25         ` Qais Yousef
2022-08-01  2:46           ` Xuewen Yan
2022-08-02 16:22             ` Qais Yousef
2022-06-29 19:46 ` [PATCH 2/7] sched/uclamp: Make task_fits_capacity() use util_fits_cpu() Qais Yousef
2022-07-11 13:09   ` Vincent Guittot
2022-07-12 10:48     ` Qais Yousef
2022-07-21 14:29       ` Qais Yousef
2022-07-22  8:19         ` Vincent Guittot
2022-07-27 16:05           ` Qais Yousef
2022-08-17  9:48             ` Vincent Guittot
2022-07-20  7:23   ` Xuewen Yan
2022-07-21 14:11     ` Qais Yousef
2022-06-29 19:46 ` [PATCH 3/7] sched/uclamp: Fix fits_capacity() check in feec() Qais Yousef
2022-07-20  7:30   ` Xuewen Yan
2022-07-21 14:19     ` Qais Yousef
2022-06-29 19:46 ` [PATCH 4/7] sched/uclamp: Make select_idle_capacity() use util_fits_cpu() Qais Yousef
2022-06-29 19:46 ` [PATCH 5/7] sched/uclamp: Make asym_fits_capacity() " Qais Yousef
2022-06-29 19:46 ` [PATCH 6/7] sched/uclamp: Make cpu_overutilized() " Qais Yousef
2022-06-29 19:46 ` [PATCH 7/7] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition Qais Yousef
2022-07-20  7:39   ` Xuewen Yan
2022-07-21 14:24     ` Qais Yousef
2022-07-22  1:09       ` Xuewen Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220722151300.GA30193@vingu-book \
    --to=vincent.guittot@linaro.org \
    --cc=Jonathan.JMChen@mediatek.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=han.lin@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=wvw@google.com \
    --cc=xuewen.yan94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox