Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Lukasz Luba <lukasz.luba@arm.com>
To: Quentin Perret <qperret@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
	rafael.j.wysocki@intel.com, linux-kernel@vger.kernel.org,
	qyousef@layalina.io, hongyan.xia2@arm.com
Subject: Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized
Date: Wed, 2 Oct 2024 10:54:21 +0100	[thread overview]
Message-ID: <7e42e080-30d1-4175-b0c1-3999e34502ae@arm.com> (raw)
In-Reply-To: <Zvz8Z_EDI93l2GBt@google.com>



On 10/2/24 08:55, Quentin Perret wrote:
> Hey Lukasz,
> 
> On Wednesday 02 Oct 2024 at 08:11:06 (+0100), Lukasz Luba wrote:
>> Hi Quentin and Vincent,
>>
>> On 10/1/24 18:50, Quentin Perret wrote:
>>> On Tuesday 01 Oct 2024 at 18:20:03 (+0200), Vincent Guittot wrote:
>>>> With commit 50181c0cff31 ("sched/pelt: Avoid underestimation of task
>>>> utilization"), the util_est remains set the value before having to
>>>> share the cpu with other tasks which means that the util_est remains
>>>> correct even if its util_avg decrease because of sharing the cpu with
>>>> other task. This has been done to cover the cases that you mention
>>>> above whereboth util_avg and util_est where decreasing when tasks
>>>> starts to  share  the CPU bandwidth with others
>>>
>>> I don't think I agree about the correctness of that util_est value at
>>> all. The above patch only makes it arbitrarily out of date in the truly
>>> overcommitted case. All the util-based heuristic we have in the
>>> scheduler are based around the assumption that the close future will
>>> look like the recent past, so using an arbitrarily old util-est is still
>>> incorrect. I can understand how this may work OK in RT-app or other
>>> use-cases with perfectly periodic tasks for their entire lifetime and
>>> such, but this doesn't work at all in the general case.
>>
>> I remember that commit Vincent mentioned above. That was from a web
>> browser test 'Speedometer', not rt-app. The browser has to run the
>> same 'computation problem' but with quite a lot of JavaScript
>> frameworks. Those frameworks mainly run in the browser main thread,
>> with some helper threads in background.
>>
>> So it was not purely RT-app or other perfectly periodic task.
>> Although, IIRC Vincent was able to build a model based on rt-app
>> to tackle that issue.
>>
>> That patch helped to better reflect the situation in the OS.
> 
> Sure thing, I'm absolutely ready to believe that an old util-est value
> will be better in certain use-cases, but again I don't think we should
> conflate this for the general case. In particular a util-est that was
> measured when the system was lightly loaded is absolutely not guaranteed
> to be valid while it is overcommitted. Freshness matters in many cases.

I think I got your point, fair enough.

> 
>> For this particular _subject_ I don't think it's relevant, though.
>> It was actually helping to show that the situation is worse, so
>> closer to OU because the task was bigger (and we avoid EAS).
>>
>>>
>>>> And feec() will return -1 for that case because util_est remains high
>>>
>>> And again, checking that a task fits is broken to start with if we don't
>>> know how big the task is. When we have reasons to believe that the util
>>> values are no longer correct (and the absence of idle time is a very
>>> good reason for that) we just need to give up on them. The fact that we
>>> have to resort to using out-of-date data to sort of make that work is
>>> just another proof that this is not a good idea in the general case.
>>>
>>>> the commit that I mentioned above covers those cases and the task will
>>>> not incorrectly fit to another smaller CPU because its util_est is
>>>> preserved during the overutilized phase
>>>
>>> There are other reasons why a task may look like it fits, e.g. two tasks
>>> coscheduled on a big CPU get 50% util each, then we migrate one away, the
>>> CPU looks half empty. Is it half empty? We've got no way to tell until
>>> we see idle time. The current util_avg and old util_est value are just
>>> not helpful, they're both bad signals and we should just discard them.
>>
>> So would you then reset them to 0? Or leave them as they are?
>> What about the other signals (cpu runqueue) which are derived from them?
>> That sounds like really heavy change or inconsistency in many places.
> 
> I would just leave them as they are, but not look at them, pretty much
> like we do today. In the overcommitted case, load is a superior signal
> because it accounts for runnable time and the task weights, so we really
> ought to use that instead of util.

OK make sense, thanks. Sounds like valid plan to try then.

> 
>>>
>>> So again I do feel like the best way forward would be to change the
>>> nature of the OU threshold to actually ask cpuidle 'when was the last
>>> time there was idle time?' (or possibly cache that in the idle task
>>> directly). And then based on that we can decide whether we want to enter
>>> feec() and do util-based decision, or to kick the push-pull mechanism in
>>> your other patches, things like that. That would solve/avoid the problem
>>> I mentioned in the previous paragraph and make the OU detection more
>>> robust. We could also consider using different thresholds in different
>>> places to re-enable load-balancing earlier, and give up on feec() a bit
>>> later to avoid messing the entire task placement when we're only
>>> transiently OU because of misfit. But eventually, we really need to just
>>> give up on util values altogether when we're really overcommitted, it's
>>> really an invariant we need to keep.
>>
>> IMHO the problem here with OU was amplified recently due to the
>> Uclamp_max setting
> 
> Ack.
> 
>> 'Max aggregation policy'
> 
> Ack.
> 
>> aggressive frequency capping
> 
> What do you mean by that?
> 
>> fast freq switching.
> 
> And not sure what fast switching has to do with the issue here?

I mean, with some recent changes flying LKML we are heading to kind
of 'per task DVFS'. Like switching a frequency 'just for that task'
when it's scheduled. This was concerning me. I think we tried to
have a 'planning' view in scheduler on more things in the CPUs requested
performance for future. The future is hard to predict, sometime even
this +20% CPU freq margin was helping us (when we run a bit longer than
our prediction).

With this approach tackling all of the 'safety margins' to save
more power I'm worried about harming normal general scheduling
and performance.

I'm a big fan to save energy, but not doing this very hard
where general scheduling concept might suffer.
E.g. this _subject_:  EAS when OU - is when I'm careful.


> 
>> Now we are in the situation where we complain about util metrics...
>>
>> I've been warning Qais and Vincent that this usage of Uclamp_max
>> in such environment is dangerous and might explode.
> 
> I absolutely agree that uclamp max makes a huge mess of things, and util
> in particular :-(
> 
>> If one background task is capped hard in CPU freq, but does computation
>> 'all the time' making that CPU to have no idle time - then IMO
>> this is not a good scheduling. This is a receipt for starvation.
>> You probably won't find any better metric.
>>
>> I would suggest to stop making the OU situation worse and more
>> frequent with this 'artificial starvation with uclamp_max'.
>>
>> I understand we want to safe energy, but uclamp_max in current shape
>> has too many side effects IMO.
>>
>> Why we haven't invested in the 'Bandwidth controller', e.g. to make
>> it big.Little aware (if that could be a problem)(they were there for
>> many years)?
> 
> Bandwidth control is a different thing really, not sure it can be used
> interchangeably with uclamp_max in general. Running all the time at low
> frequency is often going to be better from a power perspective than
> running uncapped for a fixed period of time.
> 
> I think the intention of uclamp max is really to say 'these tasks have
> low QoS, use spare cycles at low-ish frequency to run them'. What we
> found was that it was best to use cpu.shares in conjunction with
> uclamp.max to implement the 'use spare cycles' part of the previous
> statement, but that was its own can of worms and caused a lot of
> priority inversion problems. Hopefully the proxy exec stuff will solve
> that...
> 

Yes, I see your point. It looks like some new ideas are very welcome.

next prev parent reply	other threads:[~2024-10-02  9:53 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 13:03 [PATCH 0/5] sched/fair: Rework EAS to handle more cases Vincent Guittot
2024-08-30 13:03 ` [PATCH 1/5] sched/fair: Filter false overloaded_group case for EAS Vincent Guittot
2024-09-02  9:01   ` Hongyan Xia
2024-09-06  6:51     ` Vincent Guittot
2024-09-13 13:21   ` Pierre Gondois
2024-08-30 13:03 ` [PATCH 2/5] energy model: Add a get previous state function Vincent Guittot
2024-09-05  9:21   ` Lukasz Luba
2024-09-06  6:55     ` Vincent Guittot
2024-08-30 13:03 ` [PATCH 3/5] sched/fair: Rework feec() to use cost instead of spare capacity Vincent Guittot
2024-09-02  9:11   ` kernel test robot
2024-09-02 11:03   ` Hongyan Xia
2024-09-06  7:08     ` Vincent Guittot
2024-09-06 15:32       ` Hongyan Xia
2024-09-12 12:12         ` Vincent Guittot
2024-09-04 15:07   ` Pierre Gondois
2024-09-06  7:08     ` Vincent Guittot
2024-09-11 14:02   ` Pierre Gondois
2024-09-11 16:51     ` Pierre Gondois
2024-09-12 12:22     ` Vincent Guittot
2024-12-05 16:23       ` Pierre Gondois
2024-08-30 13:03 ` [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized Vincent Guittot
2024-09-17 20:24   ` Christian Loehle
2024-09-19  8:25     ` Pierre Gondois
2024-09-25 13:28       ` Vincent Guittot
2024-10-07  7:03         ` Pierre Gondois
2024-10-09  8:53           ` Vincent Guittot
2024-10-11 12:52             ` Pierre Gondois
2024-10-15 12:47               ` Vincent Guittot
2024-10-31 15:21                 ` Pierre Gondois
2024-09-25 13:07     ` Vincent Guittot
2024-09-20 16:17   ` Quentin Perret
2024-09-25 13:27     ` Vincent Guittot
2024-09-26  9:10       ` Quentin Perret
2024-10-01 16:20         ` Vincent Guittot
2024-10-01 17:50           ` Quentin Perret
2024-10-02  7:11             ` Lukasz Luba
2024-10-02  7:55               ` Quentin Perret
2024-10-02  9:54                 ` Lukasz Luba [this message]
2024-10-03  6:27             ` Vincent Guittot
2024-10-03  8:15               ` Lukasz Luba
2024-10-03  8:26                 ` Quentin Perret
2024-10-03  8:52                 ` Vincent Guittot
2024-10-03  8:21               ` Quentin Perret
2024-10-03  8:57                 ` Vincent Guittot
2024-10-03  9:52                   ` Quentin Perret
2024-10-03 13:26                     ` Vincent Guittot
2024-11-19 14:46               ` Christian Loehle
2024-08-30 13:03 ` [RFC PATCH 5/5] sched/fair: Add push task callback for EAS Vincent Guittot
2024-09-09  9:59   ` Christian Loehle
2024-09-09 12:54     ` Vincent Guittot
2024-09-11 14:03   ` Pierre Gondois
2024-09-12 12:30     ` Vincent Guittot
2024-09-13  9:09       ` Pierre Gondois
2024-09-24 12:37         ` Vincent Guittot
2024-09-13 16:08   ` Pierre Gondois
2024-09-24 13:00     ` Vincent Guittot
2024-11-07 10:14 ` [PATCH 0/5] sched/fair: Rework EAS to handle more cases Pierre Gondois
2024-11-08  9:27   ` Vincent Guittot
2024-11-08 13:10     ` Pierre Gondois
2024-11-11 19:08       ` Vincent Guittot
2024-11-28 17:24 ` Hongyan Xia
2024-11-30 10:50   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e42e080-30d1-4175-b0c1-3999e34502ae@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hongyan.xia2@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=qyousef@layalina.io \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox