From: Qais Yousef <qyousef@layalina.io>
To: Christian Loehle <christian.loehle@arm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Hongyan Xia <hongyan.xia2@arm.com>,
John Stultz <jstultz@google.com>,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5] sched: Consolidate cpufreq updates
Date: Sun, 9 Jun 2024 23:33:46 +0100 [thread overview]
Message-ID: <20240609223346.4xlkcze3fg2bhhcn@airbuntu> (raw)
In-Reply-To: <1b44938c-9535-47e7-8cbc-2b844e5dfdff@arm.com>
On 06/05/24 13:24, Christian Loehle wrote:
> On 5/30/24 11:46, Qais Yousef wrote:
> > Improve the interaction with cpufreq governors by making the
> > cpufreq_update_util() calls more intentional.
> >
> > At the moment we send them when load is updated for CFS, bandwidth for
> > DL and at enqueue/dequeue for RT. But this can lead to too many updates
> > sent in a short period of time and potentially be ignored at a critical
> > moment due to the rate_limit_us in schedutil.
> >
> > For example, simultaneous task enqueue on the CPU where 2nd task is
> > bigger and requires higher freq. The trigger to cpufreq_update_util() by
> > the first task will lead to dropping the 2nd request until tick. Or
> > another CPU in the same policy triggers a freq update shortly after.
> >
> > Updates at enqueue for RT are not strictly required. Though they do help
> > to reduce the delay for switching the frequency and the potential
> > observation of lower frequency during this delay. But current logic
> > doesn't intentionally (at least to my understanding) try to speed up the
> > request.
> >
> > To help reduce the amount of cpufreq updates and make them more
> > purposeful, consolidate them into these locations:
> >
> > 1. context_switch()
> > 2. task_tick_fair()
> > 3. update_blocked_averages()
> > 4. on syscall that changes policy or uclamp values
> >
> > The update at context switch should help guarantee that DL and RT get
> > the right frequency straightaway when they're RUNNING. As mentioned
> > though the update will happen slightly after enqueue_task(); though in
> > an ideal world these tasks should be RUNNING ASAP and this additional
> > delay should be negligible.
>
> Do we care at all about PREEMPT_NONE (and voluntary) here? I assume no.
> Anyway one scenario that should regress when we don't update at RT enqueue:
> (Essentially means that util of higher prio dominates over lower, if
> higher is enqueued first.)
> System:
> OPP 0, cap: 102, 100MHz; OPP 1, cap: 1024, 1000MHz
> RT task A prio=0 runtime@OPP1=1ms, uclamp_min=0; RT task B prio=1 runtime@OPP1=1ms, uclamp_min=1024
> rate_limit_us = freq transition delay = 1 (assume basically instant switch)
> Let's say CONFIG_HZ=100 for the tick to not get in the way, doesn't really matter.
>
> Before:
> t+0: Enqueue task A switch to OPP0
> Running A at OPP 0
> t+2us: Enqueue task B switch to OPP1
> t+1000us: Task A done, switch to task B.
> t+2000us: Task B done
>
> Now:
> t+0: Enqueue task A switch to OPP0
> Running A at OPP 0
> t+2us: Enqueue task B
> t+10000us: Task A done, switch to task B and OPP1
> t+11000us: Task B done
>
> Or am I missing something?
I think this is the correct behavior where each task gets to run at the correct
frequency, no?
Generally if the system is overloaded with RT tasks with same priority are
likely to end up stuck on the same CPU for that long (ie no other CPU in the
system is able to pull one of the tasks), relying on frequency to save the day
is wrong IMO. Userspace must ensure not to starve such busy tasks with
0 uclamp_min if the system being overloaded is likely scenario. And they need
to manage priorities correctly to ensure these busy RT tasks are not a hogger
if something else finds this latency not acceptable.
Proper Hard RT systems disable DVFS generally as they introduce unacceptable
delays.
Note that with today's code Task B request is most likely dropped and both will
end up running at OPP0.
Cheers
--
Qais Yousef
prev parent reply other threads:[~2024-06-09 22:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 10:46 [PATCH v5] sched: Consolidate cpufreq updates Qais Yousef
2024-06-01 22:40 ` Qais Yousef
2024-06-05 12:22 ` Vincent Guittot
2024-06-09 22:20 ` Qais Yousef
2024-06-17 0:46 ` Qais Yousef
2024-06-05 12:24 ` Christian Loehle
2024-06-09 22:33 ` Qais Yousef [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240609223346.4xlkcze3fg2bhhcn@airbuntu \
--to=qyousef@layalina.io \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=hongyan.xia2@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox