From: Stephan Gerhold <stephan.gerhold@linaro.org>
To: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
regressions@lists.linux.dev, Johan Hovold <johan@kernel.org>
Subject: Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
Date: Tue, 8 Apr 2025 18:48:06 +0200 [thread overview]
Message-ID: <Z_VTRspvmOUfrawh@linaro.org> (raw)
In-Reply-To: <Z_U_LN0AtH_n4YtE@sultan-box.localdomain>
Hi Sultan,
On Tue, Apr 08, 2025 at 08:22:20AM -0700, Sultan Alsawaf wrote:
> On Tue, Apr 08, 2025 at 10:59:31AM +0200, Stephan Gerhold wrote:
> > On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> > >
> > > A redundant frequency update is only truly needed when there is a policy
> > > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> > >
> > > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > > frequency update _all the time_, not just for a policy limits change,
> > > because need_freq_update is never cleared.
> > >
> > > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > > to a redundant frequency update, regardless of whether or not the driver
> > > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > > same as the current one.
> > >
> > > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > > when there's a policy limits change, and clearing need_freq_update when a
> > > requisite redundant update occurs.
> > >
> > > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > > and instead setting need_freq_update to false in sugov_update_next_freq().
> > >
> > > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > > ---
> > > kernel/sched/cpufreq_schedutil.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 28c77904ea74..e51d5ce730be 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > >
> > > if (unlikely(sg_policy->limits_changed)) {
> > > sg_policy->limits_changed = false;
> > > - sg_policy->need_freq_update = true;
> > > + sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > return true;
> > > }
> > >
> > > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > if (sg_policy->need_freq_update)
> > > - sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > + sg_policy->need_freq_update = false;
> > > else if (sg_policy->next_freq == next_freq)
> > > return false;
> > >
> >
> > This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> > cpufreq drivers that:
> >
> > - Have policy->fast_switch_enabled/fast_switch_possible set, but
> > - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
> >
> > There are several examples for this in the tree (search for
> > "fast_switch_possible"). Of all those drivers, only intel-pstate and
> > amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
> >
> > I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> > laptop:
> >
> > 1. I added some low temperature trip points in the device tree,
> > together with passive cpufreq cooling.
> > 2. I run a CPU stress test on all CPUs and monitor the temperatures
> > and CPU frequencies.
> >
> > When using "performance" governor instead of "schedutil", the CPU
> > frequencies are being throttled as expected, as soon as the temperature
> > trip points are reached.
> >
> > When using "schedutil", the CPU frequencies stay at maximum as long as
> > the stress test is running. No throttling happens, so the device heats
> > up far beyond the defined temperature trip points. Throttling is applied
> > only after stopping the stress test, since this forces schedutil to
> > re-evaluate the CPU frequency.
> >
> > Reverting this commit fixes the problem.
> >
> > Looking at the code, I think the problem is that:
> > - sg_policy->limits_changed does not result in
> > sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
> > anymore, and
> > - Without sg->policy->need_freq_update, get_next_freq() skips calling
> > cpufreq_driver_resolve_freq(), which would normally apply the policy
> > min/max constraints.
> >
> > Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> > that set policy->fast_switch_possible? If I'm reading the documentation
> > comment correctly, that flag is just supposed to enable notifications if
> > the policy min/max changes, but the resolved target frequency is still
> > the same. This is not the case here, the target frequency needs to be
> > throttled, but schedutil isn't applying the new limits.
> >
> > Any suggestions how to fix this? I'm happy to test patches with my
> > setup.
>
> Thank you for reporting this. As I see it, sg_policy->need_freq_update is
> working correctly now; however, sg_policy->limits_changed relied on the broken
> behavior of sg_policy->need_freq_update and therefore sg_policy->limits_changed
> needs to be fixed.
Thanks for the quick reply and the patch!
>
> Can you try this patch:
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 1a19d69b91ed3..f37b999854d52 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -82,7 +82,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> return false;
>
> if (unlikely(sg_policy->limits_changed)) {
> - sg_policy->limits_changed = false;
> sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> return true;
> }
> @@ -171,9 +170,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> freq = get_capacity_ref_freq(policy);
> freq = map_util_freq(util, freq, max);
>
> - if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> + if (freq == sg_policy->cached_raw_freq && !sg_policy->limits_changed &&
> + !sg_policy->need_freq_update)
> return sg_policy->next_freq;
>
> + sg_policy->limits_changed = false;
> sg_policy->cached_raw_freq = freq;
> return cpufreq_driver_resolve_freq(policy, freq);
> }
>
This is working correctly for me, CPU frequency is being throttled again
when the temperature trip points are reached. If you send this, feel
free to add:
Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Thanks!
Stephan
next prev parent reply other threads:[~2025-04-08 16:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 1:57 [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Sultan Alsawaf
2024-12-12 1:57 ` [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update Sultan Alsawaf
2024-12-12 13:24 ` Christian Loehle
2024-12-14 2:35 ` Sultan Alsawaf (unemployed)
2024-12-18 15:10 ` Rafael J. Wysocki
2025-04-08 8:59 ` Stephan Gerhold
2025-04-08 15:22 ` Sultan Alsawaf
2025-04-08 16:48 ` Stephan Gerhold [this message]
2025-04-09 11:25 ` Xuewen Yan
2025-04-09 11:48 ` Xuewen Yan
2025-04-10 1:49 ` Sultan Alsawaf
2025-04-10 2:06 ` Xuewen Yan
2025-04-10 2:08 ` Sultan Alsawaf
2025-04-10 2:13 ` Xuewen Yan
2025-04-10 2:22 ` Sultan Alsawaf
2025-04-10 2:30 ` Xuewen Yan
2025-04-10 2:33 ` Sultan Alsawaf
2025-04-10 2:42 ` Xuewen Yan
2025-04-10 1:52 ` Sultan Alsawaf
2025-04-10 19:16 ` Rafael J. Wysocki
2024-12-12 12:06 ` [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Christian Loehle
2024-12-14 2:15 ` Sultan Alsawaf (unemployed)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_VTRspvmOUfrawh@linaro.org \
--to=stephan.gerhold@linaro.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=johan@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=regressions@lists.linux.dev \
--cc=rostedt@goodmis.org \
--cc=sultan@kerneltoast.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.