[PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present
@ 2024-12-12  1:57 Sultan Alsawaf
  2024-12-12  1:57 ` [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update Sultan Alsawaf
  2024-12-12 12:06 ` [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Christian Loehle
  0 siblings, 2 replies; 22+ messages in thread
From: Sultan Alsawaf @ 2024-12-12  1:57 UTC (permalink / raw)
  Cc: Sultan Alsawaf (unemployed), Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-pm, linux-kernel

From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>

When schedutil disregards a frequency transition due to the transition rate
limit, there is no guaranteed deadline as to when the frequency transition
will actually occur after the rate limit expires. For instance, depending
on how long a CPU spends in a preempt/IRQs disabled context, a rate-limited
frequency transition may be delayed indefinitely, until said CPU reaches
the scheduler again. This also hurts tasks boosted via UCLAMP_MIN.

For frequency transitions _down_, this only poses a theoretical loss of
energy savings since a CPU may remain at a higher frequency than necessary
for an indefinite period beyond the rate limit expiry.

For frequency transitions _up_, however, this poses a significant hit to
performance when a CPU is stuck at an insufficient frequency for an
indefinitely long time. In latency-sensitive and bursty workloads
especially, a missed frequency transition up can result in a significant
performance loss due to a CPU operating at an insufficient frequency for
too long.

When support for the Frequency Invariant Engine (FIE) _isn't_ present, a
rate limit is always required for the scheduler to compute CPU utilization
with some semblance of accuracy: any frequency transition that occurs
before the previous transition latches would result in the scheduler not
knowing the frequency a CPU is actually operating at, thereby trashing the
computed CPU utilization.

However, when FIE support _is_ present, there's no technical requirement to
rate limit all frequency transitions to a cpufreq driver's reported
transition latency. With FIE, the scheduler's CPU utilization tracking is
unaffected by any frequency transitions that occur before the previous
frequency is latched.

Therefore, ignore the frequency transition rate limit when scaling up on
systems where FIE is present. This guarantees that transitions to a higher
frequency cannot be indefinitely delayed, since they simply cannot be
delayed at all.

Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
---
 kernel/sched/cpufreq_schedutil.c | 35 +++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e51d5ce730be..563baab89a24 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -59,10 +59,15 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);

 /************************ Governor internals ***********************/

-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
+static bool sugov_should_rate_limit(struct sugov_policy *sg_policy, u64 time)
 {
-	s64 delta_ns;
+	s64 delta_ns = time - sg_policy->last_freq_update_time;
+
+	return delta_ns < sg_policy->freq_update_delay_ns;
+}

+static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
+{
 	/*
 	 * Since cpufreq_update_util() is called with rq->lock held for
 	 * the @target_cpu, our per-CPU data is fully serialized.
@@ -87,17 +92,37 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 		return true;
 	}

-	delta_ns = time - sg_policy->last_freq_update_time;
+	/*
+	 * When frequency-invariant utilization tracking is present, there's no
+	 * rate limit when increasing frequency. Therefore, the next frequency
+	 * must be determined before a decision can be made to rate limit the
+	 * frequency change, hence the rate limit check is bypassed here.
+	 */
+	if (arch_scale_freq_invariant())
+		return true;

-	return delta_ns >= sg_policy->freq_update_delay_ns;
+	return !sugov_should_rate_limit(sg_policy, time);
 }

 static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
 				   unsigned int next_freq)
 {
+	/*
+	 * When a frequency update isn't mandatory (!need_freq_update), the rate
+	 * limit is checked again upon frequency reduction because systems with
+	 * frequency-invariant utilization bypass the rate limit check entirely
+	 * in sugov_should_update_freq(). This is done so that the rate limit
+	 * can be applied only for frequency reduction when frequency-invariant
+	 * utilization is present. Now that the next frequency is known, the
+	 * rate limit can be selectively applied to frequency reduction on such
+	 * systems. A check for arch_scale_freq_invariant() is omitted here
+	 * because unconditionally rechecking the rate limit is cheaper.
+	 */
 	if (sg_policy->need_freq_update)
 		sg_policy->need_freq_update = false;
-	else if (sg_policy->next_freq == next_freq)
+	else if (next_freq == sg_policy->next_freq ||
+		 (next_freq < sg_policy->next_freq &&
+		  sugov_should_rate_limit(sg_policy, time)))
 		return false;

 	sg_policy->next_freq = next_freq;
-- 
2.47.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2024-12-12  1:57 [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Sultan Alsawaf
@ 2024-12-12  1:57 ` Sultan Alsawaf
  2024-12-12 13:24   ` Christian Loehle
  2025-04-08  8:59   ` Stephan Gerhold
  2024-12-12 12:06 ` [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Christian Loehle
  1 sibling, 2 replies; 22+ messages in thread
From: Sultan Alsawaf @ 2024-12-12  1:57 UTC (permalink / raw)
  Cc: Sultan Alsawaf (unemployed), Rafael J. Wysocki, Viresh Kumar,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-pm, linux-kernel

From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>

A redundant frequency update is only truly needed when there is a policy
limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.

In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
frequency update _all the time_, not just for a policy limits change,
because need_freq_update is never cleared.

Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
to a redundant frequency update, regardless of whether or not the driver
specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
same as the current one.

Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
when there's a policy limits change, and clearing need_freq_update when a
requisite redundant update occurs.

This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
and instead setting need_freq_update to false in sugov_update_next_freq().

Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
---
 kernel/sched/cpufreq_schedutil.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 28c77904ea74..e51d5ce730be 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

 	if (unlikely(sg_policy->limits_changed)) {
 		sg_policy->limits_changed = false;
-		sg_policy->need_freq_update = true;
+		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
 		return true;
 	}

@@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
 				   unsigned int next_freq)
 {
 	if (sg_policy->need_freq_update)
-		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
+		sg_policy->need_freq_update = false;
 	else if (sg_policy->next_freq == next_freq)
 		return false;

-- 
2.47.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present
  2024-12-12  1:57 [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Sultan Alsawaf
  2024-12-12  1:57 ` [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update Sultan Alsawaf
@ 2024-12-12 12:06 ` Christian Loehle
  2024-12-14  2:15   ` Sultan Alsawaf (unemployed)
  1 sibling, 1 reply; 22+ messages in thread
From: Christian Loehle @ 2024-12-12 12:06 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel, Beata Michalska, Qais Yousef, Pierre Gondois,
	Lukasz Luba

On 12/12/24 01:57, Sultan Alsawaf wrote:
> From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>

Hi Sultan,
(Adding further CCs that might be interested in this)
Good to see some input here again, if it becomes a regular thing,
which I would welcome, you might want to look into git send-email
or b4 relay, at least in my inbox this series looks strange.
Also no cover letter.

> 
> When schedutil disregards a frequency transition due to the transition rate
> limit, there is no guaranteed deadline as to when the frequency transition
> will actually occur after the rate limit expires. For instance, depending
> on how long a CPU spends in a preempt/IRQs disabled context, a rate-limited
> frequency transition may be delayed indefinitely, until said CPU reaches
> the scheduler again. This also hurts tasks boosted via UCLAMP_MIN.

Realistically this will be the tick at worst for the systems you care about,
I assume.

> 
> For frequency transitions _down_, this only poses a theoretical loss of
> energy savings since a CPU may remain at a higher frequency than necessary
> for an indefinite period beyond the rate limit expiry.

theoretical?

> 
> For frequency transitions _up_, however, this poses a significant hit to
> performance when a CPU is stuck at an insufficient frequency for an
> indefinitely long time. In latency-sensitive and bursty workloads
> especially, a missed frequency transition up can result in a significant
> performance loss due to a CPU operating at an insufficient frequency for
> too long.

The term significant implies you have some numbers, please go ahead and
share them then.

I'm not sure if you're aware of Qais' series that also intends to ignore the
rate-limit (in certain cases).
https://lore.kernel.org/lkml/20240728184551.42133-1-qyousef@layalina.io/
I would mostly agree with your below argument regarding FIE and did lean
towards dropping it altogether. The main issue I had with Qais' series
was !fast_switch platforms and the resulting preemption by the DL task
(Often on a remote, possibly idle CPU of the same perf-domain).
Unlimited frequency updates can really hurt both power and throughput there.

Fortunately given the low-pass filter nature of PELT, without some special
workloads, most calls to cpufreq_update_util() are dropped because there
is no resulting frequency change anyway.

You keeping the rate-limit when reducing the frequency might be enough to
work around the issue though. I will run some experiments like I did for
Qais' series.

I'll also go and check for unintended changes in iowait boost (that
interacts with the rate-limit too).

> 
> When support for the Frequency Invariant Engine (FIE) _isn't_ present, a
> rate limit is always required for the scheduler to compute CPU utilization
> with some semblance of accuracy: any frequency transition that occurs
> before the previous transition latches would result in the scheduler not
> knowing the frequency a CPU is actually operating at, thereby trashing the
> computed CPU utilization.
> 
> However, when FIE support _is_ present, there's no technical requirement to
> rate limit all frequency transitions to a cpufreq driver's reported
> transition latency. With FIE, the scheduler's CPU utilization tracking is
> unaffected by any frequency transitions that occur before the previous
> frequency is latched.
> 
> Therefore, ignore the frequency transition rate limit when scaling up on
> systems where FIE is present. This guarantees that transitions to a higher
> frequency cannot be indefinitely delayed, since they simply cannot be
> delayed at all.
> 
> Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 35 +++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e51d5ce730be..563baab89a24 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -59,10 +59,15 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
>  
>  /************************ Governor internals ***********************/
>  
> -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> +static bool sugov_should_rate_limit(struct sugov_policy *sg_policy, u64 time)
>  {
> -	s64 delta_ns;
> +	s64 delta_ns = time - sg_policy->last_freq_update_time;
> +
> +	return delta_ns < sg_policy->freq_update_delay_ns;
> +}
>  
> +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> +{
>  	/*
>  	 * Since cpufreq_update_util() is called with rq->lock held for
>  	 * the @target_cpu, our per-CPU data is fully serialized.
> @@ -87,17 +92,37 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  		return true;
>  	}
>  
> -	delta_ns = time - sg_policy->last_freq_update_time;
> +	/*
> +	 * When frequency-invariant utilization tracking is present, there's no
> +	 * rate limit when increasing frequency. Therefore, the next frequency
> +	 * must be determined before a decision can be made to rate limit the
> +	 * frequency change, hence the rate limit check is bypassed here.
> +	 */
> +	if (arch_scale_freq_invariant())
> +		return true;
>  
> -	return delta_ns >= sg_policy->freq_update_delay_ns;
> +	return !sugov_should_rate_limit(sg_policy, time);
>  }
>  
>  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>  				   unsigned int next_freq)
>  {
> +	/*
> +	 * When a frequency update isn't mandatory (!need_freq_update), the rate
> +	 * limit is checked again upon frequency reduction because systems with
> +	 * frequency-invariant utilization bypass the rate limit check entirely
> +	 * in sugov_should_update_freq(). This is done so that the rate limit
> +	 * can be applied only for frequency reduction when frequency-invariant
> +	 * utilization is present. Now that the next frequency is known, the
> +	 * rate limit can be selectively applied to frequency reduction on such
> +	 * systems. A check for arch_scale_freq_invariant() is omitted here
> +	 * because unconditionally rechecking the rate limit is cheaper.
> +	 */
>  	if (sg_policy->need_freq_update)
>  		sg_policy->need_freq_update = false;
> -	else if (sg_policy->next_freq == next_freq)
> +	else if (next_freq == sg_policy->next_freq ||
> +		 (next_freq < sg_policy->next_freq &&
> +		  sugov_should_rate_limit(sg_policy, time)))
>  		return false;
>  
>  	sg_policy->next_freq = next_freq;

Code seems to match your description FWIW.
Maybe the comments could be trimmed somewhat.

Regards,
Christian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2024-12-12  1:57 ` [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update Sultan Alsawaf
@ 2024-12-12 13:24   ` Christian Loehle
  2024-12-14  2:35     ` Sultan Alsawaf (unemployed)
  2024-12-18 15:10     ` Rafael J. Wysocki
  2025-04-08  8:59   ` Stephan Gerhold
  1 sibling, 2 replies; 22+ messages in thread
From: Christian Loehle @ 2024-12-12 13:24 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel

On 12/12/24 01:57, Sultan Alsawaf wrote:
> From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> 
> A redundant frequency update is only truly needed when there is a policy
> limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> 
> In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> frequency update _all the time_, not just for a policy limits change,
> because need_freq_update is never cleared.
> 
> Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> to a redundant frequency update, regardless of whether or not the driver
> specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> same as the current one.
> 
> Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> when there's a policy limits change, and clearing need_freq_update when a
> requisite redundant update occurs.
> 
> This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> and instead setting need_freq_update to false in sugov_update_next_freq().
>

Good catch!
Fixes:
600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")


> Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>

Reviewed-by: Christian Loehle <christian.loehle@arm.com>

> ---
>  kernel/sched/cpufreq_schedutil.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 28c77904ea74..e51d5ce730be 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  
>  	if (unlikely(sg_policy->limits_changed)) {
>  		sg_policy->limits_changed = false;
> -		sg_policy->need_freq_update = true;
> +		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);>  		return true;
>  	}
>  
> @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>  				   unsigned int next_freq)
>  {
>  	if (sg_policy->need_freq_update)
> -		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> +		sg_policy->need_freq_update = false;
>  	else if (sg_policy->next_freq == next_freq)
>  		return false;

I guess you could rewrite this into just one if like

---

	if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update))
		return false;

	sg_policy->need_freq_update = false
	sg_policy->next_freq = next_freq;
	sg_policy->last_freq_update_time = time;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present
  2024-12-12 12:06 ` [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Christian Loehle
@ 2024-12-14  2:15   ` Sultan Alsawaf (unemployed)
  0 siblings, 0 replies; 22+ messages in thread
From: Sultan Alsawaf (unemployed) @ 2024-12-14  2:15 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel, Beata Michalska, Qais Yousef, Pierre Gondois,
	Lukasz Luba

On Thu, Dec 12, 2024 at 12:06:20PM +0000, Christian Loehle wrote:
> On 12/12/24 01:57, Sultan Alsawaf wrote:
> > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> 
> Hi Sultan,
> (Adding further CCs that might be interested in this)
> Good to see some input here again, if it becomes a regular thing,
> which I would welcome, you might want to look into git send-email
> or b4 relay, at least in my inbox this series looks strange.
> Also no cover letter.

Hi Christian,

Thank you for the encouraging words! :-)

Sorry about the strangeness. I knew better and should've sent a cover letter;
when I saw how the series looked on the list a few minutes after sending it, I
grimaced and realized my mistake. FWIW, I did use git send-email, but I hadn't
heard about b4; thanks for the tip!

> > 
> > When schedutil disregards a frequency transition due to the transition rate
> > limit, there is no guaranteed deadline as to when the frequency transition
> > will actually occur after the rate limit expires. For instance, depending
> > on how long a CPU spends in a preempt/IRQs disabled context, a rate-limited
> > frequency transition may be delayed indefinitely, until said CPU reaches
> > the scheduler again. This also hurts tasks boosted via UCLAMP_MIN.
> 
> Realistically this will be the tick at worst for the systems you care about,
> I assume.

Typically, yes, especially so with PREEMPT_RT. It can get quite bad otherwise if
preempt/IRQs are disabled for a while, e.g. a common offender is zone->lock in
the page allocator.

> > 
> > For frequency transitions _down_, this only poses a theoretical loss of
> > energy savings since a CPU may remain at a higher frequency than necessary
> > for an indefinite period beyond the rate limit expiry.
> 
> theoretical?

I can't think of a realistic way to measure these lost energy savings, much less
reclaim them.

> > 
> > For frequency transitions _up_, however, this poses a significant hit to
> > performance when a CPU is stuck at an insufficient frequency for an
> > indefinitely long time. In latency-sensitive and bursty workloads
> > especially, a missed frequency transition up can result in a significant
> > performance loss due to a CPU operating at an insufficient frequency for
> > too long.
> 
> The term significant implies you have some numbers, please go ahead and
> share them then.

On a Pixel 9 with fast_switch (that I implemented myself), and AMU-driven FIE:

Test command:

	taskset 40 perf stat --repeat 10 -e cycles,instructions,task-clock perf bench sched messaging -g 50

The last Cortex-A720 core in the PD of three A720 cores is tested, with
rate_limit_us set to 2000.

Before this patch, schedutil:
----------------------------
 Performance counter stats for 'perf bench sched messaging -g 50' (10 runs):

        9121062386      cycles                           #    2.541 GHz                      ( +-  1.17% )
       10863940366      instructions                     #    1.22  insn per cycle           ( +-  0.20% )
           3577.62 msec task-clock                       #    0.991 CPUs utilized            ( +-  0.82% )

            3.6108 +- 0.0294 seconds time elapsed  ( +-  0.81% )

After this patch, schedutil:
----------------------------
 Performance counter stats for 'perf bench sched messaging -g 50' (10 runs):

        8577233522      cycles                           #    2.538 GHz                      ( +-  0.73% )
       10514835388      instructions                     #    1.26  insn per cycle           ( +-  0.08% )
           3533.64 msec task-clock                       #    1.039 CPUs utilized            ( +-  0.67% )

            3.4005 +- 0.0238 seconds time elapsed  ( +-  0.70% )

With performance governor:
--------------------------
 Performance counter stats for 'perf bench sched messaging -g 50' (10 runs):

        8712516777      cycles                           #    2.653 GHz                      ( +-  0.84% )
       10543289262      instructions                     #    1.24  insn per cycle           ( +-  0.10% )
           3358.74 msec task-clock                       #    1.017 CPUs utilized            ( +-  0.84% )

            3.3028 +- 0.0283 seconds time elapsed  ( +-  0.86% )

There is an improvement of about 6% with schedutil after this patch. These
results are very consistent across several runs.

> I'm not sure if you're aware of Qais' series that also intends to ignore the
> rate-limit (in certain cases).
> https://lore.kernel.org/lkml/20240728184551.42133-1-qyousef@layalina.io/

I was unaware, thanks for the link.

I read through the cover letter and code, and my initial thought is that
task_tick_fair() is too slow to cover fair tasks' needs. Given that a PELT
period is ~1 ms, there can be several load updates in between ticks.

I also think that Qais' series permits too many frequency updates too quickly,
for example when switching from RT/DL to a fair task. On systems with a lot of
threaded IRQs (or PREEMPT_RT), I can imagine this results in the frequency
bouncing around a lot.

> I would mostly agree with your below argument regarding FIE and did lean
> towards dropping it altogether. The main issue I had with Qais' series
> was !fast_switch platforms and the resulting preemption by the DL task
> (Often on a remote, possibly idle CPU of the same perf-domain).
> Unlimited frequency updates can really hurt both power and throughput there.
> 
> Fortunately given the low-pass filter nature of PELT, without some special
> workloads, most calls to cpufreq_update_util() are dropped because there
> is no resulting frequency change anyway.
> 
> You keeping the rate-limit when reducing the frequency might be enough to
> work around the issue though. I will run some experiments like I did for
> Qais' series.

Yeah, my thinking is that always allowing updates _only_ when increasing the
frequency naturally bounds the number of possible updates in a given period. You
can't keep going up forever, unless you have a CPU with a ridiculously huge
number of discrete frequency steps. :-)

> I'll also go and check for unintended changes in iowait boost (that
> interacts with the rate-limit too).

Thanks!

> > 
> > When support for the Frequency Invariant Engine (FIE) _isn't_ present, a
> > rate limit is always required for the scheduler to compute CPU utilization
> > with some semblance of accuracy: any frequency transition that occurs
> > before the previous transition latches would result in the scheduler not
> > knowing the frequency a CPU is actually operating at, thereby trashing the
> > computed CPU utilization.
> > 
> > However, when FIE support _is_ present, there's no technical requirement to
> > rate limit all frequency transitions to a cpufreq driver's reported
> > transition latency. With FIE, the scheduler's CPU utilization tracking is
> > unaffected by any frequency transitions that occur before the previous
> > frequency is latched.
> > 
> > Therefore, ignore the frequency transition rate limit when scaling up on
> > systems where FIE is present. This guarantees that transitions to a higher
> > frequency cannot be indefinitely delayed, since they simply cannot be
> > delayed at all.
> > 
> > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 35 +++++++++++++++++++++++++++-----
> >  1 file changed, 30 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e51d5ce730be..563baab89a24 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -59,10 +59,15 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> >  
> >  /************************ Governor internals ***********************/
> >  
> > -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > +static bool sugov_should_rate_limit(struct sugov_policy *sg_policy, u64 time)
> >  {
> > -	s64 delta_ns;
> > +	s64 delta_ns = time - sg_policy->last_freq_update_time;
> > +
> > +	return delta_ns < sg_policy->freq_update_delay_ns;
> > +}
> >  
> > +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > +{
> >  	/*
> >  	 * Since cpufreq_update_util() is called with rq->lock held for
> >  	 * the @target_cpu, our per-CPU data is fully serialized.
> > @@ -87,17 +92,37 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  		return true;
> >  	}
> >  
> > -	delta_ns = time - sg_policy->last_freq_update_time;
> > +	/*
> > +	 * When frequency-invariant utilization tracking is present, there's no
> > +	 * rate limit when increasing frequency. Therefore, the next frequency
> > +	 * must be determined before a decision can be made to rate limit the
> > +	 * frequency change, hence the rate limit check is bypassed here.
> > +	 */
> > +	if (arch_scale_freq_invariant())
> > +		return true;
> >  
> > -	return delta_ns >= sg_policy->freq_update_delay_ns;
> > +	return !sugov_should_rate_limit(sg_policy, time);
> >  }
> >  
> >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >  				   unsigned int next_freq)
> >  {
> > +	/*
> > +	 * When a frequency update isn't mandatory (!need_freq_update), the rate
> > +	 * limit is checked again upon frequency reduction because systems with
> > +	 * frequency-invariant utilization bypass the rate limit check entirely
> > +	 * in sugov_should_update_freq(). This is done so that the rate limit
> > +	 * can be applied only for frequency reduction when frequency-invariant
> > +	 * utilization is present. Now that the next frequency is known, the
> > +	 * rate limit can be selectively applied to frequency reduction on such
> > +	 * systems. A check for arch_scale_freq_invariant() is omitted here
> > +	 * because unconditionally rechecking the rate limit is cheaper.
> > +	 */
> >  	if (sg_policy->need_freq_update)
> >  		sg_policy->need_freq_update = false;
> > -	else if (sg_policy->next_freq == next_freq)
> > +	else if (next_freq == sg_policy->next_freq ||
> > +		 (next_freq < sg_policy->next_freq &&
> > +		  sugov_should_rate_limit(sg_policy, time)))
> >  		return false;
> >  
> >  	sg_policy->next_freq = next_freq;
> 
> Code seems to match your description FWIW.
> Maybe the comments could be trimmed somewhat.

How's this for the sugov_update_next_freq() comment?
	/*
	 * When a frequency update isn't mandatory, the rate limit is checked
	 * again upon frequency reduction because systems with frequency
	 * invariance bypass the rate limit check in sugov_should_update_freq().
	 * This is done so that the rate limit can be applied only for frequency
	 * reduction. A check for arch_scale_freq_invariant() is omitted here
	 * because unconditionally rechecking the rate limit is cheaper.
	 */

> Regards,
> Christian

Thanks,
Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2024-12-12 13:24   ` Christian Loehle
@ 2024-12-14  2:35     ` Sultan Alsawaf (unemployed)
  2024-12-18 15:10     ` Rafael J. Wysocki
  1 sibling, 0 replies; 22+ messages in thread
From: Sultan Alsawaf (unemployed) @ 2024-12-14  2:35 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel

On Thu, Dec 12, 2024 at 01:24:41PM +0000, Christian Loehle wrote:
> On 12/12/24 01:57, Sultan Alsawaf wrote:
> > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> > 
> > A redundant frequency update is only truly needed when there is a policy
> > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> > 
> > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > frequency update _all the time_, not just for a policy limits change,
> > because need_freq_update is never cleared.
> > 
> > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > to a redundant frequency update, regardless of whether or not the driver
> > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > same as the current one.
> > 
> > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > when there's a policy limits change, and clearing need_freq_update when a
> > requisite redundant update occurs.
> > 
> > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > and instead setting need_freq_update to false in sugov_update_next_freq().
> >
> 
> Good catch!
> Fixes:
> 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
> 
> 
> > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> 
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>

Thanks for the review and digging up the bug provenance!

> > ---
> >  kernel/sched/cpufreq_schedutil.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 28c77904ea74..e51d5ce730be 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  
> >  	if (unlikely(sg_policy->limits_changed)) {
> >  		sg_policy->limits_changed = false;
> > -		sg_policy->need_freq_update = true;
> > +		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);>  		return true;
> >  	}
> >  
> > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >  				   unsigned int next_freq)
> >  {
> >  	if (sg_policy->need_freq_update)
> > -		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > +		sg_policy->need_freq_update = false;
> >  	else if (sg_policy->next_freq == next_freq)
> >  		return false;
> 
> I guess you could rewrite this into just one if like
> 
> ---
> 
> 	if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update))
> 		return false;
> 
> 	sg_policy->need_freq_update = false
> 	sg_policy->next_freq = next_freq;
> 	sg_policy->last_freq_update_time = time;

Hmm, asm seems worse since it adds an extra store to one of the branch targets:

Before:
-------
        00100020 e3 03 00 aa     mov        x3,x0
        00100024 00 a8 43 39     ldrb       w0,[x0, #0xea]
        00100028 3f 23 03 d5     paciasp
        0010002c c0 00 00 36     tbz        w0,#0x0,LAB_00100044
        00100030 7f a8 03 39     strb       wzr,[x3, #0xea]
                             LAB_00100034
        00100034 20 00 80 52     mov        w0,#0x1
        00100038 61 14 00 f9     str        x1,[x3, #0x28]
        0010003c 62 38 00 b9     str        w2,[x3, #0x38]
        00100040 ff 0b 5f d6     retaa
                             LAB_00100044
        00100044 64 38 40 b9     ldr        w4,[x3, #0x38]
        00100048 9f 00 02 6b     cmp        w4,w2
        0010004c 41 ff ff 54     b.ne       LAB_00100034
        00100050 ff 0b 5f d6     retaa

After:
------
        00100020 e3 03 00 aa     mov        x3,x0
        00100024 00 38 40 b9     ldr        w0,[x0, #0x38]
        00100028 3f 23 03 d5     paciasp
        0010002c 1f 00 02 6b     cmp        w0,w2
        00100030 c0 00 00 54     b.eq       LAB_00100048
                             LAB_00100034
        00100034 20 00 80 52     mov        w0,#0x1
        00100038 61 14 00 f9     str        x1,[x3, #0x28]
        0010003c 62 38 00 b9     str        w2,[x3, #0x38]
        00100040 7f a8 03 39     strb       wzr,[x3, #0xea]
        00100044 ff 0b 5f d6     retaa
                             LAB_00100048
        00100048 60 a8 43 39     ldrb       w0,[x3, #0xea]
        0010004c 40 ff 07 37     tbnz       w0,#0x0,LAB_00100034
        00100050 ff 0b 5f d6     retaa

Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2024-12-12 13:24   ` Christian Loehle
  2024-12-14  2:35     ` Sultan Alsawaf (unemployed)
@ 2024-12-18 15:10     ` Rafael J. Wysocki
  1 sibling, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2024-12-18 15:10 UTC (permalink / raw)
  To: Christian Loehle, Sultan Alsawaf
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel

On Thu, Dec 12, 2024 at 2:24 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 12/12/24 01:57, Sultan Alsawaf wrote:
> > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> >
> > A redundant frequency update is only truly needed when there is a policy
> > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> >
> > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > frequency update _all the time_, not just for a policy limits change,
> > because need_freq_update is never cleared.
> >
> > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > to a redundant frequency update, regardless of whether or not the driver
> > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > same as the current one.
> >
> > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > when there's a policy limits change, and clearing need_freq_update when a
> > requisite redundant update occurs.
> >
> > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > and instead setting need_freq_update to false in sugov_update_next_freq().
> >
>
> Good catch!
> Fixes:
> 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
>
>
> > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
>
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>

Applied with the above Fixes tag added as 6.14 material, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2024-12-12  1:57 ` [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update Sultan Alsawaf
  2024-12-12 13:24   ` Christian Loehle
@ 2025-04-08  8:59   ` Stephan Gerhold
  2025-04-08 15:22     ` Sultan Alsawaf
  2025-04-10 19:16     ` Rafael J. Wysocki
  1 sibling, 2 replies; 22+ messages in thread
From: Stephan Gerhold @ 2025-04-08  8:59 UTC (permalink / raw)
  To: Sultan Alsawaf, Rafael J. Wysocki
  Cc: Viresh Kumar, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, linux-pm, linux-kernel,
	regressions, Johan Hovold

Hi,

On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> 
> A redundant frequency update is only truly needed when there is a policy
> limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> 
> In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> frequency update _all the time_, not just for a policy limits change,
> because need_freq_update is never cleared.
> 
> Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> to a redundant frequency update, regardless of whether or not the driver
> specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> same as the current one.
> 
> Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> when there's a policy limits change, and clearing need_freq_update when a
> requisite redundant update occurs.
> 
> This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> and instead setting need_freq_update to false in sugov_update_next_freq().
> 
> Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 28c77904ea74..e51d5ce730be 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  
>  	if (unlikely(sg_policy->limits_changed)) {
>  		sg_policy->limits_changed = false;
> -		sg_policy->need_freq_update = true;
> +		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
>  		return true;
>  	}
>  
> @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>  				   unsigned int next_freq)
>  {
>  	if (sg_policy->need_freq_update)
> -		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> +		sg_policy->need_freq_update = false;
>  	else if (sg_policy->next_freq == next_freq)
>  		return false;
>  

This patch breaks cpufreq throttling (e.g. for thermal cooling) for
cpufreq drivers that:

 - Have policy->fast_switch_enabled/fast_switch_possible set, but
 - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set

There are several examples for this in the tree (search for
"fast_switch_possible"). Of all those drivers, only intel-pstate and
amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.

I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
laptop:

 1. I added some low temperature trip points in the device tree,
    together with passive cpufreq cooling.
 2. I run a CPU stress test on all CPUs and monitor the temperatures
    and CPU frequencies.

When using "performance" governor instead of "schedutil", the CPU
frequencies are being throttled as expected, as soon as the temperature
trip points are reached.

When using "schedutil", the CPU frequencies stay at maximum as long as
the stress test is running. No throttling happens, so the device heats
up far beyond the defined temperature trip points. Throttling is applied
only after stopping the stress test, since this forces schedutil to
re-evaluate the CPU frequency.

Reverting this commit fixes the problem.

Looking at the code, I think the problem is that:
 - sg_policy->limits_changed does not result in
   sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
   anymore, and
 - Without sg->policy->need_freq_update, get_next_freq() skips calling
   cpufreq_driver_resolve_freq(), which would normally apply the policy
   min/max constraints.

Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
that set policy->fast_switch_possible? If I'm reading the documentation
comment correctly, that flag is just supposed to enable notifications if
the policy min/max changes, but the resolved target frequency is still
the same. This is not the case here, the target frequency needs to be
throttled, but schedutil isn't applying the new limits.

Any suggestions how to fix this? I'm happy to test patches with my
setup.

Thanks,
Stephan

#regzbot introduced: 8e461a1cb43d69d2fc8a97e61916dce571e6bb31

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-08  8:59   ` Stephan Gerhold
@ 2025-04-08 15:22     ` Sultan Alsawaf
  2025-04-08 16:48       ` Stephan Gerhold
  2025-04-10 19:16     ` Rafael J. Wysocki
  1 sibling, 1 reply; 22+ messages in thread
From: Sultan Alsawaf @ 2025-04-08 15:22 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel, regressions, Johan Hovold

Hi Stephan,

On Tue, Apr 08, 2025 at 10:59:31AM +0200, Stephan Gerhold wrote:
> Hi,
> 
> On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> > 
> > A redundant frequency update is only truly needed when there is a policy
> > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> > 
> > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > frequency update _all the time_, not just for a policy limits change,
> > because need_freq_update is never cleared.
> > 
> > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > to a redundant frequency update, regardless of whether or not the driver
> > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > same as the current one.
> > 
> > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > when there's a policy limits change, and clearing need_freq_update when a
> > requisite redundant update occurs.
> > 
> > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > and instead setting need_freq_update to false in sugov_update_next_freq().
> > 
> > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 28c77904ea74..e51d5ce730be 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  
> >  	if (unlikely(sg_policy->limits_changed)) {
> >  		sg_policy->limits_changed = false;
> > -		sg_policy->need_freq_update = true;
> > +		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> >  		return true;
> >  	}
> >  
> > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >  				   unsigned int next_freq)
> >  {
> >  	if (sg_policy->need_freq_update)
> > -		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > +		sg_policy->need_freq_update = false;
> >  	else if (sg_policy->next_freq == next_freq)
> >  		return false;
> >  
> 
> This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> cpufreq drivers that:
> 
>  - Have policy->fast_switch_enabled/fast_switch_possible set, but
>  - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
> 
> There are several examples for this in the tree (search for
> "fast_switch_possible"). Of all those drivers, only intel-pstate and
> amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
> 
> I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> laptop:
> 
>  1. I added some low temperature trip points in the device tree,
>     together with passive cpufreq cooling.
>  2. I run a CPU stress test on all CPUs and monitor the temperatures
>     and CPU frequencies.
> 
> When using "performance" governor instead of "schedutil", the CPU
> frequencies are being throttled as expected, as soon as the temperature
> trip points are reached.
> 
> When using "schedutil", the CPU frequencies stay at maximum as long as
> the stress test is running. No throttling happens, so the device heats
> up far beyond the defined temperature trip points. Throttling is applied
> only after stopping the stress test, since this forces schedutil to
> re-evaluate the CPU frequency.
> 
> Reverting this commit fixes the problem.
> 
> Looking at the code, I think the problem is that:
>  - sg_policy->limits_changed does not result in
>    sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
>    anymore, and
>  - Without sg->policy->need_freq_update, get_next_freq() skips calling
>    cpufreq_driver_resolve_freq(), which would normally apply the policy
>    min/max constraints.
> 
> Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> that set policy->fast_switch_possible? If I'm reading the documentation
> comment correctly, that flag is just supposed to enable notifications if
> the policy min/max changes, but the resolved target frequency is still
> the same. This is not the case here, the target frequency needs to be
> throttled, but schedutil isn't applying the new limits.
> 
> Any suggestions how to fix this? I'm happy to test patches with my
> setup.

Thank you for reporting this. As I see it, sg_policy->need_freq_update is
working correctly now; however, sg_policy->limits_changed relied on the broken
behavior of sg_policy->need_freq_update and therefore sg_policy->limits_changed
needs to be fixed.

Can you try this patch:

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 1a19d69b91ed3..f37b999854d52 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -82,7 +82,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 		return false;
 
 	if (unlikely(sg_policy->limits_changed)) {
-		sg_policy->limits_changed = false;
 		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
 		return true;
 	}
@@ -171,9 +170,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	freq = get_capacity_ref_freq(policy);
 	freq = map_util_freq(util, freq, max);
 
-	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
+	if (freq == sg_policy->cached_raw_freq && !sg_policy->limits_changed &&
+	    !sg_policy->need_freq_update)
 		return sg_policy->next_freq;
 
+	sg_policy->limits_changed = false;
 	sg_policy->cached_raw_freq = freq;
 	return cpufreq_driver_resolve_freq(policy, freq);
 }


> 
> Thanks,
> Stephan
> 
> #regzbot introduced: 8e461a1cb43d69d2fc8a97e61916dce571e6bb31

Thanks,
Sultan

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-08 15:22     ` Sultan Alsawaf
@ 2025-04-08 16:48       ` Stephan Gerhold
  2025-04-09 11:25         ` Xuewen Yan
  2025-04-10  1:52         ` Sultan Alsawaf
  0 siblings, 2 replies; 22+ messages in thread
From: Stephan Gerhold @ 2025-04-08 16:48 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel, regressions, Johan Hovold

Hi Sultan,

On Tue, Apr 08, 2025 at 08:22:20AM -0700, Sultan Alsawaf wrote:
> On Tue, Apr 08, 2025 at 10:59:31AM +0200, Stephan Gerhold wrote:
> > On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> > > 
> > > A redundant frequency update is only truly needed when there is a policy
> > > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> > > 
> > > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > > frequency update _all the time_, not just for a policy limits change,
> > > because need_freq_update is never cleared.
> > > 
> > > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > > to a redundant frequency update, regardless of whether or not the driver
> > > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > > same as the current one.
> > > 
> > > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > > when there's a policy limits change, and clearing need_freq_update when a
> > > requisite redundant update occurs.
> > > 
> > > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > > and instead setting need_freq_update to false in sugov_update_next_freq().
> > > 
> > > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > > ---
> > >  kernel/sched/cpufreq_schedutil.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 28c77904ea74..e51d5ce730be 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > >  
> > >  	if (unlikely(sg_policy->limits_changed)) {
> > >  		sg_policy->limits_changed = false;
> > > -		sg_policy->need_freq_update = true;
> > > +		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > >  		return true;
> > >  	}
> > >  
> > > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > >  				   unsigned int next_freq)
> > >  {
> > >  	if (sg_policy->need_freq_update)
> > > -		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > +		sg_policy->need_freq_update = false;
> > >  	else if (sg_policy->next_freq == next_freq)
> > >  		return false;
> > >  
> > 
> > This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> > cpufreq drivers that:
> > 
> >  - Have policy->fast_switch_enabled/fast_switch_possible set, but
> >  - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
> > 
> > There are several examples for this in the tree (search for
> > "fast_switch_possible"). Of all those drivers, only intel-pstate and
> > amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
> > 
> > I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> > laptop:
> > 
> >  1. I added some low temperature trip points in the device tree,
> >     together with passive cpufreq cooling.
> >  2. I run a CPU stress test on all CPUs and monitor the temperatures
> >     and CPU frequencies.
> > 
> > When using "performance" governor instead of "schedutil", the CPU
> > frequencies are being throttled as expected, as soon as the temperature
> > trip points are reached.
> > 
> > When using "schedutil", the CPU frequencies stay at maximum as long as
> > the stress test is running. No throttling happens, so the device heats
> > up far beyond the defined temperature trip points. Throttling is applied
> > only after stopping the stress test, since this forces schedutil to
> > re-evaluate the CPU frequency.
> > 
> > Reverting this commit fixes the problem.
> > 
> > Looking at the code, I think the problem is that:
> >  - sg_policy->limits_changed does not result in
> >    sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
> >    anymore, and
> >  - Without sg->policy->need_freq_update, get_next_freq() skips calling
> >    cpufreq_driver_resolve_freq(), which would normally apply the policy
> >    min/max constraints.
> > 
> > Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> > that set policy->fast_switch_possible? If I'm reading the documentation
> > comment correctly, that flag is just supposed to enable notifications if
> > the policy min/max changes, but the resolved target frequency is still
> > the same. This is not the case here, the target frequency needs to be
> > throttled, but schedutil isn't applying the new limits.
> > 
> > Any suggestions how to fix this? I'm happy to test patches with my
> > setup.
> 
> Thank you for reporting this. As I see it, sg_policy->need_freq_update is
> working correctly now; however, sg_policy->limits_changed relied on the broken
> behavior of sg_policy->need_freq_update and therefore sg_policy->limits_changed
> needs to be fixed.

Thanks for the quick reply and the patch!

> 
> Can you try this patch:
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 1a19d69b91ed3..f37b999854d52 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -82,7 +82,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  		return false;
>  
>  	if (unlikely(sg_policy->limits_changed)) {
> -		sg_policy->limits_changed = false;
>  		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
>  		return true;
>  	}
> @@ -171,9 +170,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>  	freq = get_capacity_ref_freq(policy);
>  	freq = map_util_freq(util, freq, max);
>  
> -	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> +	if (freq == sg_policy->cached_raw_freq && !sg_policy->limits_changed &&
> +	    !sg_policy->need_freq_update)
>  		return sg_policy->next_freq;
>  
> +	sg_policy->limits_changed = false;
>  	sg_policy->cached_raw_freq = freq;
>  	return cpufreq_driver_resolve_freq(policy, freq);
>  }
> 

This is working correctly for me, CPU frequency is being throttled again
when the temperature trip points are reached. If you send this, feel
free to add:

Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org>

Thanks!
Stephan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-08 16:48       ` Stephan Gerhold
@ 2025-04-09 11:25         ` Xuewen Yan
  2025-04-09 11:48           ` Xuewen Yan
  2025-04-10  1:52         ` Sultan Alsawaf
  1 sibling, 1 reply; 22+ messages in thread
From: Xuewen Yan @ 2025-04-09 11:25 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Sultan Alsawaf, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

Hi Sultan,

On Wed, Apr 9, 2025 at 12:50 AM Stephan Gerhold
<stephan.gerhold@linaro.org> wrote:
>
> Hi Sultan,
>
> On Tue, Apr 08, 2025 at 08:22:20AM -0700, Sultan Alsawaf wrote:
> > On Tue, Apr 08, 2025 at 10:59:31AM +0200, Stephan Gerhold wrote:
> > > On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > > > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> > > >
> > > > A redundant frequency update is only truly needed when there is a policy
> > > > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> > > >
> > > > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > > > frequency update _all the time_, not just for a policy limits change,
> > > > because need_freq_update is never cleared.
> > > >
> > > > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > > > to a redundant frequency update, regardless of whether or not the driver
> > > > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > > > same as the current one.
> > > >
> > > > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > > > when there's a policy limits change, and clearing need_freq_update when a
> > > > requisite redundant update occurs.
> > > >
> > > > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > > > and instead setting need_freq_update to false in sugov_update_next_freq().
> > > >
> > > > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > > > ---
> > > >  kernel/sched/cpufreq_schedutil.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > index 28c77904ea74..e51d5ce730be 100644
> > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > > >
> > > >   if (unlikely(sg_policy->limits_changed)) {
> > > >           sg_policy->limits_changed = false;
> > > > -         sg_policy->need_freq_update = true;
> > > > +         sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > >           return true;
> > > >   }
> > > >
> > > > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > >                              unsigned int next_freq)
> > > >  {
> > > >   if (sg_policy->need_freq_update)
> > > > -         sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > +         sg_policy->need_freq_update = false;
> > > >   else if (sg_policy->next_freq == next_freq)
> > > >           return false;
> > > >
> > >
> > > This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> > > cpufreq drivers that:
> > >
> > >  - Have policy->fast_switch_enabled/fast_switch_possible set, but
> > >  - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
> > >
> > > There are several examples for this in the tree (search for
> > > "fast_switch_possible"). Of all those drivers, only intel-pstate and
> > > amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
> > >
> > > I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> > > laptop:
> > >
> > >  1. I added some low temperature trip points in the device tree,
> > >     together with passive cpufreq cooling.
> > >  2. I run a CPU stress test on all CPUs and monitor the temperatures
> > >     and CPU frequencies.
> > >
> > > When using "performance" governor instead of "schedutil", the CPU
> > > frequencies are being throttled as expected, as soon as the temperature
> > > trip points are reached.
> > >
> > > When using "schedutil", the CPU frequencies stay at maximum as long as
> > > the stress test is running. No throttling happens, so the device heats
> > > up far beyond the defined temperature trip points. Throttling is applied
> > > only after stopping the stress test, since this forces schedutil to
> > > re-evaluate the CPU frequency.
> > >
> > > Reverting this commit fixes the problem.
> > >
> > > Looking at the code, I think the problem is that:
> > >  - sg_policy->limits_changed does not result in
> > >    sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
> > >    anymore, and
> > >  - Without sg->policy->need_freq_update, get_next_freq() skips calling
> > >    cpufreq_driver_resolve_freq(), which would normally apply the policy
> > >    min/max constraints.
> > >
> > > Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> > > that set policy->fast_switch_possible? If I'm reading the documentation
> > > comment correctly, that flag is just supposed to enable notifications if
> > > the policy min/max changes, but the resolved target frequency is still
> > > the same. This is not the case here, the target frequency needs to be
> > > throttled, but schedutil isn't applying the new limits.
> > >
> > > Any suggestions how to fix this? I'm happy to test patches with my
> > > setup.
> >
> > Thank you for reporting this. As I see it, sg_policy->need_freq_update is
> > working correctly now; however, sg_policy->limits_changed relied on the broken
> > behavior of sg_policy->need_freq_update and therefore sg_policy->limits_changed
> > needs to be fixed.
>
> Thanks for the quick reply and the patch!
>
> >
> > Can you try this patch:
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 1a19d69b91ed3..f37b999854d52 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -82,7 +82,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >               return false;
> >
> >       if (unlikely(sg_policy->limits_changed)) {
> > -             sg_policy->limits_changed = false;
> >               sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> >               return true;
> >       }
> > @@ -171,9 +170,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >       freq = get_capacity_ref_freq(policy);
> >       freq = map_util_freq(util, freq, max);
> >
> > -     if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > +     if (freq == sg_policy->cached_raw_freq && !sg_policy->limits_changed &&
> > +         !sg_policy->need_freq_update)

We also should add the limits_changed in the sugov_update_single_freq().

> >               return sg_policy->next_freq;
> >
> > +     sg_policy->limits_changed = false;

Maybe this should be add in the sugov_update_next_freq()，because, both
sugov_update_single_freq() and sugov_update_shared(),
the sugov_update_next_freq() is after the freq check.

> >       sg_policy->cached_raw_freq = freq;
> >       return cpufreq_driver_resolve_freq(policy, freq);
> >  }
> >
>
> This is working correctly for me, CPU frequency is being throttled again
> when the temperature trip points are reached. If you send this, feel
> free to add:
>
> Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org>
>
> Thanks!
> Stephan
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-09 11:25         ` Xuewen Yan
@ 2025-04-09 11:48           ` Xuewen Yan
  2025-04-10  1:49             ` Sultan Alsawaf
  0 siblings, 1 reply; 22+ messages in thread
From: Xuewen Yan @ 2025-04-09 11:48 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Sultan Alsawaf, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

Or can we modify it as follows?

-->8--

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 1a19d69b91ed..0e8d3b92ffe7 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
sugov_policy *sg_policy, u64 time)

        if (unlikely(sg_policy->limits_changed)) {
                sg_policy->limits_changed = false;
-               sg_policy->need_freq_update =
cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
+               sg_policy->need_freq_update = true;
                return true;
        }

@@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
sugov_policy *sg_policy, u64 time)
 static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
                                   unsigned int next_freq)
 {
-       if (sg_policy->need_freq_update)
+       if (sg_policy->need_freq_update) {
                sg_policy->need_freq_update = false;
-       else if (sg_policy->next_freq == next_freq)
-               return false;
+               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
+                       goto change;
+       }

+       if (sg_policy->next_freq == next_freq)
+               return false;
+change:
        sg_policy->next_freq = next_freq;
        sg_policy->last_freq_update_time = time;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-09 11:48           ` Xuewen Yan
@ 2025-04-10  1:49             ` Sultan Alsawaf
  2025-04-10  2:06               ` Xuewen Yan
  0 siblings, 1 reply; 22+ messages in thread
From: Sultan Alsawaf @ 2025-04-10  1:49 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> Or can we modify it as follows?
> 
> -->8--
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 1a19d69b91ed..0e8d3b92ffe7 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> sugov_policy *sg_policy, u64 time)
> 
>         if (unlikely(sg_policy->limits_changed)) {
>                 sg_policy->limits_changed = false;
> -               sg_policy->need_freq_update =
> cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> +               sg_policy->need_freq_update = true;
>                 return true;
>         }
> 
> @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> sugov_policy *sg_policy, u64 time)
>  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>                                    unsigned int next_freq)
>  {
> -       if (sg_policy->need_freq_update)
> +       if (sg_policy->need_freq_update) {
>                 sg_policy->need_freq_update = false;
> -       else if (sg_policy->next_freq == next_freq)
> -               return false;
> +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> +                       goto change;
> +       }
> 
> +       if (sg_policy->next_freq == next_freq)
> +               return false;
> +change:
>         sg_policy->next_freq = next_freq;
>         sg_policy->last_freq_update_time = time;

If CPUFREQ_NEED_UPDATE_LIMITS isn't specified, then there's no need to request a
frequency switch from the driver when the current frequency is exactly the same
as the next frequency.

Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-08 16:48       ` Stephan Gerhold
  2025-04-09 11:25         ` Xuewen Yan
@ 2025-04-10  1:52         ` Sultan Alsawaf
  1 sibling, 0 replies; 22+ messages in thread
From: Sultan Alsawaf @ 2025-04-10  1:52 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Rafael J. Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-pm,
	linux-kernel, regressions, Johan Hovold

On Tue, Apr 08, 2025 at 06:48:06PM +0200, Stephan Gerhold wrote:
> Hi Sultan,
> 
> On Tue, Apr 08, 2025 at 08:22:20AM -0700, Sultan Alsawaf wrote:
> > On Tue, Apr 08, 2025 at 10:59:31AM +0200, Stephan Gerhold wrote:
> > > On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > > > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> > > > 
> > > > A redundant frequency update is only truly needed when there is a policy
> > > > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> > > > 
> > > > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > > > frequency update _all the time_, not just for a policy limits change,
> > > > because need_freq_update is never cleared.
> > > > 
> > > > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > > > to a redundant frequency update, regardless of whether or not the driver
> > > > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > > > same as the current one.
> > > > 
> > > > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > > > when there's a policy limits change, and clearing need_freq_update when a
> > > > requisite redundant update occurs.
> > > > 
> > > > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > > > and instead setting need_freq_update to false in sugov_update_next_freq().
> > > > 
> > > > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > > > ---
> > > >  kernel/sched/cpufreq_schedutil.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > index 28c77904ea74..e51d5ce730be 100644
> > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > > >  
> > > >  	if (unlikely(sg_policy->limits_changed)) {
> > > >  		sg_policy->limits_changed = false;
> > > > -		sg_policy->need_freq_update = true;
> > > > +		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > >  		return true;
> > > >  	}
> > > >  
> > > > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > >  				   unsigned int next_freq)
> > > >  {
> > > >  	if (sg_policy->need_freq_update)
> > > > -		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > +		sg_policy->need_freq_update = false;
> > > >  	else if (sg_policy->next_freq == next_freq)
> > > >  		return false;
> > > >  
> > > 
> > > This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> > > cpufreq drivers that:
> > > 
> > >  - Have policy->fast_switch_enabled/fast_switch_possible set, but
> > >  - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
> > > 
> > > There are several examples for this in the tree (search for
> > > "fast_switch_possible"). Of all those drivers, only intel-pstate and
> > > amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
> > > 
> > > I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> > > laptop:
> > > 
> > >  1. I added some low temperature trip points in the device tree,
> > >     together with passive cpufreq cooling.
> > >  2. I run a CPU stress test on all CPUs and monitor the temperatures
> > >     and CPU frequencies.
> > > 
> > > When using "performance" governor instead of "schedutil", the CPU
> > > frequencies are being throttled as expected, as soon as the temperature
> > > trip points are reached.
> > > 
> > > When using "schedutil", the CPU frequencies stay at maximum as long as
> > > the stress test is running. No throttling happens, so the device heats
> > > up far beyond the defined temperature trip points. Throttling is applied
> > > only after stopping the stress test, since this forces schedutil to
> > > re-evaluate the CPU frequency.
> > > 
> > > Reverting this commit fixes the problem.
> > > 
> > > Looking at the code, I think the problem is that:
> > >  - sg_policy->limits_changed does not result in
> > >    sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
> > >    anymore, and
> > >  - Without sg->policy->need_freq_update, get_next_freq() skips calling
> > >    cpufreq_driver_resolve_freq(), which would normally apply the policy
> > >    min/max constraints.
> > > 
> > > Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> > > that set policy->fast_switch_possible? If I'm reading the documentation
> > > comment correctly, that flag is just supposed to enable notifications if
> > > the policy min/max changes, but the resolved target frequency is still
> > > the same. This is not the case here, the target frequency needs to be
> > > throttled, but schedutil isn't applying the new limits.
> > > 
> > > Any suggestions how to fix this? I'm happy to test patches with my
> > > setup.
> > 
> > Thank you for reporting this. As I see it, sg_policy->need_freq_update is
> > working correctly now; however, sg_policy->limits_changed relied on the broken
> > behavior of sg_policy->need_freq_update and therefore sg_policy->limits_changed
> > needs to be fixed.
> 
> Thanks for the quick reply and the patch!
> 
> > 
> > Can you try this patch:
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 1a19d69b91ed3..f37b999854d52 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -82,7 +82,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  		return false;
> >  
> >  	if (unlikely(sg_policy->limits_changed)) {
> > -		sg_policy->limits_changed = false;
> >  		sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> >  		return true;
> >  	}
> > @@ -171,9 +170,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >  	freq = get_capacity_ref_freq(policy);
> >  	freq = map_util_freq(util, freq, max);
> >  
> > -	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > +	if (freq == sg_policy->cached_raw_freq && !sg_policy->limits_changed &&
> > +	    !sg_policy->need_freq_update)
> >  		return sg_policy->next_freq;
> >  
> > +	sg_policy->limits_changed = false;
> >  	sg_policy->cached_raw_freq = freq;
> >  	return cpufreq_driver_resolve_freq(policy, freq);
> >  }
> > 
> 
> This is working correctly for me, CPU frequency is being throttled again
> when the temperature trip points are reached.

Great to hear, thanks for testing that so quickly!

> If you send this, feel free to add:
> 
> Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org>

I'll do Reported-and-tested-by if you don't mind. :-)

> 
> Thanks!
> Stephan

Thanks,
Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  1:49             ` Sultan Alsawaf
@ 2025-04-10  2:06               ` Xuewen Yan
  2025-04-10  2:08                 ` Sultan Alsawaf
  0 siblings, 1 reply; 22+ messages in thread
From: Xuewen Yan @ 2025-04-10  2:06 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > Or can we modify it as follows?
> >
> > -->8--
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > sugov_policy *sg_policy, u64 time)
> >
> >         if (unlikely(sg_policy->limits_changed)) {
> >                 sg_policy->limits_changed = false;
> > -               sg_policy->need_freq_update =
> > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > +               sg_policy->need_freq_update = true;
> >                 return true;
> >         }
> >
> > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > sugov_policy *sg_policy, u64 time)
> >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >                                    unsigned int next_freq)
> >  {
> > -       if (sg_policy->need_freq_update)
> > +       if (sg_policy->need_freq_update) {
> >                 sg_policy->need_freq_update = false;
> > -       else if (sg_policy->next_freq == next_freq)
> > -               return false;
> > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > +                       goto change;
> > +       }
> >
> > +       if (sg_policy->next_freq == next_freq)
> > +               return false;
> > +change:
> >         sg_policy->next_freq = next_freq;
> >         sg_policy->last_freq_update_time = time;
>
> If CPUFREQ_NEED_UPDATE_LIMITS isn't specified, then there's no need to request a
> frequency switch from the driver when the current frequency is exactly the same
> as the next frequency.

Yes, the following check would return false:

 +       if (sg_policy->next_freq == next_freq)
 +               return false;


>
> Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  2:06               ` Xuewen Yan
@ 2025-04-10  2:08                 ` Sultan Alsawaf
  2025-04-10  2:13                   ` Xuewen Yan
  0 siblings, 1 reply; 22+ messages in thread
From: Sultan Alsawaf @ 2025-04-10  2:08 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 10:06:41AM +0800, Xuewen Yan wrote:
> On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> >
> > On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > > Or can we modify it as follows?
> > >
> > > -->8--
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > > sugov_policy *sg_policy, u64 time)
> > >
> > >         if (unlikely(sg_policy->limits_changed)) {
> > >                 sg_policy->limits_changed = false;
> > > -               sg_policy->need_freq_update =
> > > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > +               sg_policy->need_freq_update = true;
> > >                 return true;
> > >         }
> > >
> > > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > > sugov_policy *sg_policy, u64 time)
> > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > >                                    unsigned int next_freq)
> > >  {
> > > -       if (sg_policy->need_freq_update)
> > > +       if (sg_policy->need_freq_update) {
> > >                 sg_policy->need_freq_update = false;
> > > -       else if (sg_policy->next_freq == next_freq)
> > > -               return false;
> > > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > +                       goto change;
> > > +       }
> > >
> > > +       if (sg_policy->next_freq == next_freq)
> > > +               return false;
> > > +change:
> > >         sg_policy->next_freq = next_freq;
> > >         sg_policy->last_freq_update_time = time;
> >
> > If CPUFREQ_NEED_UPDATE_LIMITS isn't specified, then there's no need to request a
> > frequency switch from the driver when the current frequency is exactly the same
> > as the next frequency.
> 
> Yes, the following check would return false:
> 
>  +       if (sg_policy->next_freq == next_freq)
>  +               return false;

But what does that change fix? In fact, that change causes a limits update to
trigger a frequency switch request to the driver even when the new frequency is
the same as the current one.

Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  2:08                 ` Sultan Alsawaf
@ 2025-04-10  2:13                   ` Xuewen Yan
  2025-04-10  2:22                     ` Sultan Alsawaf
  0 siblings, 1 reply; 22+ messages in thread
From: Xuewen Yan @ 2025-04-10  2:13 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 10:09 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Thu, Apr 10, 2025 at 10:06:41AM +0800, Xuewen Yan wrote:
> > On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > >
> > > On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > > > Or can we modify it as follows?
> > > >
> > > > -->8--
> > > >
> > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > > > sugov_policy *sg_policy, u64 time)
> > > >
> > > >         if (unlikely(sg_policy->limits_changed)) {
> > > >                 sg_policy->limits_changed = false;
> > > > -               sg_policy->need_freq_update =
> > > > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > +               sg_policy->need_freq_update = true;
> > > >                 return true;
> > > >         }
> > > >
> > > > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > > > sugov_policy *sg_policy, u64 time)
> > > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > >                                    unsigned int next_freq)
> > > >  {
> > > > -       if (sg_policy->need_freq_update)
> > > > +       if (sg_policy->need_freq_update) {
> > > >                 sg_policy->need_freq_update = false;
> > > > -       else if (sg_policy->next_freq == next_freq)
> > > > -               return false;
> > > > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > > +                       goto change;
> > > > +       }
> > > >
> > > > +       if (sg_policy->next_freq == next_freq)
> > > > +               return false;
> > > > +change:
> > > >         sg_policy->next_freq = next_freq;
> > > >         sg_policy->last_freq_update_time = time;
> > >
> > > If CPUFREQ_NEED_UPDATE_LIMITS isn't specified, then there's no need to request a
> > > frequency switch from the driver when the current frequency is exactly the same
> > > as the next frequency.
> >
> > Yes, the following check would return false:
> >
> >  +       if (sg_policy->next_freq == next_freq)
> >  +               return false;
>
> But what does that change fix? In fact, that change causes a limits update to
> trigger a frequency switch request to the driver even when the new frequency is
> the same as the current one.

We set the sg_policy->need_freq_update = false instead of
sg_policy->need_freq_update =
cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS),
to fix the original issue, and then add the
+               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
+                       goto change;

to allow cpufreq to update when CPUFREQ_NEED_UPDATE_LIMITS is set.

>
> Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  2:13                   ` Xuewen Yan
@ 2025-04-10  2:22                     ` Sultan Alsawaf
  2025-04-10  2:30                       ` Xuewen Yan
  0 siblings, 1 reply; 22+ messages in thread
From: Sultan Alsawaf @ 2025-04-10  2:22 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 10:13:04AM +0800, Xuewen Yan wrote:
> On Thu, Apr 10, 2025 at 10:09 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> >
> > On Thu, Apr 10, 2025 at 10:06:41AM +0800, Xuewen Yan wrote:
> > > On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > >
> > > > On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > > > > Or can we modify it as follows?
> > > > >
> > > > > -->8--
> > > > >
> > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > > > > sugov_policy *sg_policy, u64 time)
> > > > >
> > > > >         if (unlikely(sg_policy->limits_changed)) {
> > > > >                 sg_policy->limits_changed = false;
> > > > > -               sg_policy->need_freq_update =
> > > > > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > > +               sg_policy->need_freq_update = true;
> > > > >                 return true;
> > > > >         }
> > > > >
> > > > > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > > > > sugov_policy *sg_policy, u64 time)
> > > > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > > >                                    unsigned int next_freq)
> > > > >  {
> > > > > -       if (sg_policy->need_freq_update)
> > > > > +       if (sg_policy->need_freq_update) {
> > > > >                 sg_policy->need_freq_update = false;
> > > > > -       else if (sg_policy->next_freq == next_freq)
> > > > > -               return false;
> > > > > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > > > +                       goto change;
> > > > > +       }
> > > > >
> > > > > +       if (sg_policy->next_freq == next_freq)
> > > > > +               return false;
> > > > > +change:
> > > > >         sg_policy->next_freq = next_freq;
> > > > >         sg_policy->last_freq_update_time = time;
> > > >
> > > > If CPUFREQ_NEED_UPDATE_LIMITS isn't specified, then there's no need to request a
> > > > frequency switch from the driver when the current frequency is exactly the same
> > > > as the next frequency.
> > >
> > > Yes, the following check would return false:
> > >
> > >  +       if (sg_policy->next_freq == next_freq)
> > >  +               return false;
> >
> > But what does that change fix? In fact, that change causes a limits update to
> > trigger a frequency switch request to the driver even when the new frequency is
> > the same as the current one.
> 
> We set the sg_policy->need_freq_update = false instead of
> sg_policy->need_freq_update =
> cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS),
> to fix the original issue, and then add the
> +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> +                       goto change;
> 
> to allow cpufreq to update when CPUFREQ_NEED_UPDATE_LIMITS is set.

Please take a closer look at this snippet in sugov_update_next_freq():

	if (sg_policy->need_freq_update)
		sg_policy->need_freq_update = false;
	else if (sg_policy->next_freq == next_freq)
		return false;

The 2nd if-statement is an else-if. Therefore, when need_freq_update is true, it
is set to false *and* skips the (sg_policy->next_freq == next_freq) check.

Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  2:22                     ` Sultan Alsawaf
@ 2025-04-10  2:30                       ` Xuewen Yan
  2025-04-10  2:33                         ` Sultan Alsawaf
  0 siblings, 1 reply; 22+ messages in thread
From: Xuewen Yan @ 2025-04-10  2:30 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 10:22 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Thu, Apr 10, 2025 at 10:13:04AM +0800, Xuewen Yan wrote:
> > On Thu, Apr 10, 2025 at 10:09 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > >
> > > On Thu, Apr 10, 2025 at 10:06:41AM +0800, Xuewen Yan wrote:
> > > > On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > > >
> > > > > On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > > > > > Or can we modify it as follows?
> > > > > >
> > > > > > -->8--
> > > > > >
> > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > > > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > > > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > > > > > sugov_policy *sg_policy, u64 time)
> > > > > >
> > > > > >         if (unlikely(sg_policy->limits_changed)) {
> > > > > >                 sg_policy->limits_changed = false;
> > > > > > -               sg_policy->need_freq_update =
> > > > > > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > > > +               sg_policy->need_freq_update = true;
> > > > > >                 return true;
> > > > > >         }
> > > > > >
> > > > > > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > > > > > sugov_policy *sg_policy, u64 time)
> > > > > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > > > >                                    unsigned int next_freq)
> > > > > >  {
> > > > > > -       if (sg_policy->need_freq_update)
> > > > > > +       if (sg_policy->need_freq_update) {
> > > > > >                 sg_policy->need_freq_update = false;
> > > > > > -       else if (sg_policy->next_freq == next_freq)
> > > > > > -               return false;
> > > > > > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > > > > +                       goto change;
> > > > > > +       }
> > > > > >
> > > > > > +       if (sg_policy->next_freq == next_freq)
> > > > > > +               return false;

I have deleted the else.

> > > > > > +change:
> > > > > >         sg_policy->next_freq = next_freq;
> > > > > >         sg_policy->last_freq_update_time = time;
> > > > >
> > > > > If CPUFREQ_NEED_UPDATE_LIMITS isn't specified, then there's no need to request a
> > > > > frequency switch from the driver when the current frequency is exactly the same
> > > > > as the next frequency.
> > > >
> > > > Yes, the following check would return false:
> > > >
> > > >  +       if (sg_policy->next_freq == next_freq)
> > > >  +               return false;
> > >
> > > But what does that change fix? In fact, that change causes a limits update to
> > > trigger a frequency switch request to the driver even when the new frequency is
> > > the same as the current one.
> >
> > We set the sg_policy->need_freq_update = false instead of
> > sg_policy->need_freq_update =
> > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS),
> > to fix the original issue, and then add the
> > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > +                       goto change;
> >
> > to allow cpufreq to update when CPUFREQ_NEED_UPDATE_LIMITS is set.
>
> Please take a closer look at this snippet in sugov_update_next_freq():
>
>         if (sg_policy->need_freq_update)
>                 sg_policy->need_freq_update = false;
>         else if (sg_policy->next_freq == next_freq)
>                 return false;
>
> The 2nd if-statement is an else-if. Therefore, when need_freq_update is true, it
> is set to false *and* skips the (sg_policy->next_freq == next_freq) check.
>
> Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  2:30                       ` Xuewen Yan
@ 2025-04-10  2:33                         ` Sultan Alsawaf
  2025-04-10  2:42                           ` Xuewen Yan
  0 siblings, 1 reply; 22+ messages in thread
From: Sultan Alsawaf @ 2025-04-10  2:33 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 10:30:45AM +0800, Xuewen Yan wrote:
> On Thu, Apr 10, 2025 at 10:22 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> >
> > On Thu, Apr 10, 2025 at 10:13:04AM +0800, Xuewen Yan wrote:
> > > On Thu, Apr 10, 2025 at 10:09 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > >
> > > > On Thu, Apr 10, 2025 at 10:06:41AM +0800, Xuewen Yan wrote:
> > > > > On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > > > >
> > > > > > On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > > > > > > Or can we modify it as follows?
> > > > > > >
> > > > > > > -->8--
> > > > > > >
> > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > > > > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > > > > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > > > > > > sugov_policy *sg_policy, u64 time)
> > > > > > >
> > > > > > >         if (unlikely(sg_policy->limits_changed)) {
> > > > > > >                 sg_policy->limits_changed = false;
> > > > > > > -               sg_policy->need_freq_update =
> > > > > > > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > > > > +               sg_policy->need_freq_update = true;
> > > > > > >                 return true;
> > > > > > >         }
> > > > > > >
> > > > > > > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > > > > > > sugov_policy *sg_policy, u64 time)
> > > > > > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > > > > >                                    unsigned int next_freq)
> > > > > > >  {
> > > > > > > -       if (sg_policy->need_freq_update)
> > > > > > > +       if (sg_policy->need_freq_update) {
> > > > > > >                 sg_policy->need_freq_update = false;
> > > > > > > -       else if (sg_policy->next_freq == next_freq)
> > > > > > > -               return false;
> > > > > > > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > > > > > +                       goto change;
> > > > > > > +       }
> > > > > > >
> > > > > > > +       if (sg_policy->next_freq == next_freq)
> > > > > > > +               return false;
> 
> I have deleted the else.

Yes, but your change causes a regression by recreating part of the problem
solved in 8e461a1cb43d6 ("cpufreq: schedutil: Fix superfluous updates caused by
need_freq_update").

Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-10  2:33                         ` Sultan Alsawaf
@ 2025-04-10  2:42                           ` Xuewen Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Xuewen Yan @ 2025-04-10  2:42 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: Stephan Gerhold, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

On Thu, Apr 10, 2025 at 10:33 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Thu, Apr 10, 2025 at 10:30:45AM +0800, Xuewen Yan wrote:
> > On Thu, Apr 10, 2025 at 10:22 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > >
> > > On Thu, Apr 10, 2025 at 10:13:04AM +0800, Xuewen Yan wrote:
> > > > On Thu, Apr 10, 2025 at 10:09 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > > >
> > > > > On Thu, Apr 10, 2025 at 10:06:41AM +0800, Xuewen Yan wrote:
> > > > > > On Thu, Apr 10, 2025 at 9:49 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > > > > > >
> > > > > > > On Wed, Apr 09, 2025 at 07:48:05PM +0800, Xuewen Yan wrote:
> > > > > > > > Or can we modify it as follows?
> > > > > > > >
> > > > > > > > -->8--
> > > > > > > >
> > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > > > > > index 1a19d69b91ed..0e8d3b92ffe7 100644
> > > > > > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > > > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > > > > > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct
> > > > > > > > sugov_policy *sg_policy, u64 time)
> > > > > > > >
> > > > > > > >         if (unlikely(sg_policy->limits_changed)) {
> > > > > > > >                 sg_policy->limits_changed = false;
> > > > > > > > -               sg_policy->need_freq_update =
> > > > > > > > cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > > > > > > > +               sg_policy->need_freq_update = true;
> > > > > > > >                 return true;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > @@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
> > > > > > > > sugov_policy *sg_policy, u64 time)
> > > > > > > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > > > > > >                                    unsigned int next_freq)
> > > > > > > >  {
> > > > > > > > -       if (sg_policy->need_freq_update)
> > > > > > > > +       if (sg_policy->need_freq_update) {
> > > > > > > >                 sg_policy->need_freq_update = false;
> > > > > > > > -       else if (sg_policy->next_freq == next_freq)
> > > > > > > > -               return false;
> > > > > > > > +               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > > > > > > +                       goto change;
> > > > > > > > +       }
> > > > > > > >
> > > > > > > > +       if (sg_policy->next_freq == next_freq)
> > > > > > > > +               return false;
> >
> > I have deleted the else.
>
> Yes, but your change causes a regression by recreating part of the problem
> solved in 8e461a1cb43d6 ("cpufreq: schedutil: Fix superfluous updates caused by
> need_freq_update").

Hi, Sultan,

Did I miss anything?

@@ -95,11 +95,15 @@ static bool sugov_should_update_freq(struct
sugov_policy *sg_policy, u64 time)
 static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
                                   unsigned int next_freq)
 {
-       if (sg_policy->need_freq_update)
+       if (sg_policy->need_freq_update) {
                sg_policy->need_freq_update = false;
-       else if (sg_policy->next_freq == next_freq)
-               return false;
+               if (cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
+                       goto change;
+       }

+       if (sg_policy->next_freq == next_freq)
+               return false;
+change:
        sg_policy->next_freq = next_freq;
        sg_policy->last_freq_update_time = time;

---
The patch set the need_freq_update to be false Unconditionally.
Is there another issue ?

Thanks！

>
> Sultan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  2025-04-08  8:59   ` Stephan Gerhold
  2025-04-08 15:22     ` Sultan Alsawaf
@ 2025-04-10 19:16     ` Rafael J. Wysocki
  1 sibling, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2025-04-10 19:16 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Sultan Alsawaf, Rafael J. Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-pm, linux-kernel, regressions, Johan Hovold

Hi,

Sorry for kind of a retroactive response.

On Tue, Apr 8, 2025 at 10:59 AM Stephan Gerhold
<stephan.gerhold@linaro.org> wrote:
>
> Hi,
>
> On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > From: "Sultan Alsawaf (unemployed)" <sultan@kerneltoast.com>
> >
> > A redundant frequency update is only truly needed when there is a policy
> > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> >
> > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > frequency update _all the time_, not just for a policy limits change,
> > because need_freq_update is never cleared.
> >
> > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > to a redundant frequency update, regardless of whether or not the driver
> > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > same as the current one.
> >
> > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > when there's a policy limits change, and clearing need_freq_update when a
> > requisite redundant update occurs.
> >
> > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > and instead setting need_freq_update to false in sugov_update_next_freq().
> >
> > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@kerneltoast.com>
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 28c77904ea74..e51d5ce730be 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >
> >       if (unlikely(sg_policy->limits_changed)) {
> >               sg_policy->limits_changed = false;
> > -             sg_policy->need_freq_update = true;
> > +             sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> >               return true;
> >       }
> >
> > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >                                  unsigned int next_freq)
> >  {
> >       if (sg_policy->need_freq_update)
> > -             sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > +             sg_policy->need_freq_update = false;
> >       else if (sg_policy->next_freq == next_freq)
> >               return false;
> >
>
> This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> cpufreq drivers that:
>
>  - Have policy->fast_switch_enabled/fast_switch_possible set, but
>  - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
>
> There are several examples for this in the tree (search for
> "fast_switch_possible"). Of all those drivers, only intel-pstate and
> amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
>
> I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> laptop:
>
>  1. I added some low temperature trip points in the device tree,
>     together with passive cpufreq cooling.
>  2. I run a CPU stress test on all CPUs and monitor the temperatures
>     and CPU frequencies.
>
> When using "performance" governor instead of "schedutil", the CPU
> frequencies are being throttled as expected, as soon as the temperature
> trip points are reached.
>
> When using "schedutil", the CPU frequencies stay at maximum as long as
> the stress test is running. No throttling happens, so the device heats
> up far beyond the defined temperature trip points. Throttling is applied
> only after stopping the stress test, since this forces schedutil to
> re-evaluate the CPU frequency.
>
> Reverting this commit fixes the problem.
>
> Looking at the code, I think the problem is that:
>  - sg_policy->limits_changed does not result in
>    sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
>    anymore, and
>  - Without sg->policy->need_freq_update, get_next_freq() skips calling
>    cpufreq_driver_resolve_freq(), which would normally apply the policy
>    min/max constraints.
>
> Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> that set policy->fast_switch_possible?

I think that it is generally needed for ->fast_switch() because the
new limits don't take effect without running ->fast_switch().

But if that is the case, the behavior can just be made conditional on
policy->fast_switch_enabled.

> If I'm reading the documentation
> comment correctly, that flag is just supposed to enable notifications if
> the policy min/max changes, but the resolved target frequency is still
> the same.

If the policy min/max change and the resolved target frequency is
beyond the new limits. it needs to be changed.

The flag effectively says "Call my ->fast_switch() if the policy
limits have changed regardless of whether or not there is another
reason to call it."

> This is not the case here, the target frequency needs to be
> throttled, but schedutil isn't applying the new limits.
>
> Any suggestions how to fix this? I'm happy to test patches with my setup.

Replace cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS) in
sugov_should_update_freq() with sg_policy->policy->fast_switch_enabled
and (if this works), CPUFREQ_NEED_UPDATE_LIMITS can be deleted.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-04-10 19:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12  1:57 [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Sultan Alsawaf
2024-12-12  1:57 ` [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update Sultan Alsawaf
2024-12-12 13:24   ` Christian Loehle
2024-12-14  2:35     ` Sultan Alsawaf (unemployed)
2024-12-18 15:10     ` Rafael J. Wysocki
2025-04-08  8:59   ` Stephan Gerhold
2025-04-08 15:22     ` Sultan Alsawaf
2025-04-08 16:48       ` Stephan Gerhold
2025-04-09 11:25         ` Xuewen Yan
2025-04-09 11:48           ` Xuewen Yan
2025-04-10  1:49             ` Sultan Alsawaf
2025-04-10  2:06               ` Xuewen Yan
2025-04-10  2:08                 ` Sultan Alsawaf
2025-04-10  2:13                   ` Xuewen Yan
2025-04-10  2:22                     ` Sultan Alsawaf
2025-04-10  2:30                       ` Xuewen Yan
2025-04-10  2:33                         ` Sultan Alsawaf
2025-04-10  2:42                           ` Xuewen Yan
2025-04-10  1:52         ` Sultan Alsawaf
2025-04-10 19:16     ` Rafael J. Wysocki
2024-12-12 12:06 ` [PATCH 2/2] cpufreq: schedutil: Ignore rate limit when scaling up with FIE present Christian Loehle
2024-12-14  2:15   ` Sultan Alsawaf (unemployed)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).