* [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil
@ 2025-04-15 9:52 Rafael J. Wysocki
2025-04-15 9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 9:52 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
Hi Everyone,
This is an update of
https://lore.kernel.org/linux-pm/3364921.aeNJFYEL58@rjwysocki.net/
that replaces the first patch with a better fix and adds one more patch
after the second one.
The original cover letter is still generally applicable:
"This series of patches has been inspired by the discussion following a bug
report regarding the patch at
https://lore.kernel.org/lkml/20241212015734.41241-2-sultan@kerneltoast.com/
and its attempted unsuccessful resolution:
https://lore.kernel.org/linux-pm/20250410024439.20859-1-sultan@kerneltoast.com/
which basically leads to the conclusion that cpufreq policy limits updates are
not sufficiently synchronized with the scheditil governor, especially in the
fast switching case in which running the driver callback is the only way to
make the new policy limits take effect.
The purpose of this series is to address this concern."
Patch [1/6] is a fix for the issue introduced by the patch linked above (please
see the patch changelog for details), for 6.15-rc. The remaining patches are
for 6.16.
Patch [2/6] adds memory barriers in two places in schedutil along with some
WRITE_ONCE()/READ_ONCE() annotations to ensure that policy limits updates will
not be missed due to reordering of instructions.
Patch [3/6] prevents limits_changed from being used for purposes unrelated to
changing the policy limits.
Patch [4/6] is a preparatory function rename with no functional impact.
Patch [5/6] updates the cpufreq core to avoid situations in which
cpufreq_driver_resolve_freq(), called by schedutil, may see intermediate
values of policy->min and policy->max and makes that function address the
unlikely case in which it may see policy->min > policy->max.
Patch [6/6] cleans up the code after the previous changes.
Please see individual patch changelogs for details.
Thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
@ 2025-04-15 9:58 ` Rafael J. Wysocki
2025-04-16 11:35 ` Christian Loehle
2025-04-20 1:10 ` Sultan Alsawaf
2025-04-15 9:59 ` [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling Rafael J. Wysocki
` (4 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 9:58 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
by need_freq_update") modified sugov_should_update_freq() to set the
need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
set, but that flag generally needs to be set when the policy limits
change because the driver callback may need to be invoked for the new
limits to take effect.
However, if the return value of cpufreq_driver_resolve_freq() after
applying the new limits is still equal to the previously selected
frequency, the driver callback needs to be invoked only in the case
when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
specifically wants its callback to be invoked every time the policy
limits change).
Update the code accordingly to avoid missing policy limits changes for
drivers without CPUFREQ_NEED_UPDATE_LIMITS.
Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1 -> v2:
* Always set need_freq_update when limits_changed is set.
* Take CPUFREQ_NEED_UPDATE_LIMITS into account in sugov_update_next_freq().
---
kernel/sched/cpufreq_schedutil.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -83,7 +83,7 @@
if (unlikely(sg_policy->limits_changed)) {
sg_policy->limits_changed = false;
- sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
+ sg_policy->need_freq_update = true;
return true;
}
@@ -95,10 +95,22 @@
static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
- if (sg_policy->need_freq_update)
+ if (sg_policy->need_freq_update) {
sg_policy->need_freq_update = false;
- else if (sg_policy->next_freq == next_freq)
+ /*
+ * The policy limits have changed, but if the return value of
+ * cpufreq_driver_resolve_freq() after applying the new limits
+ * is still equal to the previously selected frequency, the
+ * driver callback need not be invoked unless the driver
+ * specifically wants that to happen on every update of the
+ * policy limits.
+ */
+ if (sg_policy->next_freq == next_freq &&
+ !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
+ return false;
+ } else if (sg_policy->next_freq == next_freq) {
return false;
+ }
sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
2025-04-15 9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
@ 2025-04-15 9:59 ` Rafael J. Wysocki
2025-04-16 12:01 ` Christian Loehle
2025-04-15 10:00 ` [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() Rafael J. Wysocki
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 9:59 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The handling of the limits_changed flag in struct sugov_policy needs to
be explicitly synchronized to ensure that cpufreq policy limits updates
will not be missed in some cases.
Without that synchronization it is theoretically possible that
the limits_changed update in sugov_should_update_freq() will be
reordered with respect to the reads of the policy limits in
cpufreq_driver_resolve_freq() and in that case, if the limits_changed
update in sugov_limits() clobbers the one in sugov_should_update_freq(),
the new policy limits may not take effect for a long time.
Likewise, the limits_changed update in sugov_limits() may theoretically
get reordered with respect to the updates of the policy limits in
cpufreq_set_policy() and if sugov_should_update_freq() runs between
them, the policy limits change may be missed.
To ensure that the above situations will not take place, add memory
barriers preventing the reordering in question from taking place and
add READ_ONCE() and WRITE_ONCE() annotations around all of the
limits_changed flag updates to prevent the compiler from messing up
with that code.
Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
Cc: 5.3+ <stable@vger.nernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1 -> v2:
* Rebase on top of the new [1/6].
---
kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -81,9 +81,20 @@
if (!cpufreq_this_cpu_can_update(sg_policy->policy))
return false;
- if (unlikely(sg_policy->limits_changed)) {
- sg_policy->limits_changed = false;
+ if (unlikely(READ_ONCE(sg_policy->limits_changed))) {
+ WRITE_ONCE(sg_policy->limits_changed, false);
sg_policy->need_freq_update = true;
+
+ /*
+ * The above limits_changed update must occur before the reads
+ * of policy limits in cpufreq_driver_resolve_freq() or a policy
+ * limits update might be missed, so use a memory barrier to
+ * ensure it.
+ *
+ * This pairs with the write memory barrier in sugov_limits().
+ */
+ smp_mb();
+
return true;
}
@@ -377,7 +388,7 @@
static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
{
if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min)
- sg_cpu->sg_policy->limits_changed = true;
+ WRITE_ONCE(sg_cpu->sg_policy->limits_changed, true);
}
static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
@@ -883,7 +894,16 @@
mutex_unlock(&sg_policy->work_lock);
}
- sg_policy->limits_changed = true;
+ /*
+ * The limits_changed update below must take place before the updates
+ * of policy limits in cpufreq_set_policy() or a policy limits update
+ * might be missed, so use a memory barrier to ensure it.
+ *
+ * This pairs with the memory barrier in sugov_should_update_freq().
+ */
+ smp_wmb();
+
+ WRITE_ONCE(sg_policy->limits_changed, true);
}
struct cpufreq_governor schedutil_gov = {
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
2025-04-15 9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
2025-04-15 9:59 ` [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling Rafael J. Wysocki
@ 2025-04-15 10:00 ` Rafael J. Wysocki
2025-04-16 12:26 ` Christian Loehle
2025-04-15 10:02 ` [PATCH v2 4/6] cpufreq: Rename __resolve_freq() to clamp_and_resolve_freq() Rafael J. Wysocki
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 10:00 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).
Namely, if sugov_should_update_freq() is updated to check
sg_policy->need_freq_update and return 'true' if it is set when
sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().
Update the code accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
kernel/sched/cpufreq_schedutil.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -96,6 +96,9 @@
smp_mb();
return true;
+ } else if (sg_policy->need_freq_update) {
+ /* ignore_dl_rate_limit() wants a new frequency to be found. */
+ return true;
}
delta_ns = time - sg_policy->last_freq_update_time;
@@ -388,7 +391,7 @@
static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
{
if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min)
- WRITE_ONCE(sg_cpu->sg_policy->limits_changed, true);
+ sg_cpu->sg_policy->need_freq_update = true;
}
static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 4/6] cpufreq: Rename __resolve_freq() to clamp_and_resolve_freq()
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
` (2 preceding siblings ...)
2025-04-15 10:00 ` [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() Rafael J. Wysocki
@ 2025-04-15 10:02 ` Rafael J. Wysocki
2025-04-15 10:04 ` [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max Rafael J. Wysocki
2025-04-15 10:05 ` [PATCH v2 6/6] cpufreq: Eliminate clamp_and_resolve_freq() Rafael J. Wysocki
5 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 10:02 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In preparation for subsequent changes rename a function in the cpufreq
core as per the subject and while at it, clean up some white space
around the declaration for that function.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1 -> v2: No changes
---
drivers/cpufreq/cpufreq.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -490,8 +490,9 @@
}
EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
-static unsigned int __resolve_freq(struct cpufreq_policy *policy,
- unsigned int target_freq, unsigned int relation)
+static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
{
unsigned int idx;
@@ -520,7 +521,7 @@
unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
unsigned int target_freq)
{
- return __resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
+ return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
}
EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
@@ -2338,7 +2339,7 @@
if (cpufreq_disabled())
return -ENODEV;
- target_freq = __resolve_freq(policy, target_freq, relation);
+ target_freq = clamp_and_resolve_freq(policy, target_freq, relation);
pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
policy->cpu, target_freq, relation, old_target_freq);
@@ -2634,8 +2635,8 @@
*/
policy->min = new_data.min;
policy->max = new_data.max;
- policy->min = __resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
- policy->max = __resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
+ policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
+ policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
trace_cpu_frequency_limits(policy);
cpufreq_update_pressure(policy);
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
` (3 preceding siblings ...)
2025-04-15 10:02 ` [PATCH v2 4/6] cpufreq: Rename __resolve_freq() to clamp_and_resolve_freq() Rafael J. Wysocki
@ 2025-04-15 10:04 ` Rafael J. Wysocki
2025-04-16 12:39 ` Christian Loehle
2025-04-18 10:18 ` Sultan Alsawaf
2025-04-15 10:05 ` [PATCH v2 6/6] cpufreq: Eliminate clamp_and_resolve_freq() Rafael J. Wysocki
5 siblings, 2 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 10:04 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since cpufreq_driver_resolve_freq() can run in parallel with
cpufreq_set_policy() and there is no synchronization between them,
the former may access policy->min and policy->max while the latter
is updating them and it may see intermediate values of them due
to the way the update is carried out. Also the compiler is free
to apply any optimizations it wants both to the stores in
cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
which may result in additional inconsistencies.
To address this, use WRITE_ONCE() when updating policy->min and
policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
in cpufreq_set_policy() to avoid storing intermediate values in
policy->min and policy->max with the help of the observation that
their new values are expected to be properly ordered upfront.
Also modify cpufreq_driver_resolve_freq() to take the possible reverse
ordering of policy->min and policy->max, which may happen depending on
the ordering of operations when this function and cpufreq_set_policy()
run concurrently, into account by always honoring the max when it
turns out to be less than the min (in case it comes from thermal
throttling or similar).
Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1 -> v2: Minor edit in the subject
---
drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 36 insertions(+), 10 deletions(-)
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -490,14 +490,12 @@
}
EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
-static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
+static unsigned int __resolve_freq(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
{
unsigned int idx;
- target_freq = clamp_val(target_freq, policy->min, policy->max);
-
if (!policy->freq_table)
return target_freq;
@@ -507,6 +505,15 @@
return policy->freq_table[idx].frequency;
}
+static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ target_freq = clamp_val(target_freq, policy->min, policy->max);
+
+ return __resolve_freq(policy, target_freq, relation);
+}
+
/**
* cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
* one.
@@ -521,7 +528,22 @@
unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
unsigned int target_freq)
{
- return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
+ unsigned int min = READ_ONCE(policy->min);
+ unsigned int max = READ_ONCE(policy->max);
+
+ /*
+ * If this function runs in parallel with cpufreq_set_policy(), it may
+ * read policy->min before the update and policy->max after the update
+ * or the other way around, so there is no ordering guarantee.
+ *
+ * Resolve this by always honoring the max (in case it comes from
+ * thermal throttling or similar).
+ */
+ if (unlikely(min > max))
+ min = max;
+
+ return __resolve_freq(policy, clamp_val(target_freq, min, max),
+ CPUFREQ_RELATION_LE);
}
EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
@@ -2632,11 +2654,15 @@
* Resolve policy min/max to available frequencies. It ensures
* no frequency resolution will neither overshoot the requested maximum
* nor undershoot the requested minimum.
+ *
+ * Avoid storing intermediate values in policy->max or policy->min and
+ * compiler optimizations around them because them may be accessed
+ * concurrently by cpufreq_driver_resolve_freq() during the update.
*/
- policy->min = new_data.min;
- policy->max = new_data.max;
- policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
- policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
+ WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
+ new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
+ WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
+
trace_cpu_frequency_limits(policy);
cpufreq_update_pressure(policy);
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 6/6] cpufreq: Eliminate clamp_and_resolve_freq()
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
` (4 preceding siblings ...)
2025-04-15 10:04 ` [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max Rafael J. Wysocki
@ 2025-04-15 10:05 ` Rafael J. Wysocki
5 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-15 10:05 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fold clamp_and_resolve_freq() into __cpufreq_driver_target() which is
its only remaining caller.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v1 -> v2: No changes
---
drivers/cpufreq/cpufreq.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -505,15 +505,6 @@
return policy->freq_table[idx].frequency;
}
-static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- target_freq = clamp_val(target_freq, policy->min, policy->max);
-
- return __resolve_freq(policy, target_freq, relation);
-}
-
/**
* cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
* one.
@@ -2361,7 +2352,8 @@
if (cpufreq_disabled())
return -ENODEV;
- target_freq = clamp_and_resolve_freq(policy, target_freq, relation);
+ target_freq = clamp_val(target_freq, policy->min, policy->max);
+ target_freq = __resolve_freq(policy, target_freq, relation);
pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
policy->cpu, target_freq, relation, old_target_freq);
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
2025-04-15 9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
@ 2025-04-16 11:35 ` Christian Loehle
2025-04-20 1:10 ` Sultan Alsawaf
1 sibling, 0 replies; 18+ messages in thread
From: Christian Loehle @ 2025-04-16 11:35 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
On 4/15/25 10:58, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
> by need_freq_update") modified sugov_should_update_freq() to set the
> need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
> set, but that flag generally needs to be set when the policy limits
> change because the driver callback may need to be invoked for the new
> limits to take effect.
>
> However, if the return value of cpufreq_driver_resolve_freq() after
> applying the new limits is still equal to the previously selected
> frequency, the driver callback needs to be invoked only in the case
> when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
> specifically wants its callback to be invoked every time the policy
> limits change).
>
> Update the code accordingly to avoid missing policy limits changes for
> drivers without CPUFREQ_NEED_UPDATE_LIMITS.
>
> Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
> Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
> Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
> ---
>
> v1 -> v2:
> * Always set need_freq_update when limits_changed is set.
> * Take CPUFREQ_NEED_UPDATE_LIMITS into account in sugov_update_next_freq().
>
> ---
> kernel/sched/cpufreq_schedutil.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -83,7 +83,7 @@
>
> if (unlikely(sg_policy->limits_changed)) {
> sg_policy->limits_changed = false;
> - sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> + sg_policy->need_freq_update = true;
> return true;
> }
>
> @@ -95,10 +95,22 @@
> static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> - if (sg_policy->need_freq_update)
> + if (sg_policy->need_freq_update) {
> sg_policy->need_freq_update = false;
> - else if (sg_policy->next_freq == next_freq)
> + /*
> + * The policy limits have changed, but if the return value of
> + * cpufreq_driver_resolve_freq() after applying the new limits
> + * is still equal to the previously selected frequency, the
> + * driver callback need not be invoked unless the driver
> + * specifically wants that to happen on every update of the
> + * policy limits.
> + */
> + if (sg_policy->next_freq == next_freq &&
> + !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> + return false;
> + } else if (sg_policy->next_freq == next_freq) {
> return false;
> + }
>
> sg_policy->next_freq = next_freq;
> sg_policy->last_freq_update_time = time;
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling
2025-04-15 9:59 ` [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling Rafael J. Wysocki
@ 2025-04-16 12:01 ` Christian Loehle
2025-04-16 12:28 ` Rafael J. Wysocki
0 siblings, 1 reply; 18+ messages in thread
From: Christian Loehle @ 2025-04-16 12:01 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
On 4/15/25 10:59, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The handling of the limits_changed flag in struct sugov_policy needs to
> be explicitly synchronized to ensure that cpufreq policy limits updates
> will not be missed in some cases.
>
> Without that synchronization it is theoretically possible that
> the limits_changed update in sugov_should_update_freq() will be
> reordered with respect to the reads of the policy limits in
> cpufreq_driver_resolve_freq() and in that case, if the limits_changed
> update in sugov_limits() clobbers the one in sugov_should_update_freq(),
> the new policy limits may not take effect for a long time.
>
> Likewise, the limits_changed update in sugov_limits() may theoretically
> get reordered with respect to the updates of the policy limits in
> cpufreq_set_policy() and if sugov_should_update_freq() runs between
> them, the policy limits change may be missed.
>
> To ensure that the above situations will not take place, add memory
> barriers preventing the reordering in question from taking place and
> add READ_ONCE() and WRITE_ONCE() annotations around all of the
> limits_changed flag updates to prevent the compiler from messing up
> with that code.
>
> Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
> Cc: 5.3+ <stable@vger.nernel.org> # 5.3+
typo in the address here.
I don't fully understand why we wouldn't want this in 6.15-rc already,
even if the actual impact may be limited?
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
>[snip]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
2025-04-15 10:00 ` [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() Rafael J. Wysocki
@ 2025-04-16 12:26 ` Christian Loehle
0 siblings, 0 replies; 18+ messages in thread
From: Christian Loehle @ 2025-04-16 12:26 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
On 4/15/25 11:00, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Notice that ignore_dl_rate_limit() need not piggy back on the
> limits_changed handling to achieve its goal (which is to enforce a
> frequency update before its due time).
>
> Namely, if sugov_should_update_freq() is updated to check
> sg_policy->need_freq_update and return 'true' if it is set when
> sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
> set the former directly instead of setting the latter, so it can
> avoid hitting the memory barrier in sugov_should_update_freq().
>
> Update the code accordingly.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Much cleaner now, thanks!
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
> ---
> kernel/sched/cpufreq_schedutil.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -96,6 +96,9 @@
> smp_mb();
>
> return true;
> + } else if (sg_policy->need_freq_update) {
> + /* ignore_dl_rate_limit() wants a new frequency to be found. */
> + return true;
> }
>
> delta_ns = time - sg_policy->last_freq_update_time;
> @@ -388,7 +391,7 @@
> static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
> {
> if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min)
> - WRITE_ONCE(sg_cpu->sg_policy->limits_changed, true);
> + sg_cpu->sg_policy->need_freq_update = true;
> }
>
> static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu,
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling
2025-04-16 12:01 ` Christian Loehle
@ 2025-04-16 12:28 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-16 12:28 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, LKML, Viresh Kumar,
Srinivas Pandruvada, Mario Limonciello, Vincent Guittot,
Sultan Alsawaf, Peter Zijlstra, Valentin Schneider, Ingo Molnar
On Wed, Apr 16, 2025 at 2:01 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 4/15/25 10:59, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The handling of the limits_changed flag in struct sugov_policy needs to
> > be explicitly synchronized to ensure that cpufreq policy limits updates
> > will not be missed in some cases.
> >
> > Without that synchronization it is theoretically possible that
> > the limits_changed update in sugov_should_update_freq() will be
> > reordered with respect to the reads of the policy limits in
> > cpufreq_driver_resolve_freq() and in that case, if the limits_changed
> > update in sugov_limits() clobbers the one in sugov_should_update_freq(),
> > the new policy limits may not take effect for a long time.
> >
> > Likewise, the limits_changed update in sugov_limits() may theoretically
> > get reordered with respect to the updates of the policy limits in
> > cpufreq_set_policy() and if sugov_should_update_freq() runs between
> > them, the policy limits change may be missed.
> >
> > To ensure that the above situations will not take place, add memory
> > barriers preventing the reordering in question from taking place and
> > add READ_ONCE() and WRITE_ONCE() annotations around all of the
> > limits_changed flag updates to prevent the compiler from messing up
> > with that code.
> >
> > Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
> > Cc: 5.3+ <stable@vger.nernel.org> # 5.3+
>
> typo in the address here.
Yup, thanks!
> I don't fully understand why we wouldn't want this in 6.15-rc already,
> even if the actual impact may be limited?
I wanted it to get some coverage in linux-next before it hits
mainline, but then it's -rc3 time frame, so there will still be 4
weeks of testing before the final release.
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-15 10:04 ` [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max Rafael J. Wysocki
@ 2025-04-16 12:39 ` Christian Loehle
2025-04-16 12:50 ` Rafael J. Wysocki
2025-04-18 10:18 ` Sultan Alsawaf
1 sibling, 1 reply; 18+ messages in thread
From: Christian Loehle @ 2025-04-16 12:39 UTC (permalink / raw)
To: Rafael J. Wysocki, Linux PM
Cc: LKML, Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Sultan Alsawaf, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
On 4/15/25 11:04, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Since cpufreq_driver_resolve_freq() can run in parallel with
> cpufreq_set_policy() and there is no synchronization between them,
> the former may access policy->min and policy->max while the latter
> is updating them and it may see intermediate values of them due
> to the way the update is carried out. Also the compiler is free
> to apply any optimizations it wants both to the stores in
> cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> which may result in additional inconsistencies.
>
> To address this, use WRITE_ONCE() when updating policy->min and
> policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> in cpufreq_set_policy() to avoid storing intermediate values in
> policy->min and policy->max with the help of the observation that
> their new values are expected to be properly ordered upfront.
>
> Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> ordering of policy->min and policy->max, which may happen depending on
> the ordering of operations when this function and cpufreq_set_policy()
> run concurrently, into account by always honoring the max when it
> turns out to be less than the min (in case it comes from thermal
> throttling or similar).
>
> Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Just so I understand, the reason you don't squish 4-6 into one is
because this is the only fix? I do get that, but doesn't the fact
that it could easily be picked for backports make up for the additional
refactor?
Actual changes from patches 4-6 look good to me.
> ---
>
> v1 -> v2: Minor edit in the subject
>
> ---
> drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 36 insertions(+), 10 deletions(-)
>
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -490,14 +490,12 @@
> }
> EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
>
> -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> - unsigned int target_freq,
> - unsigned int relation)
> +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> + unsigned int target_freq,
> + unsigned int relation)
> {
> unsigned int idx;
>
> - target_freq = clamp_val(target_freq, policy->min, policy->max);
> -
> if (!policy->freq_table)
> return target_freq;
>
> @@ -507,6 +505,15 @@
> return policy->freq_table[idx].frequency;
> }
>
> +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> + unsigned int target_freq,
> + unsigned int relation)
> +{
> + target_freq = clamp_val(target_freq, policy->min, policy->max);
> +
> + return __resolve_freq(policy, target_freq, relation);
> +}
> +
> /**
> * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> * one.
> @@ -521,7 +528,22 @@
> unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> unsigned int target_freq)
> {
> - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> + unsigned int min = READ_ONCE(policy->min);
> + unsigned int max = READ_ONCE(policy->max);
> +
> + /*
> + * If this function runs in parallel with cpufreq_set_policy(), it may
> + * read policy->min before the update and policy->max after the update
> + * or the other way around, so there is no ordering guarantee.
> + *
> + * Resolve this by always honoring the max (in case it comes from
> + * thermal throttling or similar).
> + */
> + if (unlikely(min > max))
> + min = max;
> +
> + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> + CPUFREQ_RELATION_LE);
> }
> EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
>
> @@ -2632,11 +2654,15 @@
> * Resolve policy min/max to available frequencies. It ensures
> * no frequency resolution will neither overshoot the requested maximum
> * nor undershoot the requested minimum.
> + *
> + * Avoid storing intermediate values in policy->max or policy->min and
> + * compiler optimizations around them because them may be accessed
s/them/they/
> + * concurrently by cpufreq_driver_resolve_freq() during the update.
> */
> - policy->min = new_data.min;
> - policy->max = new_data.max;
> - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
> +
> trace_cpu_frequency_limits(policy);
>
> cpufreq_update_pressure(policy);
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-16 12:39 ` Christian Loehle
@ 2025-04-16 12:50 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-16 12:50 UTC (permalink / raw)
To: Christian Loehle
Cc: Rafael J. Wysocki, Linux PM, LKML, Viresh Kumar,
Srinivas Pandruvada, Mario Limonciello, Vincent Guittot,
Sultan Alsawaf, Peter Zijlstra, Valentin Schneider, Ingo Molnar
On Wed, Apr 16, 2025 at 2:39 PM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 4/15/25 11:04, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Since cpufreq_driver_resolve_freq() can run in parallel with
> > cpufreq_set_policy() and there is no synchronization between them,
> > the former may access policy->min and policy->max while the latter
> > is updating them and it may see intermediate values of them due
> > to the way the update is carried out. Also the compiler is free
> > to apply any optimizations it wants both to the stores in
> > cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> > which may result in additional inconsistencies.
> >
> > To address this, use WRITE_ONCE() when updating policy->min and
> > policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> > them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> > in cpufreq_set_policy() to avoid storing intermediate values in
> > policy->min and policy->max with the help of the observation that
> > their new values are expected to be properly ordered upfront.
> >
> > Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> > ordering of policy->min and policy->max, which may happen depending on
> > the ordering of operations when this function and cpufreq_set_policy()
> > run concurrently, into account by always honoring the max when it
> > turns out to be less than the min (in case it comes from thermal
> > throttling or similar).
> >
> > Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> > Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Just so I understand, the reason you don't squish 4-6 into one is
> because this is the only fix? I do get that, but doesn't the fact
> that it could easily be picked for backports make up for the additional
> refactor?
Yeah, I think I'll just merge them together and resend.
> Actual changes from patches 4-6 look good to me.
OK, thanks!
> > ---
> >
> > v1 -> v2: Minor edit in the subject
> >
> > ---
> > drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> > 1 file changed, 36 insertions(+), 10 deletions(-)
> >
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -490,14 +490,12 @@
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
> >
> > -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > - unsigned int target_freq,
> > - unsigned int relation)
> > +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> > + unsigned int target_freq,
> > + unsigned int relation)
> > {
> > unsigned int idx;
> >
> > - target_freq = clamp_val(target_freq, policy->min, policy->max);
> > -
> > if (!policy->freq_table)
> > return target_freq;
> >
> > @@ -507,6 +505,15 @@
> > return policy->freq_table[idx].frequency;
> > }
> >
> > +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > + unsigned int target_freq,
> > + unsigned int relation)
> > +{
> > + target_freq = clamp_val(target_freq, policy->min, policy->max);
> > +
> > + return __resolve_freq(policy, target_freq, relation);
> > +}
> > +
> > /**
> > * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> > * one.
> > @@ -521,7 +528,22 @@
> > unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > unsigned int target_freq)
> > {
> > - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> > + unsigned int min = READ_ONCE(policy->min);
> > + unsigned int max = READ_ONCE(policy->max);
> > +
> > + /*
> > + * If this function runs in parallel with cpufreq_set_policy(), it may
> > + * read policy->min before the update and policy->max after the update
> > + * or the other way around, so there is no ordering guarantee.
> > + *
> > + * Resolve this by always honoring the max (in case it comes from
> > + * thermal throttling or similar).
> > + */
> > + if (unlikely(min > max))
> > + min = max;
> > +
> > + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> > + CPUFREQ_RELATION_LE);
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> >
> > @@ -2632,11 +2654,15 @@
> > * Resolve policy min/max to available frequencies. It ensures
> > * no frequency resolution will neither overshoot the requested maximum
> > * nor undershoot the requested minimum.
> > + *
> > + * Avoid storing intermediate values in policy->max or policy->min and
> > + * compiler optimizations around them because them may be accessed
>
> s/them/they/
Yup, thanks!
> > + * concurrently by cpufreq_driver_resolve_freq() during the update.
> > */
> > - policy->min = new_data.min;
> > - policy->max = new_data.max;
> > - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> > - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> > + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> > + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> > + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
> > +
> > trace_cpu_frequency_limits(policy);
> >
> > cpufreq_update_pressure(policy);
Thanks for all the reviews!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-15 10:04 ` [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max Rafael J. Wysocki
2025-04-16 12:39 ` Christian Loehle
@ 2025-04-18 10:18 ` Sultan Alsawaf
2025-04-18 19:42 ` Rafael J. Wysocki
1 sibling, 1 reply; 18+ messages in thread
From: Sultan Alsawaf @ 2025-04-18 10:18 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Viresh Kumar, Srinivas Pandruvada,
Mario Limonciello, Vincent Guittot, Christian Loehle,
Peter Zijlstra, Valentin Schneider, Ingo Molnar
On Tue, Apr 15, 2025 at 12:04:21PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Since cpufreq_driver_resolve_freq() can run in parallel with
> cpufreq_set_policy() and there is no synchronization between them,
> the former may access policy->min and policy->max while the latter
> is updating them and it may see intermediate values of them due
> to the way the update is carried out. Also the compiler is free
> to apply any optimizations it wants both to the stores in
> cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> which may result in additional inconsistencies.
>
> To address this, use WRITE_ONCE() when updating policy->min and
> policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> in cpufreq_set_policy() to avoid storing intermediate values in
> policy->min and policy->max with the help of the observation that
> their new values are expected to be properly ordered upfront.
>
> Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> ordering of policy->min and policy->max, which may happen depending on
> the ordering of operations when this function and cpufreq_set_policy()
> run concurrently, into account by always honoring the max when it
> turns out to be less than the min (in case it comes from thermal
> throttling or similar).
>
> Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v1 -> v2: Minor edit in the subject
>
> ---
> drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 36 insertions(+), 10 deletions(-)
>
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -490,14 +490,12 @@
> }
> EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
>
> -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> - unsigned int target_freq,
> - unsigned int relation)
> +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> + unsigned int target_freq,
> + unsigned int relation)
> {
> unsigned int idx;
>
> - target_freq = clamp_val(target_freq, policy->min, policy->max);
> -
> if (!policy->freq_table)
> return target_freq;
>
> @@ -507,6 +505,15 @@
> return policy->freq_table[idx].frequency;
> }
>
> +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> + unsigned int target_freq,
> + unsigned int relation)
> +{
> + target_freq = clamp_val(target_freq, policy->min, policy->max);
> +
> + return __resolve_freq(policy, target_freq, relation);
> +}
> +
> /**
> * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> * one.
> @@ -521,7 +528,22 @@
> unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> unsigned int target_freq)
> {
> - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> + unsigned int min = READ_ONCE(policy->min);
> + unsigned int max = READ_ONCE(policy->max);
> +
> + /*
> + * If this function runs in parallel with cpufreq_set_policy(), it may
> + * read policy->min before the update and policy->max after the update
> + * or the other way around, so there is no ordering guarantee.
> + *
> + * Resolve this by always honoring the max (in case it comes from
> + * thermal throttling or similar).
> + */
> + if (unlikely(min > max))
> + min = max;
> +
> + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> + CPUFREQ_RELATION_LE);
> }
> EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
>
> @@ -2632,11 +2654,15 @@
> * Resolve policy min/max to available frequencies. It ensures
> * no frequency resolution will neither overshoot the requested maximum
> * nor undershoot the requested minimum.
> + *
> + * Avoid storing intermediate values in policy->max or policy->min and
> + * compiler optimizations around them because them may be accessed
> + * concurrently by cpufreq_driver_resolve_freq() during the update.
> */
> - policy->min = new_data.min;
> - policy->max = new_data.max;
> - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
I don't think this is sufficient, because this still permits an incoherent
policy->min and policy->max combination, which makes it possible for schedutil
to honor the incoherent limits; i.e., schedutil may observe old policy->min and
new policy->max or vice-versa.
We also can't permit a wrong freq to be propagated to the driver and then send
the _right_ freq afterwards; IOW, we can't let a bogus freq slip through and
just correct it later.
How about using a seqlock?
Sultan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-18 10:18 ` Sultan Alsawaf
@ 2025-04-18 19:42 ` Rafael J. Wysocki
2025-04-18 22:21 ` Sultan Alsawaf
0 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-18 19:42 UTC (permalink / raw)
To: Sultan Alsawaf
Cc: Rafael J. Wysocki, Linux PM, LKML, Viresh Kumar,
Srinivas Pandruvada, Mario Limonciello, Vincent Guittot,
Christian Loehle, Peter Zijlstra, Valentin Schneider, Ingo Molnar
On Fri, Apr 18, 2025 at 12:18 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Tue, Apr 15, 2025 at 12:04:21PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Since cpufreq_driver_resolve_freq() can run in parallel with
> > cpufreq_set_policy() and there is no synchronization between them,
> > the former may access policy->min and policy->max while the latter
> > is updating them and it may see intermediate values of them due
> > to the way the update is carried out. Also the compiler is free
> > to apply any optimizations it wants both to the stores in
> > cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> > which may result in additional inconsistencies.
> >
> > To address this, use WRITE_ONCE() when updating policy->min and
> > policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> > them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> > in cpufreq_set_policy() to avoid storing intermediate values in
> > policy->min and policy->max with the help of the observation that
> > their new values are expected to be properly ordered upfront.
> >
> > Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> > ordering of policy->min and policy->max, which may happen depending on
> > the ordering of operations when this function and cpufreq_set_policy()
> > run concurrently, into account by always honoring the max when it
> > turns out to be less than the min (in case it comes from thermal
> > throttling or similar).
> >
> > Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> > Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v1 -> v2: Minor edit in the subject
> >
> > ---
> > drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> > 1 file changed, 36 insertions(+), 10 deletions(-)
> >
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -490,14 +490,12 @@
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
> >
> > -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > - unsigned int target_freq,
> > - unsigned int relation)
> > +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> > + unsigned int target_freq,
> > + unsigned int relation)
> > {
> > unsigned int idx;
> >
> > - target_freq = clamp_val(target_freq, policy->min, policy->max);
> > -
> > if (!policy->freq_table)
> > return target_freq;
> >
> > @@ -507,6 +505,15 @@
> > return policy->freq_table[idx].frequency;
> > }
> >
> > +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > + unsigned int target_freq,
> > + unsigned int relation)
> > +{
> > + target_freq = clamp_val(target_freq, policy->min, policy->max);
> > +
> > + return __resolve_freq(policy, target_freq, relation);
> > +}
> > +
> > /**
> > * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> > * one.
> > @@ -521,7 +528,22 @@
> > unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > unsigned int target_freq)
> > {
> > - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> > + unsigned int min = READ_ONCE(policy->min);
> > + unsigned int max = READ_ONCE(policy->max);
> > +
> > + /*
> > + * If this function runs in parallel with cpufreq_set_policy(), it may
> > + * read policy->min before the update and policy->max after the update
> > + * or the other way around, so there is no ordering guarantee.
> > + *
> > + * Resolve this by always honoring the max (in case it comes from
> > + * thermal throttling or similar).
> > + */
> > + if (unlikely(min > max))
> > + min = max;
> > +
> > + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> > + CPUFREQ_RELATION_LE);
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> >
> > @@ -2632,11 +2654,15 @@
> > * Resolve policy min/max to available frequencies. It ensures
> > * no frequency resolution will neither overshoot the requested maximum
> > * nor undershoot the requested minimum.
> > + *
> > + * Avoid storing intermediate values in policy->max or policy->min and
> > + * compiler optimizations around them because them may be accessed
> > + * concurrently by cpufreq_driver_resolve_freq() during the update.
> > */
> > - policy->min = new_data.min;
> > - policy->max = new_data.max;
> > - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> > - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> > + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> > + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> > + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
>
> I don't think this is sufficient, because this still permits an incoherent
> policy->min and policy->max combination, which makes it possible for schedutil
> to honor the incoherent limits; i.e., schedutil may observe old policy->min and
> new policy->max or vice-versa.
Yes, it may, as stated in the new comment in cpufreq_driver_resolve_freq().
> We also can't permit a wrong freq to be propagated to the driver and then send
> the _right_ freq afterwards; IOW, we can't let a bogus freq slip through and
> just correct it later.
The frequency is neither wrong nor bogus, it is only affected by one
of the limits that were in effect previously or will be in effect
going forward. They are valid limits in either case.
> How about using a seqlock?
This would mean extra overhead in the scheduler path pretty much for no gain.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-18 19:42 ` Rafael J. Wysocki
@ 2025-04-18 22:21 ` Sultan Alsawaf
2025-04-19 10:39 ` Rafael J. Wysocki
0 siblings, 1 reply; 18+ messages in thread
From: Sultan Alsawaf @ 2025-04-18 22:21 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Linux PM, LKML, Viresh Kumar,
Srinivas Pandruvada, Mario Limonciello, Vincent Guittot,
Christian Loehle, Peter Zijlstra, Valentin Schneider, Ingo Molnar
On Fri, Apr 18, 2025 at 09:42:15PM +0200, Rafael J. Wysocki wrote:
> On Fri, Apr 18, 2025 at 12:18 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> >
> > On Tue, Apr 15, 2025 at 12:04:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Since cpufreq_driver_resolve_freq() can run in parallel with
> > > cpufreq_set_policy() and there is no synchronization between them,
> > > the former may access policy->min and policy->max while the latter
> > > is updating them and it may see intermediate values of them due
> > > to the way the update is carried out. Also the compiler is free
> > > to apply any optimizations it wants both to the stores in
> > > cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> > > which may result in additional inconsistencies.
> > >
> > > To address this, use WRITE_ONCE() when updating policy->min and
> > > policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> > > them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> > > in cpufreq_set_policy() to avoid storing intermediate values in
> > > policy->min and policy->max with the help of the observation that
> > > their new values are expected to be properly ordered upfront.
> > >
> > > Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> > > ordering of policy->min and policy->max, which may happen depending on
> > > the ordering of operations when this function and cpufreq_set_policy()
> > > run concurrently, into account by always honoring the max when it
> > > turns out to be less than the min (in case it comes from thermal
> > > throttling or similar).
> > >
> > > Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> > > Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > v1 -> v2: Minor edit in the subject
> > >
> > > ---
> > > drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> > > 1 file changed, 36 insertions(+), 10 deletions(-)
> > >
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -490,14 +490,12 @@
> > > }
> > > EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
> > >
> > > -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > - unsigned int target_freq,
> > > - unsigned int relation)
> > > +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> > > + unsigned int target_freq,
> > > + unsigned int relation)
> > > {
> > > unsigned int idx;
> > >
> > > - target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > -
> > > if (!policy->freq_table)
> > > return target_freq;
> > >
> > > @@ -507,6 +505,15 @@
> > > return policy->freq_table[idx].frequency;
> > > }
> > >
> > > +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > + unsigned int target_freq,
> > > + unsigned int relation)
> > > +{
> > > + target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > +
> > > + return __resolve_freq(policy, target_freq, relation);
> > > +}
> > > +
> > > /**
> > > * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> > > * one.
> > > @@ -521,7 +528,22 @@
> > > unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > > unsigned int target_freq)
> > > {
> > > - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> > > + unsigned int min = READ_ONCE(policy->min);
> > > + unsigned int max = READ_ONCE(policy->max);
> > > +
> > > + /*
> > > + * If this function runs in parallel with cpufreq_set_policy(), it may
> > > + * read policy->min before the update and policy->max after the update
> > > + * or the other way around, so there is no ordering guarantee.
> > > + *
> > > + * Resolve this by always honoring the max (in case it comes from
> > > + * thermal throttling or similar).
> > > + */
> > > + if (unlikely(min > max))
> > > + min = max;
> > > +
> > > + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> > > + CPUFREQ_RELATION_LE);
> > > }
> > > EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> > >
> > > @@ -2632,11 +2654,15 @@
> > > * Resolve policy min/max to available frequencies. It ensures
> > > * no frequency resolution will neither overshoot the requested maximum
> > > * nor undershoot the requested minimum.
> > > + *
> > > + * Avoid storing intermediate values in policy->max or policy->min and
> > > + * compiler optimizations around them because them may be accessed
> > > + * concurrently by cpufreq_driver_resolve_freq() during the update.
> > > */
> > > - policy->min = new_data.min;
> > > - policy->max = new_data.max;
> > > - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> > > - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> > > + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> > > + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> > > + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
> >
> > I don't think this is sufficient, because this still permits an incoherent
> > policy->min and policy->max combination, which makes it possible for schedutil
> > to honor the incoherent limits; i.e., schedutil may observe old policy->min and
> > new policy->max or vice-versa.
>
> Yes, it may, as stated in the new comment in cpufreq_driver_resolve_freq().
Thanks for pointing that out; I had ignored that hunk while reviewing.
But I ignored it because schedutil still accesses policy->min/max unprotected
via cpufreq_policy_apply_limits() and __cpufreq_driver_target(). The race still
affects those calls.
> > We also can't permit a wrong freq to be propagated to the driver and then send
> > the _right_ freq afterwards; IOW, we can't let a bogus freq slip through and
> > just correct it later.
>
> The frequency is neither wrong nor bogus, it is only affected by one
> of the limits that were in effect previously or will be in effect
> going forward. They are valid limits in either case.
I would argue that limits only make sense as a pair, not on their own. Checking
for min > max only covers the case where the new min exceeds the old max; this
means that, when min is raised without exceeding the old max, a thermal throttle
attempt could instead result in a raised frequency floor:
1. policy->min == 100000, policy->max == 2500000
2. Policy limit update request: new min of 400000, new max of 500000
3. schedutil observes policy->min == 400000, policy->max == 2500000
Raising the min freq while lowering the max freq can be a valid thermal throttle
scheme. But it only makes sense if both limits are applied simultaneously.
> > How about using a seqlock?
>
> This would mean extra overhead in the scheduler path pretty much for no gain.
Or there's the slightly cursed approach of using a union to facilitate an atomic
64-bit store of policy->min and max at the same time, since min/max are 32 bits.
Sultan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max
2025-04-18 22:21 ` Sultan Alsawaf
@ 2025-04-19 10:39 ` Rafael J. Wysocki
0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2025-04-19 10:39 UTC (permalink / raw)
To: Sultan Alsawaf
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML,
Viresh Kumar, Srinivas Pandruvada, Mario Limonciello,
Vincent Guittot, Christian Loehle, Peter Zijlstra,
Valentin Schneider, Ingo Molnar
On Sat, Apr 19, 2025 at 12:22 AM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
>
> On Fri, Apr 18, 2025 at 09:42:15PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Apr 18, 2025 at 12:18 PM Sultan Alsawaf <sultan@kerneltoast.com> wrote:
> > >
> > > On Tue, Apr 15, 2025 at 12:04:21PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > Since cpufreq_driver_resolve_freq() can run in parallel with
> > > > cpufreq_set_policy() and there is no synchronization between them,
> > > > the former may access policy->min and policy->max while the latter
> > > > is updating them and it may see intermediate values of them due
> > > > to the way the update is carried out. Also the compiler is free
> > > > to apply any optimizations it wants both to the stores in
> > > > cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> > > > which may result in additional inconsistencies.
> > > >
> > > > To address this, use WRITE_ONCE() when updating policy->min and
> > > > policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> > > > them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> > > > in cpufreq_set_policy() to avoid storing intermediate values in
> > > > policy->min and policy->max with the help of the observation that
> > > > their new values are expected to be properly ordered upfront.
> > > >
> > > > Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> > > > ordering of policy->min and policy->max, which may happen depending on
> > > > the ordering of operations when this function and cpufreq_set_policy()
> > > > run concurrently, into account by always honoring the max when it
> > > > turns out to be less than the min (in case it comes from thermal
> > > > throttling or similar).
> > > >
> > > > Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> > > > Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > v1 -> v2: Minor edit in the subject
> > > >
> > > > ---
> > > > drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> > > > 1 file changed, 36 insertions(+), 10 deletions(-)
> > > >
> > > > --- a/drivers/cpufreq/cpufreq.c
> > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > @@ -490,14 +490,12 @@
> > > > }
> > > > EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
> > > >
> > > > -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > > - unsigned int target_freq,
> > > > - unsigned int relation)
> > > > +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> > > > + unsigned int target_freq,
> > > > + unsigned int relation)
> > > > {
> > > > unsigned int idx;
> > > >
> > > > - target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > > -
> > > > if (!policy->freq_table)
> > > > return target_freq;
> > > >
> > > > @@ -507,6 +505,15 @@
> > > > return policy->freq_table[idx].frequency;
> > > > }
> > > >
> > > > +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > > + unsigned int target_freq,
> > > > + unsigned int relation)
> > > > +{
> > > > + target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > > +
> > > > + return __resolve_freq(policy, target_freq, relation);
> > > > +}
> > > > +
> > > > /**
> > > > * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> > > > * one.
> > > > @@ -521,7 +528,22 @@
> > > > unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > > > unsigned int target_freq)
> > > > {
> > > > - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> > > > + unsigned int min = READ_ONCE(policy->min);
> > > > + unsigned int max = READ_ONCE(policy->max);
> > > > +
> > > > + /*
> > > > + * If this function runs in parallel with cpufreq_set_policy(), it may
> > > > + * read policy->min before the update and policy->max after the update
> > > > + * or the other way around, so there is no ordering guarantee.
> > > > + *
> > > > + * Resolve this by always honoring the max (in case it comes from
> > > > + * thermal throttling or similar).
> > > > + */
> > > > + if (unlikely(min > max))
> > > > + min = max;
> > > > +
> > > > + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> > > > + CPUFREQ_RELATION_LE);
> > > > }
> > > > EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> > > >
> > > > @@ -2632,11 +2654,15 @@
> > > > * Resolve policy min/max to available frequencies. It ensures
> > > > * no frequency resolution will neither overshoot the requested maximum
> > > > * nor undershoot the requested minimum.
> > > > + *
> > > > + * Avoid storing intermediate values in policy->max or policy->min and
> > > > + * compiler optimizations around them because them may be accessed
> > > > + * concurrently by cpufreq_driver_resolve_freq() during the update.
> > > > */
> > > > - policy->min = new_data.min;
> > > > - policy->max = new_data.max;
> > > > - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> > > > - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> > > > + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> > > > + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> > > > + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
> > >
> > > I don't think this is sufficient, because this still permits an incoherent
> > > policy->min and policy->max combination, which makes it possible for schedutil
> > > to honor the incoherent limits; i.e., schedutil may observe old policy->min and
> > > new policy->max or vice-versa.
> >
> > Yes, it may, as stated in the new comment in cpufreq_driver_resolve_freq().
>
> Thanks for pointing that out; I had ignored that hunk while reviewing.
>
> But I ignored it because schedutil still accesses policy->min/max unprotected
> via cpufreq_policy_apply_limits() and __cpufreq_driver_target(). The race still
> affects those calls.
>
> > > We also can't permit a wrong freq to be propagated to the driver and then send
> > > the _right_ freq afterwards; IOW, we can't let a bogus freq slip through and
> > > just correct it later.
> >
> > The frequency is neither wrong nor bogus, it is only affected by one
> > of the limits that were in effect previously or will be in effect
> > going forward. They are valid limits in either case.
>
> I would argue that limits only make sense as a pair, not on their own. Checking
> for min > max only covers the case where the new min exceeds the old max; this
> means that, when min is raised without exceeding the old max, a thermal throttle
> attempt could instead result in a raised frequency floor:
>
> 1. policy->min == 100000, policy->max == 2500000
> 2. Policy limit update request: new min of 400000, new max of 500000
> 3. schedutil observes policy->min == 400000, policy->max == 2500000
Until it runs next time that is.
> Raising the min freq while lowering the max freq can be a valid thermal throttle
> scheme.
Maybe theoretically.
All of the places in the kernel that use cpufreq for cooling I'm aware
of only set the max limit.
> But it only makes sense if both limits are applied simultaneously.
And they will, but only a bit later.
As a general rule, limits don't take effect immediately. There's no
such promise, it has never been there and never will be.
The promise is that both limits will be taken into account when the
governor runs for the first time after a limits update, not that it
will pick up those limits immediately when it is running in parallel
with the update.
Whoever updates policy limits, they need to take that into account.
> > > How about using a seqlock?
> >
> > This would mean extra overhead in the scheduler path pretty much for no gain.
>
> Or there's the slightly cursed approach of using a union to facilitate an atomic
> 64-bit store of policy->min and max at the same time, since min/max are 32 bits.
That's a possibility, but at this point I don't see why it would be necessary.
Point me to at least one remotely realistic use case that is broken as
a result of the race in question and I will reconsider it.
Thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
2025-04-15 9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
2025-04-16 11:35 ` Christian Loehle
@ 2025-04-20 1:10 ` Sultan Alsawaf
1 sibling, 0 replies; 18+ messages in thread
From: Sultan Alsawaf @ 2025-04-20 1:10 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux PM, LKML, Viresh Kumar, Srinivas Pandruvada,
Mario Limonciello, Vincent Guittot, Christian Loehle,
Peter Zijlstra, Valentin Schneider, Ingo Molnar
On Tue, Apr 15, 2025 at 11:58:08AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
> by need_freq_update") modified sugov_should_update_freq() to set the
> need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
> set, but that flag generally needs to be set when the policy limits
> change because the driver callback may need to be invoked for the new
> limits to take effect.
>
> However, if the return value of cpufreq_driver_resolve_freq() after
> applying the new limits is still equal to the previously selected
> frequency, the driver callback needs to be invoked only in the case
> when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
> specifically wants its callback to be invoked every time the policy
> limits change).
>
> Update the code accordingly to avoid missing policy limits changes for
> drivers without CPUFREQ_NEED_UPDATE_LIMITS.
>
> Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
> Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
> Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thanks for fixing this.
Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com>
> ---
>
> v1 -> v2:
> * Always set need_freq_update when limits_changed is set.
> * Take CPUFREQ_NEED_UPDATE_LIMITS into account in sugov_update_next_freq().
>
> ---
> kernel/sched/cpufreq_schedutil.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -83,7 +83,7 @@
>
> if (unlikely(sg_policy->limits_changed)) {
> sg_policy->limits_changed = false;
> - sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> + sg_policy->need_freq_update = true;
> return true;
> }
>
> @@ -95,10 +95,22 @@
> static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> - if (sg_policy->need_freq_update)
> + if (sg_policy->need_freq_update) {
> sg_policy->need_freq_update = false;
> - else if (sg_policy->next_freq == next_freq)
> + /*
> + * The policy limits have changed, but if the return value of
> + * cpufreq_driver_resolve_freq() after applying the new limits
> + * is still equal to the previously selected frequency, the
> + * driver callback need not be invoked unless the driver
> + * specifically wants that to happen on every update of the
> + * policy limits.
> + */
> + if (sg_policy->next_freq == next_freq &&
> + !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> + return false;
> + } else if (sg_policy->next_freq == next_freq) {
> return false;
> + }
>
> sg_policy->next_freq = next_freq;
> sg_policy->last_freq_update_time = time;
>
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-04-20 1:10 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-15 9:52 [PATCH v2 0/6] cpufreq/sched: Improve synchronization of policy limits updates with schedutil Rafael J. Wysocki
2025-04-15 9:58 ` [PATCH v2 1/6] cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Rafael J. Wysocki
2025-04-16 11:35 ` Christian Loehle
2025-04-20 1:10 ` Sultan Alsawaf
2025-04-15 9:59 ` [PATCH v2 2/6] cpufreq/sched: Explicitly synchronize limits_changed flag handling Rafael J. Wysocki
2025-04-16 12:01 ` Christian Loehle
2025-04-16 12:28 ` Rafael J. Wysocki
2025-04-15 10:00 ` [PATCH v2 3/6] cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() Rafael J. Wysocki
2025-04-16 12:26 ` Christian Loehle
2025-04-15 10:02 ` [PATCH v2 4/6] cpufreq: Rename __resolve_freq() to clamp_and_resolve_freq() Rafael J. Wysocki
2025-04-15 10:04 ` [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and policy->max Rafael J. Wysocki
2025-04-16 12:39 ` Christian Loehle
2025-04-16 12:50 ` Rafael J. Wysocki
2025-04-18 10:18 ` Sultan Alsawaf
2025-04-18 19:42 ` Rafael J. Wysocki
2025-04-18 22:21 ` Sultan Alsawaf
2025-04-19 10:39 ` Rafael J. Wysocki
2025-04-15 10:05 ` [PATCH v2 6/6] cpufreq: Eliminate clamp_and_resolve_freq() Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox