public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] cpufreq/amd-pstate: Prevent scheduling when atomic on PREEMPT_RT
@ 2026-03-16  8:18 K Prateek Nayak
  2026-03-16  8:18 ` [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update() K Prateek Nayak
  2026-03-16  8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
  0 siblings, 2 replies; 7+ messages in thread
From: K Prateek Nayak @ 2026-03-16  8:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Huang Rui, Gautham R. Shenoy,
	Mario Limonciello, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Srinivas Pandruvada, Len Brown, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Miguel Ojeda
  Cc: Perry Yuan, linux-pm, linux-kernel, rust-for-linux,
	linux-rt-devel, K Prateek Nayak, Dietmar Eggemann, Ben Segall,
	Mel Gorman, Valentin Schneider, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich

Bert reported hitting "BUG: scheduling while atomic" when running
amd-pstate-ut on a PREEMPT_RT kernel [1].

Since reader-writer locks turn sleepable on PREEMPT_RT, they are not
suitable to be used in the scheduler hot-path under rq_lock to grab the
cpufreq policy object.

Unfortunately, the amd-pstate driver has a tight coupling between the
cpufreq_policy object and the cpudata stored in it as the driver_data.

Trying to grab a read reference on PREEMPT_RT can cause "scheduling
while atomic" if a concurrent writer is active, and trying to grab a
nested reference in presence of a writer can cause a deadlock (manifests
as lockup) since the reader fast-path is disabled on PREEMPT_RT to
prevent write-side starvation.

The two patches included removes cases of grabbing a nested read
reference to the cpufreq policy in amd-pstate, and modifies the
cpufreq_driver->adjust_perf() callback to take the raw policy reference
cached by the schedutil governor respectively.

The policy object outlives the governor and the driver making it safe to
use this cached reference from the sugov data. Any changes to the policy
will end up calling cpufreq_driver->set_policy() or
governor->set_limits() once the policy is modified which should ensure
eventual consistency despite not holding the read-side.

Series has been tested with amd-pstate-ut on PREEMPT_RT kernel which
successfully passes without any splats on LOCKDEP + DEBUG_ATOMIC_SLEEP
config. Additionally, the driver switch test from Gautham [2] was run
for 10min on the same config without observing any splats.

[1] https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/
[2] https://lore.kernel.org/all/aJRN2wMLAnhDFykv@BLRRASHENOY1.amd.com/

Since the generic cpufreq changes also affect the rust bindings, Patch2
includes the necessary rust/ binding fixes to preserve bisection.

Patches are based on:

  git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge

at commit 18ca40891b83 ("Merge branch 'pm-cpuidle-fixes' into
bleeding-edge") (14-03-2026).

Everyone has been Cc'd on the cover letter. Respective Maintainers,
Reviewers, and the folks from get_maintainer.pl have been Cc'd on the
respective patches. The amd-pstate maintainers, PREEMPT_RT maintainers,
and the lists get the entire series.

Rafael, since this touches the generic cpufreq layer, it would be good
to route this through linux-pm. Upcoming amd-pstate changes from Gautham
[3] apply cleanly on top of this and doesn't have any dependencies.

[3] https://lore.kernel.org/lkml/20260311140116.19604-1-gautham.shenoy@amd.com/

If you would like to see anything changed, please let me know and I'll
address it in the next version.
---
Chnagelog v3..v4:

o Rebased and tested on the latest linux-pm:bleeding-edge.

o Added a "Fixes:" tag to Patch 2. (Gautham)

v3: https://lore.kernel.org/lkml/20260116063234.24084-1-kprateek.nayak@amd.com/

Changelog v2..v3:

o Fixed the rust bindings for adjust_perf_callaback in Patch 2 (kernel
  test robot).

o Tested PREEMPT_RT + CONFIG_RUST builds using amd-pstate-ut.

o Viresh's ack on Patch 2 was retained since the incremental diff for
  Rust bindings was discussed on v2. (Viresh let me know in case you
  have additional comments and if I should drop the tag.)

o Adding the Rust and sched folks to Cc for awareness on rust bindings
  and the schedutil bits respectively.

v2: https://lore.kernel.org/lkml/20260114085113.21378-1-kprateek.nayak@amd.com/

Changelog rfc v1..v2:

o Updated the kdoc comment above cpufreq_driver_adjust_perf() in Patch 2
  to reflect that cpufreq_policy is passed as an argument now instead of
  the target CPU. (kernel test robot)

o Added "Reported-by:" and "Closes:" tags to Patch 2. (Mario)

o Collected tags from v1. (Thank you Mario, Viresh)

o Dropped the RFC tag.

v1: https://lore.kernel.org/lkml/20260106073608.278644-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (2):
  cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
  cpufreq: Pass the policy to cpufreq_driver->adjust_perf()

 drivers/cpufreq/amd-pstate.c     | 14 +++++---------
 drivers/cpufreq/cpufreq.c        |  6 +++---
 drivers/cpufreq/intel_pstate.c   |  4 ++--
 include/linux/cpufreq.h          |  4 ++--
 kernel/sched/cpufreq_schedutil.c |  5 +++--
 rust/kernel/cpufreq.rs           | 13 ++++++-------
 6 files changed, 21 insertions(+), 25 deletions(-)


base-commit: 18ca40891b83f01b35537f4cc93c9d95528eaf42
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
  2026-03-16  8:18 [PATCH v4 0/2] cpufreq/amd-pstate: Prevent scheduling when atomic on PREEMPT_RT K Prateek Nayak
@ 2026-03-16  8:18 ` K Prateek Nayak
  2026-03-26 12:03   ` Gautham R. Shenoy
  2026-03-16  8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
  1 sibling, 1 reply; 7+ messages in thread
From: K Prateek Nayak @ 2026-03-16  8:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Huang Rui, Gautham R. Shenoy,
	Mario Limonciello, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt
  Cc: Perry Yuan, linux-pm, linux-kernel, rust-for-linux,
	linux-rt-devel, K Prateek Nayak, Mario Limonciello

All callers of amd_pstate_update() already have a reference to the
cpufreq_policy object.

Pass the entire policy object and grab the cpudata using
"policy->driver_data" instead of passing the cpudata and unnecessarily
grabbing another read-side reference to the cpufreq policy object when
it is already available in the caller.

No functional changes intended.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v3..v4:

o No changes.
---
 drivers/cpufreq/amd-pstate.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 5aa9fcd80cf5..5faccb3d6b14 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -565,15 +565,12 @@ static inline bool amd_pstate_sample(struct amd_cpudata *cpudata)
 	return true;
 }
 
-static void amd_pstate_update(struct amd_cpudata *cpudata, u8 min_perf,
+static void amd_pstate_update(struct cpufreq_policy *policy, u8 min_perf,
 			      u8 des_perf, u8 max_perf, bool fast_switch, int gov_flags)
 {
-	struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu);
+	struct amd_cpudata *cpudata = policy->driver_data;
 	union perf_cached perf = READ_ONCE(cpudata->perf);
 
-	if (!policy)
-		return;
-
 	/* limit the max perf when core performance boost feature is disabled */
 	if (!cpudata->boost_supported)
 		max_perf = min_t(u8, perf.nominal_perf, max_perf);
@@ -688,7 +685,7 @@ static int amd_pstate_update_freq(struct cpufreq_policy *policy,
 	if (!fast_switch)
 		cpufreq_freq_transition_begin(policy, &freqs);
 
-	amd_pstate_update(cpudata, perf.min_limit_perf, des_perf,
+	amd_pstate_update(policy, perf.min_limit_perf, des_perf,
 			  perf.max_limit_perf, fast_switch,
 			  policy->governor->flags);
 
@@ -750,7 +747,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
 	if (max_perf < min_perf)
 		max_perf = min_perf;
 
-	amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true,
+	amd_pstate_update(policy, min_perf, des_perf, max_perf, true,
 			policy->governor->flags);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
  2026-03-16  8:18 [PATCH v4 0/2] cpufreq/amd-pstate: Prevent scheduling when atomic on PREEMPT_RT K Prateek Nayak
  2026-03-16  8:18 ` [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update() K Prateek Nayak
@ 2026-03-16  8:18 ` K Prateek Nayak
  2026-03-16 10:59   ` Gary Guo
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: K Prateek Nayak @ 2026-03-16  8:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Huang Rui, Gautham R. Shenoy,
	Mario Limonciello, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Srinivas Pandruvada, Len Brown, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Miguel Ojeda
  Cc: Perry Yuan, linux-pm, linux-kernel, rust-for-linux,
	linux-rt-devel, K Prateek Nayak, Dietmar Eggemann, Ben Segall,
	Mel Gorman, Valentin Schneider, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Bert Karwatzki

cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
writer(s), however amd-pstate depends on fetching the cpudata via the
policy's driver data which necessitates grabbing the reference.

Since schedutil governor can call "cpufreq_driver->update_perf()"
during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
fetching the policy object using the cpufreq_cpu_get() helper in the
scheduler fast-path leads to "BUG: scheduling while atomic" on
PREEMPT_RT [1].

Pass the cached cpufreq policy object in sg_policy to the update_perf()
instead of just the CPU. The CPU can be inferred using "policy->cpu".

The lifetime of cpufreq_policy object outlasts that of the governor and
the cpufreq driver (allocated when the CPU is onlined and only reclaimed
when the CPU is offlined / the CPU device is removed) which makes it
safe to be referenced throughout the governor's lifetime.

Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v3..v4:

o Added the Fixes tag. (Gautham, Chris Mason's review-prompts)
---
 drivers/cpufreq/amd-pstate.c     |  3 +--
 drivers/cpufreq/cpufreq.c        |  6 +++---
 drivers/cpufreq/intel_pstate.c   |  4 ++--
 include/linux/cpufreq.h          |  4 ++--
 kernel/sched/cpufreq_schedutil.c |  5 +++--
 rust/kernel/cpufreq.rs           | 13 ++++++-------
 6 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 5faccb3d6b14..ad4b5f84773a 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -710,13 +710,12 @@ static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
 	return policy->cur;
 }
 
-static void amd_pstate_adjust_perf(unsigned int cpu,
+static void amd_pstate_adjust_perf(struct cpufreq_policy *policy,
 				   unsigned long _min_perf,
 				   unsigned long target_perf,
 				   unsigned long capacity)
 {
 	u8 max_perf, min_perf, des_perf, cap_perf;
-	struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
 	struct amd_cpudata *cpudata;
 	union perf_cached perf;
 
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 2082a9e4384f..17a5b8e0ea1e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2231,7 +2231,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
 
 /**
  * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
- * @cpu: Target CPU.
+ * @policy: cpufreq policy object of the target CPU.
  * @min_perf: Minimum (required) performance level (units of @capacity).
  * @target_perf: Target (desired) performance level (units of @capacity).
  * @capacity: Capacity of the target CPU.
@@ -2250,12 +2250,12 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
  * parallel with either ->target() or ->target_index() or ->fast_switch() for
  * the same CPU.
  */
-void cpufreq_driver_adjust_perf(unsigned int cpu,
+void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
 				 unsigned long min_perf,
 				 unsigned long target_perf,
 				 unsigned long capacity)
 {
-	cpufreq_driver->adjust_perf(cpu, min_perf, target_perf, capacity);
+	cpufreq_driver->adjust_perf(policy, min_perf, target_perf, capacity);
 }
 
 /**
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 51938c5a47ca..1552b2d32a34 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -3239,12 +3239,12 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 	return target_pstate * cpu->pstate.scaling;
 }
 
-static void intel_cpufreq_adjust_perf(unsigned int cpunum,
+static void intel_cpufreq_adjust_perf(struct cpufreq_policy *policy,
 				      unsigned long min_perf,
 				      unsigned long target_perf,
 				      unsigned long capacity)
 {
-	struct cpudata *cpu = all_cpu_data[cpunum];
+	struct cpudata *cpu = all_cpu_data[policy->cpu];
 	u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
 	int old_pstate = cpu->pstate.current_pstate;
 	int cap_pstate, min_pstate, max_pstate, target_pstate;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index cc894fc38971..4317c5a312bd 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -372,7 +372,7 @@ struct cpufreq_driver {
 	 * conditions) scale invariance can be disabled, which causes the
 	 * schedutil governor to fall back to the latter.
 	 */
-	void		(*adjust_perf)(unsigned int cpu,
+	void		(*adjust_perf)(struct cpufreq_policy *policy,
 				       unsigned long min_perf,
 				       unsigned long target_perf,
 				       unsigned long capacity);
@@ -617,7 +617,7 @@ struct cpufreq_governor {
 /* Pass a target to the cpufreq driver */
 unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
 					unsigned int target_freq);
-void cpufreq_driver_adjust_perf(unsigned int cpu,
+void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
 				unsigned long min_perf,
 				unsigned long target_perf,
 				unsigned long capacity);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 153232dd8276..ae9fd211cec1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -461,6 +461,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
 				     unsigned int flags)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long prev_util = sg_cpu->util;
 	unsigned long max_cap;
 
@@ -482,10 +483,10 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
 	if (sugov_hold_freq(sg_cpu) && sg_cpu->util < prev_util)
 		sg_cpu->util = prev_util;
 
-	cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
+	cpufreq_driver_adjust_perf(sg_policy->policy, sg_cpu->bw_min,
 				   sg_cpu->util, max_cap);
 
-	sg_cpu->sg_policy->last_freq_update_time = time;
+	sg_policy->last_freq_update_time = time;
 }
 
 static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
index 76faa1ac8501..a83aec198336 100644
--- a/rust/kernel/cpufreq.rs
+++ b/rust/kernel/cpufreq.rs
@@ -1256,18 +1256,17 @@ impl<T: Driver> Registration<T> {
     /// # Safety
     ///
     /// - This function may only be called from the cpufreq C infrastructure.
+    /// - The pointer arguments must be valid pointers.
     unsafe extern "C" fn adjust_perf_callback(
-        cpu: c_uint,
+        ptr: *mut bindings::cpufreq_policy,
         min_perf: c_ulong,
         target_perf: c_ulong,
         capacity: c_ulong,
     ) {
-        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
-        let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };
-
-        if let Ok(mut policy) = PolicyCpu::from_cpu(cpu_id) {
-            T::adjust_perf(&mut policy, min_perf, target_perf, capacity);
-        }
+        // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
+        // lifetime of `policy`.
+        let policy = unsafe { Policy::from_raw_mut(ptr) };
+        T::adjust_perf(policy, min_perf, target_perf, capacity);
     }
 
     /// Driver's `get_intermediate` callback.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
  2026-03-16  8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
@ 2026-03-16 10:59   ` Gary Guo
  2026-03-26 12:04   ` Gautham R. Shenoy
  2026-03-26 13:16   ` Zhongqiu Han
  2 siblings, 0 replies; 7+ messages in thread
From: Gary Guo @ 2026-03-16 10:59 UTC (permalink / raw)
  To: K Prateek Nayak, Rafael J. Wysocki, Viresh Kumar, Huang Rui,
	Gautham R. Shenoy, Mario Limonciello, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Srinivas Pandruvada, Len Brown,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Miguel Ojeda
  Cc: Perry Yuan, linux-pm, linux-kernel, rust-for-linux,
	linux-rt-devel, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Bert Karwatzki

On Mon Mar 16, 2026 at 8:18 AM GMT, K Prateek Nayak wrote:
> cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
> writer(s), however amd-pstate depends on fetching the cpudata via the
> policy's driver data which necessitates grabbing the reference.
>
> Since schedutil governor can call "cpufreq_driver->update_perf()"
> during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
> fetching the policy object using the cpufreq_cpu_get() helper in the
> scheduler fast-path leads to "BUG: scheduling while atomic" on
> PREEMPT_RT [1].
>
> Pass the cached cpufreq policy object in sg_policy to the update_perf()
> instead of just the CPU. The CPU can be inferred using "policy->cpu".
>
> The lifetime of cpufreq_policy object outlasts that of the governor and
> the cpufreq driver (allocated when the CPU is onlined and only reclaimed
> when the CPU is offlined / the CPU device is removed) which makes it
> safe to be referenced throughout the governor's lifetime.
>
> Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>

Acked-by: Gary Guo <gary@garyguo.net> # Rust

Looks like a sensible thing to do, interestingly I noted that this (as an
effect) aligns the C API to what the Rust-side cpufreq API already has.

Best,
Gary

> ---
> changelog v3..v4:
>
> o Added the Fixes tag. (Gautham, Chris Mason's review-prompts)
> ---
>  drivers/cpufreq/amd-pstate.c     |  3 +--
>  drivers/cpufreq/cpufreq.c        |  6 +++---
>  drivers/cpufreq/intel_pstate.c   |  4 ++--
>  include/linux/cpufreq.h          |  4 ++--
>  kernel/sched/cpufreq_schedutil.c |  5 +++--
>  rust/kernel/cpufreq.rs           | 13 ++++++-------
>  6 files changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 5faccb3d6b14..ad4b5f84773a 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -710,13 +710,12 @@ static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
>  	return policy->cur;
>  }
>  
> -static void amd_pstate_adjust_perf(unsigned int cpu,
> +static void amd_pstate_adjust_perf(struct cpufreq_policy *policy,
>  				   unsigned long _min_perf,
>  				   unsigned long target_perf,
>  				   unsigned long capacity)
>  {
>  	u8 max_perf, min_perf, des_perf, cap_perf;
> -	struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
>  	struct amd_cpudata *cpudata;
>  	union perf_cached perf;
>  
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2082a9e4384f..17a5b8e0ea1e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2231,7 +2231,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>  
>  /**
>   * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
> - * @cpu: Target CPU.
> + * @policy: cpufreq policy object of the target CPU.
>   * @min_perf: Minimum (required) performance level (units of @capacity).
>   * @target_perf: Target (desired) performance level (units of @capacity).
>   * @capacity: Capacity of the target CPU.
> @@ -2250,12 +2250,12 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>   * parallel with either ->target() or ->target_index() or ->fast_switch() for
>   * the same CPU.
>   */
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
>  				 unsigned long min_perf,
>  				 unsigned long target_perf,
>  				 unsigned long capacity)
>  {
> -	cpufreq_driver->adjust_perf(cpu, min_perf, target_perf, capacity);
> +	cpufreq_driver->adjust_perf(policy, min_perf, target_perf, capacity);
>  }
>  
>  /**
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 51938c5a47ca..1552b2d32a34 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -3239,12 +3239,12 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>  	return target_pstate * cpu->pstate.scaling;
>  }
>  
> -static void intel_cpufreq_adjust_perf(unsigned int cpunum,
> +static void intel_cpufreq_adjust_perf(struct cpufreq_policy *policy,
>  				      unsigned long min_perf,
>  				      unsigned long target_perf,
>  				      unsigned long capacity)
>  {
> -	struct cpudata *cpu = all_cpu_data[cpunum];
> +	struct cpudata *cpu = all_cpu_data[policy->cpu];
>  	u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
>  	int old_pstate = cpu->pstate.current_pstate;
>  	int cap_pstate, min_pstate, max_pstate, target_pstate;
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index cc894fc38971..4317c5a312bd 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -372,7 +372,7 @@ struct cpufreq_driver {
>  	 * conditions) scale invariance can be disabled, which causes the
>  	 * schedutil governor to fall back to the latter.
>  	 */
> -	void		(*adjust_perf)(unsigned int cpu,
> +	void		(*adjust_perf)(struct cpufreq_policy *policy,
>  				       unsigned long min_perf,
>  				       unsigned long target_perf,
>  				       unsigned long capacity);
> @@ -617,7 +617,7 @@ struct cpufreq_governor {
>  /* Pass a target to the cpufreq driver */
>  unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
>  					unsigned int target_freq);
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
>  				unsigned long min_perf,
>  				unsigned long target_perf,
>  				unsigned long capacity);
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 153232dd8276..ae9fd211cec1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -461,6 +461,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>  				     unsigned int flags)
>  {
>  	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
> +	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>  	unsigned long prev_util = sg_cpu->util;
>  	unsigned long max_cap;
>  
> @@ -482,10 +483,10 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>  	if (sugov_hold_freq(sg_cpu) && sg_cpu->util < prev_util)
>  		sg_cpu->util = prev_util;
>  
> -	cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
> +	cpufreq_driver_adjust_perf(sg_policy->policy, sg_cpu->bw_min,
>  				   sg_cpu->util, max_cap);
>  
> -	sg_cpu->sg_policy->last_freq_update_time = time;
> +	sg_policy->last_freq_update_time = time;
>  }
>  
>  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
> index 76faa1ac8501..a83aec198336 100644
> --- a/rust/kernel/cpufreq.rs
> +++ b/rust/kernel/cpufreq.rs
> @@ -1256,18 +1256,17 @@ impl<T: Driver> Registration<T> {
>      /// # Safety
>      ///
>      /// - This function may only be called from the cpufreq C infrastructure.
> +    /// - The pointer arguments must be valid pointers.
>      unsafe extern "C" fn adjust_perf_callback(
> -        cpu: c_uint,
> +        ptr: *mut bindings::cpufreq_policy,
>          min_perf: c_ulong,
>          target_perf: c_ulong,
>          capacity: c_ulong,
>      ) {
> -        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
> -        let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };
> -
> -        if let Ok(mut policy) = PolicyCpu::from_cpu(cpu_id) {
> -            T::adjust_perf(&mut policy, min_perf, target_perf, capacity);
> -        }
> +        // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
> +        // lifetime of `policy`.
> +        let policy = unsafe { Policy::from_raw_mut(ptr) };
> +        T::adjust_perf(policy, min_perf, target_perf, capacity);
>      }
>  
>      /// Driver's `get_intermediate` callback.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
  2026-03-16  8:18 ` [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update() K Prateek Nayak
@ 2026-03-26 12:03   ` Gautham R. Shenoy
  0 siblings, 0 replies; 7+ messages in thread
From: Gautham R. Shenoy @ 2026-03-26 12:03 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Rafael J. Wysocki, Viresh Kumar, Huang Rui, Mario Limonciello,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Perry Yuan, linux-pm, linux-kernel, rust-for-linux,
	linux-rt-devel, Mario Limonciello

On Mon, Mar 16, 2026 at 08:18:48AM +0000, K Prateek Nayak wrote:
> All callers of amd_pstate_update() already have a reference to the
> cpufreq_policy object.
> 
> Pass the entire policy object and grab the cpudata using
> "policy->driver_data" instead of passing the cpudata and unnecessarily
> grabbing another read-side reference to the cpufreq policy object when
> it is already available in the caller.
> 
> No functional changes intended.
> 
> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>

> ---
> changelog v3..v4:
> 
> o No changes.
> ---
>  drivers/cpufreq/amd-pstate.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 5aa9fcd80cf5..5faccb3d6b14 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -565,15 +565,12 @@ static inline bool amd_pstate_sample(struct amd_cpudata *cpudata)
>  	return true;
>  }
>  
> -static void amd_pstate_update(struct amd_cpudata *cpudata, u8 min_perf,
> +static void amd_pstate_update(struct cpufreq_policy *policy, u8 min_perf,
>  			      u8 des_perf, u8 max_perf, bool fast_switch, int gov_flags)
>  {
> -	struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu);
> +	struct amd_cpudata *cpudata = policy->driver_data;
>  	union perf_cached perf = READ_ONCE(cpudata->perf);
>  
> -	if (!policy)
> -		return;
> -
>  	/* limit the max perf when core performance boost feature is disabled */
>  	if (!cpudata->boost_supported)
>  		max_perf = min_t(u8, perf.nominal_perf, max_perf);
> @@ -688,7 +685,7 @@ static int amd_pstate_update_freq(struct cpufreq_policy *policy,
>  	if (!fast_switch)
>  		cpufreq_freq_transition_begin(policy, &freqs);
>  
> -	amd_pstate_update(cpudata, perf.min_limit_perf, des_perf,
> +	amd_pstate_update(policy, perf.min_limit_perf, des_perf,
>  			  perf.max_limit_perf, fast_switch,
>  			  policy->governor->flags);
>  
> @@ -750,7 +747,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
>  	if (max_perf < min_perf)
>  		max_perf = min_perf;
>  
> -	amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true,
> +	amd_pstate_update(policy, min_perf, des_perf, max_perf, true,
>  			policy->governor->flags);
>  }
>  
> -- 
> 2.34.1
> 

-- 
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
  2026-03-16  8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
  2026-03-16 10:59   ` Gary Guo
@ 2026-03-26 12:04   ` Gautham R. Shenoy
  2026-03-26 13:16   ` Zhongqiu Han
  2 siblings, 0 replies; 7+ messages in thread
From: Gautham R. Shenoy @ 2026-03-26 12:04 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Rafael J. Wysocki, Viresh Kumar, Huang Rui, Mario Limonciello,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Srinivas Pandruvada, Len Brown, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Miguel Ojeda, Perry Yuan, linux-pm,
	linux-kernel, rust-for-linux, linux-rt-devel, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Valentin Schneider, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Bert Karwatzki

On Mon, Mar 16, 2026 at 08:18:49AM +0000, K Prateek Nayak wrote:
> cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
> writer(s), however amd-pstate depends on fetching the cpudata via the
> policy's driver data which necessitates grabbing the reference.
> 
> Since schedutil governor can call "cpufreq_driver->update_perf()"
> during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
> fetching the policy object using the cpufreq_cpu_get() helper in the
> scheduler fast-path leads to "BUG: scheduling while atomic" on
> PREEMPT_RT [1].
> 
> Pass the cached cpufreq policy object in sg_policy to the update_perf()
> instead of just the CPU. The CPU can be inferred using "policy->cpu".
> 
> The lifetime of cpufreq_policy object outlasts that of the governor and
> the cpufreq driver (allocated when the CPU is onlined and only reclaimed
> when the CPU is offlined / the CPU device is removed) which makes it
> safe to be referenced throughout the governor's lifetime.
> 
> Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>


-- 
Thanks and Regards
gautham.
> ---
> changelog v3..v4:
> 
> o Added the Fixes tag. (Gautham, Chris Mason's review-prompts)
> ---
>  drivers/cpufreq/amd-pstate.c     |  3 +--
>  drivers/cpufreq/cpufreq.c        |  6 +++---
>  drivers/cpufreq/intel_pstate.c   |  4 ++--
>  include/linux/cpufreq.h          |  4 ++--
>  kernel/sched/cpufreq_schedutil.c |  5 +++--
>  rust/kernel/cpufreq.rs           | 13 ++++++-------
>  6 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 5faccb3d6b14..ad4b5f84773a 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -710,13 +710,12 @@ static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
>  	return policy->cur;
>  }
>  
> -static void amd_pstate_adjust_perf(unsigned int cpu,
> +static void amd_pstate_adjust_perf(struct cpufreq_policy *policy,
>  				   unsigned long _min_perf,
>  				   unsigned long target_perf,
>  				   unsigned long capacity)
>  {
>  	u8 max_perf, min_perf, des_perf, cap_perf;
> -	struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
>  	struct amd_cpudata *cpudata;
>  	union perf_cached perf;
>  
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2082a9e4384f..17a5b8e0ea1e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2231,7 +2231,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>  
>  /**
>   * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
> - * @cpu: Target CPU.
> + * @policy: cpufreq policy object of the target CPU.
>   * @min_perf: Minimum (required) performance level (units of @capacity).
>   * @target_perf: Target (desired) performance level (units of @capacity).
>   * @capacity: Capacity of the target CPU.
> @@ -2250,12 +2250,12 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>   * parallel with either ->target() or ->target_index() or ->fast_switch() for
>   * the same CPU.
>   */
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
>  				 unsigned long min_perf,
>  				 unsigned long target_perf,
>  				 unsigned long capacity)
>  {
> -	cpufreq_driver->adjust_perf(cpu, min_perf, target_perf, capacity);
> +	cpufreq_driver->adjust_perf(policy, min_perf, target_perf, capacity);
>  }
>  
>  /**
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 51938c5a47ca..1552b2d32a34 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -3239,12 +3239,12 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>  	return target_pstate * cpu->pstate.scaling;
>  }
>  
> -static void intel_cpufreq_adjust_perf(unsigned int cpunum,
> +static void intel_cpufreq_adjust_perf(struct cpufreq_policy *policy,
>  				      unsigned long min_perf,
>  				      unsigned long target_perf,
>  				      unsigned long capacity)
>  {
> -	struct cpudata *cpu = all_cpu_data[cpunum];
> +	struct cpudata *cpu = all_cpu_data[policy->cpu];
>  	u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
>  	int old_pstate = cpu->pstate.current_pstate;
>  	int cap_pstate, min_pstate, max_pstate, target_pstate;
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index cc894fc38971..4317c5a312bd 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -372,7 +372,7 @@ struct cpufreq_driver {
>  	 * conditions) scale invariance can be disabled, which causes the
>  	 * schedutil governor to fall back to the latter.
>  	 */
> -	void		(*adjust_perf)(unsigned int cpu,
> +	void		(*adjust_perf)(struct cpufreq_policy *policy,
>  				       unsigned long min_perf,
>  				       unsigned long target_perf,
>  				       unsigned long capacity);
> @@ -617,7 +617,7 @@ struct cpufreq_governor {
>  /* Pass a target to the cpufreq driver */
>  unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
>  					unsigned int target_freq);
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
>  				unsigned long min_perf,
>  				unsigned long target_perf,
>  				unsigned long capacity);
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 153232dd8276..ae9fd211cec1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -461,6 +461,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>  				     unsigned int flags)
>  {
>  	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
> +	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>  	unsigned long prev_util = sg_cpu->util;
>  	unsigned long max_cap;
>  
> @@ -482,10 +483,10 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>  	if (sugov_hold_freq(sg_cpu) && sg_cpu->util < prev_util)
>  		sg_cpu->util = prev_util;
>  
> -	cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
> +	cpufreq_driver_adjust_perf(sg_policy->policy, sg_cpu->bw_min,
>  				   sg_cpu->util, max_cap);
>  
> -	sg_cpu->sg_policy->last_freq_update_time = time;
> +	sg_policy->last_freq_update_time = time;
>  }
>  
>  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
> index 76faa1ac8501..a83aec198336 100644
> --- a/rust/kernel/cpufreq.rs
> +++ b/rust/kernel/cpufreq.rs
> @@ -1256,18 +1256,17 @@ impl<T: Driver> Registration<T> {
>      /// # Safety
>      ///
>      /// - This function may only be called from the cpufreq C infrastructure.
> +    /// - The pointer arguments must be valid pointers.
>      unsafe extern "C" fn adjust_perf_callback(
> -        cpu: c_uint,
> +        ptr: *mut bindings::cpufreq_policy,
>          min_perf: c_ulong,
>          target_perf: c_ulong,
>          capacity: c_ulong,
>      ) {
> -        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
> -        let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };
> -
> -        if let Ok(mut policy) = PolicyCpu::from_cpu(cpu_id) {
> -            T::adjust_perf(&mut policy, min_perf, target_perf, capacity);
> -        }
> +        // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
> +        // lifetime of `policy`.
> +        let policy = unsafe { Policy::from_raw_mut(ptr) };
> +        T::adjust_perf(policy, min_perf, target_perf, capacity);
>      }
>  
>      /// Driver's `get_intermediate` callback.
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
  2026-03-16  8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
  2026-03-16 10:59   ` Gary Guo
  2026-03-26 12:04   ` Gautham R. Shenoy
@ 2026-03-26 13:16   ` Zhongqiu Han
  2 siblings, 0 replies; 7+ messages in thread
From: Zhongqiu Han @ 2026-03-26 13:16 UTC (permalink / raw)
  To: K Prateek Nayak, Rafael J. Wysocki, Viresh Kumar, Huang Rui,
	Gautham R. Shenoy, Mario Limonciello, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Srinivas Pandruvada, Len Brown,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Miguel Ojeda
  Cc: Perry Yuan, linux-pm, linux-kernel, rust-for-linux,
	linux-rt-devel, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Bert Karwatzki, zhongqiu.han

On 3/16/2026 4:18 PM, K Prateek Nayak wrote:
> cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
> writer(s), however amd-pstate depends on fetching the cpudata via the
> policy's driver data which necessitates grabbing the reference.
> 
> Since schedutil governor can call "cpufreq_driver->update_perf()"
> during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
> fetching the policy object using the cpufreq_cpu_get() helper in the
> scheduler fast-path leads to "BUG: scheduling while atomic" on
> PREEMPT_RT [1].
> 
> Pass the cached cpufreq policy object in sg_policy to the update_perf()
> instead of just the CPU. The CPU can be inferred using "policy->cpu".

Agreed, sugov_update_single_perf() is only invoked for CPUs belonging to
a non-shared policy; in this path, policy->cpu is guaranteed to match
sg_cpu->cpu. With that invariant, this patch looks correct to me.

> 
> The lifetime of cpufreq_policy object outlasts that of the governor and
> the cpufreq driver (allocated when the CPU is onlined and only reclaimed
> when the CPU is offlined / the CPU device is removed) which makes it
> safe to be referenced throughout the governor's lifetime.
> 
> Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> changelog v3..v4:
> 
> o Added the Fixes tag. (Gautham, Chris Mason's review-prompts)
> ---
>   drivers/cpufreq/amd-pstate.c     |  3 +--
>   drivers/cpufreq/cpufreq.c        |  6 +++---
>   drivers/cpufreq/intel_pstate.c   |  4 ++--
>   include/linux/cpufreq.h          |  4 ++--
>   kernel/sched/cpufreq_schedutil.c |  5 +++--
>   rust/kernel/cpufreq.rs           | 13 ++++++-------
>   6 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 5faccb3d6b14..ad4b5f84773a 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -710,13 +710,12 @@ static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
>   	return policy->cur;
>   }
>   
> -static void amd_pstate_adjust_perf(unsigned int cpu,
> +static void amd_pstate_adjust_perf(struct cpufreq_policy *policy,
>   				   unsigned long _min_perf,
>   				   unsigned long target_perf,
>   				   unsigned long capacity)
>   {
>   	u8 max_perf, min_perf, des_perf, cap_perf;
> -	struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
>   	struct amd_cpudata *cpudata;
>   	union perf_cached perf;
>   
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2082a9e4384f..17a5b8e0ea1e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2231,7 +2231,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>   
>   /**
>    * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
> - * @cpu: Target CPU.
> + * @policy: cpufreq policy object of the target CPU.
>    * @min_perf: Minimum (required) performance level (units of @capacity).
>    * @target_perf: Target (desired) performance level (units of @capacity).
>    * @capacity: Capacity of the target CPU.
> @@ -2250,12 +2250,12 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>    * parallel with either ->target() or ->target_index() or ->fast_switch() for
>    * the same CPU.
>    */
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
>   				 unsigned long min_perf,
>   				 unsigned long target_perf,
>   				 unsigned long capacity)
>   {
> -	cpufreq_driver->adjust_perf(cpu, min_perf, target_perf, capacity);
> +	cpufreq_driver->adjust_perf(policy, min_perf, target_perf, capacity);
>   }
>   
>   /**
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 51938c5a47ca..1552b2d32a34 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -3239,12 +3239,12 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>   	return target_pstate * cpu->pstate.scaling;
>   }
>   
> -static void intel_cpufreq_adjust_perf(unsigned int cpunum,
> +static void intel_cpufreq_adjust_perf(struct cpufreq_policy *policy,
>   				      unsigned long min_perf,
>   				      unsigned long target_perf,
>   				      unsigned long capacity)
>   {
> -	struct cpudata *cpu = all_cpu_data[cpunum];
> +	struct cpudata *cpu = all_cpu_data[policy->cpu];
>   	u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
>   	int old_pstate = cpu->pstate.current_pstate;
>   	int cap_pstate, min_pstate, max_pstate, target_pstate;
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index cc894fc38971..4317c5a312bd 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -372,7 +372,7 @@ struct cpufreq_driver {
>   	 * conditions) scale invariance can be disabled, which causes the
>   	 * schedutil governor to fall back to the latter.
>   	 */
> -	void		(*adjust_perf)(unsigned int cpu,
> +	void		(*adjust_perf)(struct cpufreq_policy *policy,
>   				       unsigned long min_perf,
>   				       unsigned long target_perf,
>   				       unsigned long capacity);
> @@ -617,7 +617,7 @@ struct cpufreq_governor {
>   /* Pass a target to the cpufreq driver */
>   unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
>   					unsigned int target_freq);
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
>   				unsigned long min_perf,
>   				unsigned long target_perf,
>   				unsigned long capacity);
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 153232dd8276..ae9fd211cec1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -461,6 +461,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>   				     unsigned int flags)
>   {
>   	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
> +	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>   	unsigned long prev_util = sg_cpu->util;
>   	unsigned long max_cap;
>   
> @@ -482,10 +483,10 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>   	if (sugov_hold_freq(sg_cpu) && sg_cpu->util < prev_util)
>   		sg_cpu->util = prev_util;
>   
> -	cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
> +	cpufreq_driver_adjust_perf(sg_policy->policy, sg_cpu->bw_min,
>   				   sg_cpu->util, max_cap);
>   
> -	sg_cpu->sg_policy->last_freq_update_time = time;
> +	sg_policy->last_freq_update_time = time;
>   }
>   
>   static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
> index 76faa1ac8501..a83aec198336 100644
> --- a/rust/kernel/cpufreq.rs
> +++ b/rust/kernel/cpufreq.rs
> @@ -1256,18 +1256,17 @@ impl<T: Driver> Registration<T> {
>       /// # Safety
>       ///
>       /// - This function may only be called from the cpufreq C infrastructure.
> +    /// - The pointer arguments must be valid pointers.
>       unsafe extern "C" fn adjust_perf_callback(
> -        cpu: c_uint,
> +        ptr: *mut bindings::cpufreq_policy,
>           min_perf: c_ulong,
>           target_perf: c_ulong,
>           capacity: c_ulong,
>       ) {
> -        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
> -        let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };
> -
> -        if let Ok(mut policy) = PolicyCpu::from_cpu(cpu_id) {
> -            T::adjust_perf(&mut policy, min_perf, target_perf, capacity);
> -        }
> +        // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
> +        // lifetime of `policy`.
> +        let policy = unsafe { Policy::from_raw_mut(ptr) };
> +        T::adjust_perf(policy, min_perf, target_perf, capacity);
>       }
>   
>       /// Driver's `get_intermediate` callback.


-- 
Thx and BRs,
Zhongqiu Han

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-26 13:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16  8:18 [PATCH v4 0/2] cpufreq/amd-pstate: Prevent scheduling when atomic on PREEMPT_RT K Prateek Nayak
2026-03-16  8:18 ` [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update() K Prateek Nayak
2026-03-26 12:03   ` Gautham R. Shenoy
2026-03-16  8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
2026-03-16 10:59   ` Gary Guo
2026-03-26 12:04   ` Gautham R. Shenoy
2026-03-26 13:16   ` Zhongqiu Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox