From: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
"Viresh Kumar" <viresh.kumar@linaro.org>,
"Huang Rui" <ray.huang@amd.com>,
"Mario Limonciello" <mario.limonciello@amd.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Clark Williams" <clrkwllms@kernel.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Srinivas Pandruvada" <srinivas.pandruvada@linux.intel.com>,
"Len Brown" <lenb@kernel.org>, "Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Perry Yuan" <perry.yuan@amd.com>,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
rust-for-linux@vger.kernel.org, linux-rt-devel@lists.linux.dev,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"Boqun Feng" <boqun@kernel.org>, "Gary Guo" <gary@garyguo.net>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Benno Lossin" <lossin@kernel.org>,
"Andreas Hindborg" <a.hindborg@kernel.org>,
"Alice Ryhl" <aliceryhl@google.com>,
"Trevor Gross" <tmgross@umich.edu>,
"Danilo Krummrich" <dakr@kernel.org>,
"Bert Karwatzki" <spasswolf@web.de>
Subject: Re: [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
Date: Thu, 26 Mar 2026 17:34:31 +0530 [thread overview]
Message-ID: <acUgzwk+ULSjW8sW@BLRRASHENOY1.amd.com> (raw)
In-Reply-To: <20260316081849.19368-3-kprateek.nayak@amd.com>
On Mon, Mar 16, 2026 at 08:18:49AM +0000, K Prateek Nayak wrote:
> cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
> writer(s), however amd-pstate depends on fetching the cpudata via the
> policy's driver data which necessitates grabbing the reference.
>
> Since schedutil governor can call "cpufreq_driver->update_perf()"
> during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
> fetching the policy object using the cpufreq_cpu_get() helper in the
> scheduler fast-path leads to "BUG: scheduling while atomic" on
> PREEMPT_RT [1].
>
> Pass the cached cpufreq policy object in sg_policy to the update_perf()
> instead of just the CPU. The CPU can be inferred using "policy->cpu".
>
> The lifetime of cpufreq_policy object outlasts that of the governor and
> the cpufreq driver (allocated when the CPU is onlined and only reclaimed
> when the CPU is offlined / the CPU device is removed) which makes it
> safe to be referenced throughout the governor's lifetime.
>
> Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
--
Thanks and Regards
gautham.
> ---
> changelog v3..v4:
>
> o Added the Fixes tag. (Gautham, Chris Mason's review-prompts)
> ---
> drivers/cpufreq/amd-pstate.c | 3 +--
> drivers/cpufreq/cpufreq.c | 6 +++---
> drivers/cpufreq/intel_pstate.c | 4 ++--
> include/linux/cpufreq.h | 4 ++--
> kernel/sched/cpufreq_schedutil.c | 5 +++--
> rust/kernel/cpufreq.rs | 13 ++++++-------
> 6 files changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 5faccb3d6b14..ad4b5f84773a 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -710,13 +710,12 @@ static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
> return policy->cur;
> }
>
> -static void amd_pstate_adjust_perf(unsigned int cpu,
> +static void amd_pstate_adjust_perf(struct cpufreq_policy *policy,
> unsigned long _min_perf,
> unsigned long target_perf,
> unsigned long capacity)
> {
> u8 max_perf, min_perf, des_perf, cap_perf;
> - struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
> struct amd_cpudata *cpudata;
> union perf_cached perf;
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2082a9e4384f..17a5b8e0ea1e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2231,7 +2231,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
>
> /**
> * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
> - * @cpu: Target CPU.
> + * @policy: cpufreq policy object of the target CPU.
> * @min_perf: Minimum (required) performance level (units of @capacity).
> * @target_perf: Target (desired) performance level (units of @capacity).
> * @capacity: Capacity of the target CPU.
> @@ -2250,12 +2250,12 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
> * parallel with either ->target() or ->target_index() or ->fast_switch() for
> * the same CPU.
> */
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
> unsigned long min_perf,
> unsigned long target_perf,
> unsigned long capacity)
> {
> - cpufreq_driver->adjust_perf(cpu, min_perf, target_perf, capacity);
> + cpufreq_driver->adjust_perf(policy, min_perf, target_perf, capacity);
> }
>
> /**
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 51938c5a47ca..1552b2d32a34 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -3239,12 +3239,12 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
> return target_pstate * cpu->pstate.scaling;
> }
>
> -static void intel_cpufreq_adjust_perf(unsigned int cpunum,
> +static void intel_cpufreq_adjust_perf(struct cpufreq_policy *policy,
> unsigned long min_perf,
> unsigned long target_perf,
> unsigned long capacity)
> {
> - struct cpudata *cpu = all_cpu_data[cpunum];
> + struct cpudata *cpu = all_cpu_data[policy->cpu];
> u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
> int old_pstate = cpu->pstate.current_pstate;
> int cap_pstate, min_pstate, max_pstate, target_pstate;
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index cc894fc38971..4317c5a312bd 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -372,7 +372,7 @@ struct cpufreq_driver {
> * conditions) scale invariance can be disabled, which causes the
> * schedutil governor to fall back to the latter.
> */
> - void (*adjust_perf)(unsigned int cpu,
> + void (*adjust_perf)(struct cpufreq_policy *policy,
> unsigned long min_perf,
> unsigned long target_perf,
> unsigned long capacity);
> @@ -617,7 +617,7 @@ struct cpufreq_governor {
> /* Pass a target to the cpufreq driver */
> unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> unsigned int target_freq);
> -void cpufreq_driver_adjust_perf(unsigned int cpu,
> +void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy,
> unsigned long min_perf,
> unsigned long target_perf,
> unsigned long capacity);
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 153232dd8276..ae9fd211cec1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -461,6 +461,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
> unsigned int flags)
> {
> struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
> + struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> unsigned long prev_util = sg_cpu->util;
> unsigned long max_cap;
>
> @@ -482,10 +483,10 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
> if (sugov_hold_freq(sg_cpu) && sg_cpu->util < prev_util)
> sg_cpu->util = prev_util;
>
> - cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
> + cpufreq_driver_adjust_perf(sg_policy->policy, sg_cpu->bw_min,
> sg_cpu->util, max_cap);
>
> - sg_cpu->sg_policy->last_freq_update_time = time;
> + sg_policy->last_freq_update_time = time;
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> diff --git a/rust/kernel/cpufreq.rs b/rust/kernel/cpufreq.rs
> index 76faa1ac8501..a83aec198336 100644
> --- a/rust/kernel/cpufreq.rs
> +++ b/rust/kernel/cpufreq.rs
> @@ -1256,18 +1256,17 @@ impl<T: Driver> Registration<T> {
> /// # Safety
> ///
> /// - This function may only be called from the cpufreq C infrastructure.
> + /// - The pointer arguments must be valid pointers.
> unsafe extern "C" fn adjust_perf_callback(
> - cpu: c_uint,
> + ptr: *mut bindings::cpufreq_policy,
> min_perf: c_ulong,
> target_perf: c_ulong,
> capacity: c_ulong,
> ) {
> - // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
> - let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };
> -
> - if let Ok(mut policy) = PolicyCpu::from_cpu(cpu_id) {
> - T::adjust_perf(&mut policy, min_perf, target_perf, capacity);
> - }
> + // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
> + // lifetime of `policy`.
> + let policy = unsafe { Policy::from_raw_mut(ptr) };
> + T::adjust_perf(policy, min_perf, target_perf, capacity);
> }
>
> /// Driver's `get_intermediate` callback.
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-03-26 12:04 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 8:18 [PATCH v4 0/2] cpufreq/amd-pstate: Prevent scheduling when atomic on PREEMPT_RT K Prateek Nayak
2026-03-16 8:18 ` [PATCH v4 1/2] cpufreq/amd-pstate: Pass the policy to amd_pstate_update() K Prateek Nayak
2026-03-26 12:03 ` Gautham R. Shenoy
2026-03-16 8:18 ` [PATCH v4 2/2] cpufreq: Pass the policy to cpufreq_driver->adjust_perf() K Prateek Nayak
2026-03-16 10:59 ` Gary Guo
2026-03-26 12:04 ` Gautham R. Shenoy [this message]
2026-03-26 13:16 ` Zhongqiu Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acUgzwk+ULSjW8sW@BLRRASHENOY1.amd.com \
--to=gautham.shenoy@amd.com \
--cc=a.hindborg@kernel.org \
--cc=aliceryhl@google.com \
--cc=bigeasy@linutronix.de \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun@kernel.org \
--cc=bsegall@google.com \
--cc=clrkwllms@kernel.org \
--cc=dakr@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=gary@garyguo.net \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=lossin@kernel.org \
--cc=mario.limonciello@amd.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=ojeda@kernel.org \
--cc=perry.yuan@amd.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=ray.huang@amd.com \
--cc=rostedt@goodmis.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=spasswolf@web.de \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=tmgross@umich.edu \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox