From: Sumit Gupta <sumitg@nvidia.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: viresh.kumar@linaro.org, pierre.gondois@arm.com,
ionela.voinescu@arm.com, zhenglifeng1@huawei.com,
zhanjie9@hisilicon.com, linux-kernel@vger.kernel.org,
linux-pm@vger.kernel.org, linux-tegra@vger.kernel.org,
treding@nvidia.com, jonathanh@nvidia.com, vsethi@nvidia.com,
ksitaraman@nvidia.com, sanjayc@nvidia.com, mochs@nvidia.com,
bbasu@nvidia.com, sumitg@nvidia.com
Subject: Re: [PATCH] cpufreq: CPPC: Preserve OSPM-set registers across hotplug and unload
Date: Wed, 24 Jun 2026 18:26:15 +0530 [thread overview]
Message-ID: <de03b23f-18a3-4e99-a15a-5044463ac484@nvidia.com> (raw)
In-Reply-To: <CAJZ5v0jvkNfx-1XChRCsGHJAcY7W4j41P+Xug6+TpPge7tHF7Q@mail.gmail.com>
Hi Rafael,
On 23/06/26 16:30, Rafael J. Wysocki wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Jun 23, 2026 at 11:54 AM Sumit Gupta <sumitg@nvidia.com> wrote:
>> Values written to OSPM-set CPPC registers (via sysfs or the autonomous
>> boot parameter) can be lost in two ways:
>>
>> - Across CPU hotplug: the platform may reset a CPU's registers when it
>> is offlined.
>> - On driver unload: the value the driver wrote is left in the register
>> instead of returning to its pre-driver state.
>>
>> Add a small table-driven mechanism that handles both:
>>
>> - Capture each register's firmware value when a CPU is first seen and
>> restore it on driver unload.
>> - Record the last value the driver set and reapply it from ->init()
>> when the policy is reactivated after CPU hotplug.
> I'm not sure if this is a good idea TBH.
>
> The overall system state when the CPU goes online may be completely
> different from the system state when the CPU was online last time, so
> there is no reason to restore its settings from before offline, at
> least in principle.
These are values userspace deliberately set, more like a frequency QoS
request that persists until userspace changes it than transient state.
It's platform dependent though, as the platform I test on preserves them
across hotplug, but where the platform resets the register on offline
the value is silently lost. Should the kernel re-apply it on online,
or is that better left to userspace? If so, I will drop the hotplug reapply
and keep only the restore on driver unload.
intel_pstate and amd-pstate already re-sync these on CPU online, as they
keep the policy across hotplug via ->online()/->offline(). So for them
it's re-syncing the hardware, not restoring state from before offline.
cppc_cpufreq has no ->online()/->offline() today, so it fully tears the
policy down and rebuilds it, which is why this reads as "restoring
settings from before offline".
Would adding ->online()/->offline() be acceptable, with policy preserved
and ->online() just re-syncing the registers the platform reset?
Thanks,
Sumit
>> The firmware value is captured on a CPU's first activation rather than
>> once at module load, so CPUs offline at boot or hot-added later are
>> covered.
>>
>> Reapply is only needed on a full policy teardown and bring-up, which goes
>> through ->init(). In a SHARED_TYPE_ANY policy, offlining a single CPU
>> leaves the shared register untouched, so nothing is lost there.
>>
>> Cover the OSPM Nominal Performance, Autonomous Selection
>> (auto_sel) and Energy Performance Preference (EPP) registers. For
>> auto_sel it replaces the previous unconditional
>> cppc_set_auto_sel(cpu, false) on unload with a restore of the firmware
>> value captured at the CPU's first init.
>>
>> Suggested-by: Pierre Gondois <pierre.gondois@arm.com>
>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>> ---
>>
>> This applies on top of (not yet merged):
>> [1] ACPI: CPPC: Add ospm_nominal_perf support
>> https://lore.kernel.org/lkml/20260615185934.2383514-1-sumitg@nvidia.com/
>> [2] cpufreq: CPPC: add autonomous mode boot parameter support
>> https://lore.kernel.org/lkml/20260623080652.3353386-1-sumitg@nvidia.com/
>>
>> drivers/cpufreq/cppc_cpufreq.c | 194 +++++++++++++++++++++++++++++++--
>> 1 file changed, 186 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>> index a3fabfb07fbe..d6ea2cbde187 100644
>> --- a/drivers/cpufreq/cppc_cpufreq.c
>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>> @@ -28,6 +28,174 @@
>>
>> static struct cpufreq_driver cppc_cpufreq_driver;
>>
>> +/*
>> + * OSPM-set CPPC registers tracked for save/restore. A value set via sysfs or
>> + * the autonomous boot parameter is reapplied across CPU hotplug, and the
>> + * firmware value is restored on driver unload.
>> + */
>> +enum cppc_saved_reg_id {
>> + CPPC_SAVED_OSPM_NOMINAL_PERF,
>> + CPPC_SAVED_AUTO_SEL,
>> + CPPC_SAVED_EPP,
>> + CPPC_NR_SAVED_REGS,
>> +};
>> +
>> +struct cppc_saved_reg {
>> + int (*get)(int cpu, u64 *val);
>> + int (*set)(int cpu, u64 val);
>> +};
>> +
>> +/* u64 wrappers so the bool auto_sel register fits the table signatures. */
>> +static int cppc_get_auto_sel_u64(int cpu, u64 *val)
>> +{
>> + bool enable;
>> + int ret;
>> +
>> + ret = cppc_get_auto_sel(cpu, &enable);
>> + if (ret)
>> + return ret;
>> +
>> + *val = enable;
>> + return 0;
>> +}
>> +
>> +static int cppc_set_auto_sel_u64(int cpu, u64 val)
>> +{
>> + return cppc_set_auto_sel(cpu, !!val);
>> +}
>> +
>> +static const struct cppc_saved_reg cppc_saved_regs[CPPC_NR_SAVED_REGS] = {
>> + [CPPC_SAVED_OSPM_NOMINAL_PERF] = {
>> + cppc_get_ospm_nominal_perf, cppc_set_ospm_nominal_perf,
>> + },
>> + [CPPC_SAVED_AUTO_SEL] = {
>> + cppc_get_auto_sel_u64, cppc_set_auto_sel_u64,
>> + },
>> + [CPPC_SAVED_EPP] = {
>> + cppc_get_epp_perf, cppc_set_epp,
>> + },
>> +};
>> +
>> +/*
>> + * Per-CPU saved state for each register in cppc_saved_regs[]:
>> + * firmware_val - register value before the driver touched it, restored
>> + * on unload
>> + * requested_val - last value the driver set (sysfs or boot parameter),
>> + * reapplied on policy reactivation
>> + * firmware_captured - whether firmware_val has been read, so a not-yet-seen
>> + * CPU isn't mistaken for one whose firmware value is 0
>> + */
>> +struct cppc_saved_state {
>> + u64 firmware_val;
>> + u64 requested_val;
>> + bool firmware_captured;
>> +};
>> +
>> +/*
>> + * Per-CPU and not tied to a policy, so the saved values survive policy
>> + * teardown/bring-up across CPU hotplug. cpu_data->perf_ctrls is per-policy
>> + * and freed on policy ->exit.
>> + */
>> +static DEFINE_PER_CPU(struct cppc_saved_state[CPPC_NR_SAVED_REGS], cppc_saved_state);
>> +
>> +static void cppc_cache_perf_ctrls(struct cppc_cpudata *cpu_data,
>> + enum cppc_saved_reg_id reg, u64 val)
>> +{
>> + switch (reg) {
>> + case CPPC_SAVED_AUTO_SEL:
>> + cpu_data->perf_ctrls.auto_sel = val;
>> + break;
>> + case CPPC_SAVED_EPP:
>> + cpu_data->perf_ctrls.energy_perf = val;
>> + break;
>> + default:
>> + break;
>> + }
>> +}
>> +
>> +/*
>> + * Save the requested value for the given register and CPU, to be reapplied when
>> + * the policy is reactivated after CPU hotplug. Also update the per-policy
>> + * perf_ctrls copy so the saved and current values stay in sync.
>> + */
>> +static void cppc_save_requested(struct cppc_cpudata *cpu_data, unsigned int cpu,
>> + enum cppc_saved_reg_id reg, u64 val)
>> +{
>> + per_cpu(cppc_saved_state, cpu)[reg].requested_val = val;
>> + cppc_cache_perf_ctrls(cpu_data, reg, val);
>> +}
>> +
>> +/*
>> + * Reapply each register's last requested value from ->init(), so a value set
>> + * via sysfs or the boot parameter survives a policy teardown and bring-up
>> + * across CPU hotplug. Also keep perf_ctrls in sync with it.
>> + */
>> +static void cppc_cpufreq_reapply_requested_regs(struct cpufreq_policy *policy)
>> +{
>> + struct cppc_cpudata *cpu_data = policy->driver_data;
>> + unsigned int cpu, i;
>> + u64 val;
>> +
>> + for_each_cpu(cpu, policy->cpus) {
>> + for (i = 0; i < CPPC_NR_SAVED_REGS; i++) {
>> + val = per_cpu(cppc_saved_state, cpu)[i].requested_val;
>> + if (val == U64_MAX)
>> + continue;
>> +
>> + cppc_saved_regs[i].set(cpu, val);
>> +
>> + /* Keep perf_ctrls in sync via the policy's CPU. */
>> + if (cpu == policy->cpu)
>> + cppc_cache_perf_ctrls(cpu_data, i, val);
>> + }
>> + }
>> +}
>> +
>> +/*
>> + * On a CPU's first ->init(), capture each register's firmware value to be
>> + * restored on driver unload. Later calls for the same CPU are a no-op. Capturing
>> + * from ->init() rather than module load covers CPUs that appear later. Also seed
>> + * requested_val to U64_MAX so its zeroed default is not taken as a request for 0.
>> + */
>> +static void cppc_cpufreq_save_firmware_regs(struct cpufreq_policy *policy)
>> +{
>> + unsigned int cpu, i;
>> + u64 val;
>> +
>> + for_each_cpu(cpu, policy->cpus) {
>> + for (i = 0; i < CPPC_NR_SAVED_REGS; i++) {
>> + struct cppc_saved_state *s =
>> + &per_cpu(cppc_saved_state, cpu)[i];
>> +
>> + /* Capture once per CPU; skip if already recorded. */
>> + if (s->firmware_captured)
>> + continue;
>> +
>> + if (cppc_saved_regs[i].get(cpu, &val))
>> + val = U64_MAX;
>> + s->firmware_val = val;
>> + s->requested_val = U64_MAX;
>> + s->firmware_captured = true;
>> + }
>> + }
>> +}
>> +
>> +/* On driver unload, restore each captured CPU's firmware value. */
>> +static void cppc_cpufreq_restore_firmware_regs(void)
>> +{
>> + unsigned int cpu, i;
>> +
>> + for_each_present_cpu(cpu) {
>> + for (i = 0; i < CPPC_NR_SAVED_REGS; i++) {
>> + struct cppc_saved_state *s =
>> + &per_cpu(cppc_saved_state, cpu)[i];
>> +
>> + if (s->firmware_captured && s->firmware_val != U64_MAX)
>> + cppc_saved_regs[i].set(cpu, s->firmware_val);
>> + }
>> + }
>> +}
>> +
>> /* Autonomous Selection boot parameter modes */
>> enum {
>> AUTO_SEL_DISABLED = 0,
>> @@ -766,6 +934,9 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>> policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>>
>> + /* Capture a CPU's firmware values on its first init, before any driver write. */
>> + cppc_cpufreq_save_firmware_regs(policy);
>> +
>> /*
>> * Enable autonomous mode on first init if boot param is set.
>> * Check last_governor to detect first init and skip if auto_sel
>> @@ -812,7 +983,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>> if (ret && ret != -EOPNOTSUPP)
>> pr_warn("Failed to set EPP for CPU%d (%d)\n", cpu, ret);
>> else if (!ret)
>> - cpu_data->perf_ctrls.energy_perf = epp;
>> + cppc_save_requested(cpu_data, cpu, CPPC_SAVED_EPP, epp);
>> }
>>
>> /* Program min/max/desired into CPPC regs (non-fatal on failure). */
>> @@ -826,7 +997,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>> pr_warn("auto_sel CPU%d failed (%d); using OS mode\n",
>> cpu, ret);
>> else if (!ret)
>> - cpu_data->perf_ctrls.auto_sel = true;
>> + cppc_save_requested(cpu_data, cpu, CPPC_SAVED_AUTO_SEL, true);
>> }
>>
>> if (cpu_data->perf_ctrls.auto_sel) {
>> @@ -850,6 +1021,10 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
>> }
>>
>> cppc_cpufreq_cpu_fie_init(policy);
>> +
>> + /* Reapply any saved values lost across a full policy teardown. */
>> + cppc_cpufreq_reapply_requested_regs(policy);
>> +
>> return 0;
>>
>> out:
>> @@ -1039,6 +1214,8 @@ static ssize_t store_auto_select(struct cpufreq_policy *policy,
>> }
>> }
>>
>> + cppc_save_requested(cpu_data, policy->cpu, CPPC_SAVED_AUTO_SEL, val);
>> +
>> return count;
>> }
>>
>> @@ -1111,7 +1288,7 @@ store_energy_performance_preference_val(struct cpufreq_policy *policy,
>> if (ret)
>> return ret;
>>
>> - cpu_data->perf_ctrls.energy_perf = val;
>> + cppc_save_requested(cpu_data, policy->cpu, CPPC_SAVED_EPP, val);
>>
>> return count;
>> }
>> @@ -1193,6 +1370,9 @@ static ssize_t store_ospm_nominal_freq(struct cpufreq_policy *policy,
>> }
>> }
>>
>> + for_each_cpu(sib, policy->cpus)
>> + cppc_save_requested(cpu_data, sib, CPPC_SAVED_OSPM_NOMINAL_PERF, perf);
>> +
>> return count;
>>
>> rollback:
>> @@ -1258,13 +1438,11 @@ static int __init cppc_cpufreq_init(void)
>>
>> static void __exit cppc_cpufreq_exit(void)
>> {
>> - unsigned int cpu;
>> -
>> - for_each_present_cpu(cpu)
>> - cppc_set_auto_sel(cpu, false);
>> -
>> cpufreq_unregister_driver(&cppc_cpufreq_driver);
>> cppc_freq_invariance_exit();
>> +
>> + /* Restore auto_sel and the other saved registers to their firmware value. */
>> + cppc_cpufreq_restore_firmware_regs();
>> }
>>
>> module_param_cb(auto_sel_mode, &auto_sel_mode_ops, &auto_sel_mode, 0444);
>> --
>> 2.34.1
>>
>>
next prev parent reply other threads:[~2026-06-24 12:56 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 9:54 [PATCH] cpufreq: CPPC: Preserve OSPM-set registers across hotplug and unload Sumit Gupta
2026-06-23 11:00 ` Rafael J. Wysocki
2026-06-24 12:56 ` Sumit Gupta [this message]
2026-06-24 13:20 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de03b23f-18a3-4e99-a15a-5044463ac484@nvidia.com \
--to=sumitg@nvidia.com \
--cc=bbasu@nvidia.com \
--cc=ionela.voinescu@arm.com \
--cc=jonathanh@nvidia.com \
--cc=ksitaraman@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=pierre.gondois@arm.com \
--cc=rafael@kernel.org \
--cc=sanjayc@nvidia.com \
--cc=treding@nvidia.com \
--cc=viresh.kumar@linaro.org \
--cc=vsethi@nvidia.com \
--cc=zhanjie9@hisilicon.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox