From: Pierre Gondois <pierre.gondois@arm.com>
To: Sumit Gupta <sumitg@nvidia.com>
Cc: linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, zhenglifeng1@huawei.com,
treding@nvidia.com, viresh.kumar@linaro.org,
jonathanh@nvidia.com, vsethi@nvidia.com, ionela.voinescu@arm.com,
ksitaraman@nvidia.com, sanjayc@nvidia.com,
zhanjie9@hisilicon.com, corbet@lwn.net, mochs@nvidia.com,
skhan@linuxfoundation.org, bbasu@nvidia.com,
rdunlap@infradead.org, linux-pm@vger.kernel.org,
mario.limonciello@amd.com, rafael@kernel.org
Subject: Re: [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support
Date: Fri, 10 Apr 2026 15:47:08 +0200 [thread overview]
Message-ID: <208360b1-36a5-419d-80f4-431914407f61@arm.com> (raw)
In-Reply-To: <b8debb30-67a5-4d2b-8c08-8fd287f7258e@nvidia.com>
Hello Sumit,
On 4/6/26 20:08, Sumit Gupta wrote:
> Hi Pierre,
>
> Thank you for the comments.
> Sorry for late reply as I was on vacation.
>
No worries
>
> On 24/03/26 23:48, Pierre Gondois wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Sumit,
>>
>> On 3/17/26 16:10, Sumit Gupta wrote:
>>> Add kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable CPPC
>>> autonomous performance selection on all CPUs at system startup without
>>> requiring runtime sysfs manipulation. When autonomous mode is enabled,
>>> the hardware automatically adjusts CPU performance based on workload
>>> demands using Energy Performance Preference (EPP) hints.
>>>
>>> When auto_sel_mode=1:
>>> - Configure all CPUs for autonomous operation on first init
>>> - Set EPP to performance preference (0x0)
>>> - Use HW min/max when set; otherwise program from policy limits (caps)
>>> - Clamp desired_perf to bounds before enabling autonomous mode
>>> - Hardware controls frequency instead of the OS governor
>>>
>>> The boot parameter is applied only during first policy initialization.
>>> On hotplug, skip applying it so that the user's runtime sysfs
>>> configuration is preserved.
>>>
>>> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> (Documentation)
>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>> ---
>>> Part 1 [1] of this series was applied for 7.1 and present in next.
>>> Sending this patch as reworked version of 'patch 11' from [2] based
>>> on next.
>>>
>>> [1]
>>> https://lore.kernel.org/lkml/20260206142658.72583-1-sumitg@nvidia.com/
>>> [2]
>>> https://lore.kernel.org/lkml/20251223121307.711773-1-sumitg@nvidia.com/
>>> ---
>>> .../admin-guide/kernel-parameters.txt | 13 +++
>>> drivers/cpufreq/cppc_cpufreq.c | 84
>>> +++++++++++++++++--
>>> 2 files changed, 92 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>> b/Documentation/admin-guide/kernel-parameters.txt
>>> index fa6171b5fdd5..de4b4c89edfe 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1060,6 +1060,19 @@ Kernel parameters
>>> policy to use. This governor must be
>>> registered in the
>>> kernel before the cpufreq driver probes.
>>>
>>> + cppc_cpufreq.auto_sel_mode=
>>> + [CPU_FREQ] Enable ACPI CPPC autonomous
>>> performance
>>> + selection. When enabled, hardware
>>> automatically adjusts
>>> + CPU frequency on all CPUs based on workload
>>> demands.
>>> + In Autonomous mode, Energy Performance
>>> Preference (EPP)
>>> + hints guide hardware toward performance (0x0)
>>> or energy
>>> + efficiency (0xff).
>>> + Requires ACPI CPPC autonomous selection
>>> register support.
>>> + Format: <bool>
>>> + Default: 0 (disabled)
>>> + 0: use cpufreq governors
>>> + 1: enable if supported by hardware
>>> +
>>> cpu_init_udelay=N
>>> [X86,EARLY] Delay for N microsec between
>>> assert and de-assert
>>> of APIC INIT to start processors. This delay
>>> occurs
>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>> b/drivers/cpufreq/cppc_cpufreq.c
>>> index 5dfb109cf1f4..49c148b2a0a4 100644
>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>> @@ -28,6 +28,9 @@
>>>
>>> static struct cpufreq_driver cppc_cpufreq_driver;
>>>
>>> +/* Autonomous Selection boot parameter */
>>> +static bool auto_sel_mode;
>>> +
>>> #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
>>> static enum {
>>> FIE_UNSET = -1,
>>> @@ -708,11 +711,74 @@ static int cppc_cpufreq_cpu_init(struct
>>> cpufreq_policy *policy)
>>> policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>>> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>>>
>>> - ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> - if (ret) {
>>> - pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
>>> - caps->highest_perf, cpu, ret);
>>> - goto out;
>>> + /*
>>> + * Enable autonomous mode on first init if boot param is set.
>>> + * Check last_governor to detect first init and skip if auto_sel
>>> + * is already enabled.
>>> + */
>> If the goal is to set autosel only once at the driver init,
>> shouldn't this be done in cppc_cpufreq_init() ?
>> I understand that cpu_data doesn't exist yet in
>> cppc_cpufreq_init(), but this seems more appropriate to do
>> it there IMO.
>>
>> This means the cpudata should be updated accordingly
>> in this cppc_cpufreq_cpu_init() function.
>
> In an earlier version [1], the setup was in cppc_cpufreq_init() but
> was moved to cppc_cpufreq_cpu_init() to improve per-CPU error handling.
> Keeping the setup in cppc_cpufreq_init() helps to avoid the last_governor
> check. We can warn for a CPU failing to enable and continue so other
> CPUs keep autonomous mode.
> cppc_cpufreq_cpu_init() would then just check the auto_sel state
> from register and sync policy limits from min/max_perf registers when
> autonomous mode is active.
> Please let me know your thoughts.
FWIU the auto_sel_mode module parameter allows to
configure the default auto_sel_mode when the driver is
first loaded, so there should not need to check that again
whenever cppc_cpufreq_cpu_init() is called.
Maybe Ionela saw something we didn't see ?
Also just to be sure, should it still be possible to change
the auto_sel_mode through the sysfs if the driver was
loaded with auto_sel_mode=1 ?
>
> [1]
> https://lore.kernel.org/lkml/5593d364-ca37-41c5-b33f-f7e245d6d626@nvidia.com/
>
>
>>
>>> + if (auto_sel_mode && policy->last_governor[0] == '\0' &&
>>> + !cpu_data->perf_ctrls.auto_sel) {
>>> + /* Enable CPPC - optional register, some platforms
>>> need it */
>> The documentation of the CPPC Enable Register is subject to
>> interpretation, but IIUC the field should be set to use the CPPC
>> controls, so I assume this should be set in cppc_cpufreq_init()
>> instead ?
>
> Agree that the CPPC Enable is about using the CPPC control path
> in general and not only for autonomous selection.
> Will move cppc_set_enable() into cppc_cpufreq_init() or outside the
> autonomous mode block in cppc_cpufreq_cpu_init() as per conclusion
> of previous comment.
>
>>> + ret = cppc_set_enable(cpu, true);
>>> + if (ret && ret != -EOPNOTSUPP)
>>> + pr_warn("Failed to enable CPPC for CPU%d
>>> (%d)\n", cpu, ret);
>>> +
>>> + /*
>>> + * Prefer HW min/max_perf when set; otherwise program
>>> from
>>> + * policy limits derived earlier from caps.
>>> + * Clamp desired_perf to bounds and sync policy->cur.
>>> + */
>>> + if (!cpu_data->perf_ctrls.min_perf ||
>>> !cpu_data->perf_ctrls.max_perf)
>>
>> The function doesn't seem to exist.
>
> It is newly added in [2].
> Don't need to call it if we move the setup to cppc_cpufreq_init().
Ah ok right thanks.
>
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea3db45ae476889a1ba0ab3617e6afdeeefbda3d
>
>
>
>>
>>> + cppc_cpufreq_update_perf_limits(cpu_data, policy);
>>> +
>>> + cpu_data->perf_ctrls.desired_perf =
>>> + clamp_t(u32, cpu_data->perf_ctrls.desired_perf,
>>> + cpu_data->perf_ctrls.min_perf,
>>> + cpu_data->perf_ctrls.max_perf);
>>> +
>>> + policy->cur = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.desired_perf);
>>> +
>>
>> Maybe this should also be done in cppc_cpufreq_init()
>> if the auto_sel_mode parameter is set ?
>
> Yes.
>
>>
>>> + /* EPP is optional - some platforms may not support it */
>>> + ret = cppc_set_epp(cpu, CPPC_EPP_PERFORMANCE_PREF);
>>> + if (ret && ret != -EOPNOTSUPP)
>>> + pr_warn("Failed to set EPP for CPU%d (%d)\n",
>>> cpu, ret);
>>> + else if (!ret)
>>> + cpu_data->perf_ctrls.energy_perf =
>>> CPPC_EPP_PERFORMANCE_PREF;
>>> +
>>> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> + if (ret) {
>>> + pr_debug("Err setting perf for autonomous mode
>>> CPU:%d ret:%d\n",
>>> + cpu, ret);
>>> + goto out;
>>> + }
>>> +
>>> + ret = cppc_set_auto_sel(cpu, true);
>>> + if (ret && ret != -EOPNOTSUPP) {
>>> + pr_warn("Failed autonomous config for CPU%d
>>> (%d)\n",
>>> + cpu, ret);
>>> + goto out;
>>> + }
>>> + if (!ret)
>>> + cpu_data->perf_ctrls.auto_sel = true;
>>> + }
>>> +
>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>
>> There is a patchset ongoing which tries to remove
>> setting policy->min/max from driver initialization.
>> Indeed, these values are only temporarily valid,
>> until the governor override them.
>> It is not sure yet the patch will be accepted though.
>>
>> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>>
>
>
> You are right that policy->min/max from .init() are temporary today
> as cpufreq_set_policy() overwrites them before the governor starts.
>
> On my test platform (highest == nominal, lowest_nonlinear == lowest),
> this had no visible effect because the BIOS bounds and cpuinfo range
> end up identical. But on platforms where they differ, the governor
> would widen the range to full cpuinfo limits.
>
> I think your patch [3] fixes this by giving these the right semantic as
> initial QoS requests. With it, cpufreq_set_policy() preserves the policy
> limits set from min/max_perf registers in .init(), which can either be
> BIOS values on first boot or last user configured values before hotplug.
>
> I will update the comment in v2 to reflect QoS seeding intent.
>
> I see that the first two patches of your series [3] is applied for 7.1.
> Do you plan to send the pending patch (3/4) from [3]?
>
I need to ping Viresh to check if this is still relevant.
> [3]
> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>
>
>>
>>
>>> + /* Sync policy limits from HW when autonomous mode is
>>> active */
>>> + policy->min = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.min_perf ?:
>>> + caps->lowest_nonlinear_perf);
>>> + policy->max = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.max_perf ?:
>>> + caps->nominal_perf);
>>> + } else {
>>> + /* Normal mode: governors control frequency */
>>> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> + if (ret) {
>>> + pr_debug("Err setting perf value:%d on CPU:%d.
>>> ret:%d\n",
>>> + caps->highest_perf, cpu, ret);
>>> + goto out;
>>> + }
>>> }
>>>
>>> cppc_cpufreq_cpu_fie_init(policy);
>>> @@ -1038,10 +1104,18 @@ static int __init cppc_cpufreq_init(void)
>>>
>>> static void __exit cppc_cpufreq_exit(void)
>>> {
>>> + unsigned int cpu;
>>> +
>>> + for_each_present_cpu(cpu)
>>> + cppc_set_auto_sel(cpu, false);
>>
>> If the firmware has a default EPP value, it means that loading
>> and the unloading the driver will reset this default EPP value.
>> Maybe the initial EPP value and/or the auto_sel value should be
>> cached somewhere and restored on exit ?
>> I don't know if this is actually an issue, this is just to signal it.
>
> The auto_sel_mode boot path programs EPP to performance preference(0),
> not the firmware’s previous value. On unload we only call
> cppc_set_auto_sel(false); we do not restore EPP, min/max perf,
> or other CPPC fields to firmware defaults.
Yes right, so loading/unloading the driver might change the
default EPP value.
>
> Thank you,
> Sumit Gupta
>
> ....
>
>
next prev parent reply other threads:[~2026-04-10 13:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 15:10 [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support Sumit Gupta
2026-03-24 18:18 ` Pierre Gondois
2026-04-06 18:08 ` Sumit Gupta
2026-04-10 13:47 ` Pierre Gondois [this message]
2026-04-13 5:51 ` Viresh Kumar
2026-04-20 13:13 ` Sumit Gupta
2026-04-20 13:07 ` Sumit Gupta
2026-04-24 12:10 ` Sumit Gupta
2026-04-24 12:55 ` Pierre Gondois
2026-04-24 13:52 ` Sumit Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=208360b1-36a5-419d-80f4-431914407f61@arm.com \
--to=pierre.gondois@arm.com \
--cc=bbasu@nvidia.com \
--cc=corbet@lwn.net \
--cc=ionela.voinescu@arm.com \
--cc=jonathanh@nvidia.com \
--cc=ksitaraman@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=mochs@nvidia.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=sanjayc@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=sumitg@nvidia.com \
--cc=treding@nvidia.com \
--cc=viresh.kumar@linaro.org \
--cc=vsethi@nvidia.com \
--cc=zhanjie9@hisilicon.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox