public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sumit Gupta <sumitg@nvidia.com>
To: Pierre Gondois <pierre.gondois@arm.com>
Cc: "linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"zhenglifeng1@huawei.com" <zhenglifeng1@huawei.com>,
	Thierry Reding <treding@nvidia.com>,
	"viresh.kumar@linaro.org" <viresh.kumar@linaro.org>,
	Jon Hunter <jonathanh@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	"ionela.voinescu@arm.com" <ionela.voinescu@arm.com>,
	Krishna Sitaraman <ksitaraman@nvidia.com>,
	Sanjay Chandrashekara <sanjayc@nvidia.com>,
	"zhanjie9@hisilicon.com" <zhanjie9@hisilicon.com>,
	"corbet@lwn.net" <corbet@lwn.net>, Matt Ochs <mochs@nvidia.com>,
	"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>,
	Bibek Basu <bbasu@nvidia.com>,
	"rdunlap@infradead.org" <rdunlap@infradead.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"mario.limonciello@amd.com" <mario.limonciello@amd.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	sumitg@nvidia.com
Subject: Re: [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support
Date: Mon, 20 Apr 2026 18:37:14 +0530	[thread overview]
Message-ID: <0fdc7e64-31aa-4bfd-ab27-dea2f349693b@nvidia.com> (raw)
In-Reply-To: <208360b1-36a5-419d-80f4-431914407f61@arm.com>


>>> On 3/17/26 16:10, Sumit Gupta wrote:
>>>> Add kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable CPPC
>>>> autonomous performance selection on all CPUs at system startup without
>>>> requiring runtime sysfs manipulation. When autonomous mode is enabled,
>>>> the hardware automatically adjusts CPU performance based on workload
>>>> demands using Energy Performance Preference (EPP) hints.
>>>>
>>>> When auto_sel_mode=1:
>>>> - Configure all CPUs for autonomous operation on first init
>>>> - Set EPP to performance preference (0x0)
>>>> - Use HW min/max when set; otherwise program from policy limits (caps)
>>>> - Clamp desired_perf to bounds before enabling autonomous mode
>>>> - Hardware controls frequency instead of the OS governor
>>>>
>>>> The boot parameter is applied only during first policy initialization.
>>>> On hotplug, skip applying it so that the user's runtime sysfs
>>>> configuration is preserved.
>>>>
>>>> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> (Documentation)
>>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>>> ---
>>>> Part 1 [1] of this series was applied for 7.1 and present in next.
>>>> Sending this patch as reworked version of 'patch 11' from [2] based
>>>> on next.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/lkml/20260206142658.72583-1-sumitg@nvidia.com/
>>>> [2]
>>>> https://lore.kernel.org/lkml/20251223121307.711773-1-sumitg@nvidia.com/
>>>> ---
>>>>    .../admin-guide/kernel-parameters.txt         | 13 +++
>>>>    drivers/cpufreq/cppc_cpufreq.c                | 84
>>>> +++++++++++++++++--
>>>>    2 files changed, 92 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>>> b/Documentation/admin-guide/kernel-parameters.txt
>>>> index fa6171b5fdd5..de4b4c89edfe 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -1060,6 +1060,19 @@ Kernel parameters
>>>>                        policy to use. This governor must be
>>>> registered in the
>>>>                        kernel before the cpufreq driver probes.
>>>>
>>>> +     cppc_cpufreq.auto_sel_mode=
>>>> +                     [CPU_FREQ] Enable ACPI CPPC autonomous
>>>> performance
>>>> +                     selection. When enabled, hardware
>>>> automatically adjusts
>>>> +                     CPU frequency on all CPUs based on workload
>>>> demands.
>>>> +                     In Autonomous mode, Energy Performance
>>>> Preference (EPP)
>>>> +                     hints guide hardware toward performance (0x0)
>>>> or energy
>>>> +                     efficiency (0xff).
>>>> +                     Requires ACPI CPPC autonomous selection
>>>> register support.
>>>> +                     Format: <bool>
>>>> +                     Default: 0 (disabled)
>>>> +                     0: use cpufreq governors
>>>> +                     1: enable if supported by hardware
>>>> +
>>>>        cpu_init_udelay=N
>>>>                        [X86,EARLY] Delay for N microsec between
>>>> assert and de-assert
>>>>                        of APIC INIT to start processors.  This delay
>>>> occurs
>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>>> b/drivers/cpufreq/cppc_cpufreq.c
>>>> index 5dfb109cf1f4..49c148b2a0a4 100644
>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>> @@ -28,6 +28,9 @@
>>>>
>>>>    static struct cpufreq_driver cppc_cpufreq_driver;
>>>>
>>>> +/* Autonomous Selection boot parameter */
>>>> +static bool auto_sel_mode;
>>>> +
>>>>    #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
>>>>    static enum {
>>>>        FIE_UNSET = -1,
>>>> @@ -708,11 +711,74 @@ static int cppc_cpufreq_cpu_init(struct
>>>> cpufreq_policy *policy)
>>>>        policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>>>>        cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>>>>
>>>> -     ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>>> -     if (ret) {
>>>> -             pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
>>>> -                      caps->highest_perf, cpu, ret);
>>>> -             goto out;
>>>> +     /*
>>>> +      * Enable autonomous mode on first init if boot param is set.
>>>> +      * Check last_governor to detect first init and skip if auto_sel
>>>> +      * is already enabled.
>>>> +      */
>>> If the goal is to set autosel only once at the driver init,
>>> shouldn't this be done in cppc_cpufreq_init() ?
>>> I understand that cpu_data doesn't exist yet in
>>> cppc_cpufreq_init(), but this seems more appropriate to do
>>> it there IMO.
>>>
>>> This means the cpudata should be updated accordingly
>>> in this cppc_cpufreq_cpu_init() function.
>> In an earlier version [1], the setup was in cppc_cpufreq_init() but
>> was moved to cppc_cpufreq_cpu_init() to improve per-CPU error handling.
>> Keeping the setup in cppc_cpufreq_init() helps to avoid the last_governor
>> check. We can warn for a CPU failing to enable and continue so other
>> CPUs keep autonomous mode.
>> cppc_cpufreq_cpu_init() would then just check the auto_sel state
>> from register and sync policy limits from min/max_perf registers when
>> autonomous mode is active.
>> Please let me know your thoughts.
> FWIU the auto_sel_mode module parameter allows to
> configure the default auto_sel_mode when the driver is
> first loaded, so there should not need to check that again
> whenever cppc_cpufreq_cpu_init() is called.
> Maybe Ionela saw something we didn't see ?

AFAIU, the concern in that review [1] was about error handling as the
earlier version disabled auto_sel on all CPUs if any single CPU failed.
Per-CPU error handling in cppc_cpufreq_init() (warn and continue)
addresses that. Can't think of more reason.
Do you have anything in mind?


>
> Also just to be sure, should it still be possible to change
> the auto_sel_mode through the sysfs if the driver was
> loaded with auto_sel_mode=1 ?
>


Yes, the per-CPU auto_select sysfs attribute works independently of the
boot param. Users can enable or disable auto_sel on any CPU at runtime
via sysfs, regardless of how the driver was loaded. The boot param only
sets the initial state.


>> [1]
>> https://lore.kernel.org/lkml/5593d364-ca37-41c5-b33f-f7e245d6d626@nvidia.com/
>>
>>
>>>> +     if (auto_sel_mode && policy->last_governor[0] == '\0' &&
>>>> +         !cpu_data->perf_ctrls.auto_sel) {
>>>> +             /* Enable CPPC - optional register, some platforms
>>>> need it */
>>> The documentation of the CPPC Enable Register is subject to
>>> interpretation, but IIUC the field should be set to use the CPPC
>>> controls, so I assume this should be set in cppc_cpufreq_init()
>>> instead ?
>> Agree that the CPPC Enable is about using the CPPC control path
>> in general and not only for autonomous selection.
>> Will move cppc_set_enable() into cppc_cpufreq_init() or outside the
>> autonomous mode block in cppc_cpufreq_cpu_init() as per conclusion
>> of previous comment.
>>
>>>> +             ret = cppc_set_enable(cpu, true);
>>>> +             if (ret && ret != -EOPNOTSUPP)
>>>> +                     pr_warn("Failed to enable CPPC for CPU%d
>>>> (%d)\n", cpu, ret);
>>>> +
>>>> +             /*
>>>> +              * Prefer HW min/max_perf when set; otherwise program
>>>> from
>>>> +              * policy limits derived earlier from caps.
>>>> +              * Clamp desired_perf to bounds and sync policy->cur.
>>>> +              */
>>>> +             if (!cpu_data->perf_ctrls.min_perf ||
>>>> !cpu_data->perf_ctrls.max_perf)
>>> The function doesn't seem to exist.
>> It is newly added in [2].
>> Don't need to call it if we move the setup to cppc_cpufreq_init().
> Ah ok right thanks.
>
>
>> [2]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea3db45ae476889a1ba0ab3617e6afdeeefbda3d
>>
>>
>>
>>>> + cppc_cpufreq_update_perf_limits(cpu_data, policy);
>>>> +
>>>> +             cpu_data->perf_ctrls.desired_perf =
>>>> +                     clamp_t(u32, cpu_data->perf_ctrls.desired_perf,
>>>> + cpu_data->perf_ctrls.min_perf,
>>>> + cpu_data->perf_ctrls.max_perf);
>>>> +
>>>> +             policy->cur = cppc_perf_to_khz(caps,
>>>> + cpu_data->perf_ctrls.desired_perf);
>>>> +
>>> Maybe this should also be done in cppc_cpufreq_init()
>>> if the auto_sel_mode parameter is set ?
>> Yes.
>>
>>>> +             /* EPP is optional - some platforms may not support it */
>>>> +             ret = cppc_set_epp(cpu, CPPC_EPP_PERFORMANCE_PREF);
>>>> +             if (ret && ret != -EOPNOTSUPP)
>>>> +                     pr_warn("Failed to set EPP for CPU%d (%d)\n",
>>>> cpu, ret);
>>>> +             else if (!ret)
>>>> +                     cpu_data->perf_ctrls.energy_perf =
>>>> CPPC_EPP_PERFORMANCE_PREF;
>>>> +
>>>> +             ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>>> +             if (ret) {
>>>> +                     pr_debug("Err setting perf for autonomous mode
>>>> CPU:%d ret:%d\n",
>>>> +                              cpu, ret);
>>>> +                     goto out;
>>>> +             }
>>>> +
>>>> +             ret = cppc_set_auto_sel(cpu, true);
>>>> +             if (ret && ret != -EOPNOTSUPP) {
>>>> +                     pr_warn("Failed autonomous config for CPU%d
>>>> (%d)\n",
>>>> +                             cpu, ret);
>>>> +                     goto out;
>>>> +             }
>>>> +             if (!ret)
>>>> +                     cpu_data->perf_ctrls.auto_sel = true;
>>>> +     }
>>>> +
>>>> +     if (cpu_data->perf_ctrls.auto_sel) {
>>> There is a patchset ongoing which tries to remove
>>> setting policy->min/max from driver initialization.
>>> Indeed, these values are only temporarily valid,
>>> until the governor override them.
>>> It is not sure yet the patch will be accepted though.
>>>
>>> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>>>
>>
>> You are right that policy->min/max from .init() are temporary today
>> as cpufreq_set_policy() overwrites them before the governor starts.
>>
>> On my test platform (highest == nominal, lowest_nonlinear == lowest),
>> this had no visible effect because the BIOS bounds and cpuinfo range
>> end up identical. But on platforms where they differ, the governor
>> would widen the range to full cpuinfo limits.
>>
>> I think your patch [3] fixes this by giving these the right semantic as
>> initial QoS requests. With it, cpufreq_set_policy() preserves the policy
>> limits set from min/max_perf registers in .init(), which can either be
>> BIOS values on first boot or last user configured values before hotplug.
>>
>> I will update the comment in v2 to reflect QoS seeding intent.
>>
>> I see that the first two patches of your series [3] is applied for 7.1.
>> Do you plan to send the pending patch (3/4) from [3]?
>>
> I need to ping Viresh to check if this is still relevant.
>
>
>> [3]
>> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>>
>>
>>>
>>>> +             /* Sync policy limits from HW when autonomous mode is
>>>> active */
>>>> +             policy->min = cppc_perf_to_khz(caps,
>>>> + cpu_data->perf_ctrls.min_perf ?:
>>>> + caps->lowest_nonlinear_perf);
>>>> +             policy->max = cppc_perf_to_khz(caps,
>>>> + cpu_data->perf_ctrls.max_perf ?:
>>>> + caps->nominal_perf);
>>>> +     } else {
>>>> +             /* Normal mode: governors control frequency */
>>>> +             ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>>> +             if (ret) {
>>>> +                     pr_debug("Err setting perf value:%d on CPU:%d.
>>>> ret:%d\n",
>>>> +                              caps->highest_perf, cpu, ret);
>>>> +                     goto out;
>>>> +             }
>>>>        }
>>>>
>>>>        cppc_cpufreq_cpu_fie_init(policy);
>>>> @@ -1038,10 +1104,18 @@ static int __init cppc_cpufreq_init(void)
>>>>
>>>>    static void __exit cppc_cpufreq_exit(void)
>>>>    {
>>>> +     unsigned int cpu;
>>>> +
>>>> +     for_each_present_cpu(cpu)
>>>> +             cppc_set_auto_sel(cpu, false);
>>> If the firmware has a default EPP value, it means that loading
>>> and the unloading the driver will reset this default EPP value.
>>> Maybe the initial EPP value and/or the auto_sel value should be
>>> cached somewhere and restored on exit ?
>>> I don't know if this is actually an issue, this is just to signal it.
>> The auto_sel_mode boot path programs EPP to performance preference(0),
>> not the firmware’s previous value. On unload we only call
>> cppc_set_auto_sel(false); we do not restore EPP, min/max perf,
>> or other CPPC fields to firmware defaults.
> Yes right, so loading/unloading the driver might change the
> default EPP value.

Acknowledged.
With auto_sel_mode, load/unload can leave EPP and other CPPC fields
different from the firmware defaults at boot.
We can add explicit cache-and-restore on exit in a follow-up
if that is desired.

Thank you,
Sumit Gupta



  parent reply	other threads:[~2026-04-20 13:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 15:10 [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support Sumit Gupta
2026-03-24 18:18 ` Pierre Gondois
2026-04-06 18:08   ` Sumit Gupta
2026-04-10 13:47     ` Pierre Gondois
2026-04-13  5:51       ` Viresh Kumar
2026-04-20 13:13         ` Sumit Gupta
2026-04-20 13:07       ` Sumit Gupta [this message]
2026-04-24 12:10         ` Sumit Gupta
2026-04-24 12:55           ` Pierre Gondois
2026-04-24 13:52             ` Sumit Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0fdc7e64-31aa-4bfd-ab27-dea2f349693b@nvidia.com \
    --to=sumitg@nvidia.com \
    --cc=bbasu@nvidia.com \
    --cc=corbet@lwn.net \
    --cc=ionela.voinescu@arm.com \
    --cc=jonathanh@nvidia.com \
    --cc=ksitaraman@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=mochs@nvidia.com \
    --cc=pierre.gondois@arm.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=sanjayc@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=treding@nvidia.com \
    --cc=viresh.kumar@linaro.org \
    --cc=vsethi@nvidia.com \
    --cc=zhanjie9@hisilicon.com \
    --cc=zhenglifeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox