From: Pierre Gondois <pierre.gondois@arm.com>
To: Sumit Gupta <sumitg@nvidia.com>,
rafael@kernel.org, viresh.kumar@linaro.org,
ionela.voinescu@arm.com, zhenglifeng1@huawei.com,
zhanjie9@hisilicon.com, corbet@lwn.net,
skhan@linuxfoundation.org, rdunlap@infradead.org,
mario.limonciello@amd.com, linux-kernel@vger.kernel.org,
linux-pm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-tegra@vger.kernel.org
Cc: treding@nvidia.com, jonathanh@nvidia.com, vsethi@nvidia.com,
ksitaraman@nvidia.com, sanjayc@nvidia.com, mochs@nvidia.com,
bbasu@nvidia.com
Subject: Re: [PATCH v5 2/2] cpufreq: CPPC: add autonomous mode boot parameter support
Date: Wed, 1 Jul 2026 18:24:05 +0200 [thread overview]
Message-ID: <07721a34-dae0-4575-897b-e4cb7754cf4d@arm.com> (raw)
In-Reply-To: <20260623080652.3353386-3-sumitg@nvidia.com>
On 6/23/26 10:06, Sumit Gupta wrote:
> Add a kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable
> CPPC autonomous performance selection on all CPUs at system startup.
> When autonomous mode is enabled, the hardware automatically adjusts
> CPU performance based on workload demands using Energy Performance
> Preference (EPP) hints.
>
> When the parameter is set:
> - Configure all CPUs for autonomous operation on first init
> - Use HW min/max_perf when available; otherwise initialize from caps
> - Initialize desired_perf to max_perf as a starting hint
> - Hardware controls frequency instead of the OS governor
> - EPP behavior depends on parameter value:
> - performance (or 1): override EPP to performance (0x0)
> - balance_performance (or 2): override EPP to balance_performance
> (0x80)
> - default_epp (or 3): preserve EPP value programmed by
> BIOS/firmware
>
> Unset, "0"/"disabled", or an unrecognized value leaves autonomous
> selection disabled.
>
> The boot parameter is applied only during first policy initialization.
> Skip applying it on CPU hotplug to preserve runtime sysfs configuration.
>
> This relies on commit 8c83947c5dbb ("cpufreq: Use policy->min/max init as
> QoS request") so that the policy->min/max set in cppc_cpufreq_cpu_init()
> are used as the policy's QoS requests and not overridden by
> cpufreq_set_policy() during init.
>
> Signed-off-by: Sumit Gupta<sumitg@nvidia.com>
> ---
> .../admin-guide/kernel-parameters.txt | 22 +++
> drivers/cpufreq/cppc_cpufreq.c | 151 +++++++++++++++++-
> include/acpi/cppc_acpi.h | 1 +
> 3 files changed, 169 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index b5493a7f8f22..88820d34d516 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1019,6 +1019,28 @@ Kernel parameters
> policy to use. This governor must be registered in the
> kernel before the cpufreq driver probes.
>
> + cppc_cpufreq.auto_sel_mode=
> + [CPU_FREQ] Enable ACPI CPPC autonomous performance
> + selection. When enabled, hardware automatically adjusts
> + CPU frequency on all CPUs based on workload demands.
> + In Autonomous mode, Energy Performance Preference (EPP)
> + hints guide hardware toward performance (0x0) or energy
> + efficiency (0xff).
> + Requires ACPI CPPC autonomous selection register
> + support.
> + Accepts:
> + disabled, 0:
Just a question, but would it be worth only accepting
strings ? If we want to have a thinner granularity later,
it will be harder to introduce them if there are integer values
already present.
> + cpufreq governors are used (auto_sel disabled)
> + performance, 1:
> + enable auto_sel + set EPP to performance (0x0)
> + balance_performance, 2:
> + enable auto_sel + set EPP to
> + balance_performance (0x80)
> + default_epp, 3:
> + enable auto_sel, preserve EPP value programmed
> + by BIOS/firmware
> + Unset or an unrecognized value is treated as disabled.
> +
> cpu_init_udelay=N
> [X86,EARLY] Delay for N microsec between assert and de-assert
> of APIC INIT to start processors. This delay occurs
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index f7a47576717a..efa673e3830c 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -28,6 +28,55 @@
>
> static struct cpufreq_driver cppc_cpufreq_driver;
>
> +/* Autonomous Selection boot parameter modes */
> +enum {
> + AUTO_SEL_DISABLED = 0,
> + AUTO_SEL_PERFORMANCE = 1,
> + AUTO_SEL_BALANCE_PERFORMANCE = 2,
> + AUTO_SEL_DEFAULT_EPP = 3,
> +};
> +
> +static int auto_sel_mode;
> +
> +static int auto_sel_mode_set(const char *val, const struct kernel_param *kp)
> +{
> + int *mode = kp->arg;
> +
> + *mode = AUTO_SEL_DISABLED;
> +
> + if (sysfs_streq(val, "performance") || sysfs_streq(val, "1"))
> + *mode = AUTO_SEL_PERFORMANCE;
> + else if (sysfs_streq(val, "balance_performance") || sysfs_streq(val, "2"))
> + *mode = AUTO_SEL_BALANCE_PERFORMANCE;
> + else if (sysfs_streq(val, "default_epp") || sysfs_streq(val, "3"))
> + *mode = AUTO_SEL_DEFAULT_EPP;
> + else if (!sysfs_streq(val, "disabled") && !sysfs_streq(val, "0"))
> + pr_warn("Invalid auto_sel_mode \"%s\", disable auto select\n", val);
> +
> + return 0;
> +}
> +
> +static int auto_sel_mode_get(char *buffer, const struct kernel_param *kp)
> +{
> + int *mode = kp->arg;
> +
> + switch (*mode) {
> + case AUTO_SEL_PERFORMANCE:
> + return sysfs_emit(buffer, "performance\n");
> + case AUTO_SEL_BALANCE_PERFORMANCE:
> + return sysfs_emit(buffer, "balance_performance\n");
> + case AUTO_SEL_DEFAULT_EPP:
> + return sysfs_emit(buffer, "default_epp\n");
> + default:
> + return sysfs_emit(buffer, "disabled\n");
> + }
> +}
> +
> +static const struct kernel_param_ops auto_sel_mode_ops = {
> + .set = auto_sel_mode_set,
> + .get = auto_sel_mode_get,
> +};
> +
> #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
> static enum {
> FIE_UNSET = -1,
> @@ -645,7 +694,9 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> unsigned int cpu = policy->cpu;
> struct cppc_cpudata *cpu_data;
> struct cppc_perf_caps *caps;
> + bool set_epp = true;
> int ret;
> + u32 epp;
>
> cpu_data = cppc_cpufreq_get_cpu_data(cpu);
> if (!cpu_data) {
> @@ -715,11 +766,87 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
> policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>
> - ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> - if (ret) {
> - pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> - caps->highest_perf, cpu, ret);
> - goto out;
> + /*
> + * Enable autonomous mode on first init if boot param is set.
> + * Check last_governor to detect first init and skip if auto_sel
> + * is already enabled.
> + */
> + if (auto_sel_mode && policy->last_governor[0] == '\0' &&
> + !cpu_data->perf_ctrls.auto_sel) {
If an .online() callback is introduced, does it mean that we can
remove the "policy->last_governor[0] == '\0'" check ?
Also maybe it would be worth creating a function for this
"if" block ?
Also (bis), maybe we should check that auto_sel is supported before
doing anything else.
> + /* Init min/max_perf from caps if not already set by HW. */
> + if (!cpu_data->perf_ctrls.min_perf)
> + cpu_data->perf_ctrls.min_perf = caps->lowest_nonlinear_perf;
> + if (!cpu_data->perf_ctrls.max_perf)
> + cpu_data->perf_ctrls.max_perf = policy->boost_enabled ?
> + caps->highest_perf : caps->nominal_perf;
Is it necessary to do that ?
- for min_perf, we are setting it to the lowest possible value
- for max_perf, we are setting it to the highest available value.
If boost is disabled and we enabled it later, I don't think max_perf is
updated accordingly, so we would limit the freq. to caps->nominal_perf
(If I m not missing something)
> +
> + /*
> + * In autonomous mode desired_perf is only a hint; EPP and
> + * the platform drive actual selection within [min, max].
> + * Initialize it to max_perf so HW starts at the upper bound.
> + */
> + cpu_data->perf_ctrls.desired_perf = cpu_data->perf_ctrls.max_perf;
> +
> + policy->cur = cppc_perf_to_khz(caps,
> + cpu_data->perf_ctrls.desired_perf);
> +
> + /*
> + * Set EPP per mode. 'default_epp' preserves the BIOS/firmware
> + * programmed EPP value. EPP is optional - some platforms may
> + * not support it.
> + */
> + switch (auto_sel_mode) {
> + case AUTO_SEL_PERFORMANCE:
> + epp = CPPC_EPP_PERFORMANCE_PREF;
> + break;
> + case AUTO_SEL_BALANCE_PERFORMANCE:
> + epp = CPPC_EPP_BALANCE_PERFORMANCE_PREF;
> + break;
> + default:
> + set_epp = false;
> + break;
> + }
> +
> + if (set_epp) {
> + ret = cppc_set_epp(cpu, epp);
> + if (ret && ret != -EOPNOTSUPP)
> + pr_warn("Failed to set EPP for CPU%d (%d)\n", cpu, ret);
> + else if (!ret)
> + cpu_data->perf_ctrls.energy_perf = epp;
> + }
> +
> + /* Program min/max/desired into CPPC regs (non-fatal on failure). */
> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> + if (ret)
> + pr_warn("set_perf failed CPU%d (%d); using HW values\n",
> + cpu, ret);
> +
> + ret = cppc_set_auto_sel(cpu, true);
> + if (ret && ret != -EOPNOTSUPP)
> + pr_warn("auto_sel CPU%d failed (%d); using OS mode\n",
> + cpu, ret);
> + else if (!ret)
> + cpu_data->perf_ctrls.auto_sel = true;
> + }
(until here)
> +
> + if (cpu_data->perf_ctrls.auto_sel) {
> + /* Sync policy limits from HW when autonomous mode is active */
Similar comment as above, doesn't it fall into a similar case as:
521223d8b3ec ("cpufreq: Fix initialization of min and max frequency QoS
request")
?
> + policy->min = cppc_perf_to_khz(caps,
> + cpu_data->perf_ctrls.min_perf ?:
> + caps->lowest_nonlinear_perf);
> + policy->max = cppc_perf_to_khz(caps,
> + cpu_data->perf_ctrls.max_perf ?:
> + (policy->boost_enabled ?
> + caps->highest_perf :
> + caps->nominal_perf));
> + } else {
> + /* Normal mode: governors control frequency */
> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
> + if (ret) {
> + pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
> + caps->highest_perf, cpu, ret);
> + goto out;
> + }
> }
>
> cppc_cpufreq_cpu_fie_init(policy);
> @@ -1066,10 +1193,24 @@ static int __init cppc_cpufreq_init(void)
>
> static void __exit cppc_cpufreq_exit(void)
> {
> + unsigned int cpu;
> +
> + for_each_present_cpu(cpu)
> + cppc_set_auto_sel(cpu, false);
I saw that this is being changed in:
[PATCH] cpufreq: CPPC: Preserve OSPM-set registers across hotplug and unload
But I think it would make more sense to have this patch comes before these
patches:
[1] ACPI: CPPC: Add ospm_nominal_perf support
https://lore.kernel.org/lkml/20260615185934.2383514-1-sumitg@nvidia.com/
[2] cpufreq: CPPC: add autonomous mode boot parameter support
https://lore.kernel.org/lkml/20260623080652.3353386-1-sumitg@nvidia.com/
to avoid making changes that are corrected in this last patch of the serie.
> +
> cpufreq_unregister_driver(&cppc_cpufreq_driver);
> cppc_freq_invariance_exit();
> }
>
> +module_param_cb(auto_sel_mode, &auto_sel_mode_ops, &auto_sel_mode, 0444);
> +MODULE_PARM_DESC(auto_sel_mode,
> + "Enable CPPC autonomous performance selection at boot: "
> + "disabled or 0 (use cpufreq governors), "
> + "performance or 1 (EPP=performance), "
> + "balance_performance or 2 (EPP=balance_performance), "
> + "default_epp or 3 (preserve BIOS/firmware EPP); "
> + "an unrecognized value is treated as disabled");
> +
> module_exit(cppc_cpufreq_exit);
> MODULE_AUTHOR("Ashwin Chaugule");
> MODULE_DESCRIPTION("CPUFreq driver based on the ACPI CPPC v5.0+ spec");
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 8693890a7275..9b18fb9aab7c 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -42,6 +42,7 @@
> #define CPPC_AUTO_ACT_WINDOW_SIG_CARRY_THRESH 129
>
> #define CPPC_EPP_PERFORMANCE_PREF 0x00
> +#define CPPC_EPP_BALANCE_PERFORMANCE_PREF 0x80
> #define CPPC_EPP_ENERGY_EFFICIENCY_PREF 0xFF
>
> #define CPPC_PERF_LIMITED_DESIRED_EXCURSION BIT(0)
prev parent reply other threads:[~2026-07-01 16:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 8:06 [PATCH v5 0/2] cpufreq: CPPC: add autonomous mode boot parameter support Sumit Gupta
2026-06-23 8:06 ` [PATCH v5 1/2] cpufreq: CPPC: Set CPPC Enable register in cpu_init Sumit Gupta
2026-06-23 8:06 ` [PATCH v5 2/2] cpufreq: CPPC: add autonomous mode boot parameter support Sumit Gupta
2026-07-01 16:24 ` Pierre Gondois [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=07721a34-dae0-4575-897b-e4cb7754cf4d@arm.com \
--to=pierre.gondois@arm.com \
--cc=bbasu@nvidia.com \
--cc=corbet@lwn.net \
--cc=ionela.voinescu@arm.com \
--cc=jonathanh@nvidia.com \
--cc=ksitaraman@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=mochs@nvidia.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=sanjayc@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=sumitg@nvidia.com \
--cc=treding@nvidia.com \
--cc=viresh.kumar@linaro.org \
--cc=vsethi@nvidia.com \
--cc=zhanjie9@hisilicon.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox