* [v2 PATCH 0/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
@ 2026-01-28 3:31 Aaron Tomlin
2026-01-28 3:31 ` [v2 PATCH 1/1] " Aaron Tomlin
2026-02-08 21:46 ` [v2 PATCH 0/1] " Aaron Tomlin
0 siblings, 2 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-01-28 3:31 UTC (permalink / raw)
To: rafael, dakr, pavel, lenb
Cc: akpm, bp, pmladek, rdunlap, feng.tang, pawan.kumar.gupta, kees,
elver, arnd, fvdl, lirongqing, bhelgaas, neelx, sean, mproche,
chjohnst, nick.lange, linux-kernel, linux-pm, linux-doc
Hi Rafael, Danilo, Pavel, Len,
Users currently lack a mechanism to define granular, per-CPU PM QoS
resume latency constraints during the early boot phase.
While the "idle=poll" boot parameter exists, it enforces a global
override, forcing all CPUs in the system to "poll". This global approach
is not suitable for asymmetric workloads where strict latency guarantees
are required only on specific critical CPUs, while housekeeping or
non-critical CPUs should be allowed to enter deeper power states to save
energy.
Additionally, the existing sysfs interface
(/sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us) becomes
available only after userspace initialisation. This is too late to
prevent deep C-state entry during the early kernel boot phase, which may
be required for debugging early boot hangs related to C-state
transitions or for workloads requiring strict latency guarantees
immediately upon system start.
This patch introduces the pm_qos_resume_latency_us kernel boot
parameter, which allows users to specify distinct resume latency
constraints for specific CPU ranges.
Syntax: pm_qos_resume_latency_us=range:value,range:value...
Unlike the sysfs interface which accepts the special string "n/a" to
remove a constraint, this boot parameter strictly requires integer
values. The special value "n/a" is not supported; the integer 0 must be
used to represent a 0 us latency constraint (polling).
For example:
"pm_qos_resume_latency_us=0:0,1-15:20"
Forces CPU 0 to poll on idle; constrains CPUs 1-15 to not enter a sleep
state that takes longer than 20 us to wake up. All other CPUs will have
the default (no resume latency) applied.
Implementation Details:
- The parameter string is captured via __setup() and parsed in
an early_initcall() to ensure suitable memory allocators are
available.
- Constraints are stored in a read-only linked list.
- The constraints are queried and applied in register_cpu().
This ensures the latency requirement is active immediately
upon CPU registration, effectively acting as a "birth"
constraint before the cpuidle governor takes over.
- The parsing logic enforces a "First Match Wins" policy: if a
CPU falls into multiple specified ranges, the latency value
from the first matching entry is used.
- The constraints persist across CPU hotplug events.
Please let me know your thoughts.
Changes since v1 [1]:
- Removed boot_option_idle_override == IDLE_POLL check
- Decoupled implementation from CONFIG_CPU_IDLE
- Added kernel-parameters.txt documentation
- Renamed internal setup functions for consistency
[1]: https://lore.kernel.org/lkml/20260123010024.3301276-1-atomlin@atomlin.com/
Aaron Tomlin (1):
PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
.../admin-guide/kernel-parameters.txt | 23 +++
drivers/base/cpu.c | 5 +-
include/linux/pm_qos.h | 5 +
kernel/power/qos.c | 141 ++++++++++++++++++
4 files changed, 172 insertions(+), 2 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
2026-01-28 3:31 [v2 PATCH 0/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us Aaron Tomlin
@ 2026-01-28 3:31 ` Aaron Tomlin
2026-03-04 2:02 ` Aaron Tomlin
2026-03-04 9:05 ` Zhongqiu Han
2026-02-08 21:46 ` [v2 PATCH 0/1] " Aaron Tomlin
1 sibling, 2 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-01-28 3:31 UTC (permalink / raw)
To: rafael, dakr, pavel, lenb
Cc: akpm, bp, pmladek, rdunlap, feng.tang, pawan.kumar.gupta, kees,
elver, arnd, fvdl, lirongqing, bhelgaas, neelx, sean, mproche,
chjohnst, nick.lange, linux-kernel, linux-pm, linux-doc
Users currently lack a mechanism to define granular, per-CPU PM QoS
resume latency constraints during the early boot phase.
While the idle=poll boot parameter exists, it enforces a global
override, forcing all CPUs in the system to "poll". This global approach
is not suitable for asymmetric workloads where strict latency guarantees
are required only on specific critical CPUs, while housekeeping or
non-critical CPUs should be allowed to enter deeper idle states to save
energy.
Additionally, the existing sysfs interface
(/sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us) becomes
available only after userspace initialisation. This is too late to
prevent deep C-state entry during the early kernel boot phase, which may
be required for debugging early boot hangs related to C-state
transitions or for workloads requiring strict latency guarantees
immediately upon system start.
This patch introduces the pm_qos_resume_latency_us kernel boot
parameter, which allows users to specify distinct resume latency
constraints for specific CPU ranges.
Syntax: pm_qos_resume_latency_us=range:value,range:value...
Unlike the sysfs interface which accepts the special string "n/a" to
remove a constraint, this boot parameter strictly requires integer
values. The special value "n/a" is not supported; the integer 0 must be
used to represent a 0 us latency constraint (polling).
For example:
"pm_qos_resume_latency_us=0:0,1-15:20"
Forces CPU 0 to poll on idle; constrains CPUs 1-15 to not enter a sleep
state that takes longer than 20 us to wake up. All other CPUs will have
the default (no resume latency) applied.
Implementation Details:
- The parameter string is captured via __setup() and parsed in
an early_initcall() to ensure suitable memory allocators are
available.
- Constraints are stored in a read-only linked list.
- The constraints are queried and applied in register_cpu().
This ensures the latency requirement is active immediately
upon CPU registration, effectively acting as a "birth"
constraint before the cpuidle governor takes over.
- The parsing logic enforces a "First Match Wins" policy: if a
CPU falls into multiple specified ranges, the latency value
from the first matching entry is used.
- The constraints persist across CPU hotplug events.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
.../admin-guide/kernel-parameters.txt | 23 +++
drivers/base/cpu.c | 5 +-
include/linux/pm_qos.h | 5 +
kernel/power/qos.c | 141 ++++++++++++++++++
4 files changed, 172 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6a3d6bd0746c..afba39ecfdee 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2238,6 +2238,29 @@ Kernel parameters
icn= [HW,ISDN]
Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
+ pm_qos_resume_latency_us= [KNL,EARLY]
+ Format: <cpu-list>:<value>[,<cpu-list>:<value>...]
+
+ Establish per-CPU resume latency constraints. These constraints
+ are applied immediately upon CPU registration and persist
+ across CPU hotplug events.
+
+ For example:
+ "pm_qos_resume_latency_us=0:0,1-15:20"
+
+ This restricts CPU 0 to a 0us resume latency (effectively
+ forcing polling) and limits CPUs 1-15 to C-states with a
+ maximum exit latency of 20us. All other CPUs remain
+ unconstrained by this parameter.
+
+ Unlike the sysfs interface, which accepts the string "n/a" to
+ remove a constraint, this boot parameter strictly requires
+ integer values. To specify a 0us latency constraint (polling),
+ the integer 0 must be used.
+
+ NOTE: The parsing logic enforces a "First Match Wins" policy.
+ If a CPU is included in multiple specified ranges, the latency
+ value from the first matching entry takes precedence.
idle= [X86,EARLY]
Format: idle=poll, idle=halt, idle=nomwait
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c6c57b6f61c6..1dea5bcd76a0 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -416,6 +416,7 @@ EXPORT_SYMBOL_GPL(cpu_subsys);
int register_cpu(struct cpu *cpu, int num)
{
int error;
+ s32 resume_latency;
cpu->node_id = cpu_to_node(num);
memset(&cpu->dev, 0x00, sizeof(struct device));
@@ -436,8 +437,8 @@ int register_cpu(struct cpu *cpu, int num)
per_cpu(cpu_sys_devices, num) = &cpu->dev;
register_cpu_under_node(num, cpu_to_node(num));
- dev_pm_qos_expose_latency_limit(&cpu->dev,
- PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
+ resume_latency = pm_qos_get_boot_cpu_latency_limit(num);
+ dev_pm_qos_expose_latency_limit(&cpu->dev, resume_latency);
set_cpu_enabled(num, true);
return 0;
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 6cea4455f867..556a7dff1419 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -174,6 +174,7 @@ static inline s32 cpu_wakeup_latency_qos_limit(void)
#ifdef CONFIG_PM
enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask);
enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask);
+s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu);
s32 __dev_pm_qos_resume_latency(struct device *dev);
s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type);
int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
@@ -218,6 +219,10 @@ static inline s32 dev_pm_qos_raw_resume_latency(struct device *dev)
pm_qos_read_value(&dev->power.qos->resume_latency);
}
#else
+static inline s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
+{
+ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+}
static inline enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev,
s32 mask)
{ return PM_QOS_FLAGS_UNDEFINED; }
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index f7d8064e9adc..e23223e3c7e8 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -34,6 +34,11 @@
#include <linux/kernel.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
+#include <linux/cpumask.h>
+#include <linux/cpu.h>
+#include <linux/list.h>
+
+#include <asm/setup.h>
#include <linux/uaccess.h>
#include <linux/export.h>
@@ -46,6 +51,10 @@
*/
static DEFINE_SPINLOCK(pm_qos_lock);
+static LIST_HEAD(pm_qos_boot_list);
+
+static char pm_qos_resume_latency_cmdline[COMMAND_LINE_SIZE] __initdata;
+
/**
* pm_qos_read_value - Return the current effective constraint value.
* @c: List of PM QoS constraint requests.
@@ -209,6 +218,138 @@ bool pm_qos_update_flags(struct pm_qos_flags *pqf,
return prev_value != curr_value;
}
+struct pm_qos_boot_entry {
+ struct list_head node;
+ struct cpumask mask;
+ s32 latency;
+};
+
+static int __init pm_qos_resume_latency_us_setup(char *str)
+{
+ strscpy(pm_qos_resume_latency_cmdline, str,
+ sizeof(pm_qos_resume_latency_cmdline));
+ return 1;
+}
+__setup("pm_qos_resume_latency_us=", pm_qos_resume_latency_us_setup);
+
+/* init_pm_qos_resume_latency_us_setup - Parse the pm_qos_resume_latency_us boot parameter.
+ *
+ * Parses the kernel command line option "pm_qos_resume_resume_latency_us=" to establish
+ * per-CPU resume latency constraints. These constraints are applied
+ * immediately when a CPU is registered.
+ *
+ * Syntax: pm_qos_resume_latency_us=<cpu-list>:<value>[,<cpu-list>:<value>...]
+ * Example: pm_qos_resume_latency_us=0-3:0,4-7:20
+ *
+ * The parsing logic enforces a "First Match Wins" policy. If a CPU is
+ * covered by multiple entries in the list, only the first valid entry
+ * applies. Any subsequent overlapping ranges for that CPU are ignored.
+ *
+ * Return: 0 on success, or a negative error code on failure.
+ */
+static int __init init_pm_qos_resume_latency_us_setup(void)
+{
+ char *token, *cmd = pm_qos_resume_latency_cmdline;
+ struct pm_qos_boot_entry *entry, *tentry;
+ cpumask_var_t covered;
+
+ if (!zalloc_cpumask_var(&covered, GFP_KERNEL)) {
+ pr_warn("pm_qos: Failed to allocate memory for parsing boot parameter\n");
+ return -ENOMEM;
+ }
+
+ while ((token = strsep(&cmd, ",")) != NULL) {
+ char *str_range, *str_val;
+
+ str_range = strsep(&token, ":");
+ str_val = token;
+
+ if (!str_val) {
+ pr_warn("pm_qos: Missing value range %s\n",
+ str_range);
+ continue;
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ pr_warn("pm_qos: Failed to allocate memory for boot entry\n");
+ goto cleanup;
+ }
+
+ if (cpulist_parse(str_range, &entry->mask)) {
+ pr_warn("pm_qos: Failed to parse cpulist range %s\n",
+ str_range);
+ kfree(entry);
+ continue;
+ }
+
+ cpumask_andnot(&entry->mask, &entry->mask, covered);
+ if (cpumask_empty(&entry->mask)) {
+ pr_warn("pm_qos: Entry %s already covered, ignoring\n",
+ str_range);
+ kfree(entry);
+ continue;
+ }
+ cpumask_or(covered, covered, &entry->mask);
+
+ if (kstrtos32(str_val, 0, &entry->latency)) {
+ pr_warn("pm_qos: Invalid latency requirement value %s\n",
+ str_val);
+ kfree(entry);
+ continue;
+ }
+
+ if (entry->latency < 0) {
+ pr_warn("pm_qos: Latency requirement cannot be negative: %d\n",
+ entry->latency);
+ kfree(entry);
+ continue;
+ }
+
+ list_add_tail(&entry->node, &pm_qos_boot_list);
+ }
+
+ free_cpumask_var(covered);
+ return 0;
+
+cleanup:
+ list_for_each_entry_safe(entry, tentry, &pm_qos_boot_list, node) {
+ list_del(&entry->node);
+ kfree(entry);
+ }
+
+ free_cpumask_var(covered);
+ return 0;
+}
+early_initcall(init_pm_qos_resume_latency_us_setup);
+
+/**
+ * pm_qos_get_boot_cpu_latency_limit - Get boot-time latency limit for a CPU.
+ * @cpu: Logical CPU number to check.
+ *
+ * Checks the read-only boot-time constraints list to see if a specific
+ * PM QoS latency override was requested for this CPU via the kernel
+ * command line.
+ *
+ * Return: The latency limit in microseconds if a constraint exists,
+ * or PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if no boot override applies.
+ */
+s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
+{
+ struct pm_qos_boot_entry *entry;
+
+ if (list_empty(&pm_qos_boot_list))
+ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+
+ list_for_each_entry(entry, &pm_qos_boot_list, node) {
+ if (cpumask_test_cpu(cpu, &entry->mask))
+ return entry->latency;
+ }
+
+ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+}
+EXPORT_SYMBOL_GPL(pm_qos_get_boot_cpu_latency_limit);
+
#ifdef CONFIG_CPU_IDLE
/* Definitions related to the CPU latency QoS. */
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 0/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
2026-01-28 3:31 [v2 PATCH 0/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us Aaron Tomlin
2026-01-28 3:31 ` [v2 PATCH 1/1] " Aaron Tomlin
@ 2026-02-08 21:46 ` Aaron Tomlin
1 sibling, 0 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-02-08 21:46 UTC (permalink / raw)
To: rafael, dakr, pavel, lenb
Cc: akpm, bp, pmladek, rdunlap, feng.tang, pawan.kumar.gupta, kees,
elver, arnd, fvdl, lirongqing, bhelgaas, neelx, sean, mproche,
chjohnst, nick.lange, linux-kernel, linux-pm, linux-doc
[-- Attachment #1: Type: text/plain, Size: 649 bytes --]
On Tue, Jan 27, 2026 at 10:31:42PM -0500, Aaron Tomlin wrote:
> Hi Rafael, Danilo, Pavel, Len,
>
> Users currently lack a mechanism to define granular, per-CPU PM QoS
> resume latency constraints during the early boot phase.
Hi Rafael, Danilo, Pavel, Len,
I am writing to respectfully enquire about the status of this v2 patch
series, initially submitted on the 27th of January.
As it has been approximately two weeks since the submission. Please do let
me know if there are any outstanding concerns or if you require any further
modifications to the series or if you would like me to rebase.
Kind regards,
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
2026-01-28 3:31 ` [v2 PATCH 1/1] " Aaron Tomlin
@ 2026-03-04 2:02 ` Aaron Tomlin
2026-03-04 9:05 ` Zhongqiu Han
1 sibling, 0 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-03-04 2:02 UTC (permalink / raw)
To: rafael, rafael.j.wysocki, dakr, pavel, lenb
Cc: akpm, bp, pmladek, rdunlap, feng.tang, pawan.kumar.gupta, kees,
elver, arnd, fvdl, lirongqing, bhelgaas, neelx, sean, mproche,
chjohnst, nick.lange, linux-kernel, linux-pm, linux-doc
On Tue, Jan 27, 2026 at 10:31:43PM -0500, Aaron Tomlin wrote:
> This patch introduces the pm_qos_resume_latency_us kernel boot
> parameter, which allows users to specify distinct resume latency
> constraints for specific CPU ranges.
>
> Syntax: pm_qos_resume_latency_us=range:value,range:value...
Hi Rafael, Danilo, Pavel, Len,
A gentle ping on this v2 patch. Now that the v7.0 merge window has closed,
I was hoping to see if there is any further feedback on this approach to
introducing the pm_qos_resume_latency_us= boot parameter.
Please let me know if any further adjustments are required, or if you need
me to rebase this against the latest power management tree.
Kind regards,
--
Aaron Tomlin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
2026-01-28 3:31 ` [v2 PATCH 1/1] " Aaron Tomlin
2026-03-04 2:02 ` Aaron Tomlin
@ 2026-03-04 9:05 ` Zhongqiu Han
2026-03-06 2:35 ` Aaron Tomlin
1 sibling, 1 reply; 6+ messages in thread
From: Zhongqiu Han @ 2026-03-04 9:05 UTC (permalink / raw)
To: Aaron Tomlin, rafael, dakr, pavel, lenb
Cc: akpm, bp, pmladek, rdunlap, feng.tang, pawan.kumar.gupta, kees,
elver, arnd, fvdl, lirongqing, bhelgaas, neelx, sean, mproche,
chjohnst, nick.lange, linux-kernel, linux-pm, linux-doc,
zhongqiu.han
On 1/28/2026 11:31 AM, Aaron Tomlin wrote:
> Users currently lack a mechanism to define granular, per-CPU PM QoS
> resume latency constraints during the early boot phase.
>
> While the idle=poll boot parameter exists, it enforces a global
> override, forcing all CPUs in the system to "poll". This global approach
> is not suitable for asymmetric workloads where strict latency guarantees
> are required only on specific critical CPUs, while housekeeping or
> non-critical CPUs should be allowed to enter deeper idle states to save
> energy.
>
> Additionally, the existing sysfs interface
> (/sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us) becomes
> available only after userspace initialisation. This is too late to
> prevent deep C-state entry during the early kernel boot phase, which may
> be required for debugging early boot hangs related to C-state
> transitions or for workloads requiring strict latency guarantees
> immediately upon system start.
>
> This patch introduces the pm_qos_resume_latency_us kernel boot
> parameter, which allows users to specify distinct resume latency
> constraints for specific CPU ranges.
Hello Aaron,
Therefore, once a PM QoS constraint is set via boot parameters, it
cannot be relaxed afterward by any means, except by applying a stricter
constraint, right?
>
> Syntax: pm_qos_resume_latency_us=range:value,range:value...
>
> Unlike the sysfs interface which accepts the special string "n/a" to
> remove a constraint, this boot parameter strictly requires integer
> values. The special value "n/a" is not supported; the integer 0 must be
> used to represent a 0 us latency constraint (polling).
>
> For example:
>
> "pm_qos_resume_latency_us=0:0,1-15:20"
>
> Forces CPU 0 to poll on idle; constrains CPUs 1-15 to not enter a sleep
> state that takes longer than 20 us to wake up. All other CPUs will have
> the default (no resume latency) applied.
>
> Implementation Details:
>
> - The parameter string is captured via __setup() and parsed in
> an early_initcall() to ensure suitable memory allocators are
> available.
>
> - Constraints are stored in a read-only linked list.
>
> - The constraints are queried and applied in register_cpu().
> This ensures the latency requirement is active immediately
> upon CPU registration, effectively acting as a "birth"
> constraint before the cpuidle governor takes over.
>
> - The parsing logic enforces a "First Match Wins" policy: if a
> CPU falls into multiple specified ranges, the latency value
> from the first matching entry is used.
May I know would it be more reasonable to apply the minimum constraint?
>
> - The constraints persist across CPU hotplug events.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> .../admin-guide/kernel-parameters.txt | 23 +++
> drivers/base/cpu.c | 5 +-
> include/linux/pm_qos.h | 5 +
> kernel/power/qos.c | 141 ++++++++++++++++++
> 4 files changed, 172 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6a3d6bd0746c..afba39ecfdee 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2238,6 +2238,29 @@ Kernel parameters
> icn= [HW,ISDN]
> Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
>
> + pm_qos_resume_latency_us= [KNL,EARLY]
> + Format: <cpu-list>:<value>[,<cpu-list>:<value>...]
> +
> + Establish per-CPU resume latency constraints. These constraints
> + are applied immediately upon CPU registration and persist
> + across CPU hotplug events.
> +
> + For example:
> + "pm_qos_resume_latency_us=0:0,1-15:20"
> +
> + This restricts CPU 0 to a 0us resume latency (effectively
> + forcing polling) and limits CPUs 1-15 to C-states with a
> + maximum exit latency of 20us. All other CPUs remain
> + unconstrained by this parameter.
> +
> + Unlike the sysfs interface, which accepts the string "n/a" to
> + remove a constraint, this boot parameter strictly requires
> + integer values. To specify a 0us latency constraint (polling),
> + the integer 0 must be used.
> +
> + NOTE: The parsing logic enforces a "First Match Wins" policy.
> + If a CPU is included in multiple specified ranges, the latency
> + value from the first matching entry takes precedence.
>
> idle= [X86,EARLY]
> Format: idle=poll, idle=halt, idle=nomwait
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index c6c57b6f61c6..1dea5bcd76a0 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -416,6 +416,7 @@ EXPORT_SYMBOL_GPL(cpu_subsys);
> int register_cpu(struct cpu *cpu, int num)
> {
> int error;
> + s32 resume_latency;
>
> cpu->node_id = cpu_to_node(num);
> memset(&cpu->dev, 0x00, sizeof(struct device));
> @@ -436,8 +437,8 @@ int register_cpu(struct cpu *cpu, int num)
>
> per_cpu(cpu_sys_devices, num) = &cpu->dev;
> register_cpu_under_node(num, cpu_to_node(num));
> - dev_pm_qos_expose_latency_limit(&cpu->dev,
> - PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> + resume_latency = pm_qos_get_boot_cpu_latency_limit(num);
> + dev_pm_qos_expose_latency_limit(&cpu->dev, resume_latency);
> set_cpu_enabled(num, true);
>
> return 0;
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index 6cea4455f867..556a7dff1419 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -174,6 +174,7 @@ static inline s32 cpu_wakeup_latency_qos_limit(void)
> #ifdef CONFIG_PM
> enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask);
> enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask);
> +s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu);
> s32 __dev_pm_qos_resume_latency(struct device *dev);
> s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type);
> int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> @@ -218,6 +219,10 @@ static inline s32 dev_pm_qos_raw_resume_latency(struct device *dev)
> pm_qos_read_value(&dev->power.qos->resume_latency);
> }
> #else
> +static inline s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
> +{
> + return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +}
> static inline enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev,
> s32 mask)
> { return PM_QOS_FLAGS_UNDEFINED; }
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index f7d8064e9adc..e23223e3c7e8 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -34,6 +34,11 @@
> #include <linux/kernel.h>
> #include <linux/debugfs.h>
> #include <linux/seq_file.h>
> +#include <linux/cpumask.h>
> +#include <linux/cpu.h>
> +#include <linux/list.h>
> +
> +#include <asm/setup.h>
Including <asm/setup.h> in generic PM/QoS code seems not appropriate,
and pulling in an arch specific header creates unnecessary coupling and
potential portability issues.
>
> #include <linux/uaccess.h>
> #include <linux/export.h>
> @@ -46,6 +51,10 @@
> */
> static DEFINE_SPINLOCK(pm_qos_lock);
>
> +static LIST_HEAD(pm_qos_boot_list);
> +
> +static char pm_qos_resume_latency_cmdline[COMMAND_LINE_SIZE] __initdata;
> +
> /**
> * pm_qos_read_value - Return the current effective constraint value.
> * @c: List of PM QoS constraint requests.
> @@ -209,6 +218,138 @@ bool pm_qos_update_flags(struct pm_qos_flags *pqf,
> return prev_value != curr_value;
> }
>
> +struct pm_qos_boot_entry {
> + struct list_head node;
How about to use array instead of list to optimize lookup and cache
hitting? but array may use more memory.
> + struct cpumask mask;
> + s32 latency;
> +};
> +
> +static int __init pm_qos_resume_latency_us_setup(char *str)
> +{
> + strscpy(pm_qos_resume_latency_cmdline, str,
> + sizeof(pm_qos_resume_latency_cmdline));
> + return 1;
> +}
> +__setup("pm_qos_resume_latency_us=", pm_qos_resume_latency_us_setup);
> +
> +/* init_pm_qos_resume_latency_us_setup - Parse the pm_qos_resume_latency_us boot parameter.
Nit: style issue, /**
> + *
> + * Parses the kernel command line option "pm_qos_resume_resume_latency_us=" to establish
pm_qos_resume_resume_latency_us typo?
> + * per-CPU resume latency constraints. These constraints are applied
> + * immediately when a CPU is registered.
> + *
> + * Syntax: pm_qos_resume_latency_us=<cpu-list>:<value>[,<cpu-list>:<value>...]
> + * Example: pm_qos_resume_latency_us=0-3:0,4-7:20
> + *
> + * The parsing logic enforces a "First Match Wins" policy. If a CPU is
> + * covered by multiple entries in the list, only the first valid entry
> + * applies. Any subsequent overlapping ranges for that CPU are ignored.
> + *
> + * Return: 0 on success, or a negative error code on failure.
> + */
> +static int __init init_pm_qos_resume_latency_us_setup(void)
> +{
> + char *token, *cmd = pm_qos_resume_latency_cmdline;
> + struct pm_qos_boot_entry *entry, *tentry;
> + cpumask_var_t covered;
> +
> + if (!zalloc_cpumask_var(&covered, GFP_KERNEL)) {
> + pr_warn("pm_qos: Failed to allocate memory for parsing boot parameter\n");
> + return -ENOMEM;
> + }
> +
> + while ((token = strsep(&cmd, ",")) != NULL) {
> + char *str_range, *str_val;
> +
> + str_range = strsep(&token, ":");
> + str_val = token;
> +
> + if (!str_val) {
> + pr_warn("pm_qos: Missing value range %s\n",
> + str_range);
> + continue;
> + }
> +
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry) {
> + pr_warn("pm_qos: Failed to allocate memory for boot entry\n");
> + goto cleanup;
> + }
> +
> + if (cpulist_parse(str_range, &entry->mask)) {
> + pr_warn("pm_qos: Failed to parse cpulist range %s\n",
> + str_range);
> + kfree(entry);
> + continue;
> + }
> +
> + cpumask_andnot(&entry->mask, &entry->mask, covered);
> + if (cpumask_empty(&entry->mask)) {
> + pr_warn("pm_qos: Entry %s already covered, ignoring\n",
> + str_range);
> + kfree(entry);
> + continue;
> + }
> + cpumask_or(covered, covered, &entry->mask);
> +
> + if (kstrtos32(str_val, 0, &entry->latency)) {
> + pr_warn("pm_qos: Invalid latency requirement value %s\n",
> + str_val);
> + kfree(entry);
> + continue;
> + }
> +
> + if (entry->latency < 0) {
> + pr_warn("pm_qos: Latency requirement cannot be negative: %d\n",
> + entry->latency);
Nit: It would be cleaner to use pr_fmt() for the log prefix rather than
embedding "pm_qos:" in every message such as:
#define pr_fmt(fmt) "pm_qos_boot: " fmt
> + kfree(entry);
> + continue;
> + }
> +
> + list_add_tail(&entry->node, &pm_qos_boot_list);
I saw there is no protect for pm_qos_boot_list, my understanding is that
during early boot, pm_qos_boot_list is fully initialized before any
other register_cpu() calls. For CPU hotplug, there only reads the list
and it’s never modified afterward, so there shouldn’t be a race.
> + }
> +
> + free_cpumask_var(covered);
> + return 0;
> +
> +cleanup:
> + list_for_each_entry_safe(entry, tentry, &pm_qos_boot_list, node) {
> + list_del(&entry->node);
> + kfree(entry);
> + }
> +
> + free_cpumask_var(covered);
> + return 0;
The function comment says "return a negative error code on failure",
but in the cleanup context the function unconditionally returns 0.
should it be:
set "ret = -ENOMEM;" before goto cleanup and then just return ret?
> +}
> +early_initcall(init_pm_qos_resume_latency_us_setup);
> +
> +/**
> + * pm_qos_get_boot_cpu_latency_limit - Get boot-time latency limit for a CPU.
> + * @cpu: Logical CPU number to check.
> + *
> + * Checks the read-only boot-time constraints list to see if a specific
> + * PM QoS latency override was requested for this CPU via the kernel
> + * command line.
> + *
> + * Return: The latency limit in microseconds if a constraint exists,
> + * or PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if no boot override applies.
> + */
> +s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
> +{
> + struct pm_qos_boot_entry *entry;
> +
> + if (list_empty(&pm_qos_boot_list))
> + return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +
> + list_for_each_entry(entry, &pm_qos_boot_list, node) {
> + if (cpumask_test_cpu(cpu, &entry->mask))
> + return entry->latency;
> + }
> +
> + return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +}
> +EXPORT_SYMBOL_GPL(pm_qos_get_boot_cpu_latency_limit);
> +
> #ifdef CONFIG_CPU_IDLE
> /* Definitions related to the CPU latency QoS. */
>
--
Thx and BRs,
Zhongqiu Han
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
2026-03-04 9:05 ` Zhongqiu Han
@ 2026-03-06 2:35 ` Aaron Tomlin
0 siblings, 0 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-03-06 2:35 UTC (permalink / raw)
To: Zhongqiu Han
Cc: rafael, dakr, pavel, lenb, akpm, bp, pmladek, rdunlap, feng.tang,
pawan.kumar.gupta, kees, elver, arnd, fvdl, lirongqing, bhelgaas,
neelx, sean, mproche, chjohnst, nick.lange, linux-kernel,
linux-pm, linux-doc
[-- Attachment #1: Type: text/plain, Size: 5815 bytes --]
On Wed, Mar 04, 2026 at 05:05:13PM +0800, Zhongqiu Han wrote:
> Hello Aaron,
>
> Therefore, once a PM QoS constraint is set via boot parameters, it
> cannot be relaxed afterward by any means, except by applying a stricter
> constraint, right?
Hi Zhongqiu,
Thank you for your feedback.
This is a perfectly logical assumption, but it is actually incorrect.
The boot parameter serves only to establish the initial latency limit when
dev_pm_qos_expose_latency_limit() is invoked during CPU registration.
Once the system has booted and userspace is fully initialised, an
administrator retains full control via the standard sysfs interface
(i.e., /sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us).
Writing a numerical value, or the special string "n/a", to this sysfs node
will successfully relax or completely remove the constraint. The boot
parameter acts as a crucial "birth constraint" to protect early boot
phases, rather than a permanent, immutable ceiling.
> > - The parsing logic enforces a "First Match Wins" policy: if a
> > CPU falls into multiple specified ranges, the latency value
> > from the first matching entry is used.
>
> May I know would it be more reasonable to apply the minimum constraint?
Enforcing a "First Match Wins" parsing policy keeps the early-boot logic
deliberately simple, predictable, and deterministic. It avoids the
computational overhead of iteratively resolving and comparing overlapping
cpumask conflicts during a sensitive phase of the kernel boot process.
Given that this parameter is intended for deliberate, static configuration,
I favoured parsing simplicity. That being said, if there is a strong
consensus that evaluating for the minimum constraint provides a far more
intuitive user experience, I am certainly open to revising this logic in
v3.
> > +#include <linux/list.h>
> > +
> > +#include <asm/setup.h>
>
> Including <asm/setup.h> in generic PM/QoS code seems not appropriate,
> and pulling in an arch specific header creates unnecessary coupling and
> potential portability issues.
I completely agree with your assessment. Introducing an
architecture-specific header into generic power management code is poor
practice. I will refactor the parameter handling in the next iteration to
eliminate the reliance on COMMAND_LINE_SIZE entirely. Likely by retaining a
pointer to the original command-line string or dynamically duplicating the
string. Thus cleanly removing the need for <asm/setup.h>.
> > +struct pm_qos_boot_entry {
> > + struct list_head node;
>
> How about to use array instead of list to optimize lookup and cache
> hitting? but array may use more memory.
While an array would theoretically improve spatial locality and cache hit
rates, we anticipate the number of boot-time entries provided by users to
be exceptionally small (typically just one or two discrete ranges).
For such a diminutive dataset, the overhead of traversing a linked list
during CPU registration is virtually immeasurable. Utilising a list allows
the code to gracefully accommodate any number of user-defined ranges
without imposing arbitrary hard limits or engineering complex dynamic array
reallocations. Consequently, I believe the linked list remains the most
pragmatic and robust choice for this specific scenario.
> > +__setup("pm_qos_resume_latency_us=", pm_qos_resume_latency_us_setup);
> > +
> > +/* init_pm_qos_resume_latency_us_setup - Parse the pm_qos_resume_latency_us boot parameter.
>
> Nit: style issue, /**
Acknowledged.
> > + if (entry->latency < 0) {
> > + pr_warn("pm_qos: Latency requirement cannot be negative: %d\n",
> > + entry->latency);
>
> Nit: It would be cleaner to use pr_fmt() for the log prefix rather than
> embedding "pm_qos:" in every message such as:
>
> #define pr_fmt(fmt) "pm_qos_boot: " fmt
Thank you for pointing this out. That is an elegant and standard approach.
I shall incorporate the pr_fmt() macro definition to ensure the logging
remains tidy and consistent.
> > + list_add_tail(&entry->node, &pm_qos_boot_list);
>
> I saw there is no protect for pm_qos_boot_list, my understanding is that
> during early boot, pm_qos_boot_list is fully initialized before any
> other register_cpu() calls. For CPU hotplug, there only reads the list
> and it’s never modified afterward, so there shouldn’t be a race.
Your understanding is perfectly correct. The pm_qos_boot_list is populated
entirely within an early_initcall(). At this specific stage of the boot
sequence, execution is strictly single-threaded on the boot CPU, occurring
well before SMP initialisation or any secondary CPU bring-up (and
consequently, before register_cpu() is invoked for non-boot CPUs).
Once this early initialisation phase concludes, the list becomes strictly
read-only for the remainder of the system's uptime. Therefore, lockless
traversal during CPU registration and subsequent CPU hotplug events is
entirely safe. Introducing a lock or RCU protection here would merely add
unnecessary overhead to a static data structure.
> > + free_cpumask_var(covered);
> > + return 0;
>
> The function comment says "return a negative error code on failure",
> but in the cleanup context the function unconditionally returns 0.
> should it be:
>
> set "ret = -ENOMEM;" before goto cleanup and then just return ret?
Thank you for catching this oversight; you are absolutely right. The
cleanup path erroneously returns 0 despite the function's documentation
explicitly stating that a negative error code should be returned upon
failure.
Your suggested approach is exactly how this should be handled. This logic
will be rectified in the v3 patch.
Kind regards,
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-06 2:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-28 3:31 [v2 PATCH 0/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us Aaron Tomlin
2026-01-28 3:31 ` [v2 PATCH 1/1] " Aaron Tomlin
2026-03-04 2:02 ` Aaron Tomlin
2026-03-04 9:05 ` Zhongqiu Han
2026-03-06 2:35 ` Aaron Tomlin
2026-02-08 21:46 ` [v2 PATCH 0/1] " Aaron Tomlin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox