* [PATCH v5] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
@ 2026-04-26 16:01 Aaron Tomlin
0 siblings, 0 replies; only message in thread
From: Aaron Tomlin @ 2026-04-26 16:01 UTC (permalink / raw)
To: rafael, dakr, pavel, lenb
Cc: zhongqiu.han, akpm, bp, pmladek, rdunlap, feng.tang,
pawan.kumar.gupta, kees, elver, arnd, fvdl, lirongqing, bhelgaas,
neelx, sean, mproche, chjohnst, nick.lange, linux-kernel,
linux-pm, linux-doc
Hi Rafael, Danilo, Pavel, Len,
Users currently lack a mechanism to define granular, per-CPU PM QoS
resume latency constraints during the early boot phase.
While the idle=poll boot parameter exists, it enforces a global
override, forcing all CPUs in the system to "poll". This global approach
is not suitable for asymmetric workloads where strict latency guarantees
are required only on specific critical CPUs, while housekeeping or
non-critical CPUs should be allowed to enter deeper idle states to save
energy.
Additionally, the existing sysfs interface
(/sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us) becomes
available only after userspace initialisation. This is too late to
prevent deep C-state entry during the early kernel boot phase, which may
be required for debugging early boot hangs related to C-state
transitions or for workloads requiring strict latency guarantees
immediately upon system start.
This patch introduces the pm_qos_resume_latency_us kernel boot
parameter, which allows users to specify distinct resume latency
constraints for specific CPU ranges.
Syntax: pm_qos_resume_latency_us=range:value;range:value...
This boot parameter mirrors the sysfs interface behaviour: the special
string "n/a" imposes a 0us latency constraint (polling), while the
integer 0 removes the constraint entirely.
For example:
"pm_qos_resume_latency_us=0:n/a;1-15:20"
Forces CPU 0 to poll on idle; constrains CPUs 1-15 to not enter a sleep
state that takes longer than 20 us to wake up. All other CPUs will have
the default (no resume latency) applied.
Implementation Details:
- The parameter string is captured via __setup() and parsed in
an early_initcall() to ensure suitable memory allocators are
available.
- Constraints are stored in a read-only linked list.
- The constraints are queried and applied in register_cpu().
This ensures the latency requirement is active immediately
upon CPU registration, effectively acting as a "birth"
constraint before the cpuidle governor takes over.
- The parsing logic enforces a "First Match Wins" policy: if a
CPU falls into multiple specified ranges, the latency value
from the first matching entry is used.
- The constraints persist across CPU hotplug events.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
Changes since v4 [1]:
- Modified the parsing logic so the boot parameter perfectly mirrors the
existing sysfs interface. Passing the integer 0 now explicitly removes
the constraint (i.e., maps to PM_QOS_RESUME_LATENCY_NO_CONSTRAINT), and
the special string "n/a" safely imposes a 0us latency constraint
(polling on idle)
- Changed the outer tuple delimiter from a comma (",") to a semicolon
(";"). This prevents strsep() from breaking standard, non-contiguous
CPU lists (e.g., "0,2:n/a;4-7:20")
- Moved the cpumask_or() coverage update to the end of the parsing loop
- Updated documentation to reflect the new semicolon delimiter and updated
semantic behavior in Documentation/admin-guide/kernel-parameters.txt
Changes since v3 [2]:
- Moved pm_qos_get_boot_cpu_latency_limit() declaration out of the
CONFIG_PM #ifdef block, as qos.c is compiled regardless
Changes since v2 [3]:
- Add pr_fmt() to standardise log prefixes (Zhongqiu Han)
- Drop <asm/setup.h> by duplicating the command line with kstrdup()
(Zhongqiu Han)
- Fix init_pm_qos_resume_latency_us_setup() error path to return -ENOMEM
(Zhongqiu Han)
Changes since v1 [4]:
- Removed boot_option_idle_override == IDLE_POLL check
- Decoupled implementation from CONFIG_CPU_IDLE
- Added kernel-parameters.txt documentation
- Renamed internal setup functions for consistency
[1]: https://lore.kernel.org/lkml/20260308190421.46657-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20260307200736.4192234-1-atomlin@atomlin.com/
[3]: https://lore.kernel.org/lkml/20260128033143.3456074-2-atomlin@atomlin.com/
[4]: https://lore.kernel.org/lkml/20260123010024.3301276-1-atomlin@atomlin.com/
.../admin-guide/kernel-parameters.txt | 22 +++
drivers/base/cpu.c | 5 +-
include/linux/pm_qos.h | 1 +
kernel/power/qos.c | 153 ++++++++++++++++++
4 files changed, 179 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6a3d6bd0746c..1beb4f82e038 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2238,6 +2238,28 @@ Kernel parameters
icn= [HW,ISDN]
Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
+ pm_qos_resume_latency_us= [KNL,EARLY]
+ Format: <cpu-list>:<value>[;<cpu-list>:<value>...]
+
+ Establish per-CPU resume latency constraints. These constraints
+ are applied immediately upon CPU registration and persist
+ across CPU hotplug events.
+
+ For example:
+ "pm_qos_resume_latency_us=0:n/a;1-15:20"
+
+ This restricts CPU 0 to a 0us resume latency (effectively
+ forcing polling) and limits CPUs 1-15 to C-states with a
+ maximum exit latency of 20us. All other CPUs remain
+ unconstrained by this parameter.
+
+ This boot parameter mirrors the sysfs interface behaviour.
+ The special string "n/a" imposes a 0us latency constraint
+ (polling), while the integer 0 removes the constraint.
+
+ NOTE: The parsing logic enforces a "First Match Wins" policy.
+ If a CPU is included in multiple specified ranges, the latency
+ value from the first matching entry takes precedence.
idle= [X86,EARLY]
Format: idle=poll, idle=halt, idle=nomwait
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index c6c57b6f61c6..1dea5bcd76a0 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -416,6 +416,7 @@ EXPORT_SYMBOL_GPL(cpu_subsys);
int register_cpu(struct cpu *cpu, int num)
{
int error;
+ s32 resume_latency;
cpu->node_id = cpu_to_node(num);
memset(&cpu->dev, 0x00, sizeof(struct device));
@@ -436,8 +437,8 @@ int register_cpu(struct cpu *cpu, int num)
per_cpu(cpu_sys_devices, num) = &cpu->dev;
register_cpu_under_node(num, cpu_to_node(num));
- dev_pm_qos_expose_latency_limit(&cpu->dev,
- PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
+ resume_latency = pm_qos_get_boot_cpu_latency_limit(num);
+ dev_pm_qos_expose_latency_limit(&cpu->dev, resume_latency);
set_cpu_enabled(num, true);
return 0;
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 6cea4455f867..65ce276282e8 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -142,6 +142,7 @@ int pm_qos_update_target(struct pm_qos_constraints *c, struct plist_node *node,
bool pm_qos_update_flags(struct pm_qos_flags *pqf,
struct pm_qos_flags_request *req,
enum pm_qos_req_action action, s32 val);
+s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu);
#ifdef CONFIG_CPU_IDLE
s32 cpu_latency_qos_limit(void);
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index f7d8064e9adc..1c854e02ada0 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -18,6 +18,8 @@
* global CPU latency QoS requests and frequency QoS requests are provided.
*/
+#define pr_fmt(fmt) "pm_qos: " fmt
+
/*#define DEBUG*/
#include <linux/pm_qos.h>
@@ -34,6 +36,9 @@
#include <linux/kernel.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
+#include <linux/cpumask.h>
+#include <linux/cpu.h>
+#include <linux/list.h>
#include <linux/uaccess.h>
#include <linux/export.h>
@@ -209,6 +214,154 @@ bool pm_qos_update_flags(struct pm_qos_flags *pqf,
return prev_value != curr_value;
}
+static LIST_HEAD(pm_qos_boot_list);
+static char *pm_qos_resume_latency_cmdline __initdata;
+
+struct pm_qos_boot_entry {
+ struct list_head node;
+ struct cpumask mask;
+ s32 latency;
+};
+
+static int __init pm_qos_resume_latency_us_setup(char *str)
+{
+ pm_qos_resume_latency_cmdline = str;
+ return 1;
+}
+__setup("pm_qos_resume_latency_us=", pm_qos_resume_latency_us_setup);
+
+/**
+ * init_pm_qos_resume_latency_us_setup - Parse the pm_qos_resume_latency_us boot parameter.
+ *
+ * Parses the kernel command line option "pm_qos_resume_latency_us=" to establish
+ * per-CPU resume latency constraints. These constraints are applied
+ * immediately when a CPU is registered.
+ *
+ * Syntax: pm_qos_resume_latency_us=<cpu-list>:<value>[;<cpu-list>:<value>...]
+ * Example: pm_qos_resume_latency_us=0-3:n/a;4-7:20
+ *
+ * The parsing logic enforces a "First Match Wins" policy. If a CPU is
+ * covered by multiple entries in the list, only the first valid entry
+ * applies. Any subsequent overlapping ranges for that CPU are ignored.
+ *
+ * Return: 0 on success, or a negative error code on failure.
+ */
+static int __init init_pm_qos_resume_latency_us_setup(void)
+{
+ char *token, *cmd, *cmd_copy;
+ struct pm_qos_boot_entry *entry, *tentry;
+ cpumask_var_t covered;
+ int ret = 0;
+
+ if (!pm_qos_resume_latency_cmdline)
+ return 0;
+
+ cmd_copy = kstrdup(pm_qos_resume_latency_cmdline, GFP_KERNEL);
+ if (!cmd_copy)
+ return -ENOMEM;
+
+ if (!zalloc_cpumask_var(&covered, GFP_KERNEL)) {
+ pr_warn("Failed to allocate memory for parsing boot parameter\n");
+ ret = -ENOMEM;
+ goto free_cmd_copy;
+ }
+
+ cmd = cmd_copy;
+ while ((token = strsep(&cmd, ";")) != NULL) {
+ char *str_range, *str_val;
+
+ str_range = strsep(&token, ":");
+ str_val = token;
+
+ if (!str_val) {
+ pr_warn("Missing value range %s\n", str_range);
+ continue;
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ pr_warn("Failed to allocate memory for boot entry\n");
+ ret = -ENOMEM;
+ goto cleanup;
+ }
+
+ if (cpulist_parse(str_range, &entry->mask)) {
+ pr_warn("Failed to parse cpulist range %s\n", str_range);
+ kfree(entry);
+ continue;
+ }
+
+ cpumask_andnot(&entry->mask, &entry->mask, covered);
+ if (cpumask_empty(&entry->mask)) {
+ pr_warn("Entry %s already covered, ignoring\n", str_range);
+ kfree(entry);
+ continue;
+ }
+
+ if (!strcmp(str_val, "n/a")) {
+ entry->latency = 0;
+ } else if (kstrtos32(str_val, 0, &entry->latency)) {
+ pr_warn("Invalid latency requirement value %s\n", str_val);
+ kfree(entry);
+ continue;
+ } else if (entry->latency == 0) {
+ entry->latency = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+ }
+
+ if (entry->latency < 0) {
+ pr_warn("Latency requirement cannot be negative: %d\n", entry->latency);
+ kfree(entry);
+ continue;
+ }
+
+ cpumask_or(covered, covered, &entry->mask);
+
+ list_add_tail(&entry->node, &pm_qos_boot_list);
+ }
+
+ free_cpumask_var(covered);
+ kfree(cmd_copy);
+ return 0;
+
+cleanup:
+ list_for_each_entry_safe(entry, tentry, &pm_qos_boot_list, node) {
+ list_del(&entry->node);
+ kfree(entry);
+ }
+ free_cpumask_var(covered);
+free_cmd_copy:
+ kfree(cmd_copy);
+ return ret;
+}
+early_initcall(init_pm_qos_resume_latency_us_setup);
+
+/**
+ * pm_qos_get_boot_cpu_latency_limit - Get boot-time latency limit for a CPU.
+ * @cpu: Logical CPU number to check.
+ *
+ * Checks the read-only boot-time constraints list to see if a specific
+ * PM QoS latency override was requested for this CPU via the kernel
+ * command line.
+ *
+ * Return: The latency limit in microseconds if a constraint exists,
+ * or PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if no boot override applies.
+ */
+s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
+{
+ struct pm_qos_boot_entry *entry;
+
+ if (list_empty(&pm_qos_boot_list))
+ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+
+ list_for_each_entry(entry, &pm_qos_boot_list, node) {
+ if (cpumask_test_cpu(cpu, &entry->mask))
+ return entry->latency;
+ }
+
+ return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+}
+EXPORT_SYMBOL_GPL(pm_qos_get_boot_cpu_latency_limit);
+
#ifdef CONFIG_CPU_IDLE
/* Definitions related to the CPU latency QoS. */
--
2.51.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-04-26 16:01 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-26 16:01 [PATCH v5] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us Aaron Tomlin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox