Re: [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
To: Aaron Tomlin <atomlin@atomlin.com>,
	rafael@kernel.org, dakr@kernel.org, pavel@kernel.org,
	lenb@kernel.org
Cc: akpm@linux-foundation.org, bp@alien8.de, pmladek@suse.com,
	rdunlap@infradead.org, feng.tang@linux.alibaba.com,
	pawan.kumar.gupta@linux.intel.com, kees@kernel.org,
	elver@google.com, arnd@arndb.de, fvdl@google.com,
	lirongqing@baidu.com, bhelgaas@google.com, neelx@suse.com,
	sean@ashe.io, mproche@gmail.com, chjohnst@gmail.com,
	nick.lange@gmail.com, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org, linux-doc@vger.kernel.org,
	zhongqiu.han@oss.qualcomm.com
Subject: Re: [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us
Date: Wed, 4 Mar 2026 17:05:13 +0800	[thread overview]
Message-ID: <d19eed6d-48ca-4df2-9739-0455f47f1485@oss.qualcomm.com> (raw)
In-Reply-To: <20260128033143.3456074-2-atomlin@atomlin.com>

On 1/28/2026 11:31 AM, Aaron Tomlin wrote:
> Users currently lack a mechanism to define granular, per-CPU PM QoS
> resume latency constraints during the early boot phase.
> 
> While the idle=poll boot parameter exists, it enforces a global
> override, forcing all CPUs in the system to "poll". This global approach
> is not suitable for asymmetric workloads where strict latency guarantees
> are required only on specific critical CPUs, while housekeeping or
> non-critical CPUs should be allowed to enter deeper idle states to save
> energy.
> 
> Additionally, the existing sysfs interface
> (/sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us) becomes
> available only after userspace initialisation. This is too late to
> prevent deep C-state entry during the early kernel boot phase, which may
> be required for debugging early boot hangs related to C-state
> transitions or for workloads requiring strict latency guarantees
> immediately upon system start.
> 
> This patch introduces the pm_qos_resume_latency_us kernel boot
> parameter, which allows users to specify distinct resume latency
> constraints for specific CPU ranges.

Hello Aaron,

Therefore, once a PM QoS constraint is set via boot parameters, it
cannot be relaxed afterward by any means, except by applying a stricter
constraint, right?

> 
> 	Syntax: pm_qos_resume_latency_us=range:value,range:value...
> 
> Unlike the sysfs interface which accepts the special string "n/a" to
> remove a constraint, this boot parameter strictly requires integer
> values. The special value "n/a" is not supported; the integer 0 must be
> used to represent a 0 us latency constraint (polling).
> 
> For example:
> 
> 	"pm_qos_resume_latency_us=0:0,1-15:20"
> 
> Forces CPU 0 to poll on idle; constrains CPUs 1-15 to not enter a sleep
> state that takes longer than 20 us to wake up. All other CPUs will have
> the default (no resume latency) applied.
> 
> Implementation Details:
> 
> 	- The parameter string is captured via __setup() and parsed in
> 	  an early_initcall() to ensure suitable memory allocators are
> 	  available.
> 
> 	- Constraints are stored in a read-only linked list.
> 
> 	- The constraints are queried and applied in register_cpu().
> 	  This ensures the latency requirement is active immediately
> 	  upon CPU registration, effectively acting as a "birth"
> 	  constraint before the cpuidle governor takes over.
> 
> 	- The parsing logic enforces a "First Match Wins" policy: if a
> 	  CPU falls into multiple specified ranges, the latency value
> 	  from the first matching entry is used.

May I know would it be more reasonable to apply the minimum constraint?

> 
> 	- The constraints persist across CPU hotplug events.
> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  23 +++
>   drivers/base/cpu.c                            |   5 +-
>   include/linux/pm_qos.h                        |   5 +
>   kernel/power/qos.c                            | 141 ++++++++++++++++++
>   4 files changed, 172 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6a3d6bd0746c..afba39ecfdee 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2238,6 +2238,29 @@ Kernel parameters
>   	icn=		[HW,ISDN]
>   			Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
>   
> +	pm_qos_resume_latency_us=	[KNL,EARLY]
> +			Format: <cpu-list>:<value>[,<cpu-list>:<value>...]
> +
> +			Establish per-CPU resume latency constraints. These constraints
> +			are applied immediately upon CPU registration and persist
> +			across CPU hotplug events.
> +
> +			For example:
> +				"pm_qos_resume_latency_us=0:0,1-15:20"
> +
> +			This restricts CPU 0 to a 0us resume latency (effectively
> +			forcing polling) and limits CPUs 1-15 to C-states with a
> +			maximum exit latency of 20us. All other CPUs remain
> +			unconstrained by this parameter.
> +
> +			Unlike the sysfs interface, which accepts the string "n/a" to
> +			remove a constraint, this boot parameter strictly requires
> +			integer values. To specify a 0us latency constraint (polling),
> +			the integer 0 must be used.
> +
> +			NOTE: The parsing logic enforces a "First Match Wins" policy.
> +			If a CPU is included in multiple specified ranges, the latency
> +			value from the first matching entry takes precedence.
>   
>   	idle=		[X86,EARLY]
>   			Format: idle=poll, idle=halt, idle=nomwait
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index c6c57b6f61c6..1dea5bcd76a0 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -416,6 +416,7 @@ EXPORT_SYMBOL_GPL(cpu_subsys);
>   int register_cpu(struct cpu *cpu, int num)
>   {
>   	int error;
> +	s32 resume_latency;
>   
>   	cpu->node_id = cpu_to_node(num);
>   	memset(&cpu->dev, 0x00, sizeof(struct device));
> @@ -436,8 +437,8 @@ int register_cpu(struct cpu *cpu, int num)
>   
>   	per_cpu(cpu_sys_devices, num) = &cpu->dev;
>   	register_cpu_under_node(num, cpu_to_node(num));
> -	dev_pm_qos_expose_latency_limit(&cpu->dev,
> -					PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> +	resume_latency = pm_qos_get_boot_cpu_latency_limit(num);
> +	dev_pm_qos_expose_latency_limit(&cpu->dev, resume_latency);
>   	set_cpu_enabled(num, true);
>   
>   	return 0;
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index 6cea4455f867..556a7dff1419 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -174,6 +174,7 @@ static inline s32 cpu_wakeup_latency_qos_limit(void)
>   #ifdef CONFIG_PM
>   enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask);
>   enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask);
> +s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu);
>   s32 __dev_pm_qos_resume_latency(struct device *dev);
>   s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type);
>   int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> @@ -218,6 +219,10 @@ static inline s32 dev_pm_qos_raw_resume_latency(struct device *dev)
>   		pm_qos_read_value(&dev->power.qos->resume_latency);
>   }
>   #else
> +static inline s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
> +{
> +	return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +}
>   static inline enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev,
>   							  s32 mask)
>   			{ return PM_QOS_FLAGS_UNDEFINED; }
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index f7d8064e9adc..e23223e3c7e8 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -34,6 +34,11 @@
>   #include <linux/kernel.h>
>   #include <linux/debugfs.h>
>   #include <linux/seq_file.h>
> +#include <linux/cpumask.h>
> +#include <linux/cpu.h>
> +#include <linux/list.h>
> +
> +#include <asm/setup.h>

Including <asm/setup.h> in generic PM/QoS code seems not appropriate,
and pulling in an arch specific header creates unnecessary coupling and
potential portability issues.

>   
>   #include <linux/uaccess.h>
>   #include <linux/export.h>
> @@ -46,6 +51,10 @@
>    */
>   static DEFINE_SPINLOCK(pm_qos_lock);
>   
> +static LIST_HEAD(pm_qos_boot_list);
> +
> +static char pm_qos_resume_latency_cmdline[COMMAND_LINE_SIZE] __initdata;
> +
>   /**
>    * pm_qos_read_value - Return the current effective constraint value.
>    * @c: List of PM QoS constraint requests.
> @@ -209,6 +218,138 @@ bool pm_qos_update_flags(struct pm_qos_flags *pqf,
>   	return prev_value != curr_value;
>   }
>   
> +struct pm_qos_boot_entry {
> +	struct list_head node;

How about to use array instead of list to optimize lookup and cache
hitting? but array may use more memory.

> +	struct cpumask mask;
> +	s32 latency;
> +};
> +
> +static int __init pm_qos_resume_latency_us_setup(char *str)
> +{
> +	strscpy(pm_qos_resume_latency_cmdline, str,
> +		sizeof(pm_qos_resume_latency_cmdline));
> +	return 1;
> +}
> +__setup("pm_qos_resume_latency_us=", pm_qos_resume_latency_us_setup);
> +
> +/* init_pm_qos_resume_latency_us_setup - Parse the pm_qos_resume_latency_us boot parameter.

Nit: style issue, /**

> + *
> + * Parses the kernel command line option "pm_qos_resume_resume_latency_us=" to establish

pm_qos_resume_resume_latency_us typo?

> + * per-CPU resume latency constraints. These constraints are applied
> + * immediately when a CPU is registered.
> + *
> + * Syntax: pm_qos_resume_latency_us=<cpu-list>:<value>[,<cpu-list>:<value>...]
> + * Example: pm_qos_resume_latency_us=0-3:0,4-7:20
> + *
> + * The parsing logic enforces a "First Match Wins" policy. If a CPU is
> + * covered by multiple entries in the list, only the first valid entry
> + * applies. Any subsequent overlapping ranges for that CPU are ignored.
> + *
> + * Return: 0 on success, or a negative error code on failure.
> + */
> +static int __init init_pm_qos_resume_latency_us_setup(void)
> +{
> +	char *token, *cmd = pm_qos_resume_latency_cmdline;
> +	struct pm_qos_boot_entry *entry, *tentry;
> +	cpumask_var_t covered;
> +
> +	if (!zalloc_cpumask_var(&covered, GFP_KERNEL)) {
> +		pr_warn("pm_qos: Failed to allocate memory for parsing boot parameter\n");
> +		return -ENOMEM;
> +	}
> +
> +	while ((token = strsep(&cmd, ",")) != NULL) {
> +		char *str_range, *str_val;
> +
> +		str_range = strsep(&token, ":");
> +		str_val = token;
> +
> +		if (!str_val) {
> +			pr_warn("pm_qos: Missing value range %s\n",
> +				str_range);
> +			continue;
> +		}
> +
> +		entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +		if (!entry) {
> +			pr_warn("pm_qos: Failed to allocate memory for boot entry\n");
> +			goto cleanup;
> +		}
> +
> +		if (cpulist_parse(str_range, &entry->mask)) {
> +			pr_warn("pm_qos: Failed to parse cpulist range %s\n",
> +				str_range);
> +			kfree(entry);
> +			continue;
> +		}
> +
> +		cpumask_andnot(&entry->mask, &entry->mask, covered);
> +		if (cpumask_empty(&entry->mask)) {
> +			pr_warn("pm_qos: Entry %s already covered, ignoring\n",
> +				str_range);
> +			kfree(entry);
> +			continue;
> +		}
> +		cpumask_or(covered, covered, &entry->mask);
> +
> +		if (kstrtos32(str_val, 0, &entry->latency)) {
> +			pr_warn("pm_qos: Invalid latency requirement value %s\n",
> +				str_val);
> +			kfree(entry);
> +			continue;
> +		}
> +
> +		if (entry->latency < 0) {
> +			pr_warn("pm_qos: Latency requirement cannot be negative: %d\n",
> +				entry->latency);

Nit: It would be cleaner to use pr_fmt() for the log prefix rather than
embedding "pm_qos:" in every message such as:

#define pr_fmt(fmt) "pm_qos_boot: " fmt


> +			kfree(entry);
> +			continue;
> +		}
> +
> +		list_add_tail(&entry->node, &pm_qos_boot_list);

I saw there is no protect for pm_qos_boot_list, my understanding is that
during early boot, pm_qos_boot_list is fully initialized before any
other register_cpu() calls. For CPU hotplug, there only reads the list
and it’s never modified afterward, so there shouldn’t be a race.


> +	}
> +
> +	free_cpumask_var(covered);
> +	return 0;
> +
> +cleanup:
> +	list_for_each_entry_safe(entry, tentry, &pm_qos_boot_list, node) {
> +		list_del(&entry->node);
> +		kfree(entry);
> +	}
> +
> +	free_cpumask_var(covered);
> +	return 0;

The function comment says "return a negative error code on failure",
but in the cleanup context the function unconditionally returns 0.
should it be:

set "ret = -ENOMEM;" before goto cleanup and then just return ret?


> +}
> +early_initcall(init_pm_qos_resume_latency_us_setup);
> +
> +/**
> + * pm_qos_get_boot_cpu_latency_limit - Get boot-time latency limit for a CPU.
> + * @cpu: Logical CPU number to check.
> + *
> + * Checks the read-only boot-time constraints list to see if a specific
> + * PM QoS latency override was requested for this CPU via the kernel
> + * command line.
> + *
> + * Return: The latency limit in microseconds if a constraint exists,
> + * or PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if no boot override applies.
> + */
> +s32 pm_qos_get_boot_cpu_latency_limit(unsigned int cpu)
> +{
> +	struct pm_qos_boot_entry *entry;
> +
> +	if (list_empty(&pm_qos_boot_list))
> +		return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +
> +	list_for_each_entry(entry, &pm_qos_boot_list, node) {
> +		if (cpumask_test_cpu(cpu, &entry->mask))
> +			return entry->latency;
> +	}
> +
> +	return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +}
> +EXPORT_SYMBOL_GPL(pm_qos_get_boot_cpu_latency_limit);
> +
>   #ifdef CONFIG_CPU_IDLE
>   /* Definitions related to the CPU latency QoS. */
>   


-- 
Thx and BRs,
Zhongqiu Han

next prev parent reply	other threads:[~2026-03-04  9:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-28  3:31 [v2 PATCH 0/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us Aaron Tomlin
2026-01-28  3:31 ` [v2 PATCH 1/1] " Aaron Tomlin
2026-03-04  2:02   ` Aaron Tomlin
2026-03-04  9:05   ` Zhongqiu Han [this message]
2026-03-06  2:35     ` Aaron Tomlin
2026-02-08 21:46 ` [v2 PATCH 0/1] " Aaron Tomlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d19eed6d-48ca-4df2-9739-0455f47f1485@oss.qualcomm.com \
    --to=zhongqiu.han@oss.qualcomm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=atomlin@atomlin.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=chjohnst@gmail.com \
    --cc=dakr@kernel.org \
    --cc=elver@google.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=fvdl@google.com \
    --cc=kees@kernel.org \
    --cc=lenb@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mproche@gmail.com \
    --cc=neelx@suse.com \
    --cc=nick.lange@gmail.com \
    --cc=pavel@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pmladek@suse.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=sean@ashe.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.