[PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
@ 2025-10-16 15:19 Ulf Hansson
  2025-10-16 15:19 ` [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit Ulf Hansson
                   ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-10-16 15:19 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Vincent Guittot, Peter Zijlstra, Kevin Hilman, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, Ulf Hansson, linux-kernel

Changes in v2:
	- Limit the new QoS to CPUs  and make some corresponding renaming of the
	functions along with name of the device node for user space.
	- Make sure we deal with the failure/error path correctly when there are
	no state available for s2idle.
	- Add documentation.

Some platforms supports multiple low-power states for CPUs that can be used
when entering system-wide suspend and s2idle in particular. Currently we are
always selecting the deepest possible state for the CPUs, which can break the
system-wakeup latency constraint that may be required for some use-cases.

Therefore, this series suggests to introduce a new interface for user-space,
allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
taken into account when selecting a suitable low-power state for s2idle.

Kind regards
Ulf Hansson


Ulf Hansson (4):
  PM: QoS: Introduce a CPU system-wakeup QoS limit
  pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  sched: idle: Respect the CPU system-wakeup QoS limit for s2idle
  Documentation: power/cpuidle: Document the CPU system-wakeup latency
    QoS

 Documentation/admin-guide/pm/cpuidle.rst |   7 ++
 Documentation/power/pm_qos_interface.rst |   9 +-
 drivers/cpuidle/cpuidle.c                |  12 +--
 drivers/pmdomain/core.c                  |  10 ++-
 drivers/pmdomain/governor.c              |  27 ++++++
 include/linux/cpuidle.h                  |   6 +-
 include/linux/pm_domain.h                |   1 +
 include/linux/pm_qos.h                   |   5 ++
 kernel/power/qos.c                       | 102 +++++++++++++++++++++++
 kernel/sched/idle.c                      |  12 +--
 10 files changed, 173 insertions(+), 18 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-16 15:19 [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Ulf Hansson
@ 2025-10-16 15:19 ` Ulf Hansson
  2025-10-29  8:10   ` Dhruva Gole
  2025-10-16 15:19 ` [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle Ulf Hansson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-16 15:19 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Vincent Guittot, Peter Zijlstra, Kevin Hilman, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, Ulf Hansson, linux-kernel

Some platforms supports multiple low-power states for CPUs that can be used
when entering system-wide suspend. Currently we are always selecting the
deepest possible state for the CPUs, which can break the system-wakeup
latency constraint that may be required for some use-cases.

Let's take the first step towards addressing this problem, by introducing
an interface for user-space, that allows us to specify the CPU
system-wakeup QoS limit. Subsequent changes will start taking into account
the new QoS limit.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v2:
	- Renamings to reflect the QoS are limited to CPUs.
	- Move code inside "CONFIG_CPU_IDLE".

---
 include/linux/pm_qos.h |   5 ++
 kernel/power/qos.c     | 102 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 4a69d4af3ff8..bf7524d38933 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -149,6 +149,7 @@ bool cpu_latency_qos_request_active(struct pm_qos_request *req);
 void cpu_latency_qos_add_request(struct pm_qos_request *req, s32 value);
 void cpu_latency_qos_update_request(struct pm_qos_request *req, s32 new_value);
 void cpu_latency_qos_remove_request(struct pm_qos_request *req);
+s32 cpu_wakeup_latency_qos_limit(void);
 #else
 static inline s32 cpu_latency_qos_limit(void) { return INT_MAX; }
 static inline bool cpu_latency_qos_request_active(struct pm_qos_request *req)
@@ -160,6 +161,10 @@ static inline void cpu_latency_qos_add_request(struct pm_qos_request *req,
 static inline void cpu_latency_qos_update_request(struct pm_qos_request *req,
 						  s32 new_value) {}
 static inline void cpu_latency_qos_remove_request(struct pm_qos_request *req) {}
+static inline s32 cpu_wakeup_latency_qos_limit(void)
+{
+	return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
+}
 #endif
 
 #ifdef CONFIG_PM
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index 4244b069442e..8c024d7dc43e 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -415,6 +415,103 @@ static struct miscdevice cpu_latency_qos_miscdev = {
 	.fops = &cpu_latency_qos_fops,
 };
 
+/* The CPU system wakeup latency QoS. */
+static struct pm_qos_constraints cpu_wakeup_latency_constraints = {
+	.list = PLIST_HEAD_INIT(cpu_wakeup_latency_constraints.list),
+	.target_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
+	.default_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
+	.no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
+	.type = PM_QOS_MIN,
+};
+
+/**
+ * cpu_wakeup_latency_qos_limit - Current CPU system wakeup latency QoS limit.
+ *
+ * Returns the current CPU system wakeup latency QoS limit that may have been
+ * requested by user-space.
+ */
+s32 cpu_wakeup_latency_qos_limit(void)
+{
+	return pm_qos_read_value(&cpu_wakeup_latency_constraints);
+}
+
+static int cpu_wakeup_latency_qos_open(struct inode *inode, struct file *filp)
+{
+	struct pm_qos_request *req;
+
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->qos = &cpu_wakeup_latency_constraints;
+	pm_qos_update_target(req->qos, &req->node, PM_QOS_ADD_REQ,
+			     PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
+	filp->private_data = req;
+
+	return 0;
+}
+
+static int cpu_wakeup_latency_qos_release(struct inode *inode,
+					  struct file *filp)
+{
+	struct pm_qos_request *req = filp->private_data;
+
+	filp->private_data = NULL;
+	pm_qos_update_target(req->qos, &req->node, PM_QOS_REMOVE_REQ,
+			     PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
+	kfree(req);
+
+	return 0;
+}
+
+static ssize_t cpu_wakeup_latency_qos_read(struct file *filp, char __user *buf,
+					   size_t count, loff_t *f_pos)
+{
+	s32 value = pm_qos_read_value(&cpu_wakeup_latency_constraints);
+
+	return simple_read_from_buffer(buf, count, f_pos, &value, sizeof(s32));
+}
+
+static ssize_t cpu_wakeup_latency_qos_write(struct file *filp,
+					    const char __user *buf,
+					    size_t count, loff_t *f_pos)
+{
+	struct pm_qos_request *req = filp->private_data;
+	s32 value;
+
+	if (count == sizeof(s32)) {
+		if (copy_from_user(&value, buf, sizeof(s32)))
+			return -EFAULT;
+	} else {
+		int ret;
+
+		ret = kstrtos32_from_user(buf, count, 16, &value);
+		if (ret)
+			return ret;
+	}
+
+	if (value < 0)
+		return -EINVAL;
+
+	pm_qos_update_target(req->qos, &req->node, PM_QOS_UPDATE_REQ, value);
+
+	return count;
+}
+
+static const struct file_operations cpu_wakeup_latency_qos_fops = {
+	.open = cpu_wakeup_latency_qos_open,
+	.release = cpu_wakeup_latency_qos_release,
+	.read = cpu_wakeup_latency_qos_read,
+	.write = cpu_wakeup_latency_qos_write,
+	.llseek = noop_llseek,
+};
+
+static struct miscdevice cpu_wakeup_latency_qos_miscdev = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "cpu_wakeup_latency",
+	.fops = &cpu_wakeup_latency_qos_fops,
+};
+
 static int __init cpu_latency_qos_init(void)
 {
 	int ret;
@@ -424,6 +521,11 @@ static int __init cpu_latency_qos_init(void)
 		pr_err("%s: %s setup failed\n", __func__,
 		       cpu_latency_qos_miscdev.name);
 
+	ret = misc_register(&cpu_wakeup_latency_qos_miscdev);
+	if (ret < 0)
+		pr_err("%s: %s setup failed\n", __func__,
+		       cpu_wakeup_latency_qos_miscdev.name);
+
 	return ret;
 }
 late_initcall(cpu_latency_qos_init);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-16 15:19 [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Ulf Hansson
  2025-10-16 15:19 ` [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit Ulf Hansson
@ 2025-10-16 15:19 ` Ulf Hansson
  2025-10-30 10:44   ` Rafael J. Wysocki
  2025-11-01  0:11   ` Kevin Hilman
  2025-10-16 15:19 ` [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle Ulf Hansson
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-10-16 15:19 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Vincent Guittot, Peter Zijlstra, Kevin Hilman, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, Ulf Hansson, linux-kernel

A CPU system-wakeup QoS limit may have been requested by user-space. To
avoid breaking this constraint when entering a low-power state during
s2idle through genpd, let's extend the corresponding genpd governor for
CPUs. More precisely, during s2idle let the genpd governor select a
suitable low-power state, by taking into account the QoS limit.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v2:
	- Limite the change to the genpd governor for CPUs.

---
 drivers/pmdomain/core.c     | 10 ++++++++--
 drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
 include/linux/pm_domain.h   |  1 +
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index 61c2277c9ce3..4fd546ef0448 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
 			return;
 	}
 
-	/* Choose the deepest state when suspending */
-	genpd->state_idx = genpd->state_count - 1;
+	if (genpd->gov && genpd->gov->system_power_down_ok) {
+		if (!genpd->gov->system_power_down_ok(&genpd->domain))
+			return;
+	} else {
+		/* Default to the deepest state. */
+		genpd->state_idx = genpd->state_count - 1;
+	}
+
 	if (_genpd_power_off(genpd, false)) {
 		genpd->states[genpd->state_idx].rejected++;
 		return;
diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
index 39359811a930..bd1b9d66d4a5 100644
--- a/drivers/pmdomain/governor.c
+++ b/drivers/pmdomain/governor.c
@@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
 	return false;
 }
 
+static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
+{
+	s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
+	struct generic_pm_domain *genpd = pd_to_genpd(pd);
+	int state_idx = genpd->state_count - 1;
+
+	if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) {
+		genpd->state_idx = state_idx;
+		return true;
+	}
+
+	/* Find the deepest state for the latency constraint. */
+	while (state_idx >= 0) {
+		s64 latency_ns = genpd->states[state_idx].power_off_latency_ns +
+				 genpd->states[state_idx].power_on_latency_ns;
+
+		if (latency_ns <= constraint_ns) {
+			genpd->state_idx = state_idx;
+			return true;
+		}
+		state_idx--;
+	}
+
+	return false;
+}
+
 struct dev_power_governor pm_domain_cpu_gov = {
 	.suspend_ok = default_suspend_ok,
 	.power_down_ok = cpu_power_down_ok,
+	.system_power_down_ok = cpu_system_power_down_ok,
 };
 #endif
 
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index f67a2cb7d781..93ba0143ca47 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -153,6 +153,7 @@ enum genpd_sync_state {
 };
 
 struct dev_power_governor {
+	bool (*system_power_down_ok)(struct dev_pm_domain *domain);
 	bool (*power_down_ok)(struct dev_pm_domain *domain);
 	bool (*suspend_ok)(struct device *dev);
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle
  2025-10-16 15:19 [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Ulf Hansson
  2025-10-16 15:19 ` [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit Ulf Hansson
  2025-10-16 15:19 ` [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle Ulf Hansson
@ 2025-10-16 15:19 ` Ulf Hansson
  2025-10-17 10:15   ` Peter Zijlstra
  2025-10-31 19:23   ` Dhruva Gole
  2025-10-16 15:19 ` [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS Ulf Hansson
  2025-10-29 14:52 ` [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Rafael J. Wysocki
  4 siblings, 2 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-10-16 15:19 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Vincent Guittot, Peter Zijlstra, Kevin Hilman, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, Ulf Hansson, linux-kernel

A CPU system-wakeup QoS limit may have been requested by user-space. To
avoid breaking this constraint when entering a low-power state during
s2idle, let's start to take into account the QoS limit.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v2:
	- Rework the code to take into account the failure/error path, when we
	don't find a s2idle specific state.

---
 drivers/cpuidle/cpuidle.c | 12 +++++++-----
 include/linux/cpuidle.h   |  6 ++++--
 kernel/sched/idle.c       | 12 +++++++-----
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 56132e843c99..c7876e9e024f 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -184,20 +184,22 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
  * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle.
  * @drv: cpuidle driver for the given CPU.
  * @dev: cpuidle device for the given CPU.
+ * @latency_limit_ns: Idle state exit latency limit
  *
  * If there are states with the ->enter_s2idle callback, find the deepest of
  * them and enter it with frozen tick.
  */
-int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+			 u64 latency_limit_ns)
 {
 	int index;
 
 	/*
-	 * Find the deepest state with ->enter_s2idle present, which guarantees
-	 * that interrupts won't be enabled when it exits and allows the tick to
-	 * be frozen safely.
+	 * Find the deepest state with ->enter_s2idle present that meets the
+	 * specified latency limit, which guarantees that interrupts won't be
+	 * enabled when it exits and allows the tick to be frozen safely.
 	 */
-	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
+	index = find_deepest_state(drv, dev, latency_limit_ns, 0, true);
 	if (index > 0) {
 		enter_s2idle_proper(drv, dev, index);
 		local_irq_enable();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index a9ee4fe55dcf..4073690504a7 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -248,7 +248,8 @@ extern int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 				      struct cpuidle_device *dev,
 				      u64 latency_limit_ns);
 extern int cpuidle_enter_s2idle(struct cpuidle_driver *drv,
-				struct cpuidle_device *dev);
+				struct cpuidle_device *dev,
+				u64 latency_limit_ns);
 extern void cpuidle_use_deepest_state(u64 latency_limit_ns);
 #else
 static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
@@ -256,7 +257,8 @@ static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
 					     u64 latency_limit_ns)
 {return -ENODEV; }
 static inline int cpuidle_enter_s2idle(struct cpuidle_driver *drv,
-				       struct cpuidle_device *dev)
+				       struct cpuidle_device *dev,
+				       u64 latency_limit_ns)
 {return -ENODEV; }
 static inline void cpuidle_use_deepest_state(u64 latency_limit_ns)
 {
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c39b089d4f09..c1c3d0166610 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -131,12 +131,13 @@ void __cpuidle default_idle_call(void)
 }
 
 static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
-			       struct cpuidle_device *dev)
+			       struct cpuidle_device *dev,
+			       u64 max_latency_ns)
 {
 	if (current_clr_polling_and_test())
 		return -EBUSY;
 
-	return cpuidle_enter_s2idle(drv, dev);
+	return cpuidle_enter_s2idle(drv, dev, max_latency_ns);
 }
 
 static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
@@ -205,12 +206,13 @@ static void cpuidle_idle_call(void)
 		u64 max_latency_ns;
 
 		if (idle_should_enter_s2idle()) {
+			max_latency_ns = cpu_wakeup_latency_qos_limit() *
+					 NSEC_PER_USEC;
 
-			entered_state = call_cpuidle_s2idle(drv, dev);
+			entered_state = call_cpuidle_s2idle(drv, dev,
+							    max_latency_ns);
 			if (entered_state > 0)
 				goto exit_idle;
-
-			max_latency_ns = U64_MAX;
 		} else {
 			max_latency_ns = dev->forced_idle_latency_limit_ns;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS
  2025-10-16 15:19 [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Ulf Hansson
                   ` (2 preceding siblings ...)
  2025-10-16 15:19 ` [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle Ulf Hansson
@ 2025-10-16 15:19 ` Ulf Hansson
  2025-10-31 10:57   ` Dhruva Gole
  2025-10-29 14:52 ` [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Rafael J. Wysocki
  4 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-16 15:19 UTC (permalink / raw)
  To: Rafael J . Wysocki, linux-pm
  Cc: Vincent Guittot, Peter Zijlstra, Kevin Hilman, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, Ulf Hansson, linux-kernel,
	Jonathan Corbet

Let's document how the new CPU system-wakeup latency QoS can be used from
user space, along with how the corresponding latency constraint gets
respected during s2idle.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v2:
	- New patch.

---
 Documentation/admin-guide/pm/cpuidle.rst | 7 +++++++
 Documentation/power/pm_qos_interface.rst | 9 +++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index 0c090b076224..3f6f79a9bc8f 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -580,6 +580,13 @@ the given CPU as the upper limit for the exit latency of the idle states that
 they are allowed to select for that CPU.  They should never select any idle
 states with exit latency beyond that limit.
 
+While the above CPU QoS constraints applies to a running system, user space may
+also request a CPU system-wakeup latency QoS limit, via the `cpu_wakeup_latency`
+file.  This QoS constraint is respected when selecting a suitable idle state
+for the CPUs, while entering the system-wide suspend-to-idle sleep state.
+
+Note that, in regards how to manage the QoS request from user space for the
+`cpu_wakeup_latency` file, it works according to the `cpu_dma_latency` file.
 
 Idle States Control Via Kernel Command Line
 ===========================================
diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
index 5019c79c7710..723f4714691a 100644
--- a/Documentation/power/pm_qos_interface.rst
+++ b/Documentation/power/pm_qos_interface.rst
@@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
 
 From user space:
 
-The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
+The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
+the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system-wakeup
 latency QoS.
 
 Only processes can register a PM QoS request.  To provide for automatic
@@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
 parameter requests as follows.
 
 To register the default PM QoS target for the CPU latency QoS, the process must
-open /dev/cpu_dma_latency.
+open /dev/cpu_dma_latency. To register a CPU system-wakeup QoS limit, the
+process must open /dev/cpu_wakeup_latency.
 
 As long as the device node is held open that process has a registered
 request on the parameter.
 
 To change the requested target value, the process needs to write an s32 value to
 the open device node.  Alternatively, it can write a hex string for the value
-using the 10 char long format e.g. "0x12345678".  This translates to a
-cpu_latency_qos_update_request() call.
+using the 10 char long format e.g. "0x12345678".
 
 To remove the user mode request for a target value simply close the device
 node.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle
  2025-10-16 15:19 ` [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle Ulf Hansson
@ 2025-10-17 10:15   ` Peter Zijlstra
  2025-10-31 19:23   ` Dhruva Gole
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2025-10-17 10:15 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, Oct 16, 2025 at 05:19:23PM +0200, Ulf Hansson wrote:

No objections to this.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index c39b089d4f09..c1c3d0166610 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -131,12 +131,13 @@ void __cpuidle default_idle_call(void)
>  }
>  
>  static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
> -			       struct cpuidle_device *dev)
> +			       struct cpuidle_device *dev,
> +			       u64 max_latency_ns)
>  {
>  	if (current_clr_polling_and_test())
>  		return -EBUSY;
>  
> -	return cpuidle_enter_s2idle(drv, dev);
> +	return cpuidle_enter_s2idle(drv, dev, max_latency_ns);
>  }
>  
>  static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> @@ -205,12 +206,13 @@ static void cpuidle_idle_call(void)
>  		u64 max_latency_ns;
>  
>  		if (idle_should_enter_s2idle()) {
> +			max_latency_ns = cpu_wakeup_latency_qos_limit() *
> +					 NSEC_PER_USEC;
>  
> -			entered_state = call_cpuidle_s2idle(drv, dev);
> +			entered_state = call_cpuidle_s2idle(drv, dev,
> +							    max_latency_ns);
>  			if (entered_state > 0)
>  				goto exit_idle;
> -
> -			max_latency_ns = U64_MAX;
>  		} else {
>  			max_latency_ns = dev->forced_idle_latency_limit_ns;
>  		}
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-16 15:19 ` [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit Ulf Hansson
@ 2025-10-29  8:10   ` Dhruva Gole
  2025-10-29 14:28     ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Dhruva Gole @ 2025-10-29  8:10 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

Hi Ulf,

On Oct 16, 2025 at 17:19:21 +0200, Ulf Hansson wrote:
> Some platforms supports multiple low-power states for CPUs that can be used
> when entering system-wide suspend. Currently we are always selecting the
> deepest possible state for the CPUs, which can break the system-wakeup
> latency constraint that may be required for some use-cases.
> 
> Let's take the first step towards addressing this problem, by introducing
> an interface for user-space, that allows us to specify the CPU
> system-wakeup QoS limit. Subsequent changes will start taking into account
> the new QoS limit.
> 
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> 
> Changes in v2:
> 	- Renamings to reflect the QoS are limited to CPUs.
> 	- Move code inside "CONFIG_CPU_IDLE".
> 
> ---
>  include/linux/pm_qos.h |   5 ++
>  kernel/power/qos.c     | 102 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 107 insertions(+)
> 
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index 4a69d4af3ff8..bf7524d38933 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -149,6 +149,7 @@ bool cpu_latency_qos_request_active(struct pm_qos_request *req);
>  void cpu_latency_qos_add_request(struct pm_qos_request *req, s32 value);
>  void cpu_latency_qos_update_request(struct pm_qos_request *req, s32 new_value);
>  void cpu_latency_qos_remove_request(struct pm_qos_request *req);
> +s32 cpu_wakeup_latency_qos_limit(void);
>  #else
>  static inline s32 cpu_latency_qos_limit(void) { return INT_MAX; }
>  static inline bool cpu_latency_qos_request_active(struct pm_qos_request *req)
> @@ -160,6 +161,10 @@ static inline void cpu_latency_qos_add_request(struct pm_qos_request *req,
>  static inline void cpu_latency_qos_update_request(struct pm_qos_request *req,
>  						  s32 new_value) {}
>  static inline void cpu_latency_qos_remove_request(struct pm_qos_request *req) {}
> +static inline s32 cpu_wakeup_latency_qos_limit(void)
> +{
> +	return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +}
>  #endif
>  
>  #ifdef CONFIG_PM
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index 4244b069442e..8c024d7dc43e 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -415,6 +415,103 @@ static struct miscdevice cpu_latency_qos_miscdev = {
>  	.fops = &cpu_latency_qos_fops,
>  };
>  
> +/* The CPU system wakeup latency QoS. */
> +static struct pm_qos_constraints cpu_wakeup_latency_constraints = {
> +	.list = PLIST_HEAD_INIT(cpu_wakeup_latency_constraints.list),
> +	.target_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> +	.default_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> +	.no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> +	.type = PM_QOS_MIN,
> +};
> +
> +/**
> + * cpu_wakeup_latency_qos_limit - Current CPU system wakeup latency QoS limit.
> + *
> + * Returns the current CPU system wakeup latency QoS limit that may have been
> + * requested by user-space.
> + */
> +s32 cpu_wakeup_latency_qos_limit(void)
> +{
> +	return pm_qos_read_value(&cpu_wakeup_latency_constraints);
> +}
> +
> +static int cpu_wakeup_latency_qos_open(struct inode *inode, struct file *filp)
> +{
> +	struct pm_qos_request *req;
> +
> +	req = kzalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->qos = &cpu_wakeup_latency_constraints;
> +	pm_qos_update_target(req->qos, &req->node, PM_QOS_ADD_REQ,
> +			     PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> +	filp->private_data = req;
> +
> +	return 0;
> +}
> +
> +static int cpu_wakeup_latency_qos_release(struct inode *inode,
> +					  struct file *filp)
> +{
> +	struct pm_qos_request *req = filp->private_data;
> +
> +	filp->private_data = NULL;
> +	pm_qos_update_target(req->qos, &req->node, PM_QOS_REMOVE_REQ,
> +			     PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);

Please excuse the delay in reviewing these patches,
I was wondering why we have decided here in release to reset the
constraints set by a user. For eg. even when I was testing the previous
revision locally I'd just commented out this release hook, since I
wanted to be able to just echo 0xABCD into /dev/cpu_wakeup_latency...

It seems an overkill to me that a userspace program be required to hold
open this file just to make sure the constraints are honoured for the
lifetime of the device. We should definitely give the freedom to just be
able to echo and also be able to cat and read back from the same place
about the latency constraint being set.

One other thing on my mind is - and probably unrelated to this specific
series, but I think we must have some sysfs entry either appear in
/sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
state that the governor has chosen to enter based on the value set in
cpu_wakeup_latency.

Thoughts?

> +	kfree(req);
> +
> +	return 0;
> +}
> +
> +static ssize_t cpu_wakeup_latency_qos_read(struct file *filp, char __user *buf,
> +					   size_t count, loff_t *f_pos)
> +{
> +	s32 value = pm_qos_read_value(&cpu_wakeup_latency_constraints);
> +
> +	return simple_read_from_buffer(buf, count, f_pos, &value, sizeof(s32));
> +}
> +
> +static ssize_t cpu_wakeup_latency_qos_write(struct file *filp,
> +					    const char __user *buf,
> +					    size_t count, loff_t *f_pos)
> +{
> +	struct pm_qos_request *req = filp->private_data;
> +	s32 value;
> +
> +	if (count == sizeof(s32)) {
> +		if (copy_from_user(&value, buf, sizeof(s32)))
> +			return -EFAULT;
> +	} else {
> +		int ret;
> +
> +		ret = kstrtos32_from_user(buf, count, 16, &value);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (value < 0)
> +		return -EINVAL;
> +
> +	pm_qos_update_target(req->qos, &req->node, PM_QOS_UPDATE_REQ, value);
> +
> +	return count;
> +}
> +
> +static const struct file_operations cpu_wakeup_latency_qos_fops = {
> +	.open = cpu_wakeup_latency_qos_open,
> +	.release = cpu_wakeup_latency_qos_release,

> +	.read = cpu_wakeup_latency_qos_read,
> +	.write = cpu_wakeup_latency_qos_write,
> +	.llseek = noop_llseek,
> +};
> +
> +static struct miscdevice cpu_wakeup_latency_qos_miscdev = {
> +	.minor = MISC_DYNAMIC_MINOR,
> +	.name = "cpu_wakeup_latency",
> +	.fops = &cpu_wakeup_latency_qos_fops,
> +};
> +
>  static int __init cpu_latency_qos_init(void)
>  {
>  	int ret;
> @@ -424,6 +521,11 @@ static int __init cpu_latency_qos_init(void)
>  		pr_err("%s: %s setup failed\n", __func__,
>  		       cpu_latency_qos_miscdev.name);
>  
> +	ret = misc_register(&cpu_wakeup_latency_qos_miscdev);
> +	if (ret < 0)
> +		pr_err("%s: %s setup failed\n", __func__,
> +		       cpu_wakeup_latency_qos_miscdev.name);
> +
>  	return ret;
>  }
>  late_initcall(cpu_latency_qos_init);
> -- 
> 2.43.0
> 

-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-29  8:10   ` Dhruva Gole
@ 2025-10-29 14:28     ` Rafael J. Wysocki
  2025-10-30 16:45       ` Dhruva Gole
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-29 14:28 UTC (permalink / raw)
  To: Dhruva Gole
  Cc: Ulf Hansson, Rafael J . Wysocki, linux-pm, Vincent Guittot,
	Peter Zijlstra, Kevin Hilman, Pavel Machek, Len Brown,
	Daniel Lezcano, Saravana Kannan, Maulik Shah, Prasad Sodagudi,
	linux-kernel

On Wed, Oct 29, 2025 at 9:10 AM Dhruva Gole <d-gole@ti.com> wrote:
>
> Hi Ulf,
>
> On Oct 16, 2025 at 17:19:21 +0200, Ulf Hansson wrote:
> > Some platforms supports multiple low-power states for CPUs that can be used
> > when entering system-wide suspend. Currently we are always selecting the
> > deepest possible state for the CPUs, which can break the system-wakeup
> > latency constraint that may be required for some use-cases.
> >
> > Let's take the first step towards addressing this problem, by introducing
> > an interface for user-space, that allows us to specify the CPU
> > system-wakeup QoS limit. Subsequent changes will start taking into account
> > the new QoS limit.
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v2:
> >       - Renamings to reflect the QoS are limited to CPUs.
> >       - Move code inside "CONFIG_CPU_IDLE".
> >
> > ---
> >  include/linux/pm_qos.h |   5 ++
> >  kernel/power/qos.c     | 102 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 107 insertions(+)
> >
> > diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> > index 4a69d4af3ff8..bf7524d38933 100644
> > --- a/include/linux/pm_qos.h
> > +++ b/include/linux/pm_qos.h
> > @@ -149,6 +149,7 @@ bool cpu_latency_qos_request_active(struct pm_qos_request *req);
> >  void cpu_latency_qos_add_request(struct pm_qos_request *req, s32 value);
> >  void cpu_latency_qos_update_request(struct pm_qos_request *req, s32 new_value);
> >  void cpu_latency_qos_remove_request(struct pm_qos_request *req);
> > +s32 cpu_wakeup_latency_qos_limit(void);
> >  #else
> >  static inline s32 cpu_latency_qos_limit(void) { return INT_MAX; }
> >  static inline bool cpu_latency_qos_request_active(struct pm_qos_request *req)
> > @@ -160,6 +161,10 @@ static inline void cpu_latency_qos_add_request(struct pm_qos_request *req,
> >  static inline void cpu_latency_qos_update_request(struct pm_qos_request *req,
> >                                                 s32 new_value) {}
> >  static inline void cpu_latency_qos_remove_request(struct pm_qos_request *req) {}
> > +static inline s32 cpu_wakeup_latency_qos_limit(void)
> > +{
> > +     return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> > +}
> >  #endif
> >
> >  #ifdef CONFIG_PM
> > diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> > index 4244b069442e..8c024d7dc43e 100644
> > --- a/kernel/power/qos.c
> > +++ b/kernel/power/qos.c
> > @@ -415,6 +415,103 @@ static struct miscdevice cpu_latency_qos_miscdev = {
> >       .fops = &cpu_latency_qos_fops,
> >  };
> >
> > +/* The CPU system wakeup latency QoS. */
> > +static struct pm_qos_constraints cpu_wakeup_latency_constraints = {
> > +     .list = PLIST_HEAD_INIT(cpu_wakeup_latency_constraints.list),
> > +     .target_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > +     .default_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > +     .no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > +     .type = PM_QOS_MIN,
> > +};
> > +
> > +/**
> > + * cpu_wakeup_latency_qos_limit - Current CPU system wakeup latency QoS limit.
> > + *
> > + * Returns the current CPU system wakeup latency QoS limit that may have been
> > + * requested by user-space.
> > + */
> > +s32 cpu_wakeup_latency_qos_limit(void)
> > +{
> > +     return pm_qos_read_value(&cpu_wakeup_latency_constraints);
> > +}
> > +
> > +static int cpu_wakeup_latency_qos_open(struct inode *inode, struct file *filp)
> > +{
> > +     struct pm_qos_request *req;
> > +
> > +     req = kzalloc(sizeof(*req), GFP_KERNEL);
> > +     if (!req)
> > +             return -ENOMEM;
> > +
> > +     req->qos = &cpu_wakeup_latency_constraints;
> > +     pm_qos_update_target(req->qos, &req->node, PM_QOS_ADD_REQ,
> > +                          PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> > +     filp->private_data = req;
> > +
> > +     return 0;
> > +}
> > +
> > +static int cpu_wakeup_latency_qos_release(struct inode *inode,
> > +                                       struct file *filp)
> > +{
> > +     struct pm_qos_request *req = filp->private_data;
> > +
> > +     filp->private_data = NULL;
> > +     pm_qos_update_target(req->qos, &req->node, PM_QOS_REMOVE_REQ,
> > +                          PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
>
> Please excuse the delay in reviewing these patches,
> I was wondering why we have decided here in release to reset the
> constraints set by a user. For eg. even when I was testing the previous
> revision locally I'd just commented out this release hook, since I
> wanted to be able to just echo 0xABCD into /dev/cpu_wakeup_latency...

If you want "fire and forget", that would be a different interface.
Device special files are not for that.

Cleaning up after closing a file descriptor is a safety measure and
CPU wakeup latency constraints are a big deal.  Leaving leftover ones
behind dead processes is not a good idea.

> It seems an overkill to me that a userspace program be required to hold
> open this file just to make sure the constraints are honoured for the
> lifetime of the device. We should definitely give the freedom to just be
> able to echo and also be able to cat and read back from the same place
> about the latency constraint being set.

So you'd want a sysfs attribute here, but that has its own issues (the
last writer "wins", so if there are multiple users of it with
different needs in user space, things get tricky).

> One other thing on my mind is - and probably unrelated to this specific
> series, but I think we must have some sysfs entry either appear in
> /sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
> state that the governor has chosen to enter based on the value set in
> cpu_wakeup_latency.

Exit latency values for all states are exposed via sysfs.  Since
s2idle always uses the deepest state it can use, it is quite
straightforward to figure out which of them will be used going
forward, given a specific latency constraint.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-16 15:19 [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Ulf Hansson
                   ` (3 preceding siblings ...)
  2025-10-16 15:19 ` [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS Ulf Hansson
@ 2025-10-29 14:52 ` Rafael J. Wysocki
  2025-10-30 12:22   ` Ulf Hansson
  4 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-29 14:52 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> Changes in v2:
>         - Limit the new QoS to CPUs  and make some corresponding renaming of the
>         functions along with name of the device node for user space.
>         - Make sure we deal with the failure/error path correctly when there are
>         no state available for s2idle.
>         - Add documentation.
>
> Some platforms supports multiple low-power states for CPUs that can be used
> when entering system-wide suspend and s2idle in particular. Currently we are
> always selecting the deepest possible state for the CPUs, which can break the
> system-wakeup latency constraint that may be required for some use-cases.
>
> Therefore, this series suggests to introduce a new interface for user-space,
> allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> taken into account when selecting a suitable low-power state for s2idle.

Last time we discussed this I said I would like the new limit to be
taken into account by regular "runtime" cpuidle because the "s2idle"
limit should not be less that the "runtime" limit (or at least it
would be illogical if that happened).

It looks like that could be implemented by making
cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
into account, couldn't it?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-16 15:19 ` [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle Ulf Hansson
@ 2025-10-30 10:44   ` Rafael J. Wysocki
  2025-10-30 12:00     ` Ulf Hansson
  2025-11-01  0:11   ` Kevin Hilman
  1 sibling, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 10:44 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> A CPU system-wakeup QoS limit may have been requested by user-space. To
> avoid breaking this constraint when entering a low-power state during
> s2idle through genpd, let's extend the corresponding genpd governor for
> CPUs. More precisely, during s2idle let the genpd governor select a
> suitable low-power state, by taking into account the QoS limit.
>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>
> Changes in v2:
>         - Limite the change to the genpd governor for CPUs.
>
> ---
>  drivers/pmdomain/core.c     | 10 ++++++++--
>  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
>  include/linux/pm_domain.h   |  1 +
>  3 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> index 61c2277c9ce3..4fd546ef0448 100644
> --- a/drivers/pmdomain/core.c
> +++ b/drivers/pmdomain/core.c
> @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
>                         return;
>         }
>
> -       /* Choose the deepest state when suspending */
> -       genpd->state_idx = genpd->state_count - 1;
> +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> +                       return;
> +       } else {
> +               /* Default to the deepest state. */
> +               genpd->state_idx = genpd->state_count - 1;
> +       }
> +
>         if (_genpd_power_off(genpd, false)) {
>                 genpd->states[genpd->state_idx].rejected++;
>                 return;
> diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> index 39359811a930..bd1b9d66d4a5 100644
> --- a/drivers/pmdomain/governor.c
> +++ b/drivers/pmdomain/governor.c
> @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
>         return false;
>  }
>
> +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> +{
> +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;

I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
into account directly.

It should be told by cpuidle which state has been selected on the CPU
side and it should not go any deeper than that anyway.

> +       struct generic_pm_domain *genpd = pd_to_genpd(pd);
> +       int state_idx = genpd->state_count - 1;
> +
> +       if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) {
> +               genpd->state_idx = state_idx;
> +               return true;
> +       }
> +
> +       /* Find the deepest state for the latency constraint. */
> +       while (state_idx >= 0) {
> +               s64 latency_ns = genpd->states[state_idx].power_off_latency_ns +
> +                                genpd->states[state_idx].power_on_latency_ns;
> +
> +               if (latency_ns <= constraint_ns) {
> +                       genpd->state_idx = state_idx;
> +                       return true;
> +               }
> +               state_idx--;
> +       }
> +
> +       return false;
> +}
> +
>  struct dev_power_governor pm_domain_cpu_gov = {
>         .suspend_ok = default_suspend_ok,
>         .power_down_ok = cpu_power_down_ok,
> +       .system_power_down_ok = cpu_system_power_down_ok,
>  };
>  #endif
>
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index f67a2cb7d781..93ba0143ca47 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -153,6 +153,7 @@ enum genpd_sync_state {
>  };
>
>  struct dev_power_governor {
> +       bool (*system_power_down_ok)(struct dev_pm_domain *domain);
>         bool (*power_down_ok)(struct dev_pm_domain *domain);
>         bool (*suspend_ok)(struct device *dev);
>  };
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 10:44   ` Rafael J. Wysocki
@ 2025-10-30 12:00     ` Ulf Hansson
  2025-10-30 12:23       ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-30 12:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > avoid breaking this constraint when entering a low-power state during
> > s2idle through genpd, let's extend the corresponding genpd governor for
> > CPUs. More precisely, during s2idle let the genpd governor select a
> > suitable low-power state, by taking into account the QoS limit.
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v2:
> >         - Limite the change to the genpd governor for CPUs.
> >
> > ---
> >  drivers/pmdomain/core.c     | 10 ++++++++--
> >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> >  include/linux/pm_domain.h   |  1 +
> >  3 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > index 61c2277c9ce3..4fd546ef0448 100644
> > --- a/drivers/pmdomain/core.c
> > +++ b/drivers/pmdomain/core.c
> > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> >                         return;
> >         }
> >
> > -       /* Choose the deepest state when suspending */
> > -       genpd->state_idx = genpd->state_count - 1;
> > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > +                       return;
> > +       } else {
> > +               /* Default to the deepest state. */
> > +               genpd->state_idx = genpd->state_count - 1;
> > +       }
> > +
> >         if (_genpd_power_off(genpd, false)) {
> >                 genpd->states[genpd->state_idx].rejected++;
> >                 return;
> > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > index 39359811a930..bd1b9d66d4a5 100644
> > --- a/drivers/pmdomain/governor.c
> > +++ b/drivers/pmdomain/governor.c
> > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> >         return false;
> >  }
> >
> > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > +{
> > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
>
> I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> into account directly.
>
> It should be told by cpuidle which state has been selected on the CPU
> side and it should not go any deeper than that anyway.

For PSCI OS-initiated mode, cpuidle doesn't know about the states that
may be shared among a group of CPUs.

Instead, those states are controlled through the PM domain topology by
genpd and its governor, hence this is needed too.

>
> > +       struct generic_pm_domain *genpd = pd_to_genpd(pd);
> > +       int state_idx = genpd->state_count - 1;
> > +
> > +       if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) {
> > +               genpd->state_idx = state_idx;
> > +               return true;
> > +       }
> > +
> > +       /* Find the deepest state for the latency constraint. */
> > +       while (state_idx >= 0) {
> > +               s64 latency_ns = genpd->states[state_idx].power_off_latency_ns +
> > +                                genpd->states[state_idx].power_on_latency_ns;
> > +
> > +               if (latency_ns <= constraint_ns) {
> > +                       genpd->state_idx = state_idx;
> > +                       return true;
> > +               }
> > +               state_idx--;
> > +       }
> > +
> > +       return false;
> > +}
> > +
> >  struct dev_power_governor pm_domain_cpu_gov = {
> >         .suspend_ok = default_suspend_ok,
> >         .power_down_ok = cpu_power_down_ok,
> > +       .system_power_down_ok = cpu_system_power_down_ok,
> >  };
> >  #endif
> >
> > diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> > index f67a2cb7d781..93ba0143ca47 100644
> > --- a/include/linux/pm_domain.h
> > +++ b/include/linux/pm_domain.h
> > @@ -153,6 +153,7 @@ enum genpd_sync_state {
> >  };
> >
> >  struct dev_power_governor {
> > +       bool (*system_power_down_ok)(struct dev_pm_domain *domain);
> >         bool (*power_down_ok)(struct dev_pm_domain *domain);
> >         bool (*suspend_ok)(struct device *dev);
> >  };
> > --
> > 2.43.0
> >

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-29 14:52 ` [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Rafael J. Wysocki
@ 2025-10-30 12:22   ` Ulf Hansson
  2025-10-30 12:26     ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-30 12:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > Changes in v2:
> >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> >         functions along with name of the device node for user space.
> >         - Make sure we deal with the failure/error path correctly when there are
> >         no state available for s2idle.
> >         - Add documentation.
> >
> > Some platforms supports multiple low-power states for CPUs that can be used
> > when entering system-wide suspend and s2idle in particular. Currently we are
> > always selecting the deepest possible state for the CPUs, which can break the
> > system-wakeup latency constraint that may be required for some use-cases.
> >
> > Therefore, this series suggests to introduce a new interface for user-space,
> > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > taken into account when selecting a suitable low-power state for s2idle.
>
> Last time we discussed this I said I would like the new limit to be
> taken into account by regular "runtime" cpuidle because the "s2idle"
> limit should not be less that the "runtime" limit (or at least it
> would be illogical if that happened).

Yes, we discussed this, but that was also before we concluded to add a
new file for user-space to operate on after all.

To me, it looks unnecessarily limiting to not allow them to be
orthogonal, but I am not insisting that it needs to be like this. I
was just thinking that we do not necessarily have to care about the
same use-case in runtime as in the system-suspend state. Moreover,
nothing would prevent user-space from applying the same constraint to
both of them, if that is needed.

>
> It looks like that could be implemented by making
> cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> into account, couldn't it?

Right, but I am not sure we want that. See above.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 12:00     ` Ulf Hansson
@ 2025-10-30 12:23       ` Rafael J. Wysocki
  2025-10-30 12:32         ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 12:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > avoid breaking this constraint when entering a low-power state during
> > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > suitable low-power state, by taking into account the QoS limit.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > ---
> > >
> > > Changes in v2:
> > >         - Limite the change to the genpd governor for CPUs.
> > >
> > > ---
> > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > >  include/linux/pm_domain.h   |  1 +
> > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > index 61c2277c9ce3..4fd546ef0448 100644
> > > --- a/drivers/pmdomain/core.c
> > > +++ b/drivers/pmdomain/core.c
> > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > >                         return;
> > >         }
> > >
> > > -       /* Choose the deepest state when suspending */
> > > -       genpd->state_idx = genpd->state_count - 1;
> > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > +                       return;
> > > +       } else {
> > > +               /* Default to the deepest state. */
> > > +               genpd->state_idx = genpd->state_count - 1;
> > > +       }
> > > +
> > >         if (_genpd_power_off(genpd, false)) {
> > >                 genpd->states[genpd->state_idx].rejected++;
> > >                 return;
> > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > index 39359811a930..bd1b9d66d4a5 100644
> > > --- a/drivers/pmdomain/governor.c
> > > +++ b/drivers/pmdomain/governor.c
> > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > >         return false;
> > >  }
> > >
> > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > +{
> > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> >
> > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > into account directly.
> >
> > It should be told by cpuidle which state has been selected on the CPU
> > side and it should not go any deeper than that anyway.
>
> For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> may be shared among a group of CPUs.
>
> Instead, those states are controlled through the PM domain topology by
> genpd and its governor, hence this is needed too.

All right, but I'd like to understand how all of that works.

So cpuidle selects a state to enter for the given CPU and then genpd
is invoked.  It has to take the exit latency of that state into
account, so it doesn't go too deep.  How does it do that?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 12:22   ` Ulf Hansson
@ 2025-10-30 12:26     ` Rafael J. Wysocki
  2025-10-30 12:29       ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 12:26 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > Changes in v2:
> > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > >         functions along with name of the device node for user space.
> > >         - Make sure we deal with the failure/error path correctly when there are
> > >         no state available for s2idle.
> > >         - Add documentation.
> > >
> > > Some platforms supports multiple low-power states for CPUs that can be used
> > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > always selecting the deepest possible state for the CPUs, which can break the
> > > system-wakeup latency constraint that may be required for some use-cases.
> > >
> > > Therefore, this series suggests to introduce a new interface for user-space,
> > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > taken into account when selecting a suitable low-power state for s2idle.
> >
> > Last time we discussed this I said I would like the new limit to be
> > taken into account by regular "runtime" cpuidle because the "s2idle"
> > limit should not be less that the "runtime" limit (or at least it
> > would be illogical if that happened).
>
> Yes, we discussed this, but that was also before we concluded to add a
> new file for user-space to operate on after all.
>
> To me, it looks unnecessarily limiting to not allow them to be
> orthogonal,

So what's the use case in which it makes sense to have a lower latency
limit for s2idle than for runtime?

> but I am not insisting that it needs to be like this. I
> was just thinking that we do not necessarily have to care about the
> same use-case in runtime as in the system-suspend state. Moreover,
> nothing would prevent user-space from applying the same constraint to
> both of them, if that is needed.
>
> >
> > It looks like that could be implemented by making
> > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > into account, couldn't it?
>
> Right, but I am not sure we want that. See above.

I do or I need to be convinced that this is a bad idea.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 12:26     ` Rafael J. Wysocki
@ 2025-10-30 12:29       ` Rafael J. Wysocki
  2025-10-30 12:43         ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 12:29 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, Oct 30, 2025 at 1:26 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > Changes in v2:
> > > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > > >         functions along with name of the device node for user space.
> > > >         - Make sure we deal with the failure/error path correctly when there are
> > > >         no state available for s2idle.
> > > >         - Add documentation.
> > > >
> > > > Some platforms supports multiple low-power states for CPUs that can be used
> > > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > > always selecting the deepest possible state for the CPUs, which can break the
> > > > system-wakeup latency constraint that may be required for some use-cases.
> > > >
> > > > Therefore, this series suggests to introduce a new interface for user-space,
> > > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > > taken into account when selecting a suitable low-power state for s2idle.
> > >
> > > Last time we discussed this I said I would like the new limit to be
> > > taken into account by regular "runtime" cpuidle because the "s2idle"
> > > limit should not be less that the "runtime" limit (or at least it
> > > would be illogical if that happened).
> >
> > Yes, we discussed this, but that was also before we concluded to add a
> > new file for user-space to operate on after all.
> >
> > To me, it looks unnecessarily limiting to not allow them to be
> > orthogonal,
>
> So what's the use case in which it makes sense to have a lower latency
> limit for s2idle than for runtime?
>
> > but I am not insisting that it needs to be like this. I
> > was just thinking that we do not necessarily have to care about the
> > same use-case in runtime as in the system-suspend state. Moreover,
> > nothing would prevent user-space from applying the same constraint to
> > both of them, if that is needed.
> >
> > >
> > > It looks like that could be implemented by making
> > > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > > into account, couldn't it?
> >
> > Right, but I am not sure we want that. See above.
>
> I do or I need to be convinced that this is a bad idea.

And there is a specific reason why I want that.

Namely, say somebody wants to set the same limit for both s2idle and
"runtime" cpuidle.  If the s2idle limit did not affect "runtime", they
would need to open two device special files and write the same value
to both of them.  Otherwise, they just need to use the s2idle limit
and it will work for "runtime" automatically.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 12:23       ` Rafael J. Wysocki
@ 2025-10-30 12:32         ` Ulf Hansson
  2025-10-30 14:02           ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-30 12:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 13:23, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > avoid breaking this constraint when entering a low-power state during
> > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > suitable low-power state, by taking into account the QoS limit.
> > > >
> > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > ---
> > > >
> > > > Changes in v2:
> > > >         - Limite the change to the genpd governor for CPUs.
> > > >
> > > > ---
> > > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > > >  include/linux/pm_domain.h   |  1 +
> > > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > > index 61c2277c9ce3..4fd546ef0448 100644
> > > > --- a/drivers/pmdomain/core.c
> > > > +++ b/drivers/pmdomain/core.c
> > > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > > >                         return;
> > > >         }
> > > >
> > > > -       /* Choose the deepest state when suspending */
> > > > -       genpd->state_idx = genpd->state_count - 1;
> > > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > > +                       return;
> > > > +       } else {
> > > > +               /* Default to the deepest state. */
> > > > +               genpd->state_idx = genpd->state_count - 1;
> > > > +       }
> > > > +
> > > >         if (_genpd_power_off(genpd, false)) {
> > > >                 genpd->states[genpd->state_idx].rejected++;
> > > >                 return;
> > > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > > index 39359811a930..bd1b9d66d4a5 100644
> > > > --- a/drivers/pmdomain/governor.c
> > > > +++ b/drivers/pmdomain/governor.c
> > > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > > >         return false;
> > > >  }
> > > >
> > > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > > +{
> > > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> > >
> > > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > > into account directly.
> > >
> > > It should be told by cpuidle which state has been selected on the CPU
> > > side and it should not go any deeper than that anyway.
> >
> > For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> > may be shared among a group of CPUs.
> >
> > Instead, those states are controlled through the PM domain topology by
> > genpd and its governor, hence this is needed too.
>
> All right, but I'd like to understand how all of that works.
>
> So cpuidle selects a state to enter for the given CPU and then genpd
> is invoked.  It has to take the exit latency of that state into
> account, so it doesn't go too deep.  How does it do that?

Depending on the state selected, in cpuidle-psci.c we may end up
calling __psci_enter_domain_idle_state() (only for the deepest
CPU-state).

For s2idle this means we call dev_pm_genpd_suspend|resume(), to manage
the reference counting of the PM domains via genpd. This then may lead
to that genpd_sync_power_off() tries to select a state by calling the
new governor function above.

Did that make sense?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 12:29       ` Rafael J. Wysocki
@ 2025-10-30 12:43         ` Ulf Hansson
  2025-10-30 14:06           ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-30 12:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 13:29, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 1:26 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > Changes in v2:
> > > > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > > > >         functions along with name of the device node for user space.
> > > > >         - Make sure we deal with the failure/error path correctly when there are
> > > > >         no state available for s2idle.
> > > > >         - Add documentation.
> > > > >
> > > > > Some platforms supports multiple low-power states for CPUs that can be used
> > > > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > > > always selecting the deepest possible state for the CPUs, which can break the
> > > > > system-wakeup latency constraint that may be required for some use-cases.
> > > > >
> > > > > Therefore, this series suggests to introduce a new interface for user-space,
> > > > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > > > taken into account when selecting a suitable low-power state for s2idle.
> > > >
> > > > Last time we discussed this I said I would like the new limit to be
> > > > taken into account by regular "runtime" cpuidle because the "s2idle"
> > > > limit should not be less that the "runtime" limit (or at least it
> > > > would be illogical if that happened).
> > >
> > > Yes, we discussed this, but that was also before we concluded to add a
> > > new file for user-space to operate on after all.
> > >
> > > To me, it looks unnecessarily limiting to not allow them to be
> > > orthogonal,
> >
> > So what's the use case in which it makes sense to have a lower latency
> > limit for s2idle than for runtime?

Honestly, I don't know, but I just wanted to keep things more flexible.

> >
> > > but I am not insisting that it needs to be like this. I
> > > was just thinking that we do not necessarily have to care about the
> > > same use-case in runtime as in the system-suspend state. Moreover,
> > > nothing would prevent user-space from applying the same constraint to
> > > both of them, if that is needed.
> > >
> > > >
> > > > It looks like that could be implemented by making
> > > > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > > > into account, couldn't it?
> > >
> > > Right, but I am not sure we want that. See above.
> >
> > I do or I need to be convinced that this is a bad idea.
>
> And there is a specific reason why I want that.
>
> Namely, say somebody wants to set the same limit for both s2idle and
> "runtime" cpuidle.  If the s2idle limit did not affect "runtime", they
> would need to open two device special files and write the same value
> to both of them.  Otherwise, they just need to use the s2idle limit
> and it will work for "runtime" automatically.

Right. User-space would need to open two files instead of one, but is
that really a problem?

What if user-space doesn't want to affect the runtime state-selection,
but cares only about a use-case that requires a cpu-wakeup constraint
when resuming from s2idle.

It's your call, I can change if you prefer, np!

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 12:32         ` Ulf Hansson
@ 2025-10-30 14:02           ` Rafael J. Wysocki
  2025-10-30 15:06             ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 14:02 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 30, 2025 at 1:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 30 Oct 2025 at 13:23, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > > avoid breaking this constraint when entering a low-power state during
> > > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > > suitable low-power state, by taking into account the QoS limit.
> > > > >
> > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > ---
> > > > >
> > > > > Changes in v2:
> > > > >         - Limite the change to the genpd governor for CPUs.
> > > > >
> > > > > ---
> > > > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > > > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > > > >  include/linux/pm_domain.h   |  1 +
> > > > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > > > index 61c2277c9ce3..4fd546ef0448 100644
> > > > > --- a/drivers/pmdomain/core.c
> > > > > +++ b/drivers/pmdomain/core.c
> > > > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > > > >                         return;
> > > > >         }
> > > > >
> > > > > -       /* Choose the deepest state when suspending */
> > > > > -       genpd->state_idx = genpd->state_count - 1;
> > > > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > > > +                       return;
> > > > > +       } else {
> > > > > +               /* Default to the deepest state. */
> > > > > +               genpd->state_idx = genpd->state_count - 1;
> > > > > +       }
> > > > > +
> > > > >         if (_genpd_power_off(genpd, false)) {
> > > > >                 genpd->states[genpd->state_idx].rejected++;
> > > > >                 return;
> > > > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > > > index 39359811a930..bd1b9d66d4a5 100644
> > > > > --- a/drivers/pmdomain/governor.c
> > > > > +++ b/drivers/pmdomain/governor.c
> > > > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > > > >         return false;
> > > > >  }
> > > > >
> > > > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > > > +{
> > > > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> > > >
> > > > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > > > into account directly.
> > > >
> > > > It should be told by cpuidle which state has been selected on the CPU
> > > > side and it should not go any deeper than that anyway.
> > >
> > > For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> > > may be shared among a group of CPUs.
> > >
> > > Instead, those states are controlled through the PM domain topology by
> > > genpd and its governor, hence this is needed too.
> >
> > All right, but I'd like to understand how all of that works.
> >
> > So cpuidle selects a state to enter for the given CPU and then genpd
> > is invoked.  It has to take the exit latency of that state into
> > account, so it doesn't go too deep.  How does it do that?
>
> Depending on the state selected, in cpuidle-psci.c we may end up
> calling __psci_enter_domain_idle_state() (only for the deepest
> CPU-state).
>
> For s2idle this means we call dev_pm_genpd_suspend|resume(), to manage
> the reference counting of the PM domains via genpd. This then may lead
> to that genpd_sync_power_off() tries to select a state by calling the
> new governor function above.
>
> Did that make sense?

So IIUC this will only happen if the deepest idle state is selected in
which case the cpu_wakeup_latency_qos_limit() value is greater than
the exit latency of that state, but it may still need to be taken into
account when selecting the domain state.  However, this means that the
exit latency number for the deepest idle state is too low (it should
represent the worst-case exit latency which means the maximum domain
exit latency in this particular case).

Moreover, it looks like the "runtime" cpuidle has the same problem, doesn't it?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 12:43         ` Ulf Hansson
@ 2025-10-30 14:06           ` Rafael J. Wysocki
  2025-10-30 15:12             ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 14:06 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 30, 2025 at 1:44 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 30 Oct 2025 at 13:29, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 30, 2025 at 1:26 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > >
> > > > > > Changes in v2:
> > > > > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > > > > >         functions along with name of the device node for user space.
> > > > > >         - Make sure we deal with the failure/error path correctly when there are
> > > > > >         no state available for s2idle.
> > > > > >         - Add documentation.
> > > > > >
> > > > > > Some platforms supports multiple low-power states for CPUs that can be used
> > > > > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > > > > always selecting the deepest possible state for the CPUs, which can break the
> > > > > > system-wakeup latency constraint that may be required for some use-cases.
> > > > > >
> > > > > > Therefore, this series suggests to introduce a new interface for user-space,
> > > > > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > > > > taken into account when selecting a suitable low-power state for s2idle.
> > > > >
> > > > > Last time we discussed this I said I would like the new limit to be
> > > > > taken into account by regular "runtime" cpuidle because the "s2idle"
> > > > > limit should not be less that the "runtime" limit (or at least it
> > > > > would be illogical if that happened).
> > > >
> > > > Yes, we discussed this, but that was also before we concluded to add a
> > > > new file for user-space to operate on after all.
> > > >
> > > > To me, it looks unnecessarily limiting to not allow them to be
> > > > orthogonal,
> > >
> > > So what's the use case in which it makes sense to have a lower latency
> > > limit for s2idle than for runtime?
>
> Honestly, I don't know, but I just wanted to keep things more flexible.
>
> > >
> > > > but I am not insisting that it needs to be like this. I
> > > > was just thinking that we do not necessarily have to care about the
> > > > same use-case in runtime as in the system-suspend state. Moreover,
> > > > nothing would prevent user-space from applying the same constraint to
> > > > both of them, if that is needed.
> > > >
> > > > >
> > > > > It looks like that could be implemented by making
> > > > > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > > > > into account, couldn't it?
> > > >
> > > > Right, but I am not sure we want that. See above.
> > >
> > > I do or I need to be convinced that this is a bad idea.
> >
> > And there is a specific reason why I want that.
> >
> > Namely, say somebody wants to set the same limit for both s2idle and
> > "runtime" cpuidle.  If the s2idle limit did not affect "runtime", they
> > would need to open two device special files and write the same value
> > to both of them.  Otherwise, they just need to use the s2idle limit
> > and it will work for "runtime" automatically.
>
> Right. User-space would need to open two files instead of one, but is
> that really a problem?

It is potentially confusing and error-prone.

> What if user-space doesn't want to affect the runtime state-selection,
> but cares only about a use-case that requires a cpu-wakeup constraint
> when resuming from s2idle.

Well, I'm not sure if this use case exists at all, which is key here.
If it doesn't exist, why make provisions for it?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 14:02           ` Rafael J. Wysocki
@ 2025-10-30 15:06             ` Ulf Hansson
  2025-10-30 18:11               ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-30 15:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 15:02, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 1:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 30 Oct 2025 at 13:23, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > >
> > > > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > > > avoid breaking this constraint when entering a low-power state during
> > > > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > > > suitable low-power state, by taking into account the QoS limit.
> > > > > >
> > > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > > ---
> > > > > >
> > > > > > Changes in v2:
> > > > > >         - Limite the change to the genpd governor for CPUs.
> > > > > >
> > > > > > ---
> > > > > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > > > > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > > > > >  include/linux/pm_domain.h   |  1 +
> > > > > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > > > > index 61c2277c9ce3..4fd546ef0448 100644
> > > > > > --- a/drivers/pmdomain/core.c
> > > > > > +++ b/drivers/pmdomain/core.c
> > > > > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > > > > >                         return;
> > > > > >         }
> > > > > >
> > > > > > -       /* Choose the deepest state when suspending */
> > > > > > -       genpd->state_idx = genpd->state_count - 1;
> > > > > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > > > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > > > > +                       return;
> > > > > > +       } else {
> > > > > > +               /* Default to the deepest state. */
> > > > > > +               genpd->state_idx = genpd->state_count - 1;
> > > > > > +       }
> > > > > > +
> > > > > >         if (_genpd_power_off(genpd, false)) {
> > > > > >                 genpd->states[genpd->state_idx].rejected++;
> > > > > >                 return;
> > > > > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > > > > index 39359811a930..bd1b9d66d4a5 100644
> > > > > > --- a/drivers/pmdomain/governor.c
> > > > > > +++ b/drivers/pmdomain/governor.c
> > > > > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > > > > >         return false;
> > > > > >  }
> > > > > >
> > > > > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > > > > +{
> > > > > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> > > > >
> > > > > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > > > > into account directly.
> > > > >
> > > > > It should be told by cpuidle which state has been selected on the CPU
> > > > > side and it should not go any deeper than that anyway.
> > > >
> > > > For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> > > > may be shared among a group of CPUs.
> > > >
> > > > Instead, those states are controlled through the PM domain topology by
> > > > genpd and its governor, hence this is needed too.
> > >
> > > All right, but I'd like to understand how all of that works.
> > >
> > > So cpuidle selects a state to enter for the given CPU and then genpd
> > > is invoked.  It has to take the exit latency of that state into
> > > account, so it doesn't go too deep.  How does it do that?
> >
> > Depending on the state selected, in cpuidle-psci.c we may end up
> > calling __psci_enter_domain_idle_state() (only for the deepest
> > CPU-state).
> >
> > For s2idle this means we call dev_pm_genpd_suspend|resume(), to manage
> > the reference counting of the PM domains via genpd. This then may lead
> > to that genpd_sync_power_off() tries to select a state by calling the
> > new governor function above.
> >
> > Did that make sense?
>
> So IIUC this will only happen if the deepest idle state is selected in
> which case the cpu_wakeup_latency_qos_limit() value is greater than
> the exit latency of that state, but it may still need to be taken into
> account when selecting the domain state.  However, this means that the

Correct.

> exit latency number for the deepest idle state is too low (it should
> represent the worst-case exit latency which means the maximum domain
> exit latency in this particular case).

Yes, from the cpuidle state-selection point of view, but how is that a problem?

If the genpd-governor doesn't find a suitable "domain-idle-state", we
fallback to using the one cpuidle selected.

>
> Moreover, it looks like the "runtime" cpuidle has the same problem, doesn't it?

It works in a very similar way, but I fail to understand why you think
there is a problem.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 14:06           ` Rafael J. Wysocki
@ 2025-10-30 15:12             ` Ulf Hansson
  2025-10-30 16:36               ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-30 15:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 15:06, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 1:44 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 30 Oct 2025 at 13:29, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 1:26 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > > > > > >         functions along with name of the device node for user space.
> > > > > > >         - Make sure we deal with the failure/error path correctly when there are
> > > > > > >         no state available for s2idle.
> > > > > > >         - Add documentation.
> > > > > > >
> > > > > > > Some platforms supports multiple low-power states for CPUs that can be used
> > > > > > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > > > > > always selecting the deepest possible state for the CPUs, which can break the
> > > > > > > system-wakeup latency constraint that may be required for some use-cases.
> > > > > > >
> > > > > > > Therefore, this series suggests to introduce a new interface for user-space,
> > > > > > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > > > > > taken into account when selecting a suitable low-power state for s2idle.
> > > > > >
> > > > > > Last time we discussed this I said I would like the new limit to be
> > > > > > taken into account by regular "runtime" cpuidle because the "s2idle"
> > > > > > limit should not be less that the "runtime" limit (or at least it
> > > > > > would be illogical if that happened).
> > > > >
> > > > > Yes, we discussed this, but that was also before we concluded to add a
> > > > > new file for user-space to operate on after all.
> > > > >
> > > > > To me, it looks unnecessarily limiting to not allow them to be
> > > > > orthogonal,
> > > >
> > > > So what's the use case in which it makes sense to have a lower latency
> > > > limit for s2idle than for runtime?
> >
> > Honestly, I don't know, but I just wanted to keep things more flexible.
> >
> > > >
> > > > > but I am not insisting that it needs to be like this. I
> > > > > was just thinking that we do not necessarily have to care about the
> > > > > same use-case in runtime as in the system-suspend state. Moreover,
> > > > > nothing would prevent user-space from applying the same constraint to
> > > > > both of them, if that is needed.
> > > > >
> > > > > >
> > > > > > It looks like that could be implemented by making
> > > > > > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > > > > > into account, couldn't it?
> > > > >
> > > > > Right, but I am not sure we want that. See above.
> > > >
> > > > I do or I need to be convinced that this is a bad idea.
> > >
> > > And there is a specific reason why I want that.
> > >
> > > Namely, say somebody wants to set the same limit for both s2idle and
> > > "runtime" cpuidle.  If the s2idle limit did not affect "runtime", they
> > > would need to open two device special files and write the same value
> > > to both of them.  Otherwise, they just need to use the s2idle limit
> > > and it will work for "runtime" automatically.
> >
> > Right. User-space would need to open two files instead of one, but is
> > that really a problem?
>
> It is potentially confusing and error-prone.
>
> > What if user-space doesn't want to affect the runtime state-selection,
> > but cares only about a use-case that requires a cpu-wakeup constraint
> > when resuming from s2idle.
>
> Well, I'm not sure if this use case exists at all, which is key here.
> If it doesn't exist, why make provisions for it?

Well, because it's not possible to change afterwards as it becomes ABI.

It would be silly having to add yet another file for userspace, down
the road, if it turns out to be needed.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 15:12             ` Ulf Hansson
@ 2025-10-30 16:36               ` Rafael J. Wysocki
  2025-10-31 10:03                 ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 16:36 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 30, 2025 at 4:13 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 30 Oct 2025 at 15:06, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 30, 2025 at 1:44 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Thu, 30 Oct 2025 at 13:29, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 1:26 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > >
> > > > > > On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > >
> > > > > > > > Changes in v2:
> > > > > > > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > > > > > > >         functions along with name of the device node for user space.
> > > > > > > >         - Make sure we deal with the failure/error path correctly when there are
> > > > > > > >         no state available for s2idle.
> > > > > > > >         - Add documentation.
> > > > > > > >
> > > > > > > > Some platforms supports multiple low-power states for CPUs that can be used
> > > > > > > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > > > > > > always selecting the deepest possible state for the CPUs, which can break the
> > > > > > > > system-wakeup latency constraint that may be required for some use-cases.
> > > > > > > >
> > > > > > > > Therefore, this series suggests to introduce a new interface for user-space,
> > > > > > > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > > > > > > taken into account when selecting a suitable low-power state for s2idle.
> > > > > > >
> > > > > > > Last time we discussed this I said I would like the new limit to be
> > > > > > > taken into account by regular "runtime" cpuidle because the "s2idle"
> > > > > > > limit should not be less that the "runtime" limit (or at least it
> > > > > > > would be illogical if that happened).
> > > > > >
> > > > > > Yes, we discussed this, but that was also before we concluded to add a
> > > > > > new file for user-space to operate on after all.
> > > > > >
> > > > > > To me, it looks unnecessarily limiting to not allow them to be
> > > > > > orthogonal,
> > > > >
> > > > > So what's the use case in which it makes sense to have a lower latency
> > > > > limit for s2idle than for runtime?
> > >
> > > Honestly, I don't know, but I just wanted to keep things more flexible.
> > >
> > > > >
> > > > > > but I am not insisting that it needs to be like this. I
> > > > > > was just thinking that we do not necessarily have to care about the
> > > > > > same use-case in runtime as in the system-suspend state. Moreover,
> > > > > > nothing would prevent user-space from applying the same constraint to
> > > > > > both of them, if that is needed.
> > > > > >
> > > > > > >
> > > > > > > It looks like that could be implemented by making
> > > > > > > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > > > > > > into account, couldn't it?
> > > > > >
> > > > > > Right, but I am not sure we want that. See above.
> > > > >
> > > > > I do or I need to be convinced that this is a bad idea.
> > > >
> > > > And there is a specific reason why I want that.
> > > >
> > > > Namely, say somebody wants to set the same limit for both s2idle and
> > > > "runtime" cpuidle.  If the s2idle limit did not affect "runtime", they
> > > > would need to open two device special files and write the same value
> > > > to both of them.  Otherwise, they just need to use the s2idle limit
> > > > and it will work for "runtime" automatically.
> > >
> > > Right. User-space would need to open two files instead of one, but is
> > > that really a problem?
> >
> > It is potentially confusing and error-prone.
> >
> > > What if user-space doesn't want to affect the runtime state-selection,
> > > but cares only about a use-case that requires a cpu-wakeup constraint
> > > when resuming from s2idle.
> >
> > Well, I'm not sure if this use case exists at all, which is key here.
> > If it doesn't exist, why make provisions for it?
>
> Well, because it's not possible to change afterwards as it becomes ABI.
>
> It would be silly having to add yet another file for userspace, down
> the road, if it turns out to be needed.

OTOH doing things without a good reason is also not particularly wise.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-29 14:28     ` Rafael J. Wysocki
@ 2025-10-30 16:45       ` Dhruva Gole
  2025-10-31 13:47         ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Dhruva Gole @ 2025-10-30 16:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ulf Hansson, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

On Oct 29, 2025 at 15:28:22 +0100, Rafael J. Wysocki wrote:
> On Wed, Oct 29, 2025 at 9:10 AM Dhruva Gole <d-gole@ti.com> wrote:
> >
> > Hi Ulf,
> >
> > On Oct 16, 2025 at 17:19:21 +0200, Ulf Hansson wrote:
> > > Some platforms supports multiple low-power states for CPUs that can be used
> > > when entering system-wide suspend. Currently we are always selecting the
> > > deepest possible state for the CPUs, which can break the system-wakeup
> > > latency constraint that may be required for some use-cases.
> > >
> > > Let's take the first step towards addressing this problem, by introducing
> > > an interface for user-space, that allows us to specify the CPU
> > > system-wakeup QoS limit. Subsequent changes will start taking into account
> > > the new QoS limit.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > ---
> > >
> > > Changes in v2:
> > >       - Renamings to reflect the QoS are limited to CPUs.
> > >       - Move code inside "CONFIG_CPU_IDLE".
> > >
> > > ---
> > >  include/linux/pm_qos.h |   5 ++
> > >  kernel/power/qos.c     | 102 +++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 107 insertions(+)
> > >
> > > diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> > > index 4a69d4af3ff8..bf7524d38933 100644
> > > --- a/include/linux/pm_qos.h
> > > +++ b/include/linux/pm_qos.h
> > > @@ -149,6 +149,7 @@ bool cpu_latency_qos_request_active(struct pm_qos_request *req);
> > >  void cpu_latency_qos_add_request(struct pm_qos_request *req, s32 value);
> > >  void cpu_latency_qos_update_request(struct pm_qos_request *req, s32 new_value);
> > >  void cpu_latency_qos_remove_request(struct pm_qos_request *req);
> > > +s32 cpu_wakeup_latency_qos_limit(void);
> > >  #else
> > >  static inline s32 cpu_latency_qos_limit(void) { return INT_MAX; }
> > >  static inline bool cpu_latency_qos_request_active(struct pm_qos_request *req)
> > > @@ -160,6 +161,10 @@ static inline void cpu_latency_qos_add_request(struct pm_qos_request *req,
> > >  static inline void cpu_latency_qos_update_request(struct pm_qos_request *req,
> > >                                                 s32 new_value) {}
> > >  static inline void cpu_latency_qos_remove_request(struct pm_qos_request *req) {}
> > > +static inline s32 cpu_wakeup_latency_qos_limit(void)
> > > +{
> > > +     return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> > > +}
> > >  #endif
> > >
> > >  #ifdef CONFIG_PM
> > > diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> > > index 4244b069442e..8c024d7dc43e 100644
> > > --- a/kernel/power/qos.c
> > > +++ b/kernel/power/qos.c
> > > @@ -415,6 +415,103 @@ static struct miscdevice cpu_latency_qos_miscdev = {
> > >       .fops = &cpu_latency_qos_fops,
> > >  };
> > >
> > > +/* The CPU system wakeup latency QoS. */
> > > +static struct pm_qos_constraints cpu_wakeup_latency_constraints = {
> > > +     .list = PLIST_HEAD_INIT(cpu_wakeup_latency_constraints.list),
> > > +     .target_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > > +     .default_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > > +     .no_constraint_value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT,
> > > +     .type = PM_QOS_MIN,
> > > +};
> > > +
> > > +/**
> > > + * cpu_wakeup_latency_qos_limit - Current CPU system wakeup latency QoS limit.
> > > + *
> > > + * Returns the current CPU system wakeup latency QoS limit that may have been
> > > + * requested by user-space.
> > > + */
> > > +s32 cpu_wakeup_latency_qos_limit(void)
> > > +{
> > > +     return pm_qos_read_value(&cpu_wakeup_latency_constraints);
> > > +}
> > > +
> > > +static int cpu_wakeup_latency_qos_open(struct inode *inode, struct file *filp)
> > > +{
> > > +     struct pm_qos_request *req;
> > > +
> > > +     req = kzalloc(sizeof(*req), GFP_KERNEL);
> > > +     if (!req)
> > > +             return -ENOMEM;
> > > +
> > > +     req->qos = &cpu_wakeup_latency_constraints;
> > > +     pm_qos_update_target(req->qos, &req->node, PM_QOS_ADD_REQ,
> > > +                          PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> > > +     filp->private_data = req;
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int cpu_wakeup_latency_qos_release(struct inode *inode,
> > > +                                       struct file *filp)
> > > +{
> > > +     struct pm_qos_request *req = filp->private_data;
> > > +
> > > +     filp->private_data = NULL;
> > > +     pm_qos_update_target(req->qos, &req->node, PM_QOS_REMOVE_REQ,
> > > +                          PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
> >
> > Please excuse the delay in reviewing these patches,
> > I was wondering why we have decided here in release to reset the
> > constraints set by a user. For eg. even when I was testing the previous
> > revision locally I'd just commented out this release hook, since I
> > wanted to be able to just echo 0xABCD into /dev/cpu_wakeup_latency...
> 
> If you want "fire and forget", that would be a different interface.
> Device special files are not for that.
> 
> Cleaning up after closing a file descriptor is a safety measure and
> CPU wakeup latency constraints are a big deal.  Leaving leftover ones
> behind dead processes is not a good idea.

Hmm okay ..

> 
> > It seems an overkill to me that a userspace program be required to hold
> > open this file just to make sure the constraints are honoured for the
> > lifetime of the device. We should definitely give the freedom to just be
> > able to echo and also be able to cat and read back from the same place
> > about the latency constraint being set.
> 
> So you'd want a sysfs attribute here, but that has its own issues (the
> last writer "wins", so if there are multiple users of it with
> different needs in user space, things get tricky).

sysfs makes sense, then would it make sense to have something like a
/sys/devices/system/cpu/cpu0/power/cpu_wakeup_latency entry?

IMHO userspace should decide accordingly to manage it's users and how/whom to allow to
set the latency constraint.
We already have CPU latency QoS entry for example which is sysfs too.

> 
> > One other thing on my mind is - and probably unrelated to this specific
> > series, but I think we must have some sysfs entry either appear in
> > /sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
> > state that the governor has chosen to enter based on the value set in
> > cpu_wakeup_latency.
> 
> Exit latency values for all states are exposed via sysfs.  Since
> s2idle always uses the deepest state it can use, it is quite
> straightforward to figure out which of them will be used going
> forward, given a specific latency constraint.

I disagree regarding the straightforward part. There could be
multiple domain heirarchy in a system for eg. and all these multiple
domains would have their own set of domain-idle-states. All of them having their own
entry, exit, and residency latencies. I myself while testing this series
have been thoroughly confused at times what idle-state did the kernel
actually pick this time, and had to add prints just to figure that out.

When implementing these things for the first
time, especially when one has complex and many a domain idle-states it
would indeed help alot if the kernel could just advertise somewhere what
the governor is going to pick as the next s2idle state.

Also, I am not quite sure if these latencies are exposed in the
domain-idle-states scenario ... 
I tried checking in /sys/kernel/debug/pm_genpd/XXX/ but I only see
these:
active_time  current_state  devices  idle_states  sub_domains  total_idle_time

Maybe an additional s2idle_state or something appearing here is what I
was inclined toward.


-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 15:06             ` Ulf Hansson
@ 2025-10-30 18:11               ` Rafael J. Wysocki
  2025-10-31 10:19                 ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-10-30 18:11 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Thu, Oct 30, 2025 at 4:07 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 30 Oct 2025 at 15:02, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 30, 2025 at 1:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Thu, 30 Oct 2025 at 13:23, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > >
> > > > > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > > > > avoid breaking this constraint when entering a low-power state during
> > > > > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > > > > suitable low-power state, by taking into account the QoS limit.
> > > > > > >
> > > > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > > > ---
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > >         - Limite the change to the genpd governor for CPUs.
> > > > > > >
> > > > > > > ---
> > > > > > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > > > > > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > > > > > >  include/linux/pm_domain.h   |  1 +
> > > > > > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > > > > > index 61c2277c9ce3..4fd546ef0448 100644
> > > > > > > --- a/drivers/pmdomain/core.c
> > > > > > > +++ b/drivers/pmdomain/core.c
> > > > > > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > > > > > >                         return;
> > > > > > >         }
> > > > > > >
> > > > > > > -       /* Choose the deepest state when suspending */
> > > > > > > -       genpd->state_idx = genpd->state_count - 1;
> > > > > > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > > > > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > > > > > +                       return;
> > > > > > > +       } else {
> > > > > > > +               /* Default to the deepest state. */
> > > > > > > +               genpd->state_idx = genpd->state_count - 1;
> > > > > > > +       }
> > > > > > > +
> > > > > > >         if (_genpd_power_off(genpd, false)) {
> > > > > > >                 genpd->states[genpd->state_idx].rejected++;
> > > > > > >                 return;
> > > > > > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > > > > > index 39359811a930..bd1b9d66d4a5 100644
> > > > > > > --- a/drivers/pmdomain/governor.c
> > > > > > > +++ b/drivers/pmdomain/governor.c
> > > > > > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > > > > > >         return false;
> > > > > > >  }
> > > > > > >
> > > > > > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > > > > > +{
> > > > > > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> > > > > >
> > > > > > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > > > > > into account directly.
> > > > > >
> > > > > > It should be told by cpuidle which state has been selected on the CPU
> > > > > > side and it should not go any deeper than that anyway.
> > > > >
> > > > > For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> > > > > may be shared among a group of CPUs.
> > > > >
> > > > > Instead, those states are controlled through the PM domain topology by
> > > > > genpd and its governor, hence this is needed too.
> > > >
> > > > All right, but I'd like to understand how all of that works.
> > > >
> > > > So cpuidle selects a state to enter for the given CPU and then genpd
> > > > is invoked.  It has to take the exit latency of that state into
> > > > account, so it doesn't go too deep.  How does it do that?
> > >
> > > Depending on the state selected, in cpuidle-psci.c we may end up
> > > calling __psci_enter_domain_idle_state() (only for the deepest
> > > CPU-state).
> > >
> > > For s2idle this means we call dev_pm_genpd_suspend|resume(), to manage
> > > the reference counting of the PM domains via genpd. This then may lead
> > > to that genpd_sync_power_off() tries to select a state by calling the
> > > new governor function above.
> > >
> > > Did that make sense?
> >
> > So IIUC this will only happen if the deepest idle state is selected in
> > which case the cpu_wakeup_latency_qos_limit() value is greater than
> > the exit latency of that state, but it may still need to be taken into
> > account when selecting the domain state.  However, this means that the
>
> Correct.
>
> > exit latency number for the deepest idle state is too low (it should
> > represent the worst-case exit latency which means the maximum domain
> > exit latency in this particular case).
>
> Yes, from the cpuidle state-selection point of view, but how is that a problem?

It is confusing.  Otherwise, for s2idle, I guess it is not a big deal.

I guess what happens is that genpd has a range of states with
different latency values to choose from and it is not practical to
expose all of them as CPU idle states, so you end up exposing just one
of them with the lowest latency value to allow cpuidle to involve
genpd often enough.

If that's the case, I'd make a note of that somewhere if I were you,
or people will routinely get confused by it.

> If the genpd-governor doesn't find a suitable "domain-idle-state", we
> fallback to using the one cpuidle selected.
>
> >
> > Moreover, it looks like the "runtime" cpuidle has the same problem, doesn't it?
>
> It works in a very similar way, but I fail to understand why you think
> there is a problem.

There is a problem because it may violate a "runtime" latency constraint.

Say you expose 2 CPU idle states, a shallow one and a genpd one.  The
advertised exit latency of the genpd state is X and the current
latency constraint is Y > X.  The genpd state is selected and genpd
doesn't look at the cpuidle_governor_latency_req() return value, so it
chooses a real state with exit latency Z > Y.

To a minimum, genpd should be made aware of
cpuidle_governor_latency_req(), but even then cpuidle governors take
exit latency into consideration in their computations, so things may
get confused somewhat.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle
  2025-10-30 16:36               ` Rafael J. Wysocki
@ 2025-10-31 10:03                 ` Ulf Hansson
  0 siblings, 0 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-10-31 10:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 17:36, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 4:13 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 30 Oct 2025 at 15:06, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 1:44 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Thu, 30 Oct 2025 at 13:29, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 30, 2025 at 1:26 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 30, 2025 at 1:23 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > >
> > > > > > > On Wed, 29 Oct 2025 at 15:53, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > > >
> > > > > > > > > Changes in v2:
> > > > > > > > >         - Limit the new QoS to CPUs  and make some corresponding renaming of the
> > > > > > > > >         functions along with name of the device node for user space.
> > > > > > > > >         - Make sure we deal with the failure/error path correctly when there are
> > > > > > > > >         no state available for s2idle.
> > > > > > > > >         - Add documentation.
> > > > > > > > >
> > > > > > > > > Some platforms supports multiple low-power states for CPUs that can be used
> > > > > > > > > when entering system-wide suspend and s2idle in particular. Currently we are
> > > > > > > > > always selecting the deepest possible state for the CPUs, which can break the
> > > > > > > > > system-wakeup latency constraint that may be required for some use-cases.
> > > > > > > > >
> > > > > > > > > Therefore, this series suggests to introduce a new interface for user-space,
> > > > > > > > > allowing us to specify the CPU system-wakeup QoS limit. The QoS limit is then
> > > > > > > > > taken into account when selecting a suitable low-power state for s2idle.
> > > > > > > >
> > > > > > > > Last time we discussed this I said I would like the new limit to be
> > > > > > > > taken into account by regular "runtime" cpuidle because the "s2idle"
> > > > > > > > limit should not be less that the "runtime" limit (or at least it
> > > > > > > > would be illogical if that happened).
> > > > > > >
> > > > > > > Yes, we discussed this, but that was also before we concluded to add a
> > > > > > > new file for user-space to operate on after all.
> > > > > > >
> > > > > > > To me, it looks unnecessarily limiting to not allow them to be
> > > > > > > orthogonal,
> > > > > >
> > > > > > So what's the use case in which it makes sense to have a lower latency
> > > > > > limit for s2idle than for runtime?
> > > >
> > > > Honestly, I don't know, but I just wanted to keep things more flexible.
> > > >
> > > > > >
> > > > > > > but I am not insisting that it needs to be like this. I
> > > > > > > was just thinking that we do not necessarily have to care about the
> > > > > > > same use-case in runtime as in the system-suspend state. Moreover,
> > > > > > > nothing would prevent user-space from applying the same constraint to
> > > > > > > both of them, if that is needed.
> > > > > > >
> > > > > > > >
> > > > > > > > It looks like that could be implemented by making
> > > > > > > > cpuidle_governor_latency_req() take cpu_wakeup_latency_qos_limit()
> > > > > > > > into account, couldn't it?
> > > > > > >
> > > > > > > Right, but I am not sure we want that. See above.
> > > > > >
> > > > > > I do or I need to be convinced that this is a bad idea.
> > > > >
> > > > > And there is a specific reason why I want that.
> > > > >
> > > > > Namely, say somebody wants to set the same limit for both s2idle and
> > > > > "runtime" cpuidle.  If the s2idle limit did not affect "runtime", they
> > > > > would need to open two device special files and write the same value
> > > > > to both of them.  Otherwise, they just need to use the s2idle limit
> > > > > and it will work for "runtime" automatically.
> > > >
> > > > Right. User-space would need to open two files instead of one, but is
> > > > that really a problem?
> > >
> > > It is potentially confusing and error-prone.
> > >
> > > > What if user-space doesn't want to affect the runtime state-selection,
> > > > but cares only about a use-case that requires a cpu-wakeup constraint
> > > > when resuming from s2idle.
> > >
> > > Well, I'm not sure if this use case exists at all, which is key here.
> > > If it doesn't exist, why make provisions for it?
> >
> > Well, because it's not possible to change afterwards as it becomes ABI.
> >
> > It would be silly having to add yet another file for userspace, down
> > the road, if it turns out to be needed.
>
> OTOH doing things without a good reason is also not particularly wise.

Right. As I said, I don't have a use-case at hand at this very moment
for why we shouldn't combine the constraints as you propose.

I will send a new version, unless someone speaks up and suggests a
different approach.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-30 18:11               ` Rafael J. Wysocki
@ 2025-10-31 10:19                 ` Ulf Hansson
  0 siblings, 0 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-10-31 10:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Kevin Hilman,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Thu, 30 Oct 2025 at 19:11, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 4:07 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 30 Oct 2025 at 15:02, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 30, 2025 at 1:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Thu, 30 Oct 2025 at 13:23, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 30, 2025 at 1:00 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > >
> > > > > > On Thu, 30 Oct 2025 at 11:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 16, 2025 at 5:19 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > > > > >
> > > > > > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > > > > > avoid breaking this constraint when entering a low-power state during
> > > > > > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > > > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > > > > > suitable low-power state, by taking into account the QoS limit.
> > > > > > > >
> > > > > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Changes in v2:
> > > > > > > >         - Limite the change to the genpd governor for CPUs.
> > > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/pmdomain/core.c     | 10 ++++++++--
> > > > > > > >  drivers/pmdomain/governor.c | 27 +++++++++++++++++++++++++++
> > > > > > > >  include/linux/pm_domain.h   |  1 +
> > > > > > > >  3 files changed, 36 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> > > > > > > > index 61c2277c9ce3..4fd546ef0448 100644
> > > > > > > > --- a/drivers/pmdomain/core.c
> > > > > > > > +++ b/drivers/pmdomain/core.c
> > > > > > > > @@ -1425,8 +1425,14 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
> > > > > > > >                         return;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > -       /* Choose the deepest state when suspending */
> > > > > > > > -       genpd->state_idx = genpd->state_count - 1;
> > > > > > > > +       if (genpd->gov && genpd->gov->system_power_down_ok) {
> > > > > > > > +               if (!genpd->gov->system_power_down_ok(&genpd->domain))
> > > > > > > > +                       return;
> > > > > > > > +       } else {
> > > > > > > > +               /* Default to the deepest state. */
> > > > > > > > +               genpd->state_idx = genpd->state_count - 1;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         if (_genpd_power_off(genpd, false)) {
> > > > > > > >                 genpd->states[genpd->state_idx].rejected++;
> > > > > > > >                 return;
> > > > > > > > diff --git a/drivers/pmdomain/governor.c b/drivers/pmdomain/governor.c
> > > > > > > > index 39359811a930..bd1b9d66d4a5 100644
> > > > > > > > --- a/drivers/pmdomain/governor.c
> > > > > > > > +++ b/drivers/pmdomain/governor.c
> > > > > > > > @@ -415,9 +415,36 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd)
> > > > > > > >         return false;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static bool cpu_system_power_down_ok(struct dev_pm_domain *pd)
> > > > > > > > +{
> > > > > > > > +       s64 constraint_ns = cpu_wakeup_latency_qos_limit() * NSEC_PER_USEC;
> > > > > > >
> > > > > > > I'm not sure why genpd needs to take cpu_wakeup_latency_qos_limit()
> > > > > > > into account directly.
> > > > > > >
> > > > > > > It should be told by cpuidle which state has been selected on the CPU
> > > > > > > side and it should not go any deeper than that anyway.
> > > > > >
> > > > > > For PSCI OS-initiated mode, cpuidle doesn't know about the states that
> > > > > > may be shared among a group of CPUs.
> > > > > >
> > > > > > Instead, those states are controlled through the PM domain topology by
> > > > > > genpd and its governor, hence this is needed too.
> > > > >
> > > > > All right, but I'd like to understand how all of that works.
> > > > >
> > > > > So cpuidle selects a state to enter for the given CPU and then genpd
> > > > > is invoked.  It has to take the exit latency of that state into
> > > > > account, so it doesn't go too deep.  How does it do that?
> > > >
> > > > Depending on the state selected, in cpuidle-psci.c we may end up
> > > > calling __psci_enter_domain_idle_state() (only for the deepest
> > > > CPU-state).
> > > >
> > > > For s2idle this means we call dev_pm_genpd_suspend|resume(), to manage
> > > > the reference counting of the PM domains via genpd. This then may lead
> > > > to that genpd_sync_power_off() tries to select a state by calling the
> > > > new governor function above.
> > > >
> > > > Did that make sense?
> > >
> > > So IIUC this will only happen if the deepest idle state is selected in
> > > which case the cpu_wakeup_latency_qos_limit() value is greater than
> > > the exit latency of that state, but it may still need to be taken into
> > > account when selecting the domain state.  However, this means that the
> >
> > Correct.
> >
> > > exit latency number for the deepest idle state is too low (it should
> > > represent the worst-case exit latency which means the maximum domain
> > > exit latency in this particular case).
> >
> > Yes, from the cpuidle state-selection point of view, but how is that a problem?
>
> It is confusing.  Otherwise, for s2idle, I guess it is not a big deal.
>
> I guess what happens is that genpd has a range of states with
> different latency values to choose from and it is not practical to
> expose all of them as CPU idle states, so you end up exposing just one
> of them with the lowest latency value to allow cpuidle to involve
> genpd often enough.

Yes, the states that are CPU specific are exposed to CPU-idle.

The states that are shared with other CPUs are managed by genpd,
because those need reference counting. Not even limited to CPUs.

>
> If that's the case, I'd make a note of that somewhere if I were you,
> or people will routinely get confused by it.

Documentation is always nice.

We have DT docs and the PSCI spec, but we lack proper documentation of
the whole genpd interface. Actually, I have started working on
documentation for genpd, but haven't reached the point of submitting a
patch for it.

>
> > If the genpd-governor doesn't find a suitable "domain-idle-state", we
> > fallback to using the one cpuidle selected.
> >
> > >
> > > Moreover, it looks like the "runtime" cpuidle has the same problem, doesn't it?
> >
> > It works in a very similar way, but I fail to understand why you think
> > there is a problem.
>
> There is a problem because it may violate a "runtime" latency constraint.
>
> Say you expose 2 CPU idle states, a shallow one and a genpd one.  The
> advertised exit latency of the genpd state is X and the current
> latency constraint is Y > X.  The genpd state is selected and genpd
> doesn't look at the cpuidle_governor_latency_req() return value, so it
> chooses a real state with exit latency Z > Y.
>
> To a minimum, genpd should be made aware of
> cpuidle_governor_latency_req(), but even then cpuidle governors take
> exit latency into consideration in their computations, so things may
> get confused somewhat.

Please have a look at cpu_power_down_ok(), which is the function that
runs to select the domain-idle-state. It does take the constraints
into account during runtime, even it doesn't call
cpuidle_governor_latency_req() explicitly.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS
  2025-10-16 15:19 ` [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS Ulf Hansson
@ 2025-10-31 10:57   ` Dhruva Gole
  2025-10-31 13:29     ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Dhruva Gole @ 2025-10-31 10:57 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel,
	Jonathan Corbet

Hi Ulf,

On Oct 16, 2025 at 17:19:24 +0200, Ulf Hansson wrote:
> Let's document how the new CPU system-wakeup latency QoS can be used from
> user space, along with how the corresponding latency constraint gets
> respected during s2idle.
> 
> Cc: Jonathan Corbet <corbet@lwn.net>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> 
> Changes in v2:
> 	- New patch.

Similar to how I did for v1 RFC,
I have applied this series on a ti-linux-6.12 branch[1] and have been testing on
the TI K3 AM62L device, my 2 cents:

> 
> ---
>  Documentation/admin-guide/pm/cpuidle.rst | 7 +++++++
>  Documentation/power/pm_qos_interface.rst | 9 +++++----
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
> index 0c090b076224..3f6f79a9bc8f 100644
> --- a/Documentation/admin-guide/pm/cpuidle.rst
> +++ b/Documentation/admin-guide/pm/cpuidle.rst
> @@ -580,6 +580,13 @@ the given CPU as the upper limit for the exit latency of the idle states that
>  they are allowed to select for that CPU.  They should never select any idle
>  states with exit latency beyond that limit.
>  
> +While the above CPU QoS constraints applies to a running system, user space may
> +also request a CPU system-wakeup latency QoS limit, via the `cpu_wakeup_latency`
> +file.  This QoS constraint is respected when selecting a suitable idle state
> +for the CPUs, while entering the system-wide suspend-to-idle sleep state.
> +
> +Note that, in regards how to manage the QoS request from user space for the
> +`cpu_wakeup_latency` file, it works according to the `cpu_dma_latency` file.
>  
>  Idle States Control Via Kernel Command Line
>  ===========================================
> diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
> index 5019c79c7710..723f4714691a 100644
> --- a/Documentation/power/pm_qos_interface.rst
> +++ b/Documentation/power/pm_qos_interface.rst
> @@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
>  
>  From user space:
>  
> -The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
> +The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
> +the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system-wakeup

If others are interested to test this out, I have a quick and dirty C
program here that you can compile on the target to test setting
constraints [2]

>  latency QoS.
>  
>  Only processes can register a PM QoS request.  To provide for automatic
> @@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
>  parameter requests as follows.
>  
>  To register the default PM QoS target for the CPU latency QoS, the process must
> -open /dev/cpu_dma_latency.
> +open /dev/cpu_dma_latency. To register a CPU system-wakeup QoS limit, the
> +process must open /dev/cpu_wakeup_latency.
>  
>  As long as the device node is held open that process has a registered
>  request on the parameter.
>  
>  To change the requested target value, the process needs to write an s32 value to
>  the open device node.  Alternatively, it can write a hex string for the value
> -using the 10 char long format e.g. "0x12345678".  This translates to a
> -cpu_latency_qos_update_request() call.
> +using the 10 char long format e.g. "0x12345678".

Here, can we please also mention the units ns or msec? I see that you
might have changed from usec to nsec from v1->v2, which may not be obvious to
everyone at first glance.

Also, In my local setup I have a single CPU system with the following
low power-states:

8<----------------------------------------------------------------------------
	idle-states {
		entry-method = "psci";

		CLST_STBY: STBY {
			compatible = "arm,idle-state";
			idle-state-name = "Standby";
			arm,psci-suspend-param = <0x00000001>;
			entry-latency-us = <300>;
			exit-latency-us = <600>;
			min-residency-us = <1000>;
		};
	};
[...]
	domain-idle-states {
		main_sleep_0: main-deep-sleep {
			compatible = "domain-idle-state";
			arm,psci-suspend-param = <0x13333>;
			entry-latency-us = <1000>;
			exit-latency-us = <1000>;
			min-residency-us = <500000>;
			local-timer-stop;
		};

		main_sleep_1: main-sleep-rtcddr {
			compatible = "domain-idle-state";
			arm,psci-suspend-param = <0x12333>;
			local-timer-stop;
			entry-latency-us = <300000>;
			exit-latency-us = <600000>;
			min-residency-us = <1000000>;
		};
	};


---------------------------------------------------------------------->8

Now, when I set the latency constraint 0x7a110 into cpu_wakeup_latency,
I expect it _not_ to pick main_sleep_0 because it has min-residency of
0x7A120 (500000 us) and since 0x7a110 < 0x7a120 I expect the governor
should pick the least latency state of the cpu which is the CLST_STBY or
maybe just kernel WFI (which is the default lowest possible idle state?).

I decided to go even lower with just setting 0x1000 (4096), but even
then s2idle picked main_sleep_0!

Only after I set something very very low like 0x1 or 0x10 did it pick
the shallower state than main_sleep_0...

I haven't dug deeper into where things might be getting miscalculated
yet but just thought to share my experiments with you before you respin
the next rev. Curious to know if I may be just confusing the units or am
missing something obvious here?

Few of the other things that I tried that _did_ work was, setting
constraint to 0x1312D00 (20000000) which is obviously much higher than
the highest min-residency , and then I can see s2idle pick the deepest
state ie. main_sleep_1. So that worked as expected.

In conclusion, I am happy that this still works in a way that I am able to
switch between low power states, but just not in the most explainable
way :(

[1] https://github.com/DhruvaG2000/dbg-linux/tree/tiL6.12-am62l-s2idle-prep-v2
[2] https://gist.github.com/DhruvaG2000/a902b815b5db296bb7096ad7cb093929

-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS
  2025-10-31 10:57   ` Dhruva Gole
@ 2025-10-31 13:29     ` Ulf Hansson
  0 siblings, 0 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-10-31 13:29 UTC (permalink / raw)
  To: Dhruva Gole
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel,
	Jonathan Corbet

On Fri, 31 Oct 2025 at 11:57, Dhruva Gole <d-gole@ti.com> wrote:
>
> Hi Ulf,
>
> On Oct 16, 2025 at 17:19:24 +0200, Ulf Hansson wrote:
> > Let's document how the new CPU system-wakeup latency QoS can be used from
> > user space, along with how the corresponding latency constraint gets
> > respected during s2idle.
> >
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v2:
> >       - New patch.
>
> Similar to how I did for v1 RFC,
> I have applied this series on a ti-linux-6.12 branch[1] and have been testing on
> the TI K3 AM62L device, my 2 cents:
>
> >
> > ---
> >  Documentation/admin-guide/pm/cpuidle.rst | 7 +++++++
> >  Documentation/power/pm_qos_interface.rst | 9 +++++----
> >  2 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
> > index 0c090b076224..3f6f79a9bc8f 100644
> > --- a/Documentation/admin-guide/pm/cpuidle.rst
> > +++ b/Documentation/admin-guide/pm/cpuidle.rst
> > @@ -580,6 +580,13 @@ the given CPU as the upper limit for the exit latency of the idle states that
> >  they are allowed to select for that CPU.  They should never select any idle
> >  states with exit latency beyond that limit.
> >
> > +While the above CPU QoS constraints applies to a running system, user space may
> > +also request a CPU system-wakeup latency QoS limit, via the `cpu_wakeup_latency`
> > +file.  This QoS constraint is respected when selecting a suitable idle state
> > +for the CPUs, while entering the system-wide suspend-to-idle sleep state.
> > +
> > +Note that, in regards how to manage the QoS request from user space for the
> > +`cpu_wakeup_latency` file, it works according to the `cpu_dma_latency` file.
> >
> >  Idle States Control Via Kernel Command Line
> >  ===========================================
> > diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
> > index 5019c79c7710..723f4714691a 100644
> > --- a/Documentation/power/pm_qos_interface.rst
> > +++ b/Documentation/power/pm_qos_interface.rst
> > @@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
> >
> >  From user space:
> >
> > -The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
> > +The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
> > +the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system-wakeup
>
> If others are interested to test this out, I have a quick and dirty C
> program here that you can compile on the target to test setting
> constraints [2]
>
> >  latency QoS.
> >
> >  Only processes can register a PM QoS request.  To provide for automatic
> > @@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
> >  parameter requests as follows.
> >
> >  To register the default PM QoS target for the CPU latency QoS, the process must
> > -open /dev/cpu_dma_latency.
> > +open /dev/cpu_dma_latency. To register a CPU system-wakeup QoS limit, the
> > +process must open /dev/cpu_wakeup_latency.
> >
> >  As long as the device node is held open that process has a registered
> >  request on the parameter.
> >
> >  To change the requested target value, the process needs to write an s32 value to
> >  the open device node.  Alternatively, it can write a hex string for the value
> > -using the 10 char long format e.g. "0x12345678".  This translates to a
> > -cpu_latency_qos_update_request() call.
> > +using the 10 char long format e.g. "0x12345678".
>
> Here, can we please also mention the units ns or msec? I see that you
> might have changed from usec to nsec from v1->v2, which may not be obvious to
> everyone at first glance.

I haven't changed the unit in-between the versions, but just using the
same format as cpu_dma_latency.

Yes, I agree the unit deserves to be described, but I suggest we make
that a separate change as the unit should be described for the
existing cpu_dma_latency too.

>
> Also, In my local setup I have a single CPU system with the following
> low power-states:
>
> 8<----------------------------------------------------------------------------
>         idle-states {
>                 entry-method = "psci";
>
>                 CLST_STBY: STBY {
>                         compatible = "arm,idle-state";
>                         idle-state-name = "Standby";
>                         arm,psci-suspend-param = <0x00000001>;
>                         entry-latency-us = <300>;
>                         exit-latency-us = <600>;
>                         min-residency-us = <1000>;
>                 };
>         };
> [...]
>         domain-idle-states {
>                 main_sleep_0: main-deep-sleep {
>                         compatible = "domain-idle-state";
>                         arm,psci-suspend-param = <0x13333>;
>                         entry-latency-us = <1000>;
>                         exit-latency-us = <1000>;
>                         min-residency-us = <500000>;
>                         local-timer-stop;
>                 };
>
>                 main_sleep_1: main-sleep-rtcddr {
>                         compatible = "domain-idle-state";
>                         arm,psci-suspend-param = <0x12333>;
>                         local-timer-stop;
>                         entry-latency-us = <300000>;
>                         exit-latency-us = <600000>;
>                         min-residency-us = <1000000>;
>                 };
>         };
>
>
> ---------------------------------------------------------------------->8
>
> Now, when I set the latency constraint 0x7a110 into cpu_wakeup_latency,
> I expect it _not_ to pick main_sleep_0 because it has min-residency of
> 0x7A120 (500000 us) and since 0x7a110 < 0x7a120 I expect the governor
> should pick the least latency state of the cpu which is the CLST_STBY or
> maybe just kernel WFI (which is the default lowest possible idle state?).
>
> I decided to go even lower with just setting 0x1000 (4096), but even
> then s2idle picked main_sleep_0!
>
> Only after I set something very very low like 0x1 or 0x10 did it pick
> the shallower state than main_sleep_0...

The residency has nothing to do with QoS.

It's only the entry+exit latency that matters during state selection.

>
> I haven't dug deeper into where things might be getting miscalculated
> yet but just thought to share my experiments with you before you respin
> the next rev. Curious to know if I may be just confusing the units or am
> missing something obvious here?

See above.

>
> Few of the other things that I tried that _did_ work was, setting
> constraint to 0x1312D00 (20000000) which is obviously much higher than
> the highest min-residency , and then I can see s2idle pick the deepest
> state ie. main_sleep_1. So that worked as expected.
>
> In conclusion, I am happy that this still works in a way that I am able to
> switch between low power states, but just not in the most explainable
> way :(

I hope this above makes sense to you - and thanks a lot for helping
out with testing!

Kind regards
Uffe

>
> [1] https://github.com/DhruvaG2000/dbg-linux/tree/tiL6.12-am62l-s2idle-prep-v2
> [2] https://gist.github.com/DhruvaG2000/a902b815b5db296bb7096ad7cb093929
>
> --
> Best regards,
> Dhruva Gole
> Texas Instruments Incorporated

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-30 16:45       ` Dhruva Gole
@ 2025-10-31 13:47         ` Ulf Hansson
  2025-10-31 18:37           ` Dhruva Gole
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-10-31 13:47 UTC (permalink / raw)
  To: Dhruva Gole
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

[...]

> >
> > > It seems an overkill to me that a userspace program be required to hold
> > > open this file just to make sure the constraints are honoured for the
> > > lifetime of the device. We should definitely give the freedom to just be
> > > able to echo and also be able to cat and read back from the same place
> > > about the latency constraint being set.
> >
> > So you'd want a sysfs attribute here, but that has its own issues (the
> > last writer "wins", so if there are multiple users of it with
> > different needs in user space, things get tricky).
>
> sysfs makes sense, then would it make sense to have something like a
> /sys/devices/system/cpu/cpu0/power/cpu_wakeup_latency entry?
>
> IMHO userspace should decide accordingly to manage it's users and how/whom to allow to
> set the latency constraint.
> We already have CPU latency QoS entry for example which is sysfs too.
>
> >
> > > One other thing on my mind is - and probably unrelated to this specific
> > > series, but I think we must have some sysfs entry either appear in
> > > /sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
> > > state that the governor has chosen to enter based on the value set in
> > > cpu_wakeup_latency.
> >
> > Exit latency values for all states are exposed via sysfs.  Since
> > s2idle always uses the deepest state it can use, it is quite
> > straightforward to figure out which of them will be used going
> > forward, given a specific latency constraint.
>
> I disagree regarding the straightforward part. There could be
> multiple domain heirarchy in a system for eg. and all these multiple
> domains would have their own set of domain-idle-states. All of them having their own
> entry, exit, and residency latencies. I myself while testing this series
> have been thoroughly confused at times what idle-state did the kernel
> actually pick this time, and had to add prints just to figure that out.

If I understand correctly, most of that confusion is because of the
misunderstanding of including the residency in the state selection in
regards to QoS.

Residency should not be accounted for, but only enter+exit latencies.

>
> When implementing these things for the first
> time, especially when one has complex and many a domain idle-states it
> would indeed help alot if the kernel could just advertise somewhere what
> the governor is going to pick as the next s2idle state.

The problem with advertising upfront is that the state selection is
done dynamically. It simply can't work.

>
> Also, I am not quite sure if these latencies are exposed in the
> domain-idle-states scenario ...
> I tried checking in /sys/kernel/debug/pm_genpd/XXX/ but I only see
> these:
> active_time  current_state  devices  idle_states  sub_domains  total_idle_time
>
> Maybe an additional s2idle_state or something appearing here is what I
> was inclined toward.

That sounds like an idea that is worth exploring, if what you are
suggesting is to extend the idle state statistics. In principle we
want a new counter per idle state that indicates the number of times
we entered this state in s2idle, right?

While I was testing this feature, I used trace_printk - and afterward
it's easy to digest the trace buffer to see what happened.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-31 13:47         ` Ulf Hansson
@ 2025-10-31 18:37           ` Dhruva Gole
  2025-11-04 13:27             ` Ulf Hansson
  0 siblings, 1 reply; 39+ messages in thread
From: Dhruva Gole @ 2025-10-31 18:37 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

On Oct 31, 2025 at 14:47:29 +0100, Ulf Hansson wrote:
> [...]
> 
> > >
> > > > It seems an overkill to me that a userspace program be required to hold
> > > > open this file just to make sure the constraints are honoured for the
> > > > lifetime of the device. We should definitely give the freedom to just be
> > > > able to echo and also be able to cat and read back from the same place
> > > > about the latency constraint being set.
> > >
> > > So you'd want a sysfs attribute here, but that has its own issues (the
> > > last writer "wins", so if there are multiple users of it with
> > > different needs in user space, things get tricky).
> >
> > sysfs makes sense, then would it make sense to have something like a
> > /sys/devices/system/cpu/cpu0/power/cpu_wakeup_latency entry?
> >
> > IMHO userspace should decide accordingly to manage it's users and how/whom to allow to
> > set the latency constraint.
> > We already have CPU latency QoS entry for example which is sysfs too.
> >
> > >
> > > > One other thing on my mind is - and probably unrelated to this specific
> > > > series, but I think we must have some sysfs entry either appear in
> > > > /sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
> > > > state that the governor has chosen to enter based on the value set in
> > > > cpu_wakeup_latency.
> > >
> > > Exit latency values for all states are exposed via sysfs.  Since
> > > s2idle always uses the deepest state it can use, it is quite
> > > straightforward to figure out which of them will be used going
> > > forward, given a specific latency constraint.
> >
> > I disagree regarding the straightforward part. There could be
> > multiple domain heirarchy in a system for eg. and all these multiple
> > domains would have their own set of domain-idle-states. All of them having their own
> > entry, exit, and residency latencies. I myself while testing this series
> > have been thoroughly confused at times what idle-state did the kernel
> > actually pick this time, and had to add prints just to figure that out.
> 
> If I understand correctly, most of that confusion is because of the
> misunderstanding of including the residency in the state selection in
> regards to QoS.
> 
> Residency should not be accounted for, but only enter+exit latencies.

Understood your point on the latencies, however the point remains that
in a multi domain , multi idle-states case, do we really have an easy way to
determine what the next choice of idle-state the governor is going to
make? We don't even expose the entry exit latencies in sysfs btw...

> 
> >
> > When implementing these things for the first
> > time, especially when one has complex and many a domain idle-states it
> > would indeed help alot if the kernel could just advertise somewhere what
> > the governor is going to pick as the next s2idle state.
> 
> The problem with advertising upfront is that the state selection is
> done dynamically. It simply can't work.

I understand it might be done dynamically, but as IIUC the only
constraint being taken into account is really coming from userspace. I
don't think this series is taking into account or even exposing any API
to kernel world to modify the cpu wakeup latency (which I think you
should, but that's an entirely orthogonal discussion, don't want to mix
it here). So as far as "dynamic" is concerned I feel if the userspace is
having control of which processes are setting the cpu wakeup constraints
then it's entirely okay for kernel to tell userspace that at any given
moment "this" is the next s2idle state I am going to pick if you do a
system s2idle right now.

> 
> >
> > Also, I am not quite sure if these latencies are exposed in the
> > domain-idle-states scenario ...
> > I tried checking in /sys/kernel/debug/pm_genpd/XXX/ but I only see
> > these:
> > active_time  current_state  devices  idle_states  sub_domains  total_idle_time
> >
> > Maybe an additional s2idle_state or something appearing here is what I
> > was inclined toward.
> 
> That sounds like an idea that is worth exploring, if what you are
> suggesting is to extend the idle state statistics. In principle we
> want a new counter per idle state that indicates the number of times
> we entered this state in s2idle, right?

Absolutely, having a "global" kind of a place to find out the s2idle
stats would really be useful.

> 
> While I was testing this feature, I used trace_printk - and afterward
> it's easy to digest the trace buffer to see what happened.
> 
> Kind regards
> Uffe

-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle
  2025-10-16 15:19 ` [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle Ulf Hansson
  2025-10-17 10:15   ` Peter Zijlstra
@ 2025-10-31 19:23   ` Dhruva Gole
  2025-11-04 13:43     ` Ulf Hansson
  1 sibling, 1 reply; 39+ messages in thread
From: Dhruva Gole @ 2025-10-31 19:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

On Oct 16, 2025 at 17:19:23 +0200, Ulf Hansson wrote:
> A CPU system-wakeup QoS limit may have been requested by user-space. To
> avoid breaking this constraint when entering a low-power state during
> s2idle, let's start to take into account the QoS limit.
> 
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> 
> Changes in v2:
> 	- Rework the code to take into account the failure/error path, when we
> 	don't find a s2idle specific state.
> 
> ---
>  drivers/cpuidle/cpuidle.c | 12 +++++++-----
>  include/linux/cpuidle.h   |  6 ++++--
>  kernel/sched/idle.c       | 12 +++++++-----
>  3 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 56132e843c99..c7876e9e024f 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -184,20 +184,22 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
>   * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle.
>   * @drv: cpuidle driver for the given CPU.
>   * @dev: cpuidle device for the given CPU.
> + * @latency_limit_ns: Idle state exit latency limit
>   *
>   * If there are states with the ->enter_s2idle callback, find the deepest of
>   * them and enter it with frozen tick.
>   */
> -int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> +int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> +			 u64 latency_limit_ns)
>  {
>  	int index;
>  
>  	/*
> -	 * Find the deepest state with ->enter_s2idle present, which guarantees
> -	 * that interrupts won't be enabled when it exits and allows the tick to
> -	 * be frozen safely.
> +	 * Find the deepest state with ->enter_s2idle present that meets the
> +	 * specified latency limit, which guarantees that interrupts won't be
> +	 * enabled when it exits and allows the tick to be frozen safely.
>  	 */
> -	index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> +	index = find_deepest_state(drv, dev, latency_limit_ns, 0, true);
>  	if (index > 0) {
>  		enter_s2idle_proper(drv, dev, index);
>  		local_irq_enable();
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index a9ee4fe55dcf..4073690504a7 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -248,7 +248,8 @@ extern int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
>  				      struct cpuidle_device *dev,
>  				      u64 latency_limit_ns);
>  extern int cpuidle_enter_s2idle(struct cpuidle_driver *drv,
> -				struct cpuidle_device *dev);
> +				struct cpuidle_device *dev,
> +				u64 latency_limit_ns);
>  extern void cpuidle_use_deepest_state(u64 latency_limit_ns);
>  #else
>  static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
> @@ -256,7 +257,8 @@ static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
>  					     u64 latency_limit_ns)
>  {return -ENODEV; }
>  static inline int cpuidle_enter_s2idle(struct cpuidle_driver *drv,
> -				       struct cpuidle_device *dev)
> +				       struct cpuidle_device *dev,
> +				       u64 latency_limit_ns)
>  {return -ENODEV; }
>  static inline void cpuidle_use_deepest_state(u64 latency_limit_ns)
>  {
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index c39b089d4f09..c1c3d0166610 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -131,12 +131,13 @@ void __cpuidle default_idle_call(void)
>  }
>  
>  static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
> -			       struct cpuidle_device *dev)
> +			       struct cpuidle_device *dev,
> +			       u64 max_latency_ns)
>  {
>  	if (current_clr_polling_and_test())
>  		return -EBUSY;
>  
> -	return cpuidle_enter_s2idle(drv, dev);
> +	return cpuidle_enter_s2idle(drv, dev, max_latency_ns);
>  }
>  
>  static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> @@ -205,12 +206,13 @@ static void cpuidle_idle_call(void)
>  		u64 max_latency_ns;
>  
>  		if (idle_should_enter_s2idle()) {
> +			max_latency_ns = cpu_wakeup_latency_qos_limit() *
> +					 NSEC_PER_USEC;

This is only taking into account the new API for the
cpu_wakeup_latency_qos_limit, however what if someone has set
cpu_latency_qos_limit, doesn't that need to be honoured?
Just trying to understand the differences in both qos here and why one
is chosen over the other.

>  
> -			entered_state = call_cpuidle_s2idle(drv, dev);
> +			entered_state = call_cpuidle_s2idle(drv, dev,
> +							    max_latency_ns);
>  			if (entered_state > 0)
>  				goto exit_idle;
> -
> -			max_latency_ns = U64_MAX;
>  		} else {
>  			max_latency_ns = dev->forced_idle_latency_limit_ns;
>  		}
> -- 
> 2.43.0
> 

-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-10-16 15:19 ` [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle Ulf Hansson
  2025-10-30 10:44   ` Rafael J. Wysocki
@ 2025-11-01  0:11   ` Kevin Hilman
  2025-11-04 16:10     ` Ulf Hansson
  1 sibling, 1 reply; 39+ messages in thread
From: Kevin Hilman @ 2025-11-01  0:11 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J . Wysocki, linux-pm
  Cc: Vincent Guittot, Peter Zijlstra, Pavel Machek, Len Brown,
	Daniel Lezcano, Saravana Kannan, Maulik Shah, Prasad Sodagudi,
	Dhruva Gole, Ulf Hansson, linux-kernel

Ulf Hansson <ulf.hansson@linaro.org> writes:

> A CPU system-wakeup QoS limit may have been requested by user-space. To
> avoid breaking this constraint when entering a low-power state during
> s2idle through genpd, let's extend the corresponding genpd governor for
> CPUs. More precisely, during s2idle let the genpd governor select a
> suitable low-power state, by taking into account the QoS limit.

In addition to a QoS limit requested by userspace, shouldn't any
per-device QoS limits from devices in the PM domain with CPUs/clusters
also be considered?

Right now, if a device is in a PM domain that also contains CPUs, any
per-device QoS constraints for those devices should also impact the
state chosen by s2idle.

I just tried a quick hack of extending you cpu_system_power_down_ok()
function to look for any per-device QoS constraints set all devices in
the PM domain (and subdomains).  It takes the min of all the per-device
QoS constratins, compares it to the one from userspace, and uses the min
of those to decide the genpd state.

That has the effect I'm after, but curious what you think about the
usecase and the idea?

Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
  2025-10-31 18:37           ` Dhruva Gole
@ 2025-11-04 13:27             ` Ulf Hansson
  0 siblings, 0 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-11-04 13:27 UTC (permalink / raw)
  To: Dhruva Gole
  Cc: Rafael J. Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

On Fri, 31 Oct 2025 at 19:37, Dhruva Gole <d-gole@ti.com> wrote:
>
> On Oct 31, 2025 at 14:47:29 +0100, Ulf Hansson wrote:
> > [...]
> >
> > > >
> > > > > It seems an overkill to me that a userspace program be required to hold
> > > > > open this file just to make sure the constraints are honoured for the
> > > > > lifetime of the device. We should definitely give the freedom to just be
> > > > > able to echo and also be able to cat and read back from the same place
> > > > > about the latency constraint being set.
> > > >
> > > > So you'd want a sysfs attribute here, but that has its own issues (the
> > > > last writer "wins", so if there are multiple users of it with
> > > > different needs in user space, things get tricky).
> > >
> > > sysfs makes sense, then would it make sense to have something like a
> > > /sys/devices/system/cpu/cpu0/power/cpu_wakeup_latency entry?
> > >
> > > IMHO userspace should decide accordingly to manage it's users and how/whom to allow to
> > > set the latency constraint.
> > > We already have CPU latency QoS entry for example which is sysfs too.
> > >
> > > >
> > > > > One other thing on my mind is - and probably unrelated to this specific
> > > > > series, but I think we must have some sysfs entry either appear in
> > > > > /sys/.../cpu0/cpuidle or s2idle/ where we can show next feesible s2idle
> > > > > state that the governor has chosen to enter based on the value set in
> > > > > cpu_wakeup_latency.
> > > >
> > > > Exit latency values for all states are exposed via sysfs.  Since
> > > > s2idle always uses the deepest state it can use, it is quite
> > > > straightforward to figure out which of them will be used going
> > > > forward, given a specific latency constraint.
> > >
> > > I disagree regarding the straightforward part. There could be
> > > multiple domain heirarchy in a system for eg. and all these multiple
> > > domains would have their own set of domain-idle-states. All of them having their own
> > > entry, exit, and residency latencies. I myself while testing this series
> > > have been thoroughly confused at times what idle-state did the kernel
> > > actually pick this time, and had to add prints just to figure that out.
> >
> > If I understand correctly, most of that confusion is because of the
> > misunderstanding of including the residency in the state selection in
> > regards to QoS.
> >
> > Residency should not be accounted for, but only enter+exit latencies.
>
> Understood your point on the latencies, however the point remains that
> in a multi domain , multi idle-states case, do we really have an easy way to
> determine what the next choice of idle-state the governor is going to
> make? We don't even expose the entry exit latencies in sysfs btw...

I agree, we should extend the sysfs/debugfs information about the
domain-idle-states with this too. Especially since we already have it
for the regular idle states that are managed by cpuidle.

>
> >
> > >
> > > When implementing these things for the first
> > > time, especially when one has complex and many a domain idle-states it
> > > would indeed help alot if the kernel could just advertise somewhere what
> > > the governor is going to pick as the next s2idle state.
> >
> > The problem with advertising upfront is that the state selection is
> > done dynamically. It simply can't work.
>
> I understand it might be done dynamically, but as IIUC the only
> constraint being taken into account is really coming from userspace. I
> don't think this series is taking into account or even exposing any API
> to kernel world to modify the cpu wakeup latency (which I think you
> should, but that's an entirely orthogonal discussion, don't want to mix
> it here). So as far as "dynamic" is concerned I feel if the userspace is
> having control of which processes are setting the cpu wakeup constraints
> then it's entirely okay for kernel to tell userspace that at any given
> moment "this" is the next s2idle state I am going to pick if you do a
> system s2idle right now.
>
> >
> > >
> > > Also, I am not quite sure if these latencies are exposed in the
> > > domain-idle-states scenario ...
> > > I tried checking in /sys/kernel/debug/pm_genpd/XXX/ but I only see
> > > these:
> > > active_time  current_state  devices  idle_states  sub_domains  total_idle_time
> > >
> > > Maybe an additional s2idle_state or something appearing here is what I
> > > was inclined toward.
> >
> > That sounds like an idea that is worth exploring, if what you are
> > suggesting is to extend the idle state statistics. In principle we
> > want a new counter per idle state that indicates the number of times
> > we entered this state in s2idle, right?
>
> Absolutely, having a "global" kind of a place to find out the s2idle
> stats would really be useful.

For regular idle states that are managed by cpuidle, those have a
per-state directory called s2idle (if the state is supported for
s2idle), with usage/time counters.

That said, I agree, it's a good idea to add something similar for the
domain-idle-states that are managed by genpd.

Let me think about it and I will post a couple of patches that add
this information about the domain-idle-states.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle
  2025-10-31 19:23   ` Dhruva Gole
@ 2025-11-04 13:43     ` Ulf Hansson
  0 siblings, 0 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-11-04 13:43 UTC (permalink / raw)
  To: Dhruva Gole
  Cc: Rafael J . Wysocki, linux-pm, Vincent Guittot, Peter Zijlstra,
	Kevin Hilman, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, linux-kernel

On Fri, 31 Oct 2025 at 20:23, Dhruva Gole <d-gole@ti.com> wrote:
>
> On Oct 16, 2025 at 17:19:23 +0200, Ulf Hansson wrote:
> > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > avoid breaking this constraint when entering a low-power state during
> > s2idle, let's start to take into account the QoS limit.
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v2:
> >       - Rework the code to take into account the failure/error path, when we
> >       don't find a s2idle specific state.
> >
> > ---
> >  drivers/cpuidle/cpuidle.c | 12 +++++++-----
> >  include/linux/cpuidle.h   |  6 ++++--
> >  kernel/sched/idle.c       | 12 +++++++-----
> >  3 files changed, 18 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index 56132e843c99..c7876e9e024f 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -184,20 +184,22 @@ static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
> >   * cpuidle_enter_s2idle - Enter an idle state suitable for suspend-to-idle.
> >   * @drv: cpuidle driver for the given CPU.
> >   * @dev: cpuidle device for the given CPU.
> > + * @latency_limit_ns: Idle state exit latency limit
> >   *
> >   * If there are states with the ->enter_s2idle callback, find the deepest of
> >   * them and enter it with frozen tick.
> >   */
> > -int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> > +int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> > +                      u64 latency_limit_ns)
> >  {
> >       int index;
> >
> >       /*
> > -      * Find the deepest state with ->enter_s2idle present, which guarantees
> > -      * that interrupts won't be enabled when it exits and allows the tick to
> > -      * be frozen safely.
> > +      * Find the deepest state with ->enter_s2idle present that meets the
> > +      * specified latency limit, which guarantees that interrupts won't be
> > +      * enabled when it exits and allows the tick to be frozen safely.
> >        */
> > -     index = find_deepest_state(drv, dev, U64_MAX, 0, true);
> > +     index = find_deepest_state(drv, dev, latency_limit_ns, 0, true);
> >       if (index > 0) {
> >               enter_s2idle_proper(drv, dev, index);
> >               local_irq_enable();
> > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> > index a9ee4fe55dcf..4073690504a7 100644
> > --- a/include/linux/cpuidle.h
> > +++ b/include/linux/cpuidle.h
> > @@ -248,7 +248,8 @@ extern int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
> >                                     struct cpuidle_device *dev,
> >                                     u64 latency_limit_ns);
> >  extern int cpuidle_enter_s2idle(struct cpuidle_driver *drv,
> > -                             struct cpuidle_device *dev);
> > +                             struct cpuidle_device *dev,
> > +                             u64 latency_limit_ns);
> >  extern void cpuidle_use_deepest_state(u64 latency_limit_ns);
> >  #else
> >  static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
> > @@ -256,7 +257,8 @@ static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv,
> >                                            u64 latency_limit_ns)
> >  {return -ENODEV; }
> >  static inline int cpuidle_enter_s2idle(struct cpuidle_driver *drv,
> > -                                    struct cpuidle_device *dev)
> > +                                    struct cpuidle_device *dev,
> > +                                    u64 latency_limit_ns)
> >  {return -ENODEV; }
> >  static inline void cpuidle_use_deepest_state(u64 latency_limit_ns)
> >  {
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index c39b089d4f09..c1c3d0166610 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -131,12 +131,13 @@ void __cpuidle default_idle_call(void)
> >  }
> >
> >  static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
> > -                            struct cpuidle_device *dev)
> > +                            struct cpuidle_device *dev,
> > +                            u64 max_latency_ns)
> >  {
> >       if (current_clr_polling_and_test())
> >               return -EBUSY;
> >
> > -     return cpuidle_enter_s2idle(drv, dev);
> > +     return cpuidle_enter_s2idle(drv, dev, max_latency_ns);
> >  }
> >
> >  static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> > @@ -205,12 +206,13 @@ static void cpuidle_idle_call(void)
> >               u64 max_latency_ns;
> >
> >               if (idle_should_enter_s2idle()) {
> > +                     max_latency_ns = cpu_wakeup_latency_qos_limit() *
> > +                                      NSEC_PER_USEC;
>
> This is only taking into account the new API for the
> cpu_wakeup_latency_qos_limit, however what if someone has set
> cpu_latency_qos_limit, doesn't that need to be honoured?
> Just trying to understand the differences in both qos here and why one
> is chosen over the other.

cpu_latency_qos_limit is for runtime only, during regular cpuidle idle
state selection.

The new cpu_wakeup_latency_qos_limit is taken into account above for
s2idle, specifically.

That said, Rafael suggests that the new cpu_wakeup_latency_qos_limit
should be respected for runtime cpuidle state selection too, so I am
working on updating the series to take that into account.

>
> >
> > -                     entered_state = call_cpuidle_s2idle(drv, dev);
> > +                     entered_state = call_cpuidle_s2idle(drv, dev,
> > +                                                         max_latency_ns);
> >                       if (entered_state > 0)
> >                               goto exit_idle;
> > -
> > -                     max_latency_ns = U64_MAX;
> >               } else {
> >                       max_latency_ns = dev->forced_idle_latency_limit_ns;
> >               }
> > --
> > 2.43.0
> >
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-11-01  0:11   ` Kevin Hilman
@ 2025-11-04 16:10     ` Ulf Hansson
  2025-11-04 16:37       ` Rafael J. Wysocki
  2025-11-15  0:23       ` Kevin Hilman
  0 siblings, 2 replies; 39+ messages in thread
From: Ulf Hansson @ 2025-11-04 16:10 UTC (permalink / raw)
  To: Kevin Hilman, Rafael J . Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, linux-kernel

On Sat, 1 Nov 2025 at 01:11, Kevin Hilman <khilman@baylibre.com> wrote:
>
> Ulf Hansson <ulf.hansson@linaro.org> writes:
>
> > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > avoid breaking this constraint when entering a low-power state during
> > s2idle through genpd, let's extend the corresponding genpd governor for
> > CPUs. More precisely, during s2idle let the genpd governor select a
> > suitable low-power state, by taking into account the QoS limit.
>
> In addition to a QoS limit requested by userspace, shouldn't any
> per-device QoS limits from devices in the PM domain with CPUs/clusters
> also be considered?
>
> Right now, if a device is in a PM domain that also contains CPUs, any
> per-device QoS constraints for those devices should also impact the
> state chosen by s2idle.

I am not so sure about that. The existing dev PM QoS latency is
targeted towards runtime suspend of a device and the genpd governor
also takes it into account for this use case.

If we would start to take the same dev PM QoS latency constraint into
account for system suspend (s2idle), it may not be what the user
really intended. Instead, I think we should consider extending the dev
PM QoS interface, to allow the user to set a specific latency limit
for system wakeup. Then the genpd governor should take that into
account for s2idle.

>
> I just tried a quick hack of extending you cpu_system_power_down_ok()
> function to look for any per-device QoS constraints set all devices in
> the PM domain (and subdomains).  It takes the min of all the per-device
> QoS constratins, compares it to the one from userspace, and uses the min
> of those to decide the genpd state.
>
> That has the effect I'm after, but curious what you think about the
> usecase and the idea?

It makes sense, but as stated above, I think we need a new QoS limit
specific for system suspend.

Rafael, what's your thoughts around this?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-11-04 16:10     ` Ulf Hansson
@ 2025-11-04 16:37       ` Rafael J. Wysocki
  2025-11-04 16:52         ` Ulf Hansson
  2025-11-15  0:23       ` Kevin Hilman
  1 sibling, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-11-04 16:37 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Kevin Hilman, Rafael J . Wysocki, linux-pm, Vincent Guittot,
	Peter Zijlstra, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Tue, Nov 4, 2025 at 5:10 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Sat, 1 Nov 2025 at 01:11, Kevin Hilman <khilman@baylibre.com> wrote:
> >
> > Ulf Hansson <ulf.hansson@linaro.org> writes:
> >
> > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > avoid breaking this constraint when entering a low-power state during
> > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > suitable low-power state, by taking into account the QoS limit.
> >
> > In addition to a QoS limit requested by userspace, shouldn't any
> > per-device QoS limits from devices in the PM domain with CPUs/clusters
> > also be considered?
> >
> > Right now, if a device is in a PM domain that also contains CPUs, any
> > per-device QoS constraints for those devices should also impact the
> > state chosen by s2idle.
>
> I am not so sure about that. The existing dev PM QoS latency is
> targeted towards runtime suspend of a device and the genpd governor
> also takes it into account for this use case.
>
> If we would start to take the same dev PM QoS latency constraint into
> account for system suspend (s2idle), it may not be what the user
> really intended. Instead, I think we should consider extending the dev
> PM QoS interface, to allow the user to set a specific latency limit
> for system wakeup. Then the genpd governor should take that into
> account for s2idle.
>
> >
> > I just tried a quick hack of extending you cpu_system_power_down_ok()
> > function to look for any per-device QoS constraints set all devices in
> > the PM domain (and subdomains).  It takes the min of all the per-device
> > QoS constratins, compares it to the one from userspace, and uses the min
> > of those to decide the genpd state.
> >
> > That has the effect I'm after, but curious what you think about the
> > usecase and the idea?
>
> It makes sense, but as stated above, I think we need a new QoS limit
> specific for system suspend.
>
> Rafael, what's your thoughts around this?

Well, it's analogous to the CPU latency limit case, so if a new
"system suspend" QoS limit is introduced for CPUs, that also needs to
be done for the other devices.

However, as in the CPU case, my personal view is that the "system
suspend" latency limits should be greater than or equal to the
corresponding latency limits for runtime PM.

One more thing that has just occurred to me is that there are systems
in which I don't want to enable the "system suspend" limits at all.
IOW, all of this needs to be disabled unless the platform opts in.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-11-04 16:37       ` Rafael J. Wysocki
@ 2025-11-04 16:52         ` Ulf Hansson
  2025-11-04 17:53           ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Ulf Hansson @ 2025-11-04 16:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kevin Hilman, linux-pm, Vincent Guittot, Peter Zijlstra,
	Pavel Machek, Len Brown, Daniel Lezcano, Saravana Kannan,
	Maulik Shah, Prasad Sodagudi, Dhruva Gole, linux-kernel

On Tue, 4 Nov 2025 at 17:37, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Nov 4, 2025 at 5:10 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Sat, 1 Nov 2025 at 01:11, Kevin Hilman <khilman@baylibre.com> wrote:
> > >
> > > Ulf Hansson <ulf.hansson@linaro.org> writes:
> > >
> > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > avoid breaking this constraint when entering a low-power state during
> > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > suitable low-power state, by taking into account the QoS limit.
> > >
> > > In addition to a QoS limit requested by userspace, shouldn't any
> > > per-device QoS limits from devices in the PM domain with CPUs/clusters
> > > also be considered?
> > >
> > > Right now, if a device is in a PM domain that also contains CPUs, any
> > > per-device QoS constraints for those devices should also impact the
> > > state chosen by s2idle.
> >
> > I am not so sure about that. The existing dev PM QoS latency is
> > targeted towards runtime suspend of a device and the genpd governor
> > also takes it into account for this use case.
> >
> > If we would start to take the same dev PM QoS latency constraint into
> > account for system suspend (s2idle), it may not be what the user
> > really intended. Instead, I think we should consider extending the dev
> > PM QoS interface, to allow the user to set a specific latency limit
> > for system wakeup. Then the genpd governor should take that into
> > account for s2idle.
> >
> > >
> > > I just tried a quick hack of extending you cpu_system_power_down_ok()
> > > function to look for any per-device QoS constraints set all devices in
> > > the PM domain (and subdomains).  It takes the min of all the per-device
> > > QoS constratins, compares it to the one from userspace, and uses the min
> > > of those to decide the genpd state.
> > >
> > > That has the effect I'm after, but curious what you think about the
> > > usecase and the idea?
> >
> > It makes sense, but as stated above, I think we need a new QoS limit
> > specific for system suspend.
> >
> > Rafael, what's your thoughts around this?
>
> Well, it's analogous to the CPU latency limit case, so if a new
> "system suspend" QoS limit is introduced for CPUs, that also needs to
> be done for the other devices.

Agreed!

>
> However, as in the CPU case, my personal view is that the "system
> suspend" latency limits should be greater than or equal to the
> corresponding latency limits for runtime PM.

Right, we should treat general devices similar to CPUs.

>
> One more thing that has just occurred to me is that there are systems
> in which I don't want to enable the "system suspend" limits at all.
> IOW, all of this needs to be disabled unless the platform opts in.

Okay. So are you thinking of using a Kconfig for this or better to
manage this in runtime?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-11-04 16:52         ` Ulf Hansson
@ 2025-11-04 17:53           ` Rafael J. Wysocki
  0 siblings, 0 replies; 39+ messages in thread
From: Rafael J. Wysocki @ 2025-11-04 17:53 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Kevin Hilman, linux-pm, Vincent Guittot,
	Peter Zijlstra, Pavel Machek, Len Brown, Daniel Lezcano,
	Saravana Kannan, Maulik Shah, Prasad Sodagudi, Dhruva Gole,
	linux-kernel

On Tue, Nov 4, 2025 at 5:52 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 4 Nov 2025 at 17:37, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Nov 4, 2025 at 5:10 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Sat, 1 Nov 2025 at 01:11, Kevin Hilman <khilman@baylibre.com> wrote:
> > > >
> > > > Ulf Hansson <ulf.hansson@linaro.org> writes:
> > > >
> > > > > A CPU system-wakeup QoS limit may have been requested by user-space. To
> > > > > avoid breaking this constraint when entering a low-power state during
> > > > > s2idle through genpd, let's extend the corresponding genpd governor for
> > > > > CPUs. More precisely, during s2idle let the genpd governor select a
> > > > > suitable low-power state, by taking into account the QoS limit.
> > > >
> > > > In addition to a QoS limit requested by userspace, shouldn't any
> > > > per-device QoS limits from devices in the PM domain with CPUs/clusters
> > > > also be considered?
> > > >
> > > > Right now, if a device is in a PM domain that also contains CPUs, any
> > > > per-device QoS constraints for those devices should also impact the
> > > > state chosen by s2idle.
> > >
> > > I am not so sure about that. The existing dev PM QoS latency is
> > > targeted towards runtime suspend of a device and the genpd governor
> > > also takes it into account for this use case.
> > >
> > > If we would start to take the same dev PM QoS latency constraint into
> > > account for system suspend (s2idle), it may not be what the user
> > > really intended. Instead, I think we should consider extending the dev
> > > PM QoS interface, to allow the user to set a specific latency limit
> > > for system wakeup. Then the genpd governor should take that into
> > > account for s2idle.
> > >
> > > >
> > > > I just tried a quick hack of extending you cpu_system_power_down_ok()
> > > > function to look for any per-device QoS constraints set all devices in
> > > > the PM domain (and subdomains).  It takes the min of all the per-device
> > > > QoS constratins, compares it to the one from userspace, and uses the min
> > > > of those to decide the genpd state.
> > > >
> > > > That has the effect I'm after, but curious what you think about the
> > > > usecase and the idea?
> > >
> > > It makes sense, but as stated above, I think we need a new QoS limit
> > > specific for system suspend.
> > >
> > > Rafael, what's your thoughts around this?
> >
> > Well, it's analogous to the CPU latency limit case, so if a new
> > "system suspend" QoS limit is introduced for CPUs, that also needs to
> > be done for the other devices.
>
> Agreed!
>
> >
> > However, as in the CPU case, my personal view is that the "system
> > suspend" latency limits should be greater than or equal to the
> > corresponding latency limits for runtime PM.
>
> Right, we should treat general devices similar to CPUs.
>
> >
> > One more thing that has just occurred to me is that there are systems
> > in which I don't want to enable the "system suspend" limits at all.
> > IOW, all of this needs to be disabled unless the platform opts in.
>
> Okay. So are you thinking of using a Kconfig for this or better to
> manage this in runtime?

Hmm, Kconfig might be a good fit because, for instance, I don't quite
see how this can be made to generally work on x86 except for some
corner cases.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
  2025-11-04 16:10     ` Ulf Hansson
  2025-11-04 16:37       ` Rafael J. Wysocki
@ 2025-11-15  0:23       ` Kevin Hilman
  1 sibling, 0 replies; 39+ messages in thread
From: Kevin Hilman @ 2025-11-15  0:23 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J . Wysocki
  Cc: linux-pm, Vincent Guittot, Peter Zijlstra, Pavel Machek,
	Len Brown, Daniel Lezcano, Saravana Kannan, Maulik Shah,
	Prasad Sodagudi, Dhruva Gole, linux-kernel

Ulf Hansson <ulf.hansson@linaro.org> writes:

> On Sat, 1 Nov 2025 at 01:11, Kevin Hilman <khilman@baylibre.com> wrote:
>>
>> Ulf Hansson <ulf.hansson@linaro.org> writes:
>>
>> > A CPU system-wakeup QoS limit may have been requested by user-space. To
>> > avoid breaking this constraint when entering a low-power state during
>> > s2idle through genpd, let's extend the corresponding genpd governor for
>> > CPUs. More precisely, during s2idle let the genpd governor select a
>> > suitable low-power state, by taking into account the QoS limit.
>>
>> In addition to a QoS limit requested by userspace, shouldn't any
>> per-device QoS limits from devices in the PM domain with CPUs/clusters
>> also be considered?
>>
>> Right now, if a device is in a PM domain that also contains CPUs, any
>> per-device QoS constraints for those devices should also impact the
>> state chosen by s2idle.
>
> I am not so sure about that. The existing dev PM QoS latency is
> targeted towards runtime suspend of a device and the genpd governor
> also takes it into account for this use case.
>
> If we would start to take the same dev PM QoS latency constraint into
> account for system suspend (s2idle), it may not be what the user
> really intended. Instead, I think we should consider extending the dev
> PM QoS interface, to allow the user to set a specific latency limit
> for system wakeup. Then the genpd governor should take that into
> account for s2idle.
>>
>> I just tried a quick hack of extending you cpu_system_power_down_ok()
>> function to look for any per-device QoS constraints set all devices in
>> the PM domain (and subdomains).  It takes the min of all the per-device
>> QoS constratins, compares it to the one from userspace, and uses the min
>> of those to decide the genpd state.
>>
>> That has the effect I'm after, but curious what you think about the
>> usecase and the idea?
>
> It makes sense, but as stated above, I think we need a new QoS limit
> specific for system suspend.

OK, got it.  I understand the need to distinguish between QoS
constraints set for runtime PM and system-wide suspend/resume.

So, instead of adding a completely separate entry for system-wide
suspend, and duplicating a bunch of code/API, what about using the QoS
flags, and adding a new flag that indicates if the resume_latency
constraint applies only to runtime PM (the default) or if it *also*
applies to system-wide suspend? (RFC patch below[1])

With this, I was able to extend your new cpu_system_power_down_ok()
function to check for any devices in the domain with a resume_latency
*and* this new flag before applying it to s2idle, and that's working
great locally.

Kevin

[1]
From d25b871c2044ee0e993fd75f5f1df36a35633d1f Mon Sep 17 00:00:00 2001
From: "Kevin Hilman (TI.com)" <khilman@baylibre.com>
Date: Thu, 13 Nov 2025 17:02:38 -0800
Subject: [RFC/PATCH] PM / QoS: add flag to indicate latency applies
 system-wide

Add new PM_QOS_FLAG_LATENCY_SYS flag to indicate that the
resume_latency QoS constraint applies not just to runtime PM
transitions but also to system-wide suspend/s2idle.

Signed-off-by: Kevin Hilman (TI.com) <khilman@baylibre.com>
---
 drivers/base/power/sysfs.c | 27 +++++++++++++++++++++++++++
 include/linux/pm_qos.h     |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 13b31a3adc77..9136c13c17e4 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -316,6 +316,32 @@ static ssize_t pm_qos_no_power_off_store(struct device *dev,
 
 static DEVICE_ATTR_RW(pm_qos_no_power_off);
 
+static ssize_t pm_qos_latency_sys_show(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	return sysfs_emit(buf, "%d\n", !!(dev_pm_qos_requested_flags(dev)
+					  & PM_QOS_FLAG_LATENCY_SYS));
+}
+
+static ssize_t pm_qos_latency_sys_store(struct device *dev,
+					 struct device_attribute *attr,
+					 const char *buf, size_t n)
+{
+	int ret;
+
+	if (kstrtoint(buf, 0, &ret))
+		return -EINVAL;
+
+	if (ret != 0 && ret != 1)
+		return -EINVAL;
+
+	ret = dev_pm_qos_update_flags(dev, PM_QOS_FLAG_LATENCY_SYS, ret);
+	return ret < 0 ? ret : n;
+}
+
+static DEVICE_ATTR_RW(pm_qos_latency_sys);
+
 #ifdef CONFIG_PM_SLEEP
 static const char _enabled[] = "enabled";
 static const char _disabled[] = "disabled";
@@ -681,6 +707,7 @@ static const struct attribute_group pm_qos_latency_tolerance_attr_group = {
 
 static struct attribute *pm_qos_flags_attrs[] = {
 	&dev_attr_pm_qos_no_power_off.attr,
+	&dev_attr_pm_qos_latency_sys.attr,
 	NULL,
 };
 static const struct attribute_group pm_qos_flags_attr_group = {
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 4a69d4af3ff8..b95f52775dfb 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -37,6 +37,8 @@ enum pm_qos_flags_status {
 #define PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT	(-1)
 
 #define PM_QOS_FLAG_NO_POWER_OFF	(1 << 0)
+/* latency value applies to system-wide suspend/s2idle */
+#define PM_QOS_FLAG_LATENCY_SYS		(2 << 0)
 
 enum pm_qos_type {
 	PM_QOS_UNITIALIZED,
-- 
2.51.0




^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2025-11-15  0:23 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16 15:19 [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Ulf Hansson
2025-10-16 15:19 ` [PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit Ulf Hansson
2025-10-29  8:10   ` Dhruva Gole
2025-10-29 14:28     ` Rafael J. Wysocki
2025-10-30 16:45       ` Dhruva Gole
2025-10-31 13:47         ` Ulf Hansson
2025-10-31 18:37           ` Dhruva Gole
2025-11-04 13:27             ` Ulf Hansson
2025-10-16 15:19 ` [PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle Ulf Hansson
2025-10-30 10:44   ` Rafael J. Wysocki
2025-10-30 12:00     ` Ulf Hansson
2025-10-30 12:23       ` Rafael J. Wysocki
2025-10-30 12:32         ` Ulf Hansson
2025-10-30 14:02           ` Rafael J. Wysocki
2025-10-30 15:06             ` Ulf Hansson
2025-10-30 18:11               ` Rafael J. Wysocki
2025-10-31 10:19                 ` Ulf Hansson
2025-11-01  0:11   ` Kevin Hilman
2025-11-04 16:10     ` Ulf Hansson
2025-11-04 16:37       ` Rafael J. Wysocki
2025-11-04 16:52         ` Ulf Hansson
2025-11-04 17:53           ` Rafael J. Wysocki
2025-11-15  0:23       ` Kevin Hilman
2025-10-16 15:19 ` [PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle Ulf Hansson
2025-10-17 10:15   ` Peter Zijlstra
2025-10-31 19:23   ` Dhruva Gole
2025-11-04 13:43     ` Ulf Hansson
2025-10-16 15:19 ` [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS Ulf Hansson
2025-10-31 10:57   ` Dhruva Gole
2025-10-31 13:29     ` Ulf Hansson
2025-10-29 14:52 ` [PATCH v2 0/4] PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle Rafael J. Wysocki
2025-10-30 12:22   ` Ulf Hansson
2025-10-30 12:26     ` Rafael J. Wysocki
2025-10-30 12:29       ` Rafael J. Wysocki
2025-10-30 12:43         ` Ulf Hansson
2025-10-30 14:06           ` Rafael J. Wysocki
2025-10-30 15:12             ` Ulf Hansson
2025-10-30 16:36               ` Rafael J. Wysocki
2025-10-31 10:03                 ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).