[PATCH v1 0/5] Optimize async device suspend/resume

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/5] Optimize async device suspend/resume
@ 2024-11-14 22:09 Saravana Kannan
  2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

A lot of the details are in patch 4/5 and 5/5. The summary is that
there's a lot of overhead and wasted work in how async device
suspend/resume is handled today. I talked about this and otther
suspend/resume issues at LPC 2024[1].

You can remove a lot of the overhead by doing a breadth first queuing of
async suspend/resumes. That's what this patch series does. I also
noticed that during resume, because of EAS, we don't use the bigger CPUs
as quickly. This was leading to a lot of scheduling latency and
preemption of runnable threads and increasing the resume latency. So, we
also disable EAS for that tiny period of resume where we know there'll
be a lot of parallelism.

On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
series yields significant improvements:
+---------------------------+-----------+----------------+------------+-------+
| Phase			    | Old full sync | Old full async | New full async |
|			    |		    | 		     | + EAS disabled |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
+---------------------------+-----------+----------------+------------+-------+
| Sum			    |        182 ms |         162 ms |         123 ms |
+---------------------------+-----------+----------------+------------+-------+

There might be room for some more optimizations in the future, but I'm
keep this patch series simple enough so that it's easier to review and
check that it's not breaking anything. If this series lands and is
stable and no bug reports for a few months, I can work on optimizing
this a bit further.

Thanks,
Saravana
P.S: Cc-ing some usual suspects you might be interested in testing this
out.

[1] - https://lpc.events/event/18/contributions/1845/

Saravana Kannan (5):
  PM: sleep: Fix runtime PM issue in dpm_resume()
  PM: sleep: Remove unnecessary mutex lock when waiting on parent
  PM: sleep: Add helper functions to loop through superior/subordinate
    devs
  PM: sleep: Do breadth first suspend/resume for async suspend/resume
  PM: sleep: Spread out async kworker threads during dpm_resume*()
    phases

 drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
 kernel/power/suspend.c    |  16 ++
 kernel/sched/topology.c   |  13 ++
 3 files changed, 276 insertions(+), 78 deletions(-)

-- 
2.47.0.338.g60cca15819-goog

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
@ 2024-11-14 22:09 ` Saravana Kannan
  2024-11-16  7:43   ` Greg Kroah-Hartman
                     ` (2 more replies)
  2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

Some devices might have their is_suspended flag set to false. In these
cases, dpm_resume() should skip doing anything for those devices.
However, runtime PM enable and a few others steps are done before
checking for this flag. Fix it so that we do things in the right order.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 4a67e83300e1..86e51b9fefab 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
 	if (dev->power.syscore)
 		goto Complete;
 
+	if (!dev->power.is_suspended)
+		goto Unlock;
+
 	if (dev->power.direct_complete) {
 		/* Match the pm_runtime_disable() in __device_suspend(). */
 		pm_runtime_enable(dev);
@@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
 	 */
 	dev->power.is_prepared = false;
 
-	if (!dev->power.is_suspended)
-		goto Unlock;
-
 	if (dev->pm_domain) {
 		info = "power domain ";
 		callback = pm_op(&dev->pm_domain->ops, state);
-- 
2.47.0.338.g60cca15819-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
  2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
@ 2024-11-14 22:09 ` Saravana Kannan
  2024-12-02 20:11   ` Rafael J. Wysocki
  2024-11-14 22:09 ` [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs Saravana Kannan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

Locking is not needed to do get_device(dev->parent). We either get a NULL
(if the parent was cleared) or the actual parent. Also, when a device is
deleted (device_del()) and removed from the dpm_list, its completion
variable is also complete_all()-ed. So, we don't have to worry about
waiting indefinitely on a deleted parent device.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/main.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 86e51b9fefab..9b9b6088e56a 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
 	 * counting the parent once more unless the device has been deleted
 	 * already (in which case return right away).
 	 */
-	mutex_lock(&dpm_list_mtx);
-
-	if (!device_pm_initialized(dev)) {
-		mutex_unlock(&dpm_list_mtx);
-		return false;
-	}
-
 	parent = get_device(dev->parent);
-
-	mutex_unlock(&dpm_list_mtx);
-
-	dpm_wait(parent, async);
+	if (device_pm_initialized(dev))
+		dpm_wait(parent, async);
 	put_device(parent);
 
 	dpm_wait_for_suppliers(dev, async);
-- 
2.47.0.338.g60cca15819-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs
  2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
  2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
  2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan
@ 2024-11-14 22:09 ` Saravana Kannan
  2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

We have a lot of code that does or will loop through superior/subordinate
devices and take action on them. Refactor the code to pull out the common
"loop through" functionality into helper functions to avoid repeating the
logic.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/main.c | 70 ++++++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 9b9b6088e56a..aa1470ef6ac0 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -247,15 +247,22 @@ static int dpm_wait_fn(struct device *dev, void *async_ptr)
 	return 0;
 }
 
-static void dpm_wait_for_children(struct device *dev, bool async)
-{
-       device_for_each_child(dev, &async, dpm_wait_fn);
-}
-
-static void dpm_wait_for_suppliers(struct device *dev, bool async)
+static int dpm_for_each_superior(struct device *dev, void *data,
+				 int (*fn)(struct device *dev, void *data))
 {
+	struct device *parent;
+	int ret = 0, idx;
 	struct device_link *link;
-	int idx;
+
+	if (!dev)
+		return 0;
+
+	parent = get_device(dev->parent);
+	if (parent)
+		ret = fn(parent, data);
+	put_device(parent);
+	if (ret)
+		return ret;
 
 	idx = device_links_read_lock();
 
@@ -267,29 +274,20 @@ static void dpm_wait_for_suppliers(struct device *dev, bool async)
 	 * walking.
 	 */
 	list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
-		if (READ_ONCE(link->status) != DL_STATE_DORMANT)
-			dpm_wait(link->supplier, async);
+		if (READ_ONCE(link->status) != DL_STATE_DORMANT) {
+			ret = fn(link->supplier, data);
+			if (ret)
+				break;
+		}
 
 	device_links_read_unlock(idx);
+
+	return ret;
 }
 
 static bool dpm_wait_for_superior(struct device *dev, bool async)
 {
-	struct device *parent;
-
-	/*
-	 * If the device is resumed asynchronously and the parent's callback
-	 * deletes both the device and the parent itself, the parent object may
-	 * be freed while this function is running, so avoid that by reference
-	 * counting the parent once more unless the device has been deleted
-	 * already (in which case return right away).
-	 */
-	parent = get_device(dev->parent);
-	if (device_pm_initialized(dev))
-		dpm_wait(parent, async);
-	put_device(parent);
-
-	dpm_wait_for_suppliers(dev, async);
+	dpm_for_each_superior(dev, &async, dpm_wait_fn);
 
 	/*
 	 * If the parent's callback has deleted the device, attempting to resume
@@ -298,10 +296,18 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
 	return device_pm_initialized(dev);
 }
 
-static void dpm_wait_for_consumers(struct device *dev, bool async)
+static int dpm_for_each_subordinate(struct device *dev, void *data,
+				    int (*fn)(struct device *dev, void *data))
 {
+	int ret, idx;
 	struct device_link *link;
-	int idx;
+
+	if (!dev)
+		return 0;
+
+	ret = device_for_each_child(dev, data, fn);
+	if (ret)
+		return ret;
 
 	idx = device_links_read_lock();
 
@@ -315,16 +321,20 @@ static void dpm_wait_for_consumers(struct device *dev, bool async)
 	 * unregistration).
 	 */
 	list_for_each_entry_rcu_locked(link, &dev->links.consumers, s_node)
-		if (READ_ONCE(link->status) != DL_STATE_DORMANT)
-			dpm_wait(link->consumer, async);
+		if (READ_ONCE(link->status) != DL_STATE_DORMANT) {
+			ret = fn(link->consumer, data);
+			if (ret)
+				break;
+		}
 
 	device_links_read_unlock(idx);
+
+	return ret;
 }
 
 static void dpm_wait_for_subordinate(struct device *dev, bool async)
 {
-	dpm_wait_for_children(dev, async);
-	dpm_wait_for_consumers(dev, async);
+	dpm_for_each_subordinate(dev, &async, dpm_wait_fn);
 }
 
 /**
-- 
2.47.0.338.g60cca15819-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume
  2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
                   ` (2 preceding siblings ...)
  2024-11-14 22:09 ` [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs Saravana Kannan
@ 2024-11-14 22:09 ` Saravana Kannan
  2025-03-11 11:25   ` Geert Uytterhoeven
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
  2024-11-19  4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
  5 siblings, 1 reply; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

The dpm_list used for suspend/resume ensures that all superior devices
(parents and suppliers) precede subordinate devices (children and
consumers).

Current async resume logic:
-------------------------------
* For each resume phase (except the "complete" phase, which is always
  serialized), the resume logic first queues all async devices in the
  dpm_list. It then loops through the dpm_list again to resume the sync
  devices one by one.

* Async devices wait for all their superior devices to resume before
  starting their own resume operation.

* This process results in multiple sleep and wake-up cycles before an
  async device actually resumes. This sleeping also causes kworker
  threads to stall with work for a period. Consequently, the workqueue
  framework spins up more kworker threads to handle the other async
  devices.

* The end result is excessive thread creation, wake-ups, sleeps, and
  context switches for every async device. This overhead makes a full
  async resume (with all devices marked as async-capable) much slower
  than a synchronous resume.

Current async suspend logic:
--------------------------------
* The async suspend logic differs from the async resume logic. The
  suspend logic loops through the dpm_list. When it finds an async
  device, it queues the work and moves on. However, if it encounters a
  sync device, it waits until the sync device (and all its subordinate
  devices) have suspended before proceeding to the next device.
  Therefore, an async suspend device can be left waiting on an
  unrelated device before even being queued.

* Once queued, an async device experiences the same inefficiencies as
  in the resume logic (thread creation, wake-ups, sleeps, and context
  switches).

On a Pixel 6, averaging over 100 suspend/resume cycles, the data is as
follows:

+---------------------------+-----------+------------+----------+
| Phase			    | Full sync | Full async | % change |
+---------------------------+-----------+------------+----------+
| Total dpm_suspend*() time |	 107 ms |      72 ms |     -33% |
+---------------------------+-----------+------------+----------+
| Total dpm_resume*() time  |	  75 ms |      90 ms |     +20% |
+---------------------------+-----------+------------+----------+
| Sum			    |	 182 ms |     162 ms |     -11% |
+---------------------------+-----------+------------+----------+

This shows that full async suspend/resume is not a viable option. It
makes the user-visible resume phase slower and only improves the
overall time by 11%.

To fix all this, this patches introduces a new async suspend/resume
logic.

New suspend/resume logic:
-------------------------
* For each suspend/resume phase (except "complete" and "prepare,"
  which are always serialized), the logic first queues only the async
  devices that don't have to wait for any subordinates (for suspend)
  or superiors (for resume). It then loops through the dpm_list again
  to suspend/resume the sync devices one by one.

* When a device (sync or async) successfully suspends/resumes, it
  examines its superiors/subordinates and queues only the async
  devices that don't need to wait for any subordinates/superiors.

With this new logic:

* Queued async devices don't have to wait for completion and are
  always ready to perform their suspend/resume operation.

* The queue of async devices remains short.

* kworkers never sleep for extended periods, and the workqueue
  framework doesn't spin up many new threads to handle a backlog of
  async devices.

* The result is approximately NCPU kworker threads running in parallel
  without sleeping until all async devices finish.

On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
yields improved results:
+---------------------------+-----------+------------+------------------+
| Phase			    | Old full sync | New full async | % change |
+---------------------------+-----------+------------+------------------+
| Total dpm_suspend*() time |        107 ms |          60 ms |     -44% |
+---------------------------+-----------+------------+------------------+
| Total dpm_resume*() time  |         75 ms |          74 ms |      -1% |
+---------------------------+-----------+------------+------------------+
| Sum			    |        182 ms |         134 ms |     -26% |
+---------------------------+-----------+------------+------------------+

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/main.c | 242 ++++++++++++++++++++++++++++++++------
 1 file changed, 205 insertions(+), 37 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index aa1470ef6ac0..65c195be48b8 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -121,6 +121,12 @@ void device_pm_unlock(void)
 	mutex_unlock(&dpm_list_mtx);
 }
 
+static bool is_async(struct device *dev)
+{
+	return dev->power.async_suspend && pm_async_enabled
+		&& !pm_trace_is_enabled();
+}
+
 /**
  * device_pm_add - Add a device to the PM core's list of active devices.
  * @dev: Device to add to the list.
@@ -287,6 +293,9 @@ static int dpm_for_each_superior(struct device *dev, void *data,
 
 static bool dpm_wait_for_superior(struct device *dev, bool async)
 {
+	if (is_async(dev))
+		return true;
+
 	dpm_for_each_superior(dev, &async, dpm_wait_fn);
 
 	/*
@@ -334,9 +343,22 @@ static int dpm_for_each_subordinate(struct device *dev, void *data,
 
 static void dpm_wait_for_subordinate(struct device *dev, bool async)
 {
+	if (is_async(dev))
+		return;
+
 	dpm_for_each_subordinate(dev, &async, dpm_wait_fn);
 }
 
+static int dpm_check_fn(struct device *dev, void *data)
+{
+	return completion_done(&dev->power.completion) ? 0 : -EBUSY;
+}
+
+static int dpm_check_subordinate(struct device *dev)
+{
+	return dpm_for_each_subordinate(dev, NULL, dpm_check_fn);
+}
+
 /**
  * pm_op - Return the PM operation appropriate for given PM event.
  * @ops: PM operations to choose from.
@@ -578,33 +600,39 @@ bool dev_pm_skip_resume(struct device *dev)
 	return !dev->power.must_resume;
 }
 
-static bool is_async(struct device *dev)
+static void dpm_async_fn(struct device *dev, async_func_t func)
 {
-	return dev->power.async_suspend && pm_async_enabled
-		&& !pm_trace_is_enabled();
-}
-
-static bool dpm_async_fn(struct device *dev, async_func_t func)
-{
-	reinit_completion(&dev->power.completion);
-
-	if (is_async(dev)) {
-		dev->power.async_in_progress = true;
-
-		get_device(dev);
+	/*
+	 * Multiple async devices could trigger the async queuing of this
+	 * device. Make sure not to double queue it.
+	 */
+	spin_lock(&dev->power.lock);
+	if (dev->power.async_in_progress) {
+		spin_unlock(&dev->power.lock);
+		return;
+	}
+	dev->power.async_in_progress = true;
+	spin_unlock(&dev->power.lock);
 
-		if (async_schedule_dev_nocall(func, dev))
-			return true;
+	get_device(dev);
 
-		put_device(dev);
-	}
 	/*
-	 * Because async_schedule_dev_nocall() above has returned false or it
-	 * has not been called at all, func() is not running and it is safe to
-	 * update the async_in_progress flag without extra synchronization.
+	 * We aren't going to call any callbacks if the device has none.  Also,
+	 * direct_complete means all the resume and suspend callbacks should be
+	 * skipped and the device should go straight to dpm_complete().  In both
+	 * of these case, calling the func() synchronously will be a lot quicker
+	 * than even queuing the async work. So, do that.
 	 */
-	dev->power.async_in_progress = false;
-	return false;
+	if (dev->power.no_pm_callbacks || dev->power.direct_complete) {
+		func(dev, 0);
+		return;
+	}
+
+	if (async_schedule_dev_nocall(func, dev))
+		return;
+
+	WARN(1, "Unable to schedule async suspend/resume calls!\n");
+	put_device(dev);
 }
 
 /**
@@ -692,12 +720,47 @@ static void device_resume_noirq(struct device *dev, pm_message_t state, bool asy
 	}
 }
 
+static void dpm_reinit_dev_state(struct list_head *list)
+{
+	struct device *dev;
+
+	list_for_each_entry(dev, list, power.entry) {
+		reinit_completion(&dev->power.completion);
+		dev->power.async_in_progress = false;
+	}
+}
+
+static int dpm_check_superior(struct device *dev)
+{
+	return dpm_for_each_superior(dev, NULL, dpm_check_fn);
+}
+
+static int dpm_async_queue_resume_ready_fn(struct device *dev, void *data)
+{
+	if (!is_async(dev) || dpm_check_superior(dev))
+		return 0;
+
+	dpm_async_fn(dev, data);
+	return 0;
+}
+
+static void dpm_async_resume_loop(struct list_head *from, async_func_t func)
+{
+	struct device *dev;
+
+	list_for_each_entry(dev, from, power.entry)
+		dpm_async_queue_resume_ready_fn(dev, func);
+}
+
 static void async_resume_noirq(void *data, async_cookie_t cookie)
 {
 	struct device *dev = data;
 
 	device_resume_noirq(dev, pm_transition, true);
 	put_device(dev);
+
+	dpm_for_each_subordinate(dev, async_resume_noirq,
+				 dpm_async_queue_resume_ready_fn);
 }
 
 static void dpm_noirq_resume_devices(pm_message_t state)
@@ -712,23 +775,26 @@ static void dpm_noirq_resume_devices(pm_message_t state)
 
 	mutex_lock(&dpm_list_mtx);
 
+	dpm_reinit_dev_state(&dpm_noirq_list);
+
 	/*
 	 * Trigger the resume of "async" devices upfront so they don't have to
 	 * wait for the "non-async" ones they don't depend on.
 	 */
-	list_for_each_entry(dev, &dpm_noirq_list, power.entry)
-		dpm_async_fn(dev, async_resume_noirq);
+	dpm_async_resume_loop(&dpm_noirq_list, async_resume_noirq);
 
 	while (!list_empty(&dpm_noirq_list)) {
 		dev = to_device(dpm_noirq_list.next);
 		list_move_tail(&dev->power.entry, &dpm_late_early_list);
 
-		if (!dev->power.async_in_progress) {
+		if (!is_async(dev)) {
 			get_device(dev);
 
 			mutex_unlock(&dpm_list_mtx);
 
 			device_resume_noirq(dev, state, false);
+			dpm_for_each_subordinate(dev, async_resume_noirq,
+						 dpm_async_queue_resume_ready_fn);
 
 			put_device(dev);
 
@@ -834,6 +900,9 @@ static void async_resume_early(void *data, async_cookie_t cookie)
 
 	device_resume_early(dev, pm_transition, true);
 	put_device(dev);
+
+	dpm_for_each_subordinate(dev, async_resume_early,
+				 dpm_async_queue_resume_ready_fn);
 }
 
 /**
@@ -852,23 +921,26 @@ void dpm_resume_early(pm_message_t state)
 
 	mutex_lock(&dpm_list_mtx);
 
+	dpm_reinit_dev_state(&dpm_late_early_list);
+
 	/*
 	 * Trigger the resume of "async" devices upfront so they don't have to
 	 * wait for the "non-async" ones they don't depend on.
 	 */
-	list_for_each_entry(dev, &dpm_late_early_list, power.entry)
-		dpm_async_fn(dev, async_resume_early);
+	dpm_async_resume_loop(&dpm_late_early_list, async_resume_early);
 
 	while (!list_empty(&dpm_late_early_list)) {
 		dev = to_device(dpm_late_early_list.next);
 		list_move_tail(&dev->power.entry, &dpm_suspended_list);
 
-		if (!dev->power.async_in_progress) {
+		if (!is_async(dev)) {
 			get_device(dev);
 
 			mutex_unlock(&dpm_list_mtx);
 
 			device_resume_early(dev, state, false);
+			dpm_for_each_subordinate(dev, async_resume_early,
+					dpm_async_queue_resume_ready_fn);
 
 			put_device(dev);
 
@@ -996,6 +1068,9 @@ static void async_resume(void *data, async_cookie_t cookie)
 
 	device_resume(dev, pm_transition, true);
 	put_device(dev);
+
+	dpm_for_each_subordinate(dev, async_resume,
+				 dpm_async_queue_resume_ready_fn);
 }
 
 /**
@@ -1018,23 +1093,26 @@ void dpm_resume(pm_message_t state)
 
 	mutex_lock(&dpm_list_mtx);
 
+	dpm_reinit_dev_state(&dpm_suspended_list);
+
 	/*
 	 * Trigger the resume of "async" devices upfront so they don't have to
 	 * wait for the "non-async" ones they don't depend on.
 	 */
-	list_for_each_entry(dev, &dpm_suspended_list, power.entry)
-		dpm_async_fn(dev, async_resume);
+	dpm_async_resume_loop(&dpm_suspended_list, async_resume);
 
 	while (!list_empty(&dpm_suspended_list)) {
 		dev = to_device(dpm_suspended_list.next);
 		list_move_tail(&dev->power.entry, &dpm_prepared_list);
 
-		if (!dev->power.async_in_progress) {
+		if (!is_async(dev)) {
 			get_device(dev);
 
 			mutex_unlock(&dpm_list_mtx);
 
 			device_resume(dev, state, false);
+			dpm_for_each_subordinate(dev, async_resume,
+					dpm_async_queue_resume_ready_fn);
 
 			put_device(dev);
 
@@ -1274,12 +1352,35 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy
 	return error;
 }
 
+static int dpm_async_queue_suspend_ready_fn(struct device *dev, void *data)
+{
+	if (!is_async(dev) || dpm_check_subordinate(dev))
+		return 0;
+
+	dpm_async_fn(dev, data);
+	return 0;
+}
+
+static void dpm_async_suspend_loop(struct list_head *from, async_func_t func)
+{
+	struct device *dev;
+
+	list_for_each_entry_reverse(dev, from, power.entry)
+		dpm_async_queue_suspend_ready_fn(dev, func);
+}
+
 static void async_suspend_noirq(void *data, async_cookie_t cookie)
 {
 	struct device *dev = data;
 
 	device_suspend_noirq(dev, pm_transition, true);
 	put_device(dev);
+
+	if (async_error)
+		return;
+
+	dpm_for_each_superior(dev, async_suspend_noirq,
+			      dpm_async_queue_suspend_ready_fn);
 }
 
 static int dpm_noirq_suspend_devices(pm_message_t state)
@@ -1294,12 +1395,20 @@ static int dpm_noirq_suspend_devices(pm_message_t state)
 
 	mutex_lock(&dpm_list_mtx);
 
+	dpm_reinit_dev_state(&dpm_late_early_list);
+
+	/*
+	 * Trigger the "async" devices upfront so they don't have to wait for
+	 * the "non-async" ones they don't depend on.
+	 */
+	dpm_async_suspend_loop(&dpm_late_early_list, async_suspend_noirq);
+
 	while (!list_empty(&dpm_late_early_list)) {
 		struct device *dev = to_device(dpm_late_early_list.prev);
 
 		list_move(&dev->power.entry, &dpm_noirq_list);
 
-		if (dpm_async_fn(dev, async_suspend_noirq))
+		if (is_async(dev))
 			continue;
 
 		get_device(dev);
@@ -1307,13 +1416,20 @@ static int dpm_noirq_suspend_devices(pm_message_t state)
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_suspend_noirq(dev, state, false);
+		if (!error && !async_error)
+			dpm_for_each_superior(dev, async_suspend_noirq,
+					      dpm_async_queue_suspend_ready_fn);
 
 		put_device(dev);
 
 		mutex_lock(&dpm_list_mtx);
 
-		if (error || async_error)
+		if (error || async_error) {
+			/* See explanation in dpm_suspend() */
+			list_splice_init(&dpm_late_early_list, &dpm_noirq_list);
 			break;
+		}
+
 	}
 
 	mutex_unlock(&dpm_list_mtx);
@@ -1447,6 +1563,12 @@ static void async_suspend_late(void *data, async_cookie_t cookie)
 
 	device_suspend_late(dev, pm_transition, true);
 	put_device(dev);
+
+	if (async_error)
+		return;
+
+	dpm_for_each_superior(dev, async_suspend_late,
+			      dpm_async_queue_suspend_ready_fn);
 }
 
 /**
@@ -1467,12 +1589,20 @@ int dpm_suspend_late(pm_message_t state)
 
 	mutex_lock(&dpm_list_mtx);
 
+	dpm_reinit_dev_state(&dpm_suspended_list);
+
+	/*
+	 * Trigger the "async" devices upfront so they don't have to wait for
+	 * the "non-async" ones they don't depend on.
+	 */
+	dpm_async_suspend_loop(&dpm_suspended_list, async_suspend_late);
+
 	while (!list_empty(&dpm_suspended_list)) {
 		struct device *dev = to_device(dpm_suspended_list.prev);
 
 		list_move(&dev->power.entry, &dpm_late_early_list);
 
-		if (dpm_async_fn(dev, async_suspend_late))
+		if (is_async(dev))
 			continue;
 
 		get_device(dev);
@@ -1480,13 +1610,19 @@ int dpm_suspend_late(pm_message_t state)
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_suspend_late(dev, state, false);
+		if (!error && !async_error)
+			dpm_for_each_superior(dev, async_suspend_late,
+					      dpm_async_queue_suspend_ready_fn);
 
 		put_device(dev);
 
 		mutex_lock(&dpm_list_mtx);
 
-		if (error || async_error)
+		if (error || async_error) {
+			/* See explanation in dpm_suspend() */
+			list_splice_init(&dpm_suspended_list, &dpm_late_early_list);
 			break;
+		}
 	}
 
 	mutex_unlock(&dpm_list_mtx);
@@ -1712,6 +1848,12 @@ static void async_suspend(void *data, async_cookie_t cookie)
 
 	device_suspend(dev, pm_transition, true);
 	put_device(dev);
+
+	if (async_error)
+		return;
+
+	dpm_for_each_superior(dev, async_suspend,
+			      dpm_async_queue_suspend_ready_fn);
 }
 
 /**
@@ -1734,12 +1876,20 @@ int dpm_suspend(pm_message_t state)
 
 	mutex_lock(&dpm_list_mtx);
 
+	dpm_reinit_dev_state(&dpm_prepared_list);
+
+	/*
+	 * Trigger the "async" devices upfront so they don't have to wait for
+	 * the "non-async" ones they don't depend on.
+	 */
+	dpm_async_suspend_loop(&dpm_prepared_list, async_suspend);
+
 	while (!list_empty(&dpm_prepared_list)) {
 		struct device *dev = to_device(dpm_prepared_list.prev);
 
 		list_move(&dev->power.entry, &dpm_suspended_list);
 
-		if (dpm_async_fn(dev, async_suspend))
+		if (is_async(dev))
 			continue;
 
 		get_device(dev);
@@ -1747,13 +1897,31 @@ int dpm_suspend(pm_message_t state)
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_suspend(dev, state, false);
+		if (!error && !async_error)
+			dpm_for_each_superior(dev, async_suspend,
+					      dpm_async_queue_suspend_ready_fn);
 
 		put_device(dev);
 
 		mutex_lock(&dpm_list_mtx);
 
-		if (error || async_error)
+		if (error || async_error) {
+			/*
+			 * async devices that come after the current sync device
+			 * in the list might have already suspended. We need to
+			 * make sure those async devices are resumed again. By
+			 * moving all devices to the next list, we make sure the
+			 * error handling path resumes all suspended devices.
+			 *
+			 * However, we also need to avoid resuming devices that
+			 * haven't been suspended yet. Fortunately, the existing
+			 * is_suspended/is_noirq_suspended/is_late_suspended
+			 * flags take care of skipping the resume of a device if
+			 * it hasn't been suspended yet.
+			 */
+			list_splice_init(&dpm_prepared_list, &dpm_suspended_list);
 			break;
+		}
 	}
 
 	mutex_unlock(&dpm_list_mtx);
-- 
2.47.0.338.g60cca15819-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
                   ` (3 preceding siblings ...)
  2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan
@ 2024-11-14 22:09 ` Saravana Kannan
  2024-11-15  5:25   ` Saravana Kannan
                     ` (5 more replies)
  2024-11-19  4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
  5 siblings, 6 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

As of today, the scheduler doesn't spread out all the kworker threads
across all the available CPUs during suspend/resume. This causes
significant resume latency during the dpm_resume*() phases.

System resume latency is a very user-visible event. Reducing the
latency is more important than trying to be energy aware during that
period.

Since there are no userspace processes running during this time and
this is a very short time window, we can simply disable EAS during
resume so that the parallel resume of the devices is spread across all
the CPUs.

On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
plus disabling EAS for resume yields significant improvements:
+---------------------------+-----------+------------+------------------+
| Phase			    | Old full sync | New full async | % change |
|			    |		    | + EAS disabled |		|
+---------------------------+-----------+------------+------------------+
| Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
+---------------------------+-----------+------------+------------------+
| Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
+---------------------------+-----------+------------+------------------+
| Sum			    |        182 ms |         123 ms |     -32% |
+---------------------------+-----------+------------+------------------+

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 kernel/power/suspend.c  | 16 ++++++++++++++++
 kernel/sched/topology.c | 13 +++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 09f8397bae15..7304dc39958f 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
 	local_irq_enable();
 }
 
+/*
+ * Intentionally not part of a header file to avoid risk of abuse by other
+ * drivers.
+ */
+void sched_set_energy_aware(unsigned int enable);
+
 /**
  * suspend_enter - Make the system enter the given sleep state.
  * @state: System sleep state to enter.
@@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 
  Platform_wake:
 	platform_resume_noirq(state);
+	/*
+	 * We do this only for resume instead of suspend and resume for these
+	 * reasons:
+	 * - Performance is more important than power for resume.
+	 * - Power spent entering suspend is more important for suspend. Also,
+	 *   stangely, disabling EAS was making suspent a few milliseconds
+	 *   slower in my testing.
+	 */
+	sched_set_energy_aware(0);
 	dpm_resume_noirq(PMSG_RESUME);
 
  Platform_early_resume:
@@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
  Resume_devices:
 	suspend_test_start();
 	dpm_resume_end(PMSG_RESUME);
+	sched_set_energy_aware(1);
 	suspend_test_finish("resume devices");
 	trace_suspend_resume(TPS("resume_console"), state, true);
 	resume_console();
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9748a4c8d668..c069c0b17cbf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
 	mutex_unlock(&sched_energy_mutex);
 }
 
+void sched_set_energy_aware(unsigned int enable)
+{
+	int state;
+
+	if (!sched_is_eas_possible(cpu_active_mask))
+		return;
+
+	sysctl_sched_energy_aware = enable;
+	state = static_branch_unlikely(&sched_energy_present);
+	if (state != sysctl_sched_energy_aware)
+		rebuild_sched_domains_energy();
+}
+
 #ifdef CONFIG_PROC_SYSCTL
 static int sched_energy_aware_handler(const struct ctl_table *table, int write,
 		void *buffer, size_t *lenp, loff_t *ppos)
-- 
2.47.0.338.g60cca15819-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
@ 2024-11-15  5:25   ` Saravana Kannan
  2024-11-15  9:25     ` Geert Uytterhoeven
  2024-11-15 15:30     ` Vincent Guittot
  2024-11-15 16:12   ` Vincent Guittot
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-15  5:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
>
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
>
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> |                           |               | + EAS disabled |          |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>         local_irq_enable();
>  }
>
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>
>   Platform_wake:
>         platform_resume_noirq(state);
> +       /*
> +        * We do this only for resume instead of suspend and resume for these
> +        * reasons:
> +        * - Performance is more important than power for resume.
> +        * - Power spent entering suspend is more important for suspend. Also,
> +        *   stangely, disabling EAS was making suspent a few milliseconds
> +        *   slower in my testing.
> +        */
> +       sched_set_energy_aware(0);
>         dpm_resume_noirq(PMSG_RESUME);
>
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>         suspend_test_start();
>         dpm_resume_end(PMSG_RESUME);
> +       sched_set_energy_aware(1);
>         suspend_test_finish("resume devices");
>         trace_suspend_resume(TPS("resume_console"), state, true);
>         resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>         mutex_unlock(&sched_energy_mutex);
>  }
>
> +void sched_set_energy_aware(unsigned int enable)

  CC      kernel/sched/build_utility.o
In file included from kernel/sched/build_utility.c:88:
kernel/sched/topology.c:287:6: warning: no previous prototype for
‘sched_set_energy_aware’ [-Wmissing-prototypes]
  287 | void sched_set_energy_aware(unsigned int enable)
      |      ^~~~~~~~~~~~~~~~~~~~~~

Peter/Vincent,

I noticed that I'm getting a warning for this line. But I'm not sure
what to do about it. I intentionally didn't put this in a header file
because I'm guessing we don't want to make this available to
drivers/frameworks in general.

Let me know how you want me to handle this.

-Saravana

> +{
> +       int state;
> +
> +       if (!sched_is_eas_possible(cpu_active_mask))
> +               return;
> +
> +       sysctl_sched_energy_aware = enable;
> +       state = static_branch_unlikely(&sched_energy_present);
> +       if (state != sysctl_sched_energy_aware)
> +               rebuild_sched_domains_energy();
> +}
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
>                 void *buffer, size_t *lenp, loff_t *ppos)
> --
> 2.47.0.338.g60cca15819-goog
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-15  5:25   ` Saravana Kannan
@ 2024-11-15  9:25     ` Geert Uytterhoeven
  2024-11-15 15:30     ` Vincent Guittot
  1 sibling, 0 replies; 33+ messages in thread
From: Geert Uytterhoeven @ 2024-11-15  9:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

Hi Saravana,

On Fri, Nov 15, 2024 at 6:25 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
>   CC      kernel/sched/build_utility.o
> In file included from kernel/sched/build_utility.c:88:
> kernel/sched/topology.c:287:6: warning: no previous prototype for
> ‘sched_set_energy_aware’ [-Wmissing-prototypes]
>   287 | void sched_set_energy_aware(unsigned int enable)
>       |      ^~~~~~~~~~~~~~~~~~~~~~
>
> Peter/Vincent,
>
> I noticed that I'm getting a warning for this line. But I'm not sure
> what to do about it. I intentionally didn't put this in a header file
> because I'm guessing we don't want to make this available to
> drivers/frameworks in general.
>
> Let me know how you want me to handle this.

Put the prototype in kernel/sched/sched.h, and include
../sched/sched.h from kernel/power/suspend.c?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-15  5:25   ` Saravana Kannan
  2024-11-15  9:25     ` Geert Uytterhoeven
@ 2024-11-15 15:30     ` Vincent Guittot
  1 sibling, 0 replies; 33+ messages in thread
From: Vincent Guittot @ 2024-11-15 15:30 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Fri, 15 Nov 2024 at 06:25, Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);

extern void sched_set_energy_aware(unsigned int enable);

clear the warning

> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
>   CC      kernel/sched/build_utility.o
> In file included from kernel/sched/build_utility.c:88:
> kernel/sched/topology.c:287:6: warning: no previous prototype for
> ‘sched_set_energy_aware’ [-Wmissing-prototypes]
>   287 | void sched_set_energy_aware(unsigned int enable)
>       |      ^~~~~~~~~~~~~~~~~~~~~~
>
> Peter/Vincent,
>
> I noticed that I'm getting a warning for this line. But I'm not sure
> what to do about it. I intentionally didn't put this in a header file
> because I'm guessing we don't want to make this available to
> drivers/frameworks in general.
>
> Let me know how you want me to handle this.
>
> -Saravana
>
> > +{
> > +       int state;
> > +
> > +       if (!sched_is_eas_possible(cpu_active_mask))
> > +               return;
> > +
> > +       sysctl_sched_energy_aware = enable;
> > +       state = static_branch_unlikely(&sched_energy_present);
> > +       if (state != sysctl_sched_energy_aware)
> > +               rebuild_sched_domains_energy();
> > +}
> > +
> >  #ifdef CONFIG_PROC_SYSCTL
> >  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
> >                 void *buffer, size_t *lenp, loff_t *ppos)
> > --
> > 2.47.0.338.g60cca15819-goog
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
  2024-11-15  5:25   ` Saravana Kannan
@ 2024-11-15 16:12   ` Vincent Guittot
  2024-11-15 18:33     ` Saravana Kannan
  2024-11-17  0:34   ` kernel test robot
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Vincent Guittot @ 2024-11-15 16:12 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
>
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
>
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
>
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> |                           |               | + EAS disabled |          |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+

in cover letter you have figures for
 - Old full sync
 - New full async
 - New full async  + EAS disabled

you should better use the figures for  New full async vs New full
async  + EAS disabled to show EAS disabled impact

I would be interested to get figures about the impact of disabling it
during full suspend sequence as I'm not convince that it's worth the
complexity especially with fix OPP during suspend

>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>         local_irq_enable();
>  }
>
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>
>   Platform_wake:
>         platform_resume_noirq(state);
> +       /*
> +        * We do this only for resume instead of suspend and resume for these
> +        * reasons:
> +        * - Performance is more important than power for resume.
> +        * - Power spent entering suspend is more important for suspend. Also,
> +        *   stangely, disabling EAS was making suspent a few milliseconds
> +        *   slower in my testing.
> +        */
> +       sched_set_energy_aware(0);
>         dpm_resume_noirq(PMSG_RESUME);
>
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>         suspend_test_start();
>         dpm_resume_end(PMSG_RESUME);
> +       sched_set_energy_aware(1);

If we end up having a special scheduling mode during suspend, we
should make the function more generic and not only EAS/ smartphone
specific

Like a sched_suspend and sched_resume

>         suspend_test_finish("resume devices");
>         trace_suspend_resume(TPS("resume_console"), state, true);
>         resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>         mutex_unlock(&sched_energy_mutex);
>  }
>
> +void sched_set_energy_aware(unsigned int enable)

This is a copy/paste of sched_energy_aware_handler() below, we should
have 1 helper for both

> +{
> +       int state;
> +
> +       if (!sched_is_eas_possible(cpu_active_mask))
> +               return;
> +
> +       sysctl_sched_energy_aware = enable;
> +       state = static_branch_unlikely(&sched_energy_present);
> +       if (state != sysctl_sched_energy_aware)
> +               rebuild_sched_domains_energy();
> +}
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
>                 void *buffer, size_t *lenp, loff_t *ppos)
> --
> 2.47.0.338.g60cca15819-goog
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-15 16:12   ` Vincent Guittot
@ 2024-11-15 18:33     ` Saravana Kannan
  0 siblings, 0 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-15 18:33 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Fri, Nov 15, 2024 at 8:13 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
> >
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
>
> in cover letter you have figures for
>  - Old full sync
>  - New full async
>  - New full async  + EAS disabled
>
> you should better use the figures for  New full async vs New full
> async  + EAS disabled to show EAS disabled impact

I do give those numbers in the commit text of each patch making the changes.

Patch 4 commit text shows how it's improving things compared to the
older logic full sync (this is the baseline) - resume is 1% faster.
Patch 5 commit text shows you how disabling EAS is improving numbers
compared to baseline - resume 19% faster.

So, yeah, all the numbers are there in one of these emails. Patch 5
(which is the only one touching EAS) is the one that has the
comparison you are asking for.

> I would be interested to get figures about the impact of disabling it
> during full suspend sequence as I'm not convince that it's worth the
> complexity especially with fix OPP during suspend

1. Device suspend actually got worse by 5ms or so. I already provided that.

2. As I said in the Patch 5, suspend is more about reducing the energy
going into suspend. It's a balance of how quick you can be to how much
power you use to be quick. So, disabling EAS across all of
suspend/resume will have a huge impact on power because userspace is
still running, there are a ton of threads and userspace could get
preempted between disabling suspend and kicking off suspend. Lots of
obvious power concerns overall.

Thanks,
Saravana

>
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
>
> If we end up having a special scheduling mode during suspend, we
> should make the function more generic and not only EAS/ smartphone
> specific
>
> Like a sched_suspend and sched_resume
>
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
> This is a copy/paste of sched_energy_aware_handler() below, we should
> have 1 helper for both
>
> > +{
> > +       int state;
> > +
> > +       if (!sched_is_eas_possible(cpu_active_mask))
> > +               return;
> > +
> > +       sysctl_sched_energy_aware = enable;
> > +       state = static_branch_unlikely(&sched_energy_present);
> > +       if (state != sysctl_sched_energy_aware)
> > +               rebuild_sched_domains_energy();
> > +}
> > +
> >  #ifdef CONFIG_PROC_SYSCTL
> >  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
> >                 void *buffer, size_t *lenp, loff_t *ppos)
> > --
> > 2.47.0.338.g60cca15819-goog
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
@ 2024-11-16  7:43   ` Greg Kroah-Hartman
  2024-11-16 21:06     ` Saravana Kannan
  2024-12-04 12:53   ` Rafael J. Wysocki
  2025-03-14 20:47   ` Pavel Machek
  2 siblings, 1 reply; 33+ messages in thread
From: Greg Kroah-Hartman @ 2024-11-16  7:43 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote:
> Some devices might have their is_suspended flag set to false. In these
> cases, dpm_resume() should skip doing anything for those devices.
> However, runtime PM enable and a few others steps are done before
> checking for this flag. Fix it so that we do things in the right order.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>

This looks like a nice generic fix as well, should it go to older
kernels?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2024-11-16  7:43   ` Greg Kroah-Hartman
@ 2024-11-16 21:06     ` Saravana Kannan
  0 siblings, 0 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-16 21:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Fri, Nov 15, 2024 at 11:43 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote:
> > Some devices might have their is_suspended flag set to false. In these
> > cases, dpm_resume() should skip doing anything for those devices.
> > However, runtime PM enable and a few others steps are done before
> > checking for this flag. Fix it so that we do things in the right order.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> This looks like a nice generic fix as well, should it go to older
> kernels?

Yeah, it's meant to be a generic fix. But I'll feel better about it if
someone familiar with this code can give it a Reviewed-by.

And as I look at it... I have a bug in there. I think it should be
goto Complete and not Unlock! No idea how my devices didn't get freed
after a few suspend aborts!

I can send it out as a separate patch too if you want. And depending
on when it lands, I can keep it or drop it from v2 of this series.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
  2024-11-15  5:25   ` Saravana Kannan
  2024-11-15 16:12   ` Vincent Guittot
@ 2024-11-17  0:34   ` kernel test robot
  2024-11-17  1:17   ` kernel test robot
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2024-11-17  0:34 UTC (permalink / raw)
  To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: oe-kbuild-all, Saravana Kannan, Geert Uytterhoeven, Marek Vasut,
	Bird, Tim, kernel-team, linux-pm, linux-kernel

Hi Saravana,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com
patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
config: arm-s3c6400_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411170802.SnHuptE7-lkp@intel.com/

All errors (new ones prefixed by >>):

   arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_enter':
>> kernel/power/suspend.c:485:(.text+0x2f4): undefined reference to `sched_set_energy_aware'
>> arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x48c): undefined reference to `sched_set_energy_aware'
   arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4c0): undefined reference to `sched_set_energy_aware'
   arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4d0): undefined reference to `sched_set_energy_aware'
   arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter':
   kernel/power/suspend.c:538:(.text+0x748): undefined reference to `sched_set_energy_aware'


vim +485 kernel/power/suspend.c

   401	
   402	/**
   403	 * suspend_enter - Make the system enter the given sleep state.
   404	 * @state: System sleep state to enter.
   405	 * @wakeup: Returns information that the sleep state should not be re-entered.
   406	 *
   407	 * This function should be called after devices have been suspended.
   408	 */
   409	static int suspend_enter(suspend_state_t state, bool *wakeup)
   410	{
   411		int error;
   412	
   413		error = platform_suspend_prepare(state);
   414		if (error)
   415			goto Platform_finish;
   416	
   417		error = dpm_suspend_late(PMSG_SUSPEND);
   418		if (error) {
   419			pr_err("late suspend of devices failed\n");
   420			goto Platform_finish;
   421		}
   422		error = platform_suspend_prepare_late(state);
   423		if (error)
   424			goto Devices_early_resume;
   425	
   426		error = dpm_suspend_noirq(PMSG_SUSPEND);
   427		if (error) {
   428			pr_err("noirq suspend of devices failed\n");
   429			goto Platform_early_resume;
   430		}
   431		error = platform_suspend_prepare_noirq(state);
   432		if (error)
   433			goto Platform_wake;
   434	
   435		if (suspend_test(TEST_PLATFORM))
   436			goto Platform_wake;
   437	
   438		if (state == PM_SUSPEND_TO_IDLE) {
   439			s2idle_loop();
   440			goto Platform_wake;
   441		}
   442	
   443		error = pm_sleep_disable_secondary_cpus();
   444		if (error || suspend_test(TEST_CPUS))
   445			goto Enable_cpus;
   446	
   447		arch_suspend_disable_irqs();
   448		BUG_ON(!irqs_disabled());
   449	
   450		system_state = SYSTEM_SUSPEND;
   451	
   452		error = syscore_suspend();
   453		if (!error) {
   454			*wakeup = pm_wakeup_pending();
   455			if (!(suspend_test(TEST_CORE) || *wakeup)) {
   456				trace_suspend_resume(TPS("machine_suspend"),
   457					state, true);
   458				error = suspend_ops->enter(state);
   459				trace_suspend_resume(TPS("machine_suspend"),
   460					state, false);
   461			} else if (*wakeup) {
   462				error = -EBUSY;
   463			}
   464			syscore_resume();
   465		}
   466	
   467		system_state = SYSTEM_RUNNING;
   468	
   469		arch_suspend_enable_irqs();
   470		BUG_ON(irqs_disabled());
   471	
   472	 Enable_cpus:
   473		pm_sleep_enable_secondary_cpus();
   474	
   475	 Platform_wake:
   476		platform_resume_noirq(state);
   477		/*
   478		 * We do this only for resume instead of suspend and resume for these
   479		 * reasons:
   480		 * - Performance is more important than power for resume.
   481		 * - Power spent entering suspend is more important for suspend. Also,
   482		 *   stangely, disabling EAS was making suspent a few milliseconds
   483		 *   slower in my testing.
   484		 */
 > 485		sched_set_energy_aware(0);
   486		dpm_resume_noirq(PMSG_RESUME);
   487	
   488	 Platform_early_resume:
   489		platform_resume_early(state);
   490	
   491	 Devices_early_resume:
   492		dpm_resume_early(PMSG_RESUME);
   493	
   494	 Platform_finish:
   495		platform_resume_finish(state);
   496		return error;
   497	}
   498	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
                     ` (2 preceding siblings ...)
  2024-11-17  0:34   ` kernel test robot
@ 2024-11-17  1:17   ` kernel test robot
  2024-11-17 13:34   ` kernel test robot
  2024-11-18  9:52   ` Christian Loehle
  5 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2024-11-17  1:17 UTC (permalink / raw)
  To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: oe-kbuild-all, Saravana Kannan, Geert Uytterhoeven, Marek Vasut,
	Bird, Tim, kernel-team, linux-pm, linux-kernel

Hi Saravana,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com
patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
config: powerpc-tqm8560_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411170920.Pv29JceD-lkp@intel.com/

All errors (new ones prefixed by >>):

   powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_enter':
   suspend.c:(.text+0x414): undefined reference to `sched_set_energy_aware'
>> powerpc-linux-ld: suspend.c:(.text+0x63c): undefined reference to `sched_set_energy_aware'
   powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter':
   suspend.c:(.text+0x834): undefined reference to `sched_set_energy_aware'
   powerpc-linux-ld: suspend.c:(.text+0x950): undefined reference to `sched_set_energy_aware'
   powerpc-linux-ld: suspend.c:(.text+0x970): undefined reference to `sched_set_energy_aware'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
                     ` (3 preceding siblings ...)
  2024-11-17  1:17   ` kernel test robot
@ 2024-11-17 13:34   ` kernel test robot
  2024-11-18  9:52   ` Christian Loehle
  5 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2024-11-17 13:34 UTC (permalink / raw)
  To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: oe-kbuild-all, Saravana Kannan, Geert Uytterhoeven, Marek Vasut,
	Bird, Tim, kernel-team, linux-pm, linux-kernel

Hi Saravana,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com
patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
config: arm-randconfig-003-20241117 (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411172344.QqFap290-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from kernel/sched/build_utility.c:88:
>> kernel/sched/topology.c:287:6: warning: no previous prototype for 'sched_set_energy_aware' [-Wmissing-prototypes]
     287 | void sched_set_energy_aware(unsigned int enable)
         |      ^~~~~~~~~~~~~~~~~~~~~~


vim +/sched_set_energy_aware +287 kernel/sched/topology.c

   286	
 > 287	void sched_set_energy_aware(unsigned int enable)
   288	{
   289		int state;
   290	
   291		if (!sched_is_eas_possible(cpu_active_mask))
   292			return;
   293	
   294		sysctl_sched_energy_aware = enable;
   295		state = static_branch_unlikely(&sched_energy_present);
   296		if (state != sysctl_sched_energy_aware)
   297			rebuild_sched_domains_energy();
   298	}
   299	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
                     ` (4 preceding siblings ...)
  2024-11-17 13:34   ` kernel test robot
@ 2024-11-18  9:52   ` Christian Loehle
  2024-11-18 17:18     ` Saravana Kannan
  5 siblings, 1 reply; 33+ messages in thread
From: Christian Loehle @ 2024-11-18  9:52 UTC (permalink / raw)
  To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On 11/14/24 22:09, Saravana Kannan wrote:
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
> 
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
> 
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
> 
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase			    | Old full sync | New full async | % change |
> |			    |		    | + EAS disabled |		|
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum			    |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>  	local_irq_enable();
>  }
>  
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>  
>   Platform_wake:
>  	platform_resume_noirq(state);
> +	/*
> +	 * We do this only for resume instead of suspend and resume for these
> +	 * reasons:
> +	 * - Performance is more important than power for resume.
> +	 * - Power spent entering suspend is more important for suspend. Also,
> +	 *   stangely, disabling EAS was making suspent a few milliseconds
> +	 *   slower in my testing.

s/stangely/strangely
s/suspent/suspend
I'd also be curious why that is. Disabling EAS shouldn't be that expensive.
What if you just hack the static branch switch (without the sd rebuild)?

> +	 */
> +	sched_set_energy_aware(0);
>  	dpm_resume_noirq(PMSG_RESUME);
>  
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>  	suspend_test_start();
>  	dpm_resume_end(PMSG_RESUME);
> +	sched_set_energy_aware(1);
>  	suspend_test_finish("resume devices");
>  	trace_suspend_resume(TPS("resume_console"), state, true);
>  	resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>  	mutex_unlock(&sched_energy_mutex);
>  }
>  
> +void sched_set_energy_aware(unsigned int enable)

bool enable?

> +{
> +	int state;
> +
> +	if (!sched_is_eas_possible(cpu_active_mask))
> +		return;
> +
> +	sysctl_sched_energy_aware = enable;
> +	state = static_branch_unlikely(&sched_energy_present);
> +	if (state != sysctl_sched_energy_aware)
> +		rebuild_sched_domains_energy();
> +}
> +

This definitely shouldn't just overwrite
sysctl_sched_energy_aware, otherwise you enable EAS
for users that explicitly disabled it.

If it ever comes to other users wanting this we might
need a eas_pause counter so this can be nested, but
let's just hope that's never needed.

Regards,
Christian


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases
  2024-11-18  9:52   ` Christian Loehle
@ 2024-11-18 17:18     ` Saravana Kannan
  0 siblings, 0 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-18 17:18 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

On Mon, Nov 18, 2024 at 1:52 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 11/14/24 22:09, Saravana Kannan wrote:
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                           | Old full sync | New full async | % change |
> > |                         |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                     |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >       local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >       platform_resume_noirq(state);
> > +     /*
> > +      * We do this only for resume instead of suspend and resume for these
> > +      * reasons:
> > +      * - Performance is more important than power for resume.
> > +      * - Power spent entering suspend is more important for suspend. Also,
> > +      *   stangely, disabling EAS was making suspent a few milliseconds
> > +      *   slower in my testing.
>
> s/stangely/strangely
> s/suspent/suspend

Will fix it in the next version.

> I'd also be curious why that is. Disabling EAS shouldn't be that expensive.
> What if you just hack the static branch switch (without the sd rebuild)?

I don't think the enabling/disabling is the expensive part. Because I
do it around dpm_resume*() and it helps performance. I tried to see if
I could spot a reason, looking at the trace. But nothing stood out.

My educated guess is that when going into suspend, the "thundering
herd" happens early (all the leaf nodes suspend first) and then peters
out. Whereas, during resume it's a slow ramp up until the "thundering
herd" happens at the end (all the leaf nodes resume last).  Spreading
out the threads immediately (no EAS) probably has a different impact
on these two styles of thundering herds.

>
> > +      */
> > +     sched_set_energy_aware(0);
> >       dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >       suspend_test_start();
> >       dpm_resume_end(PMSG_RESUME);
> > +     sched_set_energy_aware(1);
> >       suspend_test_finish("resume devices");
> >       trace_suspend_resume(TPS("resume_console"), state, true);
> >       resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >       mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
> bool enable?

Will do.

>
> > +{
> > +     int state;
> > +
> > +     if (!sched_is_eas_possible(cpu_active_mask))
> > +             return;
> > +
> > +     sysctl_sched_energy_aware = enable;
> > +     state = static_branch_unlikely(&sched_energy_present);
> > +     if (state != sysctl_sched_energy_aware)
> > +             rebuild_sched_domains_energy();
> > +}
> > +
>
> This definitely shouldn't just overwrite
> sysctl_sched_energy_aware, otherwise you enable EAS
> for users that explicitly disabled it.

Good point. Will fix it in the next version.

Thanks for the review!

-Saravana

>
> If it ever comes to other users wanting this we might
> need a eas_pause counter so this can be nested, but
> let's just hope that's never needed.
>
> Regards,
> Christian
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 0/5] Optimize async device suspend/resume
  2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
                   ` (4 preceding siblings ...)
  2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
@ 2024-11-19  4:04 ` Saravana Kannan
  2024-11-19  9:51   ` Greg Kroah-Hartman
  5 siblings, 1 reply; 33+ messages in thread
From: Saravana Kannan @ 2024-11-19  4:04 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider
  Cc: Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> A lot of the details are in patch 4/5 and 5/5. The summary is that
> there's a lot of overhead and wasted work in how async device
> suspend/resume is handled today. I talked about this and otther
> suspend/resume issues at LPC 2024[1].
>
> You can remove a lot of the overhead by doing a breadth first queuing of
> async suspend/resumes. That's what this patch series does. I also
> noticed that during resume, because of EAS, we don't use the bigger CPUs
> as quickly. This was leading to a lot of scheduling latency and
> preemption of runnable threads and increasing the resume latency. So, we
> also disable EAS for that tiny period of resume where we know there'll
> be a lot of parallelism.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
> series yields significant improvements:
> +---------------------------+-----------+----------------+------------+-------+
> | Phase                     | Old full sync | Old full async | New full async |
> |                           |               |                | + EAS disabled |
> +---------------------------+-----------+----------------+------------+-------+
> | Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
> +---------------------------+-----------+----------------+------------+-------+
> | Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
> +---------------------------+-----------+----------------+------------+-------+
> | Sum                       |        182 ms |         162 ms |         123 ms |
> +---------------------------+-----------+----------------+------------+-------+
>
> There might be room for some more optimizations in the future, but I'm
> keep this patch series simple enough so that it's easier to review and
> check that it's not breaking anything. If this series lands and is
> stable and no bug reports for a few months, I can work on optimizing
> this a bit further.
>
> Thanks,
> Saravana
> P.S: Cc-ing some usual suspects you might be interested in testing this
> out.
>
> [1] - https://lpc.events/event/18/contributions/1845/
>
> Saravana Kannan (5):
>   PM: sleep: Fix runtime PM issue in dpm_resume()
>   PM: sleep: Remove unnecessary mutex lock when waiting on parent
>   PM: sleep: Add helper functions to loop through superior/subordinate
>     devs
>   PM: sleep: Do breadth first suspend/resume for async suspend/resume
>   PM: sleep: Spread out async kworker threads during dpm_resume*()
>     phases
>
>  drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------

Hi Rafael/Greg,

I'm waiting for one of your reviews before I send out the next version.

-Saravana

>  kernel/power/suspend.c    |  16 ++
>  kernel/sched/topology.c   |  13 ++
>  3 files changed, 276 insertions(+), 78 deletions(-)
>
> --
> 2.47.0.338.g60cca15819-goog
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 0/5] Optimize async device suspend/resume
  2024-11-19  4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
@ 2024-11-19  9:51   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2024-11-19  9:51 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Mon, Nov 18, 2024 at 08:04:26PM -0800, Saravana Kannan wrote:
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > A lot of the details are in patch 4/5 and 5/5. The summary is that
> > there's a lot of overhead and wasted work in how async device
> > suspend/resume is handled today. I talked about this and otther
> > suspend/resume issues at LPC 2024[1].
> >
> > You can remove a lot of the overhead by doing a breadth first queuing of
> > async suspend/resumes. That's what this patch series does. I also
> > noticed that during resume, because of EAS, we don't use the bigger CPUs
> > as quickly. This was leading to a lot of scheduling latency and
> > preemption of runnable threads and increasing the resume latency. So, we
> > also disable EAS for that tiny period of resume where we know there'll
> > be a lot of parallelism.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
> > series yields significant improvements:
> > +---------------------------+-----------+----------------+------------+-------+
> > | Phase                     | Old full sync | Old full async | New full async |
> > |                           |               |                | + EAS disabled |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Sum                       |        182 ms |         162 ms |         123 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> >
> > There might be room for some more optimizations in the future, but I'm
> > keep this patch series simple enough so that it's easier to review and
> > check that it's not breaking anything. If this series lands and is
> > stable and no bug reports for a few months, I can work on optimizing
> > this a bit further.
> >
> > Thanks,
> > Saravana
> > P.S: Cc-ing some usual suspects you might be interested in testing this
> > out.
> >
> > [1] - https://lpc.events/event/18/contributions/1845/
> >
> > Saravana Kannan (5):
> >   PM: sleep: Fix runtime PM issue in dpm_resume()
> >   PM: sleep: Remove unnecessary mutex lock when waiting on parent
> >   PM: sleep: Add helper functions to loop through superior/subordinate
> >     devs
> >   PM: sleep: Do breadth first suspend/resume for async suspend/resume
> >   PM: sleep: Spread out async kworker threads during dpm_resume*()
> >     phases
> >
> >  drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
> 
> Hi Rafael/Greg,
> 
> I'm waiting for one of your reviews before I send out the next version.

Please feel free to send, it's the middle of the merge window now, and
I'm busy with that for the next 2 weeks, so I can't do anything until
after that.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan
@ 2024-12-02 20:11   ` Rafael J. Wysocki
  2024-12-02 20:16     ` Rafael J. Wysocki
  0 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2024-12-02 20:11 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

Sorry for the delay.

On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Locking is not needed to do get_device(dev->parent). We either get a NULL
> (if the parent was cleared) or the actual parent. Also, when a device is
> deleted (device_del()) and removed from the dpm_list, its completion
> variable is also complete_all()-ed. So, we don't have to worry about
> waiting indefinitely on a deleted parent device.

The device_pm_initialized(dev) check before get_device(dev->parent)
doesn't make sense without the locking and that's the whole point of
it.

> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/main.c | 13 ++-----------
>  1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 86e51b9fefab..9b9b6088e56a 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
>          * counting the parent once more unless the device has been deleted
>          * already (in which case return right away).
>          */
> -       mutex_lock(&dpm_list_mtx);
> -
> -       if (!device_pm_initialized(dev)) {
> -               mutex_unlock(&dpm_list_mtx);
> -               return false;
> -       }
> -
>         parent = get_device(dev->parent);
> -
> -       mutex_unlock(&dpm_list_mtx);
> -
> -       dpm_wait(parent, async);
> +       if (device_pm_initialized(dev))
> +               dpm_wait(parent, async);

This is racy, so what's the point?

You can just do

parent = get_device(dev->parent);
dpm_wait(parent, async);

and please update the comment above this.

>         put_device(parent);
>
>         dpm_wait_for_suppliers(dev, async);
> --

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-12-02 20:11   ` Rafael J. Wysocki
@ 2024-12-02 20:16     ` Rafael J. Wysocki
  2024-12-02 20:46       ` Saravana Kannan
  0 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2024-12-02 20:16 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> Sorry for the delay.
>
> On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Locking is not needed to do get_device(dev->parent). We either get a NULL
> > (if the parent was cleared) or the actual parent. Also, when a device is
> > deleted (device_del()) and removed from the dpm_list, its completion
> > variable is also complete_all()-ed. So, we don't have to worry about
> > waiting indefinitely on a deleted parent device.
>
> The device_pm_initialized(dev) check before get_device(dev->parent)
> doesn't make sense without the locking and that's the whole point of
> it.

Hmm, not really.

How is the parent prevented from going away in get_device() right
after the initial dev check without the locking?

> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/power/main.c | 13 ++-----------
> >  1 file changed, 2 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 86e51b9fefab..9b9b6088e56a 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
> >          * counting the parent once more unless the device has been deleted
> >          * already (in which case return right away).
> >          */
> > -       mutex_lock(&dpm_list_mtx);
> > -
> > -       if (!device_pm_initialized(dev)) {
> > -               mutex_unlock(&dpm_list_mtx);
> > -               return false;
> > -       }
> > -
> >         parent = get_device(dev->parent);
> > -
> > -       mutex_unlock(&dpm_list_mtx);
> > -
> > -       dpm_wait(parent, async);
> > +       if (device_pm_initialized(dev))
> > +               dpm_wait(parent, async);
>
> This is racy, so what's the point?
>
> You can just do
>
> parent = get_device(dev->parent);
> dpm_wait(parent, async);
>
> and please update the comment above this.
>
> >         put_device(parent);
> >
> >         dpm_wait_for_suppliers(dev, async);
> > --

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-12-02 20:16     ` Rafael J. Wysocki
@ 2024-12-02 20:46       ` Saravana Kannan
  2024-12-02 21:14         ` Rafael J. Wysocki
  0 siblings, 1 reply; 33+ messages in thread
From: Saravana Kannan @ 2024-12-02 20:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > Sorry for the delay.
> >
> > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Locking is not needed to do get_device(dev->parent). We either get a NULL
> > > (if the parent was cleared) or the actual parent. Also, when a device is
> > > deleted (device_del()) and removed from the dpm_list, its completion
> > > variable is also complete_all()-ed. So, we don't have to worry about
> > > waiting indefinitely on a deleted parent device.
> >
> > The device_pm_initialized(dev) check before get_device(dev->parent)
> > doesn't make sense without the locking and that's the whole point of
> > it.
>
> Hmm, not really.
>
> How is the parent prevented from going away in get_device() right
> after the initial dev check without the locking?

Not sure what you mean by "go away"? But get_device() is going to keep
a non-zero refcount on the parent so that struct doesn't get freed.
The parent itself can still "go away" in terms of unbinding or
removing the children from the dpm_list(). But that's what the
device_pm_initialized() check is for. When a device_del() is called,
it's removed from the dpm_list. The actual freeing comes later. But we
aren't/weren't checking for that anyway.

>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/base/power/main.c | 13 ++-----------
> > >  1 file changed, 2 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > index 86e51b9fefab..9b9b6088e56a 100644
> > > --- a/drivers/base/power/main.c
> > > +++ b/drivers/base/power/main.c
> > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
> > >          * counting the parent once more unless the device has been deleted
> > >          * already (in which case return right away).
> > >          */
> > > -       mutex_lock(&dpm_list_mtx);
> > > -
> > > -       if (!device_pm_initialized(dev)) {
> > > -               mutex_unlock(&dpm_list_mtx);
> > > -               return false;
> > > -       }
> > > -
> > >         parent = get_device(dev->parent);
> > > -
> > > -       mutex_unlock(&dpm_list_mtx);
> > > -
> > > -       dpm_wait(parent, async);
> > > +       if (device_pm_initialized(dev))
> > > +               dpm_wait(parent, async);
> >
> > This is racy, so what's the point?
> >
> > You can just do
> >
> > parent = get_device(dev->parent);
> > dpm_wait(parent, async);

Parent struct device being there isn't the same as whether this device
is in the dpm_list? So, shouldn't we still check this?

Also, is it really racy anymore with the new algorithm? We don't kick
off the subordinates until after the parent is done.

> >
> > and please update the comment above this.

Will do.

Thanks,
Saravana
> >
> > >         put_device(parent);
> > >
> > >         dpm_wait_for_suppliers(dev, async);
> > > --

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-12-02 20:46       ` Saravana Kannan
@ 2024-12-02 21:14         ` Rafael J. Wysocki
  2024-12-02 23:27           ` Saravana Kannan
  0 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2024-12-02 21:14 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > Sorry for the delay.
> > >
> > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Locking is not needed to do get_device(dev->parent). We either get a NULL
> > > > (if the parent was cleared) or the actual parent. Also, when a device is
> > > > deleted (device_del()) and removed from the dpm_list, its completion
> > > > variable is also complete_all()-ed. So, we don't have to worry about
> > > > waiting indefinitely on a deleted parent device.
> > >
> > > The device_pm_initialized(dev) check before get_device(dev->parent)
> > > doesn't make sense without the locking and that's the whole point of
> > > it.
> >
> > Hmm, not really.
> >
> > How is the parent prevented from going away in get_device() right
> > after the initial dev check without the locking?
>
> Not sure what you mean by "go away"? But get_device() is going to keep
> a non-zero refcount on the parent so that struct doesn't get freed.

That's after it has returned.

There is nothing to prevent dev from being freed in get_device()
itself before the kobject_get(&dev->kobj) call.

So when get_device() is called, there needs to be an active ref on the
device already or something else to prevent the target device object
from being freed.

In this particular case it is the lock along with the
device_pm_initialized(dev) check on the child.

> The parent itself can still "go away" in terms of unbinding or
> removing the children from the dpm_list(). But that's what the
> device_pm_initialized() check is for. When a device_del() is called,
> it's removed from the dpm_list. The actual freeing comes later. But we
> aren't/weren't checking for that anyway.
>
> >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  drivers/base/power/main.c | 13 ++-----------
> > > >  1 file changed, 2 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > index 86e51b9fefab..9b9b6088e56a 100644
> > > > --- a/drivers/base/power/main.c
> > > > +++ b/drivers/base/power/main.c
> > > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
> > > >          * counting the parent once more unless the device has been deleted
> > > >          * already (in which case return right away).
> > > >          */
> > > > -       mutex_lock(&dpm_list_mtx);
> > > > -
> > > > -       if (!device_pm_initialized(dev)) {
> > > > -               mutex_unlock(&dpm_list_mtx);
> > > > -               return false;
> > > > -       }
> > > > -
> > > >         parent = get_device(dev->parent);
> > > > -
> > > > -       mutex_unlock(&dpm_list_mtx);
> > > > -
> > > > -       dpm_wait(parent, async);
> > > > +       if (device_pm_initialized(dev))
> > > > +               dpm_wait(parent, async);
> > >
> > > This is racy, so what's the point?
> > >
> > > You can just do
> > >
> > > parent = get_device(dev->parent);
> > > dpm_wait(parent, async);
>
> Parent struct device being there isn't the same as whether this device
> is in the dpm_list? So, shouldn't we still check this?
>
> Also, is it really racy anymore with the new algorithm? We don't kick
> off the subordinates until after the parent is done.
>
> > >
> > > and please update the comment above this.
>
> Will do.
>
> Thanks,
> Saravana
> > >
> > > >         put_device(parent);
> > > >
> > > >         dpm_wait_for_suppliers(dev, async);
> > > > --

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-12-02 21:14         ` Rafael J. Wysocki
@ 2024-12-02 23:27           ` Saravana Kannan
  2024-12-04 12:21             ` Rafael J. Wysocki
  0 siblings, 1 reply; 33+ messages in thread
From: Saravana Kannan @ 2024-12-02 23:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Mon, Dec 2, 2024 at 1:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > Sorry for the delay.
> > > >
> > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL
> > > > > (if the parent was cleared) or the actual parent. Also, when a device is
> > > > > deleted (device_del()) and removed from the dpm_list, its completion
> > > > > variable is also complete_all()-ed. So, we don't have to worry about
> > > > > waiting indefinitely on a deleted parent device.
> > > >
> > > > The device_pm_initialized(dev) check before get_device(dev->parent)
> > > > doesn't make sense without the locking and that's the whole point of
> > > > it.
> > >
> > > Hmm, not really.
> > >
> > > How is the parent prevented from going away in get_device() right
> > > after the initial dev check without the locking?
> >
> > Not sure what you mean by "go away"? But get_device() is going to keep
> > a non-zero refcount on the parent so that struct doesn't get freed.
>
> That's after it has returned.
>
> There is nothing to prevent dev from being freed in get_device()
> itself before the kobject_get(&dev->kobj) call.
>
> So when get_device() is called, there needs to be an active ref on the
> device already or something else to prevent the target device object
> from being freed.
>
> In this particular case it is the lock along with the
> device_pm_initialized(dev) check on the child.

Ugh... my head hurts whenever I think about get_device(). Yeah, I
think you are right.

Hmm... I wanted to have this function be replaced by a call to a
generic helper function dpm_for_each_superior() in the next patch. But
that helper function could be called from places where the dpm_list
lock is held. Also, I was planning on making it even more generic (not
specific to dpm) in the future. So, depending on dpm_list lock didn't
make sense.

Any other ideas on how I could do this without grabbing the dpm_list mutex?

Actually, with the rewrite and at the end of this series, we don't
have to worry about this race because each device's suspend/resume is
only started after all the dependencies are done. So, if the parent
deletes a child and itself, the child device's suspend wouldn't have
been triggered at all.

So, another option is to just squash the series and call it a day. Or
add a comment to the commit text that this adds a race that's fixed by
the time we get to the end of the series.

Thoughts?

Thanks,
Saravana


>
> > The parent itself can still "go away" in terms of unbinding or
> > removing the children from the dpm_list(). But that's what the
> > device_pm_initialized() check is for. When a device_del() is called,
> > it's removed from the dpm_list. The actual freeing comes later. But we
> > aren't/weren't checking for that anyway.
> >
> > >
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > ---
> > > > >  drivers/base/power/main.c | 13 ++-----------
> > > > >  1 file changed, 2 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > > index 86e51b9fefab..9b9b6088e56a 100644
> > > > > --- a/drivers/base/power/main.c
> > > > > +++ b/drivers/base/power/main.c
> > > > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
> > > > >          * counting the parent once more unless the device has been deleted
> > > > >          * already (in which case return right away).
> > > > >          */
> > > > > -       mutex_lock(&dpm_list_mtx);
> > > > > -
> > > > > -       if (!device_pm_initialized(dev)) {
> > > > > -               mutex_unlock(&dpm_list_mtx);
> > > > > -               return false;
> > > > > -       }
> > > > > -
> > > > >         parent = get_device(dev->parent);
> > > > > -
> > > > > -       mutex_unlock(&dpm_list_mtx);
> > > > > -
> > > > > -       dpm_wait(parent, async);
> > > > > +       if (device_pm_initialized(dev))
> > > > > +               dpm_wait(parent, async);
> > > >
> > > > This is racy, so what's the point?
> > > >
> > > > You can just do
> > > >
> > > > parent = get_device(dev->parent);
> > > > dpm_wait(parent, async);
> >
> > Parent struct device being there isn't the same as whether this device
> > is in the dpm_list? So, shouldn't we still check this?
> >
> > Also, is it really racy anymore with the new algorithm? We don't kick
> > off the subordinates until after the parent is done.
> >
> > > >
> > > > and please update the comment above this.
> >
> > Will do.
> >
> > Thanks,
> > Saravana
> > > >
> > > > >         put_device(parent);
> > > > >
> > > > >         dpm_wait_for_suppliers(dev, async);
> > > > > --

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
  2024-12-02 23:27           ` Saravana Kannan
@ 2024-12-04 12:21             ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2024-12-04 12:21 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
	kernel-team, linux-pm, linux-kernel

On Tue, Dec 3, 2024 at 12:28 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Dec 2, 2024 at 1:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > Sorry for the delay.
> > > > >
> > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL
> > > > > > (if the parent was cleared) or the actual parent. Also, when a device is
> > > > > > deleted (device_del()) and removed from the dpm_list, its completion
> > > > > > variable is also complete_all()-ed. So, we don't have to worry about
> > > > > > waiting indefinitely on a deleted parent device.
> > > > >
> > > > > The device_pm_initialized(dev) check before get_device(dev->parent)
> > > > > doesn't make sense without the locking and that's the whole point of
> > > > > it.
> > > >
> > > > Hmm, not really.
> > > >
> > > > How is the parent prevented from going away in get_device() right
> > > > after the initial dev check without the locking?
> > >
> > > Not sure what you mean by "go away"? But get_device() is going to keep
> > > a non-zero refcount on the parent so that struct doesn't get freed.
> >
> > That's after it has returned.
> >
> > There is nothing to prevent dev from being freed in get_device()
> > itself before the kobject_get(&dev->kobj) call.
> >
> > So when get_device() is called, there needs to be an active ref on the
> > device already or something else to prevent the target device object
> > from being freed.
> >
> > In this particular case it is the lock along with the
> > device_pm_initialized(dev) check on the child.
>
> Ugh... my head hurts whenever I think about get_device(). Yeah, I
> think you are right.
>
> Hmm... I wanted to have this function be replaced by a call to a
> generic helper function dpm_for_each_superior() in the next patch. But
> that helper function could be called from places where the dpm_list
> lock is held. Also, I was planning on making it even more generic (not
> specific to dpm) in the future. So, depending on dpm_list lock didn't
> make sense.
>
> Any other ideas on how I could do this without grabbing the dpm_list mutex?

You don't need to replace the existing function with a new helper.

Just add the helper, use it going forward and drop the old function in
a separate patch when there are no more users of it.

> Actually, with the rewrite and at the end of this series, we don't
> have to worry about this race because each device's suspend/resume is
> only started after all the dependencies are done. So, if the parent
> deletes a child and itself, the child device's suspend wouldn't have
> been triggered at all.
>
> So, another option is to just squash the series and call it a day.

No, no.  This is complicated enough and the code is super-sensitive.

I think that you need to split the changes even more.

> Or add a comment to the commit text that this adds a race that's fixed by
> the time we get to the end of the series.

That would create a bisection trap, so not really.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
  2024-11-16  7:43   ` Greg Kroah-Hartman
@ 2024-12-04 12:53   ` Rafael J. Wysocki
  2025-03-11 10:47     ` Rafael J. Wysocki
  2025-03-14 20:47   ` Pavel Machek
  2 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2024-12-04 12:53 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall,
	Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel

Trim CC list.

On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Some devices might have their is_suspended flag set to false. In these
> cases, dpm_resume() should skip doing anything for those devices.

Not really.  This is particularly untrue for devices with
power.direct_complete set that have power.is_suspended clear.

> However, runtime PM enable and a few others steps are done before
> checking for this flag. Fix it so that we do things in the right order.

I don't see the bug this is fixing, but I can see bugs introduced by it.

I think that you want power.is_suspended to be checked before waiting
for the superiors.  Fair enough, since for devices with
power.is_suspended clear, there should be no superiors to wait for, so
the two checks can be done in any order and checking
power.is_suspended first would be less overhead.  And that's it
AFAICS.

> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4a67e83300e1..86e51b9fefab 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
>         if (dev->power.syscore)
>                 goto Complete;
>
> +       if (!dev->power.is_suspended)
> +               goto Unlock;
> +
>         if (dev->power.direct_complete) {
>                 /* Match the pm_runtime_disable() in __device_suspend(). */
>                 pm_runtime_enable(dev);
> @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
>          */
>         dev->power.is_prepared = false;
>
> -       if (!dev->power.is_suspended)
> -               goto Unlock;
> -
>         if (dev->pm_domain) {
>                 info = "power domain ";
>                 callback = pm_op(&dev->pm_domain->ops, state);
> --
> 2.47.0.338.g60cca15819-goog
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2024-12-04 12:53   ` Rafael J. Wysocki
@ 2025-03-11 10:47     ` Rafael J. Wysocki
  2025-03-13  1:49       ` Saravana Kannan
  0 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2025-03-11 10:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall,
	Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel

On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> Trim CC list.
>
> On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Some devices might have their is_suspended flag set to false. In these
> > cases, dpm_resume() should skip doing anything for those devices.
>
> Not really.  This is particularly untrue for devices with
> power.direct_complete set that have power.is_suspended clear.
>
> > However, runtime PM enable and a few others steps are done before
> > checking for this flag. Fix it so that we do things in the right order.
>
> I don't see the bug this is fixing, but I can see bugs introduced by it.

So AFAICS the bug is in the error path when dpm_suspend() fails in
which case some devices with direct_complete set may not have been
handled by device_suspend().  Since runtime PM has not been disabled
for those devices yet, it doesn't make sense to re-enable runtime PM
for them (and if they had runtime PM disabled to start with, this will
inadvertently enable runtime PM for them).

However, two changes are needed to fix this issue:
(1) power.is_suspended needs to be set for the devices with
direct_complete set in device_suspend().
(2) The power.is_suspended check needs to be moved after the
power.syscore one in device_resume().

The patch below only does (2) which is insufficient and it introduces
a functional issue for the direct_complete devices with runtime PM
disabled because it will cause runtime PM to remain disabled for them
permanently.

> I think that you want power.is_suspended to be checked before waiting
> for the superiors.  Fair enough, since for devices with
> power.is_suspended clear, there should be no superiors to wait for, so
> the two checks can be done in any order and checking
> power.is_suspended first would be less overhead.  And that's it
> AFAICS.
>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/power/main.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 4a67e83300e1..86e51b9fefab 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> >         if (dev->power.syscore)
> >                 goto Complete;
> >
> > +       if (!dev->power.is_suspended)
> > +               goto Unlock;

And this should be "goto Complete" because jumping to Unlock
introduces a device locking imbalance.

> > +
> >         if (dev->power.direct_complete) {
> >                 /* Match the pm_runtime_disable() in __device_suspend(). */
> >                 pm_runtime_enable(dev);
> > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> >          */
> >         dev->power.is_prepared = false;
> >
> > -       if (!dev->power.is_suspended)
> > -               goto Unlock;
> > -
> >         if (dev->pm_domain) {
> >                 info = "power domain ";
> >                 callback = pm_op(&dev->pm_domain->ops, state);
> > --

If you want to submit a new version of this patch, please do so by the
end of the week or I will send my fix because I want this issue to be
addressed in 6.15.

BTW, please note that this is orthogonal to the recent async
suspend-resume series

https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/

so there is no reason why it should be addressed in that series.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume
  2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan
@ 2025-03-11 11:25   ` Geert Uytterhoeven
  0 siblings, 0 replies; 33+ messages in thread
From: Geert Uytterhoeven @ 2025-03-11 11:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

Hi Saravana,

On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
> The dpm_list used for suspend/resume ensures that all superior devices
> (parents and suppliers) precede subordinate devices (children and
> consumers).
>
> Current async resume logic:
> -------------------------------
> * For each resume phase (except the "complete" phase, which is always
>   serialized), the resume logic first queues all async devices in the
>   dpm_list. It then loops through the dpm_list again to resume the sync
>   devices one by one.
>
> * Async devices wait for all their superior devices to resume before
>   starting their own resume operation.
>
> * This process results in multiple sleep and wake-up cycles before an
>   async device actually resumes. This sleeping also causes kworker
>   threads to stall with work for a period. Consequently, the workqueue
>   framework spins up more kworker threads to handle the other async
>   devices.
>
> * The end result is excessive thread creation, wake-ups, sleeps, and
>   context switches for every async device. This overhead makes a full
>   async resume (with all devices marked as async-capable) much slower
>   than a synchronous resume.
>
> Current async suspend logic:
> --------------------------------
> * The async suspend logic differs from the async resume logic. The
>   suspend logic loops through the dpm_list. When it finds an async
>   device, it queues the work and moves on. However, if it encounters a
>   sync device, it waits until the sync device (and all its subordinate
>   devices) have suspended before proceeding to the next device.
>   Therefore, an async suspend device can be left waiting on an
>   unrelated device before even being queued.
>
> * Once queued, an async device experiences the same inefficiencies as
>   in the resume logic (thread creation, wake-ups, sleeps, and context
>   switches).
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the data is as
> follows:
>
> +---------------------------+-----------+------------+----------+
> | Phase                     | Full sync | Full async | % change |
> +---------------------------+-----------+------------+----------+
> | Total dpm_suspend*() time |    107 ms |      72 ms |     -33% |
> +---------------------------+-----------+------------+----------+
> | Total dpm_resume*() time  |     75 ms |      90 ms |     +20% |
> +---------------------------+-----------+------------+----------+
> | Sum                       |    182 ms |     162 ms |     -11% |
> +---------------------------+-----------+------------+----------+
>
> This shows that full async suspend/resume is not a viable option. It
> makes the user-visible resume phase slower and only improves the
> overall time by 11%.
>
> To fix all this, this patches introduces a new async suspend/resume
> logic.
>
> New suspend/resume logic:
> -------------------------
> * For each suspend/resume phase (except "complete" and "prepare,"
>   which are always serialized), the logic first queues only the async
>   devices that don't have to wait for any subordinates (for suspend)
>   or superiors (for resume). It then loops through the dpm_list again
>   to suspend/resume the sync devices one by one.
>
> * When a device (sync or async) successfully suspends/resumes, it
>   examines its superiors/subordinates and queues only the async
>   devices that don't need to wait for any subordinates/superiors.
>
> With this new logic:
>
> * Queued async devices don't have to wait for completion and are
>   always ready to perform their suspend/resume operation.
>
> * The queue of async devices remains short.
>
> * kworkers never sleep for extended periods, and the workqueue
>   framework doesn't spin up many new threads to handle a backlog of
>   async devices.
>
> * The result is approximately NCPU kworker threads running in parallel
>   without sleeping until all async devices finish.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> yields improved results:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          60 ms |     -44% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          74 ms |      -1% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         134 ms |     -26% |
> +---------------------------+-----------+------------+------------------+
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Thanks for your patch!

On Renesas Gray Hawk Single (R-Car V4M) during s2idle:

PM: suspend entry (s2idle)
Filesystems sync: 0.055 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.004 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.003 seconds)

================================
WARNING: inconsistent lock state
6.14.0-rc5-rcar3-05904-g1ec95427acf9 #261 Tainted: G        W
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
s2idle/764 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffffff844392c190 (&dev->power.lock){?.-.}-{3:3}, at: dpm_async_fn+0x24/0xa8
{IN-HARDIRQ-W} state was registered at:
  lock_acquire+0x26c/0x2c4
  _raw_spin_lock_irqsave+0x54/0x70
  pm_suspend_timer_fn+0x20/0x78
  __hrtimer_run_queues+0x204/0x330
  hrtimer_interrupt+0xa8/0x1b0
  arch_timer_handler_virt+0x28/0x3c
  handle_percpu_devid_irq+0x64/0x110
  handle_irq_desc+0x3c/0x50
  generic_handle_domain_irq+0x18/0x20
  gic_handle_irq+0x50/0xbc
  call_on_irq_stack+0x24/0x34
  do_interrupt_handler+0x60/0x88
  el1_interrupt+0x30/0x48
  el1h_64_irq_handler+0x14/0x1c
  el1h_64_irq+0x70/0x74
  cpuidle_enter_state+0x1a4/0x2d0
  cpuidle_enter+0x34/0x48
  do_idle+0x21c/0x240
  cpu_startup_entry+0x30/0x34
  kernel_init+0x0/0x124
  console_on_rootfs+0x0/0x64
  __primary_switched+0x88/0x90
irq event stamp: 17055
hardirqs last  enabled at (17055): [<ffffffc080a0782c>]
_raw_spin_unlock_irqrestore+0x34/0x54
hardirqs last disabled at (17054): [<ffffffc080a075a4>]
_raw_spin_lock_irqsave+0x28/0x70
softirqs last  enabled at (14360): [<ffffffc080096bcc>]
handle_softirqs+0x1b0/0x3b4
softirqs last disabled at (14355): [<ffffffc080010168>] __do_softirq+0x10/0x18

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dev->power.lock);
  <Interrupt>
    lock(&dev->power.lock);

 *** DEADLOCK ***

5 locks held by s2idle/764:
 #0: ffffff84400983f0 (sb_writers#5){.+.+}-{0:0}, at:
file_start_write.isra.0+0x24/0x30
 #1: ffffff8446802288 (&of->mutex#2){+.+.}-{4:4}, at:
kernfs_fop_write_iter+0xf8/0x180
 #2: ffffff8440d016e8 (kn->active#39){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x100/0x180
 #3: ffffffc0812f4780 (system_transition_mutex){+.+.}-{4:4}, at:
pm_suspend+0x84/0x248
 #4: ffffffc084bb7f78 (dpm_list_mtx){+.+.}-{4:4}, at: dpm_suspend+0x84/0x1a8

stack backtrace:
CPU: 0 UID: 0 PID: 764 Comm: s2idle Tainted: G        W
6.14.0-rc5-rcar3-05904-g1ec95427acf9 #261
Tainted: [W]=WARN
Hardware name: Renesas Gray Hawk Single board based on r8a779h0 (DT)
Call trace:
 show_stack+0x14/0x1c (C)
 dump_stack_lvl+0x78/0xa8
 dump_stack+0x14/0x1c
 print_usage_bug+0x1dc/0x1f8
 mark_lock+0x1c4/0x3a4
 __lock_acquire+0x560/0x1038
 lock_acquire+0x26c/0x2c4
 _raw_spin_lock+0x40/0x54
 dpm_async_fn+0x24/0xa8
 dpm_async_queue_suspend_ready_fn+0x40/0x50
 dpm_async_suspend_loop+0x48/0x50
 dpm_suspend+0x94/0x1a8
 dpm_suspend_start+0x68/0x70
 suspend_devices_and_enter+0xd8/0x59c
 pm_suspend+0x214/0x248
 state_store+0xa8/0xe8
 kobj_attr_store+0x14/0x24
 sysfs_kf_write+0x4c/0x64
 kernfs_fop_write_iter+0x138/0x180
 vfs_write+0x148/0x1b4
 ksys_write+0x78/0xe0
 __arm64_sys_write+0x14/0x1c
 invoke_syscall+0x68/0xf0
 el0_svc_common.constprop.0+0xb0/0xcc
 do_el0_svc+0x18/0x20
 el0_svc+0x38/0x90
 el0t_64_sync_handler+0x80/0x130
 el0t_64_sync+0x158/0x15c


Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2025-03-11 10:47     ` Rafael J. Wysocki
@ 2025-03-13  1:49       ` Saravana Kannan
  2025-03-13 10:58         ` Rafael J. Wysocki
  0 siblings, 1 reply; 33+ messages in thread
From: Saravana Kannan @ 2025-03-13  1:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall,
	Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel

On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > Trim CC list.
> >
> > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Some devices might have their is_suspended flag set to false. In these
> > > cases, dpm_resume() should skip doing anything for those devices.
> >
> > Not really.  This is particularly untrue for devices with
> > power.direct_complete set that have power.is_suspended clear.
> >
> > > However, runtime PM enable and a few others steps are done before
> > > checking for this flag. Fix it so that we do things in the right order.
> >
> > I don't see the bug this is fixing, but I can see bugs introduced by it.
>
> So AFAICS the bug is in the error path when dpm_suspend() fails in
> which case some devices with direct_complete set may not have been
> handled by device_suspend().  Since runtime PM has not been disabled
> for those devices yet, it doesn't make sense to re-enable runtime PM
> for them (and if they had runtime PM disabled to start with, this will
> inadvertently enable runtime PM for them).
>
> However, two changes are needed to fix this issue:
> (1) power.is_suspended needs to be set for the devices with
> direct_complete set in device_suspend().
> (2) The power.is_suspended check needs to be moved after the
> power.syscore one in device_resume().
>
> The patch below only does (2) which is insufficient and it introduces
> a functional issue for the direct_complete devices with runtime PM
> disabled because it will cause runtime PM to remain disabled for them
> permanently.
>
> > I think that you want power.is_suspended to be checked before waiting
> > for the superiors.  Fair enough, since for devices with
> > power.is_suspended clear, there should be no superiors to wait for, so
> > the two checks can be done in any order and checking
> > power.is_suspended first would be less overhead.  And that's it
> > AFAICS.
> >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/base/power/main.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > index 4a67e83300e1..86e51b9fefab 100644
> > > --- a/drivers/base/power/main.c
> > > +++ b/drivers/base/power/main.c
> > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > >         if (dev->power.syscore)
> > >                 goto Complete;
> > >
> > > +       if (!dev->power.is_suspended)
> > > +               goto Unlock;
>
> And this should be "goto Complete" because jumping to Unlock
> introduces a device locking imbalance.
>
> > > +
> > >         if (dev->power.direct_complete) {
> > >                 /* Match the pm_runtime_disable() in __device_suspend(). */
> > >                 pm_runtime_enable(dev);
> > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > >          */
> > >         dev->power.is_prepared = false;
> > >
> > > -       if (!dev->power.is_suspended)
> > > -               goto Unlock;
> > > -
> > >         if (dev->pm_domain) {
> > >                 info = "power domain ";
> > >                 callback = pm_op(&dev->pm_domain->ops, state);
> > > --
>
> If you want to submit a new version of this patch, please do so by the
> end of the week or I will send my fix because I want this issue to be
> addressed in 6.15.

Please do ahead with the fix for this. I'm not too comfortable with
the direct_complete logic yet.

-Saravana

>
> BTW, please note that this is orthogonal to the recent async
> suspend-resume series
>
> https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/
>
> so there is no reason why it should be addressed in that series.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2025-03-13  1:49       ` Saravana Kannan
@ 2025-03-13 10:58         ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2025-03-13 10:58 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Ben Segall, Geert Uytterhoeven, kernel-team, linux-pm,
	linux-kernel

On Thu, Mar 13, 2025 at 2:50 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > Trim CC list.
> > >
> > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Some devices might have their is_suspended flag set to false. In these
> > > > cases, dpm_resume() should skip doing anything for those devices.
> > >
> > > Not really.  This is particularly untrue for devices with
> > > power.direct_complete set that have power.is_suspended clear.
> > >
> > > > However, runtime PM enable and a few others steps are done before
> > > > checking for this flag. Fix it so that we do things in the right order.
> > >
> > > I don't see the bug this is fixing, but I can see bugs introduced by it.
> >
> > So AFAICS the bug is in the error path when dpm_suspend() fails in
> > which case some devices with direct_complete set may not have been
> > handled by device_suspend().  Since runtime PM has not been disabled
> > for those devices yet, it doesn't make sense to re-enable runtime PM
> > for them (and if they had runtime PM disabled to start with, this will
> > inadvertently enable runtime PM for them).
> >
> > However, two changes are needed to fix this issue:
> > (1) power.is_suspended needs to be set for the devices with
> > direct_complete set in device_suspend().
> > (2) The power.is_suspended check needs to be moved after the
> > power.syscore one in device_resume().
> >
> > The patch below only does (2) which is insufficient and it introduces
> > a functional issue for the direct_complete devices with runtime PM
> > disabled because it will cause runtime PM to remain disabled for them
> > permanently.
> >
> > > I think that you want power.is_suspended to be checked before waiting
> > > for the superiors.  Fair enough, since for devices with
> > > power.is_suspended clear, there should be no superiors to wait for, so
> > > the two checks can be done in any order and checking
> > > power.is_suspended first would be less overhead.  And that's it
> > > AFAICS.
> > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  drivers/base/power/main.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > index 4a67e83300e1..86e51b9fefab 100644
> > > > --- a/drivers/base/power/main.c
> > > > +++ b/drivers/base/power/main.c
> > > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > > >         if (dev->power.syscore)
> > > >                 goto Complete;
> > > >
> > > > +       if (!dev->power.is_suspended)
> > > > +               goto Unlock;
> >
> > And this should be "goto Complete" because jumping to Unlock
> > introduces a device locking imbalance.
> >
> > > > +
> > > >         if (dev->power.direct_complete) {
> > > >                 /* Match the pm_runtime_disable() in __device_suspend(). */
> > > >                 pm_runtime_enable(dev);
> > > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > > >          */
> > > >         dev->power.is_prepared = false;
> > > >
> > > > -       if (!dev->power.is_suspended)
> > > > -               goto Unlock;
> > > > -
> > > >         if (dev->pm_domain) {
> > > >                 info = "power domain ";
> > > >                 callback = pm_op(&dev->pm_domain->ops, state);
> > > > --
> >
> > If you want to submit a new version of this patch, please do so by the
> > end of the week or I will send my fix because I want this issue to be
> > addressed in 6.15.
>
> Please do ahead with the fix for this. I'm not too comfortable with
> the direct_complete logic yet.

OK, I will, thank you!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
  2024-11-16  7:43   ` Greg Kroah-Hartman
  2024-12-04 12:53   ` Rafael J. Wysocki
@ 2025-03-14 20:47   ` Pavel Machek
  2025-03-14 20:49     ` Saravana Kannan
  2 siblings, 1 reply; 33+ messages in thread
From: Pavel Machek @ 2025-03-14 20:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 986 bytes --]

Hi!

> Some devices might have their is_suspended flag set to false. In these
> cases, dpm_resume() should skip doing anything for those devices.
> However, runtime PM enable and a few others steps are done before
> checking for this flag. Fix it so that we do things in the right order.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4a67e83300e1..86e51b9fefab 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
>  	if (dev->power.syscore)
>  		goto Complete;
>  
> +	if (!dev->power.is_suspended)
> +		goto Unlock;
> +

This needs to be goto Complete, right?
								Pavel
-- 
People of Russia, stop Putin before his war on Ukraine escalates.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume()
  2025-03-14 20:47   ` Pavel Machek
@ 2025-03-14 20:49     ` Saravana Kannan
  0 siblings, 0 replies; 33+ messages in thread
From: Saravana Kannan @ 2025-03-14 20:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm,
	linux-kernel

On Fri, Mar 14, 2025 at 1:47 PM Pavel Machek <pavel@ucw.cz> wrote:
>
> Hi!
>
> > Some devices might have their is_suspended flag set to false. In these
> > cases, dpm_resume() should skip doing anything for those devices.
> > However, runtime PM enable and a few others steps are done before
> > checking for this flag. Fix it so that we do things in the right order.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/power/main.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 4a67e83300e1..86e51b9fefab 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> >       if (dev->power.syscore)
> >               goto Complete;
> >
> > +     if (!dev->power.is_suspended)
> > +             goto Unlock;
> > +
>
> This needs to be goto Complete, right?

Yeah, it's a bug even I pointed out in another email. But Rafael send
a new proper fixed up patch that I Reviewed-by.

-Saravana

>                                                                 Pavel
> --
> People of Russia, stop Putin before his war on Ukraine escalates.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-03-14 20:54 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
2024-11-16  7:43   ` Greg Kroah-Hartman
2024-11-16 21:06     ` Saravana Kannan
2024-12-04 12:53   ` Rafael J. Wysocki
2025-03-11 10:47     ` Rafael J. Wysocki
2025-03-13  1:49       ` Saravana Kannan
2025-03-13 10:58         ` Rafael J. Wysocki
2025-03-14 20:47   ` Pavel Machek
2025-03-14 20:49     ` Saravana Kannan
2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan
2024-12-02 20:11   ` Rafael J. Wysocki
2024-12-02 20:16     ` Rafael J. Wysocki
2024-12-02 20:46       ` Saravana Kannan
2024-12-02 21:14         ` Rafael J. Wysocki
2024-12-02 23:27           ` Saravana Kannan
2024-12-04 12:21             ` Rafael J. Wysocki
2024-11-14 22:09 ` [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs Saravana Kannan
2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan
2025-03-11 11:25   ` Geert Uytterhoeven
2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan
2024-11-15  5:25   ` Saravana Kannan
2024-11-15  9:25     ` Geert Uytterhoeven
2024-11-15 15:30     ` Vincent Guittot
2024-11-15 16:12   ` Vincent Guittot
2024-11-15 18:33     ` Saravana Kannan
2024-11-17  0:34   ` kernel test robot
2024-11-17  1:17   ` kernel test robot
2024-11-17 13:34   ` kernel test robot
2024-11-18  9:52   ` Christian Loehle
2024-11-18 17:18     ` Saravana Kannan
2024-11-19  4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan
2024-11-19  9:51   ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).