* [PATCH v1 0/5] Optimize async device suspend/resume
@ 2024-11-14 22:09 Saravana Kannan
2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan
` (5 more replies)
0 siblings, 6 replies; 33+ messages in thread
From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw)
To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider
Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim,
kernel-team, linux-pm, linux-kernel
A lot of the details are in patch 4/5 and 5/5. The summary is that
there's a lot of overhead and wasted work in how async device
suspend/resume is handled today. I talked about this and otther
suspend/resume issues at LPC 2024[1].
You can remove a lot of the overhead by doing a breadth first queuing of
async suspend/resumes. That's what this patch series does. I also
noticed that during resume, because of EAS, we don't use the bigger CPUs
as quickly. This was leading to a lot of scheduling latency and
preemption of runnable threads and increasing the resume latency. So, we
also disable EAS for that tiny period of resume where we know there'll
be a lot of parallelism.
On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
series yields significant improvements:
+---------------------------+-----------+----------------+------------+-------+
| Phase | Old full sync | Old full async | New full async |
| | | | + EAS disabled |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_suspend*() time | 107 ms | 72 ms | 62 ms |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_resume*() time | 75 ms | 90 ms | 61 ms |
+---------------------------+-----------+----------------+------------+-------+
| Sum | 182 ms | 162 ms | 123 ms |
+---------------------------+-----------+----------------+------------+-------+
There might be room for some more optimizations in the future, but I'm
keep this patch series simple enough so that it's easier to review and
check that it's not breaking anything. If this series lands and is
stable and no bug reports for a few months, I can work on optimizing
this a bit further.
Thanks,
Saravana
P.S: Cc-ing some usual suspects you might be interested in testing this
out.
[1] - https://lpc.events/event/18/contributions/1845/
Saravana Kannan (5):
PM: sleep: Fix runtime PM issue in dpm_resume()
PM: sleep: Remove unnecessary mutex lock when waiting on parent
PM: sleep: Add helper functions to loop through superior/subordinate
devs
PM: sleep: Do breadth first suspend/resume for async suspend/resume
PM: sleep: Spread out async kworker threads during dpm_resume*()
phases
drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
kernel/power/suspend.c | 16 ++
kernel/sched/topology.c | 13 ++
3 files changed, 276 insertions(+), 78 deletions(-)
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply [flat|nested] 33+ messages in thread* [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan @ 2024-11-14 22:09 ` Saravana Kannan 2024-11-16 7:43 ` Greg Kroah-Hartman ` (2 more replies) 2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan ` (4 subsequent siblings) 5 siblings, 3 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Some devices might have their is_suspended flag set to false. In these cases, dpm_resume() should skip doing anything for those devices. However, runtime PM enable and a few others steps are done before checking for this flag. Fix it so that we do things in the right order. Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/base/power/main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 4a67e83300e1..86e51b9fefab 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; + if (!dev->power.is_suspended) + goto Unlock; + if (dev->power.direct_complete) { /* Match the pm_runtime_disable() in __device_suspend(). */ pm_runtime_enable(dev); @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) */ dev->power.is_prepared = false; - if (!dev->power.is_suspended) - goto Unlock; - if (dev->pm_domain) { info = "power domain "; callback = pm_op(&dev->pm_domain->ops, state); -- 2.47.0.338.g60cca15819-goog ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan @ 2024-11-16 7:43 ` Greg Kroah-Hartman 2024-11-16 21:06 ` Saravana Kannan 2024-12-04 12:53 ` Rafael J. Wysocki 2025-03-14 20:47 ` Pavel Machek 2 siblings, 1 reply; 33+ messages in thread From: Greg Kroah-Hartman @ 2024-11-16 7:43 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote: > Some devices might have their is_suspended flag set to false. In these > cases, dpm_resume() should skip doing anything for those devices. > However, runtime PM enable and a few others steps are done before > checking for this flag. Fix it so that we do things in the right order. > > Signed-off-by: Saravana Kannan <saravanak@google.com> This looks like a nice generic fix as well, should it go to older kernels? thanks, greg k-h ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2024-11-16 7:43 ` Greg Kroah-Hartman @ 2024-11-16 21:06 ` Saravana Kannan 0 siblings, 0 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-16 21:06 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Fri, Nov 15, 2024 at 11:43 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote: > > Some devices might have their is_suspended flag set to false. In these > > cases, dpm_resume() should skip doing anything for those devices. > > However, runtime PM enable and a few others steps are done before > > checking for this flag. Fix it so that we do things in the right order. > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > This looks like a nice generic fix as well, should it go to older > kernels? Yeah, it's meant to be a generic fix. But I'll feel better about it if someone familiar with this code can give it a Reviewed-by. And as I look at it... I have a bug in there. I think it should be goto Complete and not Unlock! No idea how my devices didn't get freed after a few suspend aborts! I can send it out as a separate patch too if you want. And depending on when it lands, I can keep it or drop it from v2 of this series. Thanks, Saravana ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan 2024-11-16 7:43 ` Greg Kroah-Hartman @ 2024-12-04 12:53 ` Rafael J. Wysocki 2025-03-11 10:47 ` Rafael J. Wysocki 2025-03-14 20:47 ` Pavel Machek 2 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2024-12-04 12:53 UTC (permalink / raw) To: Saravana Kannan Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall, Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel Trim CC list. On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > Some devices might have their is_suspended flag set to false. In these > cases, dpm_resume() should skip doing anything for those devices. Not really. This is particularly untrue for devices with power.direct_complete set that have power.is_suspended clear. > However, runtime PM enable and a few others steps are done before > checking for this flag. Fix it so that we do things in the right order. I don't see the bug this is fixing, but I can see bugs introduced by it. I think that you want power.is_suspended to be checked before waiting for the superiors. Fair enough, since for devices with power.is_suspended clear, there should be no superiors to wait for, so the two checks can be done in any order and checking power.is_suspended first would be less overhead. And that's it AFAICS. > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/base/power/main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 4a67e83300e1..86e51b9fefab 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > if (dev->power.syscore) > goto Complete; > > + if (!dev->power.is_suspended) > + goto Unlock; > + > if (dev->power.direct_complete) { > /* Match the pm_runtime_disable() in __device_suspend(). */ > pm_runtime_enable(dev); > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > */ > dev->power.is_prepared = false; > > - if (!dev->power.is_suspended) > - goto Unlock; > - > if (dev->pm_domain) { > info = "power domain "; > callback = pm_op(&dev->pm_domain->ops, state); > -- > 2.47.0.338.g60cca15819-goog > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2024-12-04 12:53 ` Rafael J. Wysocki @ 2025-03-11 10:47 ` Rafael J. Wysocki 2025-03-13 1:49 ` Saravana Kannan 0 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2025-03-11 10:47 UTC (permalink / raw) To: Saravana Kannan Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall, Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > Trim CC list. > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > Some devices might have their is_suspended flag set to false. In these > > cases, dpm_resume() should skip doing anything for those devices. > > Not really. This is particularly untrue for devices with > power.direct_complete set that have power.is_suspended clear. > > > However, runtime PM enable and a few others steps are done before > > checking for this flag. Fix it so that we do things in the right order. > > I don't see the bug this is fixing, but I can see bugs introduced by it. So AFAICS the bug is in the error path when dpm_suspend() fails in which case some devices with direct_complete set may not have been handled by device_suspend(). Since runtime PM has not been disabled for those devices yet, it doesn't make sense to re-enable runtime PM for them (and if they had runtime PM disabled to start with, this will inadvertently enable runtime PM for them). However, two changes are needed to fix this issue: (1) power.is_suspended needs to be set for the devices with direct_complete set in device_suspend(). (2) The power.is_suspended check needs to be moved after the power.syscore one in device_resume(). The patch below only does (2) which is insufficient and it introduces a functional issue for the direct_complete devices with runtime PM disabled because it will cause runtime PM to remain disabled for them permanently. > I think that you want power.is_suspended to be checked before waiting > for the superiors. Fair enough, since for devices with > power.is_suspended clear, there should be no superiors to wait for, so > the two checks can be done in any order and checking > power.is_suspended first would be less overhead. And that's it > AFAICS. > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > drivers/base/power/main.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > index 4a67e83300e1..86e51b9fefab 100644 > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > if (dev->power.syscore) > > goto Complete; > > > > + if (!dev->power.is_suspended) > > + goto Unlock; And this should be "goto Complete" because jumping to Unlock introduces a device locking imbalance. > > + > > if (dev->power.direct_complete) { > > /* Match the pm_runtime_disable() in __device_suspend(). */ > > pm_runtime_enable(dev); > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > */ > > dev->power.is_prepared = false; > > > > - if (!dev->power.is_suspended) > > - goto Unlock; > > - > > if (dev->pm_domain) { > > info = "power domain "; > > callback = pm_op(&dev->pm_domain->ops, state); > > -- If you want to submit a new version of this patch, please do so by the end of the week or I will send my fix because I want this issue to be addressed in 6.15. BTW, please note that this is orthogonal to the recent async suspend-resume series https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/ so there is no reason why it should be addressed in that series. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2025-03-11 10:47 ` Rafael J. Wysocki @ 2025-03-13 1:49 ` Saravana Kannan 2025-03-13 10:58 ` Rafael J. Wysocki 0 siblings, 1 reply; 33+ messages in thread From: Saravana Kannan @ 2025-03-13 1:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall, Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > Trim CC list. > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > Some devices might have their is_suspended flag set to false. In these > > > cases, dpm_resume() should skip doing anything for those devices. > > > > Not really. This is particularly untrue for devices with > > power.direct_complete set that have power.is_suspended clear. > > > > > However, runtime PM enable and a few others steps are done before > > > checking for this flag. Fix it so that we do things in the right order. > > > > I don't see the bug this is fixing, but I can see bugs introduced by it. > > So AFAICS the bug is in the error path when dpm_suspend() fails in > which case some devices with direct_complete set may not have been > handled by device_suspend(). Since runtime PM has not been disabled > for those devices yet, it doesn't make sense to re-enable runtime PM > for them (and if they had runtime PM disabled to start with, this will > inadvertently enable runtime PM for them). > > However, two changes are needed to fix this issue: > (1) power.is_suspended needs to be set for the devices with > direct_complete set in device_suspend(). > (2) The power.is_suspended check needs to be moved after the > power.syscore one in device_resume(). > > The patch below only does (2) which is insufficient and it introduces > a functional issue for the direct_complete devices with runtime PM > disabled because it will cause runtime PM to remain disabled for them > permanently. > > > I think that you want power.is_suspended to be checked before waiting > > for the superiors. Fair enough, since for devices with > > power.is_suspended clear, there should be no superiors to wait for, so > > the two checks can be done in any order and checking > > power.is_suspended first would be less overhead. And that's it > > AFAICS. > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > --- > > > drivers/base/power/main.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > index 4a67e83300e1..86e51b9fefab 100644 > > > --- a/drivers/base/power/main.c > > > +++ b/drivers/base/power/main.c > > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > if (dev->power.syscore) > > > goto Complete; > > > > > > + if (!dev->power.is_suspended) > > > + goto Unlock; > > And this should be "goto Complete" because jumping to Unlock > introduces a device locking imbalance. > > > > + > > > if (dev->power.direct_complete) { > > > /* Match the pm_runtime_disable() in __device_suspend(). */ > > > pm_runtime_enable(dev); > > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > */ > > > dev->power.is_prepared = false; > > > > > > - if (!dev->power.is_suspended) > > > - goto Unlock; > > > - > > > if (dev->pm_domain) { > > > info = "power domain "; > > > callback = pm_op(&dev->pm_domain->ops, state); > > > -- > > If you want to submit a new version of this patch, please do so by the > end of the week or I will send my fix because I want this issue to be > addressed in 6.15. Please do ahead with the fix for this. I'm not too comfortable with the direct_complete logic yet. -Saravana > > BTW, please note that this is orthogonal to the recent async > suspend-resume series > > https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/ > > so there is no reason why it should be addressed in that series. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2025-03-13 1:49 ` Saravana Kannan @ 2025-03-13 10:58 ` Rafael J. Wysocki 0 siblings, 0 replies; 33+ messages in thread From: Rafael J. Wysocki @ 2025-03-13 10:58 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ben Segall, Geert Uytterhoeven, kernel-team, linux-pm, linux-kernel On Thu, Mar 13, 2025 at 2:50 AM Saravana Kannan <saravanak@google.com> wrote: > > On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > Trim CC list. > > > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > Some devices might have their is_suspended flag set to false. In these > > > > cases, dpm_resume() should skip doing anything for those devices. > > > > > > Not really. This is particularly untrue for devices with > > > power.direct_complete set that have power.is_suspended clear. > > > > > > > However, runtime PM enable and a few others steps are done before > > > > checking for this flag. Fix it so that we do things in the right order. > > > > > > I don't see the bug this is fixing, but I can see bugs introduced by it. > > > > So AFAICS the bug is in the error path when dpm_suspend() fails in > > which case some devices with direct_complete set may not have been > > handled by device_suspend(). Since runtime PM has not been disabled > > for those devices yet, it doesn't make sense to re-enable runtime PM > > for them (and if they had runtime PM disabled to start with, this will > > inadvertently enable runtime PM for them). > > > > However, two changes are needed to fix this issue: > > (1) power.is_suspended needs to be set for the devices with > > direct_complete set in device_suspend(). > > (2) The power.is_suspended check needs to be moved after the > > power.syscore one in device_resume(). > > > > The patch below only does (2) which is insufficient and it introduces > > a functional issue for the direct_complete devices with runtime PM > > disabled because it will cause runtime PM to remain disabled for them > > permanently. > > > > > I think that you want power.is_suspended to be checked before waiting > > > for the superiors. Fair enough, since for devices with > > > power.is_suspended clear, there should be no superiors to wait for, so > > > the two checks can be done in any order and checking > > > power.is_suspended first would be less overhead. And that's it > > > AFAICS. > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > > --- > > > > drivers/base/power/main.c | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > > index 4a67e83300e1..86e51b9fefab 100644 > > > > --- a/drivers/base/power/main.c > > > > +++ b/drivers/base/power/main.c > > > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > > if (dev->power.syscore) > > > > goto Complete; > > > > > > > > + if (!dev->power.is_suspended) > > > > + goto Unlock; > > > > And this should be "goto Complete" because jumping to Unlock > > introduces a device locking imbalance. > > > > > > + > > > > if (dev->power.direct_complete) { > > > > /* Match the pm_runtime_disable() in __device_suspend(). */ > > > > pm_runtime_enable(dev); > > > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > > */ > > > > dev->power.is_prepared = false; > > > > > > > > - if (!dev->power.is_suspended) > > > > - goto Unlock; > > > > - > > > > if (dev->pm_domain) { > > > > info = "power domain "; > > > > callback = pm_op(&dev->pm_domain->ops, state); > > > > -- > > > > If you want to submit a new version of this patch, please do so by the > > end of the week or I will send my fix because I want this issue to be > > addressed in 6.15. > > Please do ahead with the fix for this. I'm not too comfortable with > the direct_complete logic yet. OK, I will, thank you! ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan 2024-11-16 7:43 ` Greg Kroah-Hartman 2024-12-04 12:53 ` Rafael J. Wysocki @ 2025-03-14 20:47 ` Pavel Machek 2025-03-14 20:49 ` Saravana Kannan 2 siblings, 1 reply; 33+ messages in thread From: Pavel Machek @ 2025-03-14 20:47 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 986 bytes --] Hi! > Some devices might have their is_suspended flag set to false. In these > cases, dpm_resume() should skip doing anything for those devices. > However, runtime PM enable and a few others steps are done before > checking for this flag. Fix it so that we do things in the right order. > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/base/power/main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 4a67e83300e1..86e51b9fefab 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > if (dev->power.syscore) > goto Complete; > > + if (!dev->power.is_suspended) > + goto Unlock; > + This needs to be goto Complete, right? Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() 2025-03-14 20:47 ` Pavel Machek @ 2025-03-14 20:49 ` Saravana Kannan 0 siblings, 0 replies; 33+ messages in thread From: Saravana Kannan @ 2025-03-14 20:49 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Fri, Mar 14, 2025 at 1:47 PM Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > Some devices might have their is_suspended flag set to false. In these > > cases, dpm_resume() should skip doing anything for those devices. > > However, runtime PM enable and a few others steps are done before > > checking for this flag. Fix it so that we do things in the right order. > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > drivers/base/power/main.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > index 4a67e83300e1..86e51b9fefab 100644 > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > if (dev->power.syscore) > > goto Complete; > > > > + if (!dev->power.is_suspended) > > + goto Unlock; > > + > > This needs to be goto Complete, right? Yeah, it's a bug even I pointed out in another email. But Rafael send a new proper fixed up patch that I Reviewed-by. -Saravana > Pavel > -- > People of Russia, stop Putin before his war on Ukraine escalates. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan @ 2024-11-14 22:09 ` Saravana Kannan 2024-12-02 20:11 ` Rafael J. Wysocki 2024-11-14 22:09 ` [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs Saravana Kannan ` (3 subsequent siblings) 5 siblings, 1 reply; 33+ messages in thread From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Locking is not needed to do get_device(dev->parent). We either get a NULL (if the parent was cleared) or the actual parent. Also, when a device is deleted (device_del()) and removed from the dpm_list, its completion variable is also complete_all()-ed. So, we don't have to worry about waiting indefinitely on a deleted parent device. Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/base/power/main.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 86e51b9fefab..9b9b6088e56a 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) * counting the parent once more unless the device has been deleted * already (in which case return right away). */ - mutex_lock(&dpm_list_mtx); - - if (!device_pm_initialized(dev)) { - mutex_unlock(&dpm_list_mtx); - return false; - } - parent = get_device(dev->parent); - - mutex_unlock(&dpm_list_mtx); - - dpm_wait(parent, async); + if (device_pm_initialized(dev)) + dpm_wait(parent, async); put_device(parent); dpm_wait_for_suppliers(dev, async); -- 2.47.0.338.g60cca15819-goog ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan @ 2024-12-02 20:11 ` Rafael J. Wysocki 2024-12-02 20:16 ` Rafael J. Wysocki 0 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2024-12-02 20:11 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Sorry for the delay. On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > Locking is not needed to do get_device(dev->parent). We either get a NULL > (if the parent was cleared) or the actual parent. Also, when a device is > deleted (device_del()) and removed from the dpm_list, its completion > variable is also complete_all()-ed. So, we don't have to worry about > waiting indefinitely on a deleted parent device. The device_pm_initialized(dev) check before get_device(dev->parent) doesn't make sense without the locking and that's the whole point of it. > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/base/power/main.c | 13 ++----------- > 1 file changed, 2 insertions(+), 11 deletions(-) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 86e51b9fefab..9b9b6088e56a 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) > * counting the parent once more unless the device has been deleted > * already (in which case return right away). > */ > - mutex_lock(&dpm_list_mtx); > - > - if (!device_pm_initialized(dev)) { > - mutex_unlock(&dpm_list_mtx); > - return false; > - } > - > parent = get_device(dev->parent); > - > - mutex_unlock(&dpm_list_mtx); > - > - dpm_wait(parent, async); > + if (device_pm_initialized(dev)) > + dpm_wait(parent, async); This is racy, so what's the point? You can just do parent = get_device(dev->parent); dpm_wait(parent, async); and please update the comment above this. > put_device(parent); > > dpm_wait_for_suppliers(dev, async); > -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-12-02 20:11 ` Rafael J. Wysocki @ 2024-12-02 20:16 ` Rafael J. Wysocki 2024-12-02 20:46 ` Saravana Kannan 0 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2024-12-02 20:16 UTC (permalink / raw) To: Saravana Kannan Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > Sorry for the delay. > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL > > (if the parent was cleared) or the actual parent. Also, when a device is > > deleted (device_del()) and removed from the dpm_list, its completion > > variable is also complete_all()-ed. So, we don't have to worry about > > waiting indefinitely on a deleted parent device. > > The device_pm_initialized(dev) check before get_device(dev->parent) > doesn't make sense without the locking and that's the whole point of > it. Hmm, not really. How is the parent prevented from going away in get_device() right after the initial dev check without the locking? > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > drivers/base/power/main.c | 13 ++----------- > > 1 file changed, 2 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > index 86e51b9fefab..9b9b6088e56a 100644 > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) > > * counting the parent once more unless the device has been deleted > > * already (in which case return right away). > > */ > > - mutex_lock(&dpm_list_mtx); > > - > > - if (!device_pm_initialized(dev)) { > > - mutex_unlock(&dpm_list_mtx); > > - return false; > > - } > > - > > parent = get_device(dev->parent); > > - > > - mutex_unlock(&dpm_list_mtx); > > - > > - dpm_wait(parent, async); > > + if (device_pm_initialized(dev)) > > + dpm_wait(parent, async); > > This is racy, so what's the point? > > You can just do > > parent = get_device(dev->parent); > dpm_wait(parent, async); > > and please update the comment above this. > > > put_device(parent); > > > > dpm_wait_for_suppliers(dev, async); > > -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-12-02 20:16 ` Rafael J. Wysocki @ 2024-12-02 20:46 ` Saravana Kannan 2024-12-02 21:14 ` Rafael J. Wysocki 0 siblings, 1 reply; 33+ messages in thread From: Saravana Kannan @ 2024-12-02 20:46 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > Sorry for the delay. > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL > > > (if the parent was cleared) or the actual parent. Also, when a device is > > > deleted (device_del()) and removed from the dpm_list, its completion > > > variable is also complete_all()-ed. So, we don't have to worry about > > > waiting indefinitely on a deleted parent device. > > > > The device_pm_initialized(dev) check before get_device(dev->parent) > > doesn't make sense without the locking and that's the whole point of > > it. > > Hmm, not really. > > How is the parent prevented from going away in get_device() right > after the initial dev check without the locking? Not sure what you mean by "go away"? But get_device() is going to keep a non-zero refcount on the parent so that struct doesn't get freed. The parent itself can still "go away" in terms of unbinding or removing the children from the dpm_list(). But that's what the device_pm_initialized() check is for. When a device_del() is called, it's removed from the dpm_list. The actual freeing comes later. But we aren't/weren't checking for that anyway. > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > --- > > > drivers/base/power/main.c | 13 ++----------- > > > 1 file changed, 2 insertions(+), 11 deletions(-) > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > index 86e51b9fefab..9b9b6088e56a 100644 > > > --- a/drivers/base/power/main.c > > > +++ b/drivers/base/power/main.c > > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) > > > * counting the parent once more unless the device has been deleted > > > * already (in which case return right away). > > > */ > > > - mutex_lock(&dpm_list_mtx); > > > - > > > - if (!device_pm_initialized(dev)) { > > > - mutex_unlock(&dpm_list_mtx); > > > - return false; > > > - } > > > - > > > parent = get_device(dev->parent); > > > - > > > - mutex_unlock(&dpm_list_mtx); > > > - > > > - dpm_wait(parent, async); > > > + if (device_pm_initialized(dev)) > > > + dpm_wait(parent, async); > > > > This is racy, so what's the point? > > > > You can just do > > > > parent = get_device(dev->parent); > > dpm_wait(parent, async); Parent struct device being there isn't the same as whether this device is in the dpm_list? So, shouldn't we still check this? Also, is it really racy anymore with the new algorithm? We don't kick off the subordinates until after the parent is done. > > > > and please update the comment above this. Will do. Thanks, Saravana > > > > > put_device(parent); > > > > > > dpm_wait_for_suppliers(dev, async); > > > -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-12-02 20:46 ` Saravana Kannan @ 2024-12-02 21:14 ` Rafael J. Wysocki 2024-12-02 23:27 ` Saravana Kannan 0 siblings, 1 reply; 33+ messages in thread From: Rafael J. Wysocki @ 2024-12-02 21:14 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@google.com> wrote: > > On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > Sorry for the delay. > > > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL > > > > (if the parent was cleared) or the actual parent. Also, when a device is > > > > deleted (device_del()) and removed from the dpm_list, its completion > > > > variable is also complete_all()-ed. So, we don't have to worry about > > > > waiting indefinitely on a deleted parent device. > > > > > > The device_pm_initialized(dev) check before get_device(dev->parent) > > > doesn't make sense without the locking and that's the whole point of > > > it. > > > > Hmm, not really. > > > > How is the parent prevented from going away in get_device() right > > after the initial dev check without the locking? > > Not sure what you mean by "go away"? But get_device() is going to keep > a non-zero refcount on the parent so that struct doesn't get freed. That's after it has returned. There is nothing to prevent dev from being freed in get_device() itself before the kobject_get(&dev->kobj) call. So when get_device() is called, there needs to be an active ref on the device already or something else to prevent the target device object from being freed. In this particular case it is the lock along with the device_pm_initialized(dev) check on the child. > The parent itself can still "go away" in terms of unbinding or > removing the children from the dpm_list(). But that's what the > device_pm_initialized() check is for. When a device_del() is called, > it's removed from the dpm_list. The actual freeing comes later. But we > aren't/weren't checking for that anyway. > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > > --- > > > > drivers/base/power/main.c | 13 ++----------- > > > > 1 file changed, 2 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > > index 86e51b9fefab..9b9b6088e56a 100644 > > > > --- a/drivers/base/power/main.c > > > > +++ b/drivers/base/power/main.c > > > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) > > > > * counting the parent once more unless the device has been deleted > > > > * already (in which case return right away). > > > > */ > > > > - mutex_lock(&dpm_list_mtx); > > > > - > > > > - if (!device_pm_initialized(dev)) { > > > > - mutex_unlock(&dpm_list_mtx); > > > > - return false; > > > > - } > > > > - > > > > parent = get_device(dev->parent); > > > > - > > > > - mutex_unlock(&dpm_list_mtx); > > > > - > > > > - dpm_wait(parent, async); > > > > + if (device_pm_initialized(dev)) > > > > + dpm_wait(parent, async); > > > > > > This is racy, so what's the point? > > > > > > You can just do > > > > > > parent = get_device(dev->parent); > > > dpm_wait(parent, async); > > Parent struct device being there isn't the same as whether this device > is in the dpm_list? So, shouldn't we still check this? > > Also, is it really racy anymore with the new algorithm? We don't kick > off the subordinates until after the parent is done. > > > > > > > and please update the comment above this. > > Will do. > > Thanks, > Saravana > > > > > > > put_device(parent); > > > > > > > > dpm_wait_for_suppliers(dev, async); > > > > -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-12-02 21:14 ` Rafael J. Wysocki @ 2024-12-02 23:27 ` Saravana Kannan 2024-12-04 12:21 ` Rafael J. Wysocki 0 siblings, 1 reply; 33+ messages in thread From: Saravana Kannan @ 2024-12-02 23:27 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Mon, Dec 2, 2024 at 1:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@google.com> wrote: > > > > On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > Sorry for the delay. > > > > > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL > > > > > (if the parent was cleared) or the actual parent. Also, when a device is > > > > > deleted (device_del()) and removed from the dpm_list, its completion > > > > > variable is also complete_all()-ed. So, we don't have to worry about > > > > > waiting indefinitely on a deleted parent device. > > > > > > > > The device_pm_initialized(dev) check before get_device(dev->parent) > > > > doesn't make sense without the locking and that's the whole point of > > > > it. > > > > > > Hmm, not really. > > > > > > How is the parent prevented from going away in get_device() right > > > after the initial dev check without the locking? > > > > Not sure what you mean by "go away"? But get_device() is going to keep > > a non-zero refcount on the parent so that struct doesn't get freed. > > That's after it has returned. > > There is nothing to prevent dev from being freed in get_device() > itself before the kobject_get(&dev->kobj) call. > > So when get_device() is called, there needs to be an active ref on the > device already or something else to prevent the target device object > from being freed. > > In this particular case it is the lock along with the > device_pm_initialized(dev) check on the child. Ugh... my head hurts whenever I think about get_device(). Yeah, I think you are right. Hmm... I wanted to have this function be replaced by a call to a generic helper function dpm_for_each_superior() in the next patch. But that helper function could be called from places where the dpm_list lock is held. Also, I was planning on making it even more generic (not specific to dpm) in the future. So, depending on dpm_list lock didn't make sense. Any other ideas on how I could do this without grabbing the dpm_list mutex? Actually, with the rewrite and at the end of this series, we don't have to worry about this race because each device's suspend/resume is only started after all the dependencies are done. So, if the parent deletes a child and itself, the child device's suspend wouldn't have been triggered at all. So, another option is to just squash the series and call it a day. Or add a comment to the commit text that this adds a race that's fixed by the time we get to the end of the series. Thoughts? Thanks, Saravana > > > The parent itself can still "go away" in terms of unbinding or > > removing the children from the dpm_list(). But that's what the > > device_pm_initialized() check is for. When a device_del() is called, > > it's removed from the dpm_list. The actual freeing comes later. But we > > aren't/weren't checking for that anyway. > > > > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > > > --- > > > > > drivers/base/power/main.c | 13 ++----------- > > > > > 1 file changed, 2 insertions(+), 11 deletions(-) > > > > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > > > index 86e51b9fefab..9b9b6088e56a 100644 > > > > > --- a/drivers/base/power/main.c > > > > > +++ b/drivers/base/power/main.c > > > > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) > > > > > * counting the parent once more unless the device has been deleted > > > > > * already (in which case return right away). > > > > > */ > > > > > - mutex_lock(&dpm_list_mtx); > > > > > - > > > > > - if (!device_pm_initialized(dev)) { > > > > > - mutex_unlock(&dpm_list_mtx); > > > > > - return false; > > > > > - } > > > > > - > > > > > parent = get_device(dev->parent); > > > > > - > > > > > - mutex_unlock(&dpm_list_mtx); > > > > > - > > > > > - dpm_wait(parent, async); > > > > > + if (device_pm_initialized(dev)) > > > > > + dpm_wait(parent, async); > > > > > > > > This is racy, so what's the point? > > > > > > > > You can just do > > > > > > > > parent = get_device(dev->parent); > > > > dpm_wait(parent, async); > > > > Parent struct device being there isn't the same as whether this device > > is in the dpm_list? So, shouldn't we still check this? > > > > Also, is it really racy anymore with the new algorithm? We don't kick > > off the subordinates until after the parent is done. > > > > > > > > > > and please update the comment above this. > > > > Will do. > > > > Thanks, > > Saravana > > > > > > > > > put_device(parent); > > > > > > > > > > dpm_wait_for_suppliers(dev, async); > > > > > -- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent 2024-12-02 23:27 ` Saravana Kannan @ 2024-12-04 12:21 ` Rafael J. Wysocki 0 siblings, 0 replies; 33+ messages in thread From: Rafael J. Wysocki @ 2024-12-04 12:21 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Tue, Dec 3, 2024 at 12:28 AM Saravana Kannan <saravanak@google.com> wrote: > > On Mon, Dec 2, 2024 at 1:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > Sorry for the delay. > > > > > > > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > > > > > Locking is not needed to do get_device(dev->parent). We either get a NULL > > > > > > (if the parent was cleared) or the actual parent. Also, when a device is > > > > > > deleted (device_del()) and removed from the dpm_list, its completion > > > > > > variable is also complete_all()-ed. So, we don't have to worry about > > > > > > waiting indefinitely on a deleted parent device. > > > > > > > > > > The device_pm_initialized(dev) check before get_device(dev->parent) > > > > > doesn't make sense without the locking and that's the whole point of > > > > > it. > > > > > > > > Hmm, not really. > > > > > > > > How is the parent prevented from going away in get_device() right > > > > after the initial dev check without the locking? > > > > > > Not sure what you mean by "go away"? But get_device() is going to keep > > > a non-zero refcount on the parent so that struct doesn't get freed. > > > > That's after it has returned. > > > > There is nothing to prevent dev from being freed in get_device() > > itself before the kobject_get(&dev->kobj) call. > > > > So when get_device() is called, there needs to be an active ref on the > > device already or something else to prevent the target device object > > from being freed. > > > > In this particular case it is the lock along with the > > device_pm_initialized(dev) check on the child. > > Ugh... my head hurts whenever I think about get_device(). Yeah, I > think you are right. > > Hmm... I wanted to have this function be replaced by a call to a > generic helper function dpm_for_each_superior() in the next patch. But > that helper function could be called from places where the dpm_list > lock is held. Also, I was planning on making it even more generic (not > specific to dpm) in the future. So, depending on dpm_list lock didn't > make sense. > > Any other ideas on how I could do this without grabbing the dpm_list mutex? You don't need to replace the existing function with a new helper. Just add the helper, use it going forward and drop the old function in a separate patch when there are no more users of it. > Actually, with the rewrite and at the end of this series, we don't > have to worry about this race because each device's suspend/resume is > only started after all the dependencies are done. So, if the parent > deletes a child and itself, the child device's suspend wouldn't have > been triggered at all. > > So, another option is to just squash the series and call it a day. No, no. This is complicated enough and the code is super-sensitive. I think that you need to split the changes even more. > Or add a comment to the commit text that this adds a race that's fixed by > the time we get to the end of the series. That would create a bisection trap, so not really. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan @ 2024-11-14 22:09 ` Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan ` (2 subsequent siblings) 5 siblings, 0 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel We have a lot of code that does or will loop through superior/subordinate devices and take action on them. Refactor the code to pull out the common "loop through" functionality into helper functions to avoid repeating the logic. Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/base/power/main.c | 70 ++++++++++++++++++++++----------------- 1 file changed, 40 insertions(+), 30 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 9b9b6088e56a..aa1470ef6ac0 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -247,15 +247,22 @@ static int dpm_wait_fn(struct device *dev, void *async_ptr) return 0; } -static void dpm_wait_for_children(struct device *dev, bool async) -{ - device_for_each_child(dev, &async, dpm_wait_fn); -} - -static void dpm_wait_for_suppliers(struct device *dev, bool async) +static int dpm_for_each_superior(struct device *dev, void *data, + int (*fn)(struct device *dev, void *data)) { + struct device *parent; + int ret = 0, idx; struct device_link *link; - int idx; + + if (!dev) + return 0; + + parent = get_device(dev->parent); + if (parent) + ret = fn(parent, data); + put_device(parent); + if (ret) + return ret; idx = device_links_read_lock(); @@ -267,29 +274,20 @@ static void dpm_wait_for_suppliers(struct device *dev, bool async) * walking. */ list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) - if (READ_ONCE(link->status) != DL_STATE_DORMANT) - dpm_wait(link->supplier, async); + if (READ_ONCE(link->status) != DL_STATE_DORMANT) { + ret = fn(link->supplier, data); + if (ret) + break; + } device_links_read_unlock(idx); + + return ret; } static bool dpm_wait_for_superior(struct device *dev, bool async) { - struct device *parent; - - /* - * If the device is resumed asynchronously and the parent's callback - * deletes both the device and the parent itself, the parent object may - * be freed while this function is running, so avoid that by reference - * counting the parent once more unless the device has been deleted - * already (in which case return right away). - */ - parent = get_device(dev->parent); - if (device_pm_initialized(dev)) - dpm_wait(parent, async); - put_device(parent); - - dpm_wait_for_suppliers(dev, async); + dpm_for_each_superior(dev, &async, dpm_wait_fn); /* * If the parent's callback has deleted the device, attempting to resume @@ -298,10 +296,18 @@ static bool dpm_wait_for_superior(struct device *dev, bool async) return device_pm_initialized(dev); } -static void dpm_wait_for_consumers(struct device *dev, bool async) +static int dpm_for_each_subordinate(struct device *dev, void *data, + int (*fn)(struct device *dev, void *data)) { + int ret, idx; struct device_link *link; - int idx; + + if (!dev) + return 0; + + ret = device_for_each_child(dev, data, fn); + if (ret) + return ret; idx = device_links_read_lock(); @@ -315,16 +321,20 @@ static void dpm_wait_for_consumers(struct device *dev, bool async) * unregistration). */ list_for_each_entry_rcu_locked(link, &dev->links.consumers, s_node) - if (READ_ONCE(link->status) != DL_STATE_DORMANT) - dpm_wait(link->consumer, async); + if (READ_ONCE(link->status) != DL_STATE_DORMANT) { + ret = fn(link->consumer, data); + if (ret) + break; + } device_links_read_unlock(idx); + + return ret; } static void dpm_wait_for_subordinate(struct device *dev, bool async) { - dpm_wait_for_children(dev, async); - dpm_wait_for_consumers(dev, async); + dpm_for_each_subordinate(dev, &async, dpm_wait_fn); } /** -- 2.47.0.338.g60cca15819-goog ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan ` (2 preceding siblings ...) 2024-11-14 22:09 ` [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs Saravana Kannan @ 2024-11-14 22:09 ` Saravana Kannan 2025-03-11 11:25 ` Geert Uytterhoeven 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan 2024-11-19 4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan 5 siblings, 1 reply; 33+ messages in thread From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel The dpm_list used for suspend/resume ensures that all superior devices (parents and suppliers) precede subordinate devices (children and consumers). Current async resume logic: ------------------------------- * For each resume phase (except the "complete" phase, which is always serialized), the resume logic first queues all async devices in the dpm_list. It then loops through the dpm_list again to resume the sync devices one by one. * Async devices wait for all their superior devices to resume before starting their own resume operation. * This process results in multiple sleep and wake-up cycles before an async device actually resumes. This sleeping also causes kworker threads to stall with work for a period. Consequently, the workqueue framework spins up more kworker threads to handle the other async devices. * The end result is excessive thread creation, wake-ups, sleeps, and context switches for every async device. This overhead makes a full async resume (with all devices marked as async-capable) much slower than a synchronous resume. Current async suspend logic: -------------------------------- * The async suspend logic differs from the async resume logic. The suspend logic loops through the dpm_list. When it finds an async device, it queues the work and moves on. However, if it encounters a sync device, it waits until the sync device (and all its subordinate devices) have suspended before proceeding to the next device. Therefore, an async suspend device can be left waiting on an unrelated device before even being queued. * Once queued, an async device experiences the same inefficiencies as in the resume logic (thread creation, wake-ups, sleeps, and context switches). On a Pixel 6, averaging over 100 suspend/resume cycles, the data is as follows: +---------------------------+-----------+------------+----------+ | Phase | Full sync | Full async | % change | +---------------------------+-----------+------------+----------+ | Total dpm_suspend*() time | 107 ms | 72 ms | -33% | +---------------------------+-----------+------------+----------+ | Total dpm_resume*() time | 75 ms | 90 ms | +20% | +---------------------------+-----------+------------+----------+ | Sum | 182 ms | 162 ms | -11% | +---------------------------+-----------+------------+----------+ This shows that full async suspend/resume is not a viable option. It makes the user-visible resume phase slower and only improves the overall time by 11%. To fix all this, this patches introduces a new async suspend/resume logic. New suspend/resume logic: ------------------------- * For each suspend/resume phase (except "complete" and "prepare," which are always serialized), the logic first queues only the async devices that don't have to wait for any subordinates (for suspend) or superiors (for resume). It then loops through the dpm_list again to suspend/resume the sync devices one by one. * When a device (sync or async) successfully suspends/resumes, it examines its superiors/subordinates and queues only the async devices that don't need to wait for any subordinates/superiors. With this new logic: * Queued async devices don't have to wait for completion and are always ready to perform their suspend/resume operation. * The queue of async devices remains short. * kworkers never sleep for extended periods, and the workqueue framework doesn't spin up many new threads to handle a backlog of async devices. * The result is approximately NCPU kworker threads running in parallel without sleeping until all async devices finish. On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic yields improved results: +---------------------------+-----------+------------+------------------+ | Phase | Old full sync | New full async | % change | +---------------------------+-----------+------------+------------------+ | Total dpm_suspend*() time | 107 ms | 60 ms | -44% | +---------------------------+-----------+------------+------------------+ | Total dpm_resume*() time | 75 ms | 74 ms | -1% | +---------------------------+-----------+------------+------------------+ | Sum | 182 ms | 134 ms | -26% | +---------------------------+-----------+------------+------------------+ Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/base/power/main.c | 242 ++++++++++++++++++++++++++++++++------ 1 file changed, 205 insertions(+), 37 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index aa1470ef6ac0..65c195be48b8 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -121,6 +121,12 @@ void device_pm_unlock(void) mutex_unlock(&dpm_list_mtx); } +static bool is_async(struct device *dev) +{ + return dev->power.async_suspend && pm_async_enabled + && !pm_trace_is_enabled(); +} + /** * device_pm_add - Add a device to the PM core's list of active devices. * @dev: Device to add to the list. @@ -287,6 +293,9 @@ static int dpm_for_each_superior(struct device *dev, void *data, static bool dpm_wait_for_superior(struct device *dev, bool async) { + if (is_async(dev)) + return true; + dpm_for_each_superior(dev, &async, dpm_wait_fn); /* @@ -334,9 +343,22 @@ static int dpm_for_each_subordinate(struct device *dev, void *data, static void dpm_wait_for_subordinate(struct device *dev, bool async) { + if (is_async(dev)) + return; + dpm_for_each_subordinate(dev, &async, dpm_wait_fn); } +static int dpm_check_fn(struct device *dev, void *data) +{ + return completion_done(&dev->power.completion) ? 0 : -EBUSY; +} + +static int dpm_check_subordinate(struct device *dev) +{ + return dpm_for_each_subordinate(dev, NULL, dpm_check_fn); +} + /** * pm_op - Return the PM operation appropriate for given PM event. * @ops: PM operations to choose from. @@ -578,33 +600,39 @@ bool dev_pm_skip_resume(struct device *dev) return !dev->power.must_resume; } -static bool is_async(struct device *dev) +static void dpm_async_fn(struct device *dev, async_func_t func) { - return dev->power.async_suspend && pm_async_enabled - && !pm_trace_is_enabled(); -} - -static bool dpm_async_fn(struct device *dev, async_func_t func) -{ - reinit_completion(&dev->power.completion); - - if (is_async(dev)) { - dev->power.async_in_progress = true; - - get_device(dev); + /* + * Multiple async devices could trigger the async queuing of this + * device. Make sure not to double queue it. + */ + spin_lock(&dev->power.lock); + if (dev->power.async_in_progress) { + spin_unlock(&dev->power.lock); + return; + } + dev->power.async_in_progress = true; + spin_unlock(&dev->power.lock); - if (async_schedule_dev_nocall(func, dev)) - return true; + get_device(dev); - put_device(dev); - } /* - * Because async_schedule_dev_nocall() above has returned false or it - * has not been called at all, func() is not running and it is safe to - * update the async_in_progress flag without extra synchronization. + * We aren't going to call any callbacks if the device has none. Also, + * direct_complete means all the resume and suspend callbacks should be + * skipped and the device should go straight to dpm_complete(). In both + * of these case, calling the func() synchronously will be a lot quicker + * than even queuing the async work. So, do that. */ - dev->power.async_in_progress = false; - return false; + if (dev->power.no_pm_callbacks || dev->power.direct_complete) { + func(dev, 0); + return; + } + + if (async_schedule_dev_nocall(func, dev)) + return; + + WARN(1, "Unable to schedule async suspend/resume calls!\n"); + put_device(dev); } /** @@ -692,12 +720,47 @@ static void device_resume_noirq(struct device *dev, pm_message_t state, bool asy } } +static void dpm_reinit_dev_state(struct list_head *list) +{ + struct device *dev; + + list_for_each_entry(dev, list, power.entry) { + reinit_completion(&dev->power.completion); + dev->power.async_in_progress = false; + } +} + +static int dpm_check_superior(struct device *dev) +{ + return dpm_for_each_superior(dev, NULL, dpm_check_fn); +} + +static int dpm_async_queue_resume_ready_fn(struct device *dev, void *data) +{ + if (!is_async(dev) || dpm_check_superior(dev)) + return 0; + + dpm_async_fn(dev, data); + return 0; +} + +static void dpm_async_resume_loop(struct list_head *from, async_func_t func) +{ + struct device *dev; + + list_for_each_entry(dev, from, power.entry) + dpm_async_queue_resume_ready_fn(dev, func); +} + static void async_resume_noirq(void *data, async_cookie_t cookie) { struct device *dev = data; device_resume_noirq(dev, pm_transition, true); put_device(dev); + + dpm_for_each_subordinate(dev, async_resume_noirq, + dpm_async_queue_resume_ready_fn); } static void dpm_noirq_resume_devices(pm_message_t state) @@ -712,23 +775,26 @@ static void dpm_noirq_resume_devices(pm_message_t state) mutex_lock(&dpm_list_mtx); + dpm_reinit_dev_state(&dpm_noirq_list); + /* * Trigger the resume of "async" devices upfront so they don't have to * wait for the "non-async" ones they don't depend on. */ - list_for_each_entry(dev, &dpm_noirq_list, power.entry) - dpm_async_fn(dev, async_resume_noirq); + dpm_async_resume_loop(&dpm_noirq_list, async_resume_noirq); while (!list_empty(&dpm_noirq_list)) { dev = to_device(dpm_noirq_list.next); list_move_tail(&dev->power.entry, &dpm_late_early_list); - if (!dev->power.async_in_progress) { + if (!is_async(dev)) { get_device(dev); mutex_unlock(&dpm_list_mtx); device_resume_noirq(dev, state, false); + dpm_for_each_subordinate(dev, async_resume_noirq, + dpm_async_queue_resume_ready_fn); put_device(dev); @@ -834,6 +900,9 @@ static void async_resume_early(void *data, async_cookie_t cookie) device_resume_early(dev, pm_transition, true); put_device(dev); + + dpm_for_each_subordinate(dev, async_resume_early, + dpm_async_queue_resume_ready_fn); } /** @@ -852,23 +921,26 @@ void dpm_resume_early(pm_message_t state) mutex_lock(&dpm_list_mtx); + dpm_reinit_dev_state(&dpm_late_early_list); + /* * Trigger the resume of "async" devices upfront so they don't have to * wait for the "non-async" ones they don't depend on. */ - list_for_each_entry(dev, &dpm_late_early_list, power.entry) - dpm_async_fn(dev, async_resume_early); + dpm_async_resume_loop(&dpm_late_early_list, async_resume_early); while (!list_empty(&dpm_late_early_list)) { dev = to_device(dpm_late_early_list.next); list_move_tail(&dev->power.entry, &dpm_suspended_list); - if (!dev->power.async_in_progress) { + if (!is_async(dev)) { get_device(dev); mutex_unlock(&dpm_list_mtx); device_resume_early(dev, state, false); + dpm_for_each_subordinate(dev, async_resume_early, + dpm_async_queue_resume_ready_fn); put_device(dev); @@ -996,6 +1068,9 @@ static void async_resume(void *data, async_cookie_t cookie) device_resume(dev, pm_transition, true); put_device(dev); + + dpm_for_each_subordinate(dev, async_resume, + dpm_async_queue_resume_ready_fn); } /** @@ -1018,23 +1093,26 @@ void dpm_resume(pm_message_t state) mutex_lock(&dpm_list_mtx); + dpm_reinit_dev_state(&dpm_suspended_list); + /* * Trigger the resume of "async" devices upfront so they don't have to * wait for the "non-async" ones they don't depend on. */ - list_for_each_entry(dev, &dpm_suspended_list, power.entry) - dpm_async_fn(dev, async_resume); + dpm_async_resume_loop(&dpm_suspended_list, async_resume); while (!list_empty(&dpm_suspended_list)) { dev = to_device(dpm_suspended_list.next); list_move_tail(&dev->power.entry, &dpm_prepared_list); - if (!dev->power.async_in_progress) { + if (!is_async(dev)) { get_device(dev); mutex_unlock(&dpm_list_mtx); device_resume(dev, state, false); + dpm_for_each_subordinate(dev, async_resume, + dpm_async_queue_resume_ready_fn); put_device(dev); @@ -1274,12 +1352,35 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy return error; } +static int dpm_async_queue_suspend_ready_fn(struct device *dev, void *data) +{ + if (!is_async(dev) || dpm_check_subordinate(dev)) + return 0; + + dpm_async_fn(dev, data); + return 0; +} + +static void dpm_async_suspend_loop(struct list_head *from, async_func_t func) +{ + struct device *dev; + + list_for_each_entry_reverse(dev, from, power.entry) + dpm_async_queue_suspend_ready_fn(dev, func); +} + static void async_suspend_noirq(void *data, async_cookie_t cookie) { struct device *dev = data; device_suspend_noirq(dev, pm_transition, true); put_device(dev); + + if (async_error) + return; + + dpm_for_each_superior(dev, async_suspend_noirq, + dpm_async_queue_suspend_ready_fn); } static int dpm_noirq_suspend_devices(pm_message_t state) @@ -1294,12 +1395,20 @@ static int dpm_noirq_suspend_devices(pm_message_t state) mutex_lock(&dpm_list_mtx); + dpm_reinit_dev_state(&dpm_late_early_list); + + /* + * Trigger the "async" devices upfront so they don't have to wait for + * the "non-async" ones they don't depend on. + */ + dpm_async_suspend_loop(&dpm_late_early_list, async_suspend_noirq); + while (!list_empty(&dpm_late_early_list)) { struct device *dev = to_device(dpm_late_early_list.prev); list_move(&dev->power.entry, &dpm_noirq_list); - if (dpm_async_fn(dev, async_suspend_noirq)) + if (is_async(dev)) continue; get_device(dev); @@ -1307,13 +1416,20 @@ static int dpm_noirq_suspend_devices(pm_message_t state) mutex_unlock(&dpm_list_mtx); error = device_suspend_noirq(dev, state, false); + if (!error && !async_error) + dpm_for_each_superior(dev, async_suspend_noirq, + dpm_async_queue_suspend_ready_fn); put_device(dev); mutex_lock(&dpm_list_mtx); - if (error || async_error) + if (error || async_error) { + /* See explanation in dpm_suspend() */ + list_splice_init(&dpm_late_early_list, &dpm_noirq_list); break; + } + } mutex_unlock(&dpm_list_mtx); @@ -1447,6 +1563,12 @@ static void async_suspend_late(void *data, async_cookie_t cookie) device_suspend_late(dev, pm_transition, true); put_device(dev); + + if (async_error) + return; + + dpm_for_each_superior(dev, async_suspend_late, + dpm_async_queue_suspend_ready_fn); } /** @@ -1467,12 +1589,20 @@ int dpm_suspend_late(pm_message_t state) mutex_lock(&dpm_list_mtx); + dpm_reinit_dev_state(&dpm_suspended_list); + + /* + * Trigger the "async" devices upfront so they don't have to wait for + * the "non-async" ones they don't depend on. + */ + dpm_async_suspend_loop(&dpm_suspended_list, async_suspend_late); + while (!list_empty(&dpm_suspended_list)) { struct device *dev = to_device(dpm_suspended_list.prev); list_move(&dev->power.entry, &dpm_late_early_list); - if (dpm_async_fn(dev, async_suspend_late)) + if (is_async(dev)) continue; get_device(dev); @@ -1480,13 +1610,19 @@ int dpm_suspend_late(pm_message_t state) mutex_unlock(&dpm_list_mtx); error = device_suspend_late(dev, state, false); + if (!error && !async_error) + dpm_for_each_superior(dev, async_suspend_late, + dpm_async_queue_suspend_ready_fn); put_device(dev); mutex_lock(&dpm_list_mtx); - if (error || async_error) + if (error || async_error) { + /* See explanation in dpm_suspend() */ + list_splice_init(&dpm_suspended_list, &dpm_late_early_list); break; + } } mutex_unlock(&dpm_list_mtx); @@ -1712,6 +1848,12 @@ static void async_suspend(void *data, async_cookie_t cookie) device_suspend(dev, pm_transition, true); put_device(dev); + + if (async_error) + return; + + dpm_for_each_superior(dev, async_suspend, + dpm_async_queue_suspend_ready_fn); } /** @@ -1734,12 +1876,20 @@ int dpm_suspend(pm_message_t state) mutex_lock(&dpm_list_mtx); + dpm_reinit_dev_state(&dpm_prepared_list); + + /* + * Trigger the "async" devices upfront so they don't have to wait for + * the "non-async" ones they don't depend on. + */ + dpm_async_suspend_loop(&dpm_prepared_list, async_suspend); + while (!list_empty(&dpm_prepared_list)) { struct device *dev = to_device(dpm_prepared_list.prev); list_move(&dev->power.entry, &dpm_suspended_list); - if (dpm_async_fn(dev, async_suspend)) + if (is_async(dev)) continue; get_device(dev); @@ -1747,13 +1897,31 @@ int dpm_suspend(pm_message_t state) mutex_unlock(&dpm_list_mtx); error = device_suspend(dev, state, false); + if (!error && !async_error) + dpm_for_each_superior(dev, async_suspend, + dpm_async_queue_suspend_ready_fn); put_device(dev); mutex_lock(&dpm_list_mtx); - if (error || async_error) + if (error || async_error) { + /* + * async devices that come after the current sync device + * in the list might have already suspended. We need to + * make sure those async devices are resumed again. By + * moving all devices to the next list, we make sure the + * error handling path resumes all suspended devices. + * + * However, we also need to avoid resuming devices that + * haven't been suspended yet. Fortunately, the existing + * is_suspended/is_noirq_suspended/is_late_suspended + * flags take care of skipping the resume of a device if + * it hasn't been suspended yet. + */ + list_splice_init(&dpm_prepared_list, &dpm_suspended_list); break; + } } mutex_unlock(&dpm_list_mtx); -- 2.47.0.338.g60cca15819-goog ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume 2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan @ 2025-03-11 11:25 ` Geert Uytterhoeven 0 siblings, 0 replies; 33+ messages in thread From: Geert Uytterhoeven @ 2025-03-11 11:25 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Hi Saravana, On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote: > The dpm_list used for suspend/resume ensures that all superior devices > (parents and suppliers) precede subordinate devices (children and > consumers). > > Current async resume logic: > ------------------------------- > * For each resume phase (except the "complete" phase, which is always > serialized), the resume logic first queues all async devices in the > dpm_list. It then loops through the dpm_list again to resume the sync > devices one by one. > > * Async devices wait for all their superior devices to resume before > starting their own resume operation. > > * This process results in multiple sleep and wake-up cycles before an > async device actually resumes. This sleeping also causes kworker > threads to stall with work for a period. Consequently, the workqueue > framework spins up more kworker threads to handle the other async > devices. > > * The end result is excessive thread creation, wake-ups, sleeps, and > context switches for every async device. This overhead makes a full > async resume (with all devices marked as async-capable) much slower > than a synchronous resume. > > Current async suspend logic: > -------------------------------- > * The async suspend logic differs from the async resume logic. The > suspend logic loops through the dpm_list. When it finds an async > device, it queues the work and moves on. However, if it encounters a > sync device, it waits until the sync device (and all its subordinate > devices) have suspended before proceeding to the next device. > Therefore, an async suspend device can be left waiting on an > unrelated device before even being queued. > > * Once queued, an async device experiences the same inefficiencies as > in the resume logic (thread creation, wake-ups, sleeps, and context > switches). > > On a Pixel 6, averaging over 100 suspend/resume cycles, the data is as > follows: > > +---------------------------+-----------+------------+----------+ > | Phase | Full sync | Full async | % change | > +---------------------------+-----------+------------+----------+ > | Total dpm_suspend*() time | 107 ms | 72 ms | -33% | > +---------------------------+-----------+------------+----------+ > | Total dpm_resume*() time | 75 ms | 90 ms | +20% | > +---------------------------+-----------+------------+----------+ > | Sum | 182 ms | 162 ms | -11% | > +---------------------------+-----------+------------+----------+ > > This shows that full async suspend/resume is not a viable option. It > makes the user-visible resume phase slower and only improves the > overall time by 11%. > > To fix all this, this patches introduces a new async suspend/resume > logic. > > New suspend/resume logic: > ------------------------- > * For each suspend/resume phase (except "complete" and "prepare," > which are always serialized), the logic first queues only the async > devices that don't have to wait for any subordinates (for suspend) > or superiors (for resume). It then loops through the dpm_list again > to suspend/resume the sync devices one by one. > > * When a device (sync or async) successfully suspends/resumes, it > examines its superiors/subordinates and queues only the async > devices that don't need to wait for any subordinates/superiors. > > With this new logic: > > * Queued async devices don't have to wait for completion and are > always ready to perform their suspend/resume operation. > > * The queue of async devices remains short. > > * kworkers never sleep for extended periods, and the workqueue > framework doesn't spin up many new threads to handle a backlog of > async devices. > > * The result is approximately NCPU kworker threads running in parallel > without sleeping until all async devices finish. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > yields improved results: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 60 ms | -44% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 74 ms | -1% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 134 ms | -26% | > +---------------------------+-----------+------------+------------------+ > > Signed-off-by: Saravana Kannan <saravanak@google.com> Thanks for your patch! On Renesas Gray Hawk Single (R-Car V4M) during s2idle: PM: suspend entry (s2idle) Filesystems sync: 0.055 seconds Freezing user space processes Freezing user space processes completed (elapsed 0.004 seconds) OOM killer disabled. Freezing remaining freezable tasks Freezing remaining freezable tasks completed (elapsed 0.003 seconds) ================================ WARNING: inconsistent lock state 6.14.0-rc5-rcar3-05904-g1ec95427acf9 #261 Tainted: G W -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. s2idle/764 [HC0[0]:SC0[0]:HE1:SE1] takes: ffffff844392c190 (&dev->power.lock){?.-.}-{3:3}, at: dpm_async_fn+0x24/0xa8 {IN-HARDIRQ-W} state was registered at: lock_acquire+0x26c/0x2c4 _raw_spin_lock_irqsave+0x54/0x70 pm_suspend_timer_fn+0x20/0x78 __hrtimer_run_queues+0x204/0x330 hrtimer_interrupt+0xa8/0x1b0 arch_timer_handler_virt+0x28/0x3c handle_percpu_devid_irq+0x64/0x110 handle_irq_desc+0x3c/0x50 generic_handle_domain_irq+0x18/0x20 gic_handle_irq+0x50/0xbc call_on_irq_stack+0x24/0x34 do_interrupt_handler+0x60/0x88 el1_interrupt+0x30/0x48 el1h_64_irq_handler+0x14/0x1c el1h_64_irq+0x70/0x74 cpuidle_enter_state+0x1a4/0x2d0 cpuidle_enter+0x34/0x48 do_idle+0x21c/0x240 cpu_startup_entry+0x30/0x34 kernel_init+0x0/0x124 console_on_rootfs+0x0/0x64 __primary_switched+0x88/0x90 irq event stamp: 17055 hardirqs last enabled at (17055): [<ffffffc080a0782c>] _raw_spin_unlock_irqrestore+0x34/0x54 hardirqs last disabled at (17054): [<ffffffc080a075a4>] _raw_spin_lock_irqsave+0x28/0x70 softirqs last enabled at (14360): [<ffffffc080096bcc>] handle_softirqs+0x1b0/0x3b4 softirqs last disabled at (14355): [<ffffffc080010168>] __do_softirq+0x10/0x18 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev->power.lock); <Interrupt> lock(&dev->power.lock); *** DEADLOCK *** 5 locks held by s2idle/764: #0: ffffff84400983f0 (sb_writers#5){.+.+}-{0:0}, at: file_start_write.isra.0+0x24/0x30 #1: ffffff8446802288 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x180 #2: ffffff8440d016e8 (kn->active#39){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x180 #3: ffffffc0812f4780 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x84/0x248 #4: ffffffc084bb7f78 (dpm_list_mtx){+.+.}-{4:4}, at: dpm_suspend+0x84/0x1a8 stack backtrace: CPU: 0 UID: 0 PID: 764 Comm: s2idle Tainted: G W 6.14.0-rc5-rcar3-05904-g1ec95427acf9 #261 Tainted: [W]=WARN Hardware name: Renesas Gray Hawk Single board based on r8a779h0 (DT) Call trace: show_stack+0x14/0x1c (C) dump_stack_lvl+0x78/0xa8 dump_stack+0x14/0x1c print_usage_bug+0x1dc/0x1f8 mark_lock+0x1c4/0x3a4 __lock_acquire+0x560/0x1038 lock_acquire+0x26c/0x2c4 _raw_spin_lock+0x40/0x54 dpm_async_fn+0x24/0xa8 dpm_async_queue_suspend_ready_fn+0x40/0x50 dpm_async_suspend_loop+0x48/0x50 dpm_suspend+0x94/0x1a8 dpm_suspend_start+0x68/0x70 suspend_devices_and_enter+0xd8/0x59c pm_suspend+0x214/0x248 state_store+0xa8/0xe8 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write_iter+0x138/0x180 vfs_write+0x148/0x1b4 ksys_write+0x78/0xe0 __arm64_sys_write+0x14/0x1c invoke_syscall+0x68/0xf0 el0_svc_common.constprop.0+0xb0/0xcc do_el0_svc+0x18/0x20 el0_svc+0x38/0x90 el0t_64_sync_handler+0x80/0x130 el0t_64_sync+0x158/0x15c Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan ` (3 preceding siblings ...) 2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan @ 2024-11-14 22:09 ` Saravana Kannan 2024-11-15 5:25 ` Saravana Kannan ` (5 more replies) 2024-11-19 4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan 5 siblings, 6 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-14 22:09 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel As of today, the scheduler doesn't spread out all the kworker threads across all the available CPUs during suspend/resume. This causes significant resume latency during the dpm_resume*() phases. System resume latency is a very user-visible event. Reducing the latency is more important than trying to be energy aware during that period. Since there are no userspace processes running during this time and this is a very short time window, we can simply disable EAS during resume so that the parallel resume of the devices is spread across all the CPUs. On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic plus disabling EAS for resume yields significant improvements: +---------------------------+-----------+------------+------------------+ | Phase | Old full sync | New full async | % change | | | | + EAS disabled | | +---------------------------+-----------+------------+------------------+ | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | +---------------------------+-----------+------------+------------------+ | Total dpm_resume*() time | 75 ms | 61 ms | -19% | +---------------------------+-----------+------------+------------------+ | Sum | 182 ms | 123 ms | -32% | +---------------------------+-----------+------------+------------------+ Signed-off-by: Saravana Kannan <saravanak@google.com> --- kernel/power/suspend.c | 16 ++++++++++++++++ kernel/sched/topology.c | 13 +++++++++++++ 2 files changed, 29 insertions(+) diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 09f8397bae15..7304dc39958f 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) local_irq_enable(); } +/* + * Intentionally not part of a header file to avoid risk of abuse by other + * drivers. + */ +void sched_set_energy_aware(unsigned int enable); + /** * suspend_enter - Make the system enter the given sleep state. * @state: System sleep state to enter. @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) Platform_wake: platform_resume_noirq(state); + /* + * We do this only for resume instead of suspend and resume for these + * reasons: + * - Performance is more important than power for resume. + * - Power spent entering suspend is more important for suspend. Also, + * stangely, disabling EAS was making suspent a few milliseconds + * slower in my testing. + */ + sched_set_energy_aware(0); dpm_resume_noirq(PMSG_RESUME); Platform_early_resume: @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) Resume_devices: suspend_test_start(); dpm_resume_end(PMSG_RESUME); + sched_set_energy_aware(1); suspend_test_finish("resume devices"); trace_suspend_resume(TPS("resume_console"), state, true); resume_console(); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 9748a4c8d668..c069c0b17cbf 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) mutex_unlock(&sched_energy_mutex); } +void sched_set_energy_aware(unsigned int enable) +{ + int state; + + if (!sched_is_eas_possible(cpu_active_mask)) + return; + + sysctl_sched_energy_aware = enable; + state = static_branch_unlikely(&sched_energy_present); + if (state != sysctl_sched_energy_aware) + rebuild_sched_domains_energy(); +} + #ifdef CONFIG_PROC_SYSCTL static int sched_energy_aware_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) -- 2.47.0.338.g60cca15819-goog ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan @ 2024-11-15 5:25 ` Saravana Kannan 2024-11-15 9:25 ` Geert Uytterhoeven 2024-11-15 15:30 ` Vincent Guittot 2024-11-15 16:12 ` Vincent Guittot ` (4 subsequent siblings) 5 siblings, 2 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-15 5:25 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > across all the available CPUs during suspend/resume. This causes > significant resume latency during the dpm_resume*() phases. > > System resume latency is a very user-visible event. Reducing the > latency is more important than trying to be energy aware during that > period. > > Since there are no userspace processes running during this time and > this is a very short time window, we can simply disable EAS during > resume so that the parallel resume of the devices is spread across all > the CPUs. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > plus disabling EAS for resume yields significant improvements: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > | | | + EAS disabled | | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 123 ms | -32% | > +---------------------------+-----------+------------+------------------+ > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > kernel/power/suspend.c | 16 ++++++++++++++++ > kernel/sched/topology.c | 13 +++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 09f8397bae15..7304dc39958f 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > local_irq_enable(); > } > > +/* > + * Intentionally not part of a header file to avoid risk of abuse by other > + * drivers. > + */ > +void sched_set_energy_aware(unsigned int enable); > + > /** > * suspend_enter - Make the system enter the given sleep state. > * @state: System sleep state to enter. > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > Platform_wake: > platform_resume_noirq(state); > + /* > + * We do this only for resume instead of suspend and resume for these > + * reasons: > + * - Performance is more important than power for resume. > + * - Power spent entering suspend is more important for suspend. Also, > + * stangely, disabling EAS was making suspent a few milliseconds > + * slower in my testing. > + */ > + sched_set_energy_aware(0); > dpm_resume_noirq(PMSG_RESUME); > > Platform_early_resume: > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > Resume_devices: > suspend_test_start(); > dpm_resume_end(PMSG_RESUME); > + sched_set_energy_aware(1); > suspend_test_finish("resume devices"); > trace_suspend_resume(TPS("resume_console"), state, true); > resume_console(); > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9748a4c8d668..c069c0b17cbf 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > mutex_unlock(&sched_energy_mutex); > } > > +void sched_set_energy_aware(unsigned int enable) CC kernel/sched/build_utility.o In file included from kernel/sched/build_utility.c:88: kernel/sched/topology.c:287:6: warning: no previous prototype for ‘sched_set_energy_aware’ [-Wmissing-prototypes] 287 | void sched_set_energy_aware(unsigned int enable) | ^~~~~~~~~~~~~~~~~~~~~~ Peter/Vincent, I noticed that I'm getting a warning for this line. But I'm not sure what to do about it. I intentionally didn't put this in a header file because I'm guessing we don't want to make this available to drivers/frameworks in general. Let me know how you want me to handle this. -Saravana > +{ > + int state; > + > + if (!sched_is_eas_possible(cpu_active_mask)) > + return; > + > + sysctl_sched_energy_aware = enable; > + state = static_branch_unlikely(&sched_energy_present); > + if (state != sysctl_sched_energy_aware) > + rebuild_sched_domains_energy(); > +} > + > #ifdef CONFIG_PROC_SYSCTL > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > void *buffer, size_t *lenp, loff_t *ppos) > -- > 2.47.0.338.g60cca15819-goog > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-15 5:25 ` Saravana Kannan @ 2024-11-15 9:25 ` Geert Uytterhoeven 2024-11-15 15:30 ` Vincent Guittot 1 sibling, 0 replies; 33+ messages in thread From: Geert Uytterhoeven @ 2024-11-15 9:25 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Hi Saravana, On Fri, Nov 15, 2024 at 6:25 AM Saravana Kannan <saravanak@google.com> wrote: > On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > CC kernel/sched/build_utility.o > In file included from kernel/sched/build_utility.c:88: > kernel/sched/topology.c:287:6: warning: no previous prototype for > ‘sched_set_energy_aware’ [-Wmissing-prototypes] > 287 | void sched_set_energy_aware(unsigned int enable) > | ^~~~~~~~~~~~~~~~~~~~~~ > > Peter/Vincent, > > I noticed that I'm getting a warning for this line. But I'm not sure > what to do about it. I intentionally didn't put this in a header file > because I'm guessing we don't want to make this available to > drivers/frameworks in general. > > Let me know how you want me to handle this. Put the prototype in kernel/sched/sched.h, and include ../sched/sched.h from kernel/power/suspend.c? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-15 5:25 ` Saravana Kannan 2024-11-15 9:25 ` Geert Uytterhoeven @ 2024-11-15 15:30 ` Vincent Guittot 1 sibling, 0 replies; 33+ messages in thread From: Vincent Guittot @ 2024-11-15 15:30 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Fri, 15 Nov 2024 at 06:25, Saravana Kannan <saravanak@google.com> wrote: > > On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); extern void sched_set_energy_aware(unsigned int enable); clear the warning > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > CC kernel/sched/build_utility.o > In file included from kernel/sched/build_utility.c:88: > kernel/sched/topology.c:287:6: warning: no previous prototype for > ‘sched_set_energy_aware’ [-Wmissing-prototypes] > 287 | void sched_set_energy_aware(unsigned int enable) > | ^~~~~~~~~~~~~~~~~~~~~~ > > Peter/Vincent, > > I noticed that I'm getting a warning for this line. But I'm not sure > what to do about it. I intentionally didn't put this in a header file > because I'm guessing we don't want to make this available to > drivers/frameworks in general. > > Let me know how you want me to handle this. > > -Saravana > > > +{ > > + int state; > > + > > + if (!sched_is_eas_possible(cpu_active_mask)) > > + return; > > + > > + sysctl_sched_energy_aware = enable; > > + state = static_branch_unlikely(&sched_energy_present); > > + if (state != sysctl_sched_energy_aware) > > + rebuild_sched_domains_energy(); > > +} > > + > > #ifdef CONFIG_PROC_SYSCTL > > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > > void *buffer, size_t *lenp, loff_t *ppos) > > -- > > 2.47.0.338.g60cca15819-goog > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan 2024-11-15 5:25 ` Saravana Kannan @ 2024-11-15 16:12 ` Vincent Guittot 2024-11-15 18:33 ` Saravana Kannan 2024-11-17 0:34 ` kernel test robot ` (3 subsequent siblings) 5 siblings, 1 reply; 33+ messages in thread From: Vincent Guittot @ 2024-11-15 16:12 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > across all the available CPUs during suspend/resume. This causes > significant resume latency during the dpm_resume*() phases. > > System resume latency is a very user-visible event. Reducing the > latency is more important than trying to be energy aware during that > period. > > Since there are no userspace processes running during this time and > this is a very short time window, we can simply disable EAS during > resume so that the parallel resume of the devices is spread across all > the CPUs. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > plus disabling EAS for resume yields significant improvements: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > | | | + EAS disabled | | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 123 ms | -32% | > +---------------------------+-----------+------------+------------------+ in cover letter you have figures for - Old full sync - New full async - New full async + EAS disabled you should better use the figures for New full async vs New full async + EAS disabled to show EAS disabled impact I would be interested to get figures about the impact of disabling it during full suspend sequence as I'm not convince that it's worth the complexity especially with fix OPP during suspend > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > kernel/power/suspend.c | 16 ++++++++++++++++ > kernel/sched/topology.c | 13 +++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 09f8397bae15..7304dc39958f 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > local_irq_enable(); > } > > +/* > + * Intentionally not part of a header file to avoid risk of abuse by other > + * drivers. > + */ > +void sched_set_energy_aware(unsigned int enable); > + > /** > * suspend_enter - Make the system enter the given sleep state. > * @state: System sleep state to enter. > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > Platform_wake: > platform_resume_noirq(state); > + /* > + * We do this only for resume instead of suspend and resume for these > + * reasons: > + * - Performance is more important than power for resume. > + * - Power spent entering suspend is more important for suspend. Also, > + * stangely, disabling EAS was making suspent a few milliseconds > + * slower in my testing. > + */ > + sched_set_energy_aware(0); > dpm_resume_noirq(PMSG_RESUME); > > Platform_early_resume: > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > Resume_devices: > suspend_test_start(); > dpm_resume_end(PMSG_RESUME); > + sched_set_energy_aware(1); If we end up having a special scheduling mode during suspend, we should make the function more generic and not only EAS/ smartphone specific Like a sched_suspend and sched_resume > suspend_test_finish("resume devices"); > trace_suspend_resume(TPS("resume_console"), state, true); > resume_console(); > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9748a4c8d668..c069c0b17cbf 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > mutex_unlock(&sched_energy_mutex); > } > > +void sched_set_energy_aware(unsigned int enable) This is a copy/paste of sched_energy_aware_handler() below, we should have 1 helper for both > +{ > + int state; > + > + if (!sched_is_eas_possible(cpu_active_mask)) > + return; > + > + sysctl_sched_energy_aware = enable; > + state = static_branch_unlikely(&sched_energy_present); > + if (state != sysctl_sched_energy_aware) > + rebuild_sched_domains_energy(); > +} > + > #ifdef CONFIG_PROC_SYSCTL > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > void *buffer, size_t *lenp, loff_t *ppos) > -- > 2.47.0.338.g60cca15819-goog > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-15 16:12 ` Vincent Guittot @ 2024-11-15 18:33 ` Saravana Kannan 0 siblings, 0 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-15 18:33 UTC (permalink / raw) To: Vincent Guittot Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Fri, Nov 15, 2024 at 8:13 AM Vincent Guittot <vincent.guittot@linaro.org> wrote: > > On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote: > > > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > in cover letter you have figures for > - Old full sync > - New full async > - New full async + EAS disabled > > you should better use the figures for New full async vs New full > async + EAS disabled to show EAS disabled impact I do give those numbers in the commit text of each patch making the changes. Patch 4 commit text shows how it's improving things compared to the older logic full sync (this is the baseline) - resume is 1% faster. Patch 5 commit text shows you how disabling EAS is improving numbers compared to baseline - resume 19% faster. So, yeah, all the numbers are there in one of these emails. Patch 5 (which is the only one touching EAS) is the one that has the comparison you are asking for. > I would be interested to get figures about the impact of disabling it > during full suspend sequence as I'm not convince that it's worth the > complexity especially with fix OPP during suspend 1. Device suspend actually got worse by 5ms or so. I already provided that. 2. As I said in the Patch 5, suspend is more about reducing the energy going into suspend. It's a balance of how quick you can be to how much power you use to be quick. So, disabling EAS across all of suspend/resume will have a huge impact on power because userspace is still running, there are a ton of threads and userspace could get preempted between disabling suspend and kicking off suspend. Lots of obvious power concerns overall. Thanks, Saravana > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > If we end up having a special scheduling mode during suspend, we > should make the function more generic and not only EAS/ smartphone > specific > > Like a sched_suspend and sched_resume > > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > This is a copy/paste of sched_energy_aware_handler() below, we should > have 1 helper for both > > > +{ > > + int state; > > + > > + if (!sched_is_eas_possible(cpu_active_mask)) > > + return; > > + > > + sysctl_sched_energy_aware = enable; > > + state = static_branch_unlikely(&sched_energy_present); > > + if (state != sysctl_sched_energy_aware) > > + rebuild_sched_domains_energy(); > > +} > > + > > #ifdef CONFIG_PROC_SYSCTL > > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > > void *buffer, size_t *lenp, loff_t *ppos) > > -- > > 2.47.0.338.g60cca15819-goog > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan 2024-11-15 5:25 ` Saravana Kannan 2024-11-15 16:12 ` Vincent Guittot @ 2024-11-17 0:34 ` kernel test robot 2024-11-17 1:17 ` kernel test robot ` (2 subsequent siblings) 5 siblings, 0 replies; 33+ messages in thread From: kernel test robot @ 2024-11-17 0:34 UTC (permalink / raw) To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: oe-kbuild-all, Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Hi Saravana, kernel test robot noticed the following build errors: [auto build test ERROR on rafael-pm/linux-next] [also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases config: arm-s3c6400_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411170802.SnHuptE7-lkp@intel.com/ All errors (new ones prefixed by >>): arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_enter': >> kernel/power/suspend.c:485:(.text+0x2f4): undefined reference to `sched_set_energy_aware' >> arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x48c): undefined reference to `sched_set_energy_aware' arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4c0): undefined reference to `sched_set_energy_aware' arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4d0): undefined reference to `sched_set_energy_aware' arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter': kernel/power/suspend.c:538:(.text+0x748): undefined reference to `sched_set_energy_aware' vim +485 kernel/power/suspend.c 401 402 /** 403 * suspend_enter - Make the system enter the given sleep state. 404 * @state: System sleep state to enter. 405 * @wakeup: Returns information that the sleep state should not be re-entered. 406 * 407 * This function should be called after devices have been suspended. 408 */ 409 static int suspend_enter(suspend_state_t state, bool *wakeup) 410 { 411 int error; 412 413 error = platform_suspend_prepare(state); 414 if (error) 415 goto Platform_finish; 416 417 error = dpm_suspend_late(PMSG_SUSPEND); 418 if (error) { 419 pr_err("late suspend of devices failed\n"); 420 goto Platform_finish; 421 } 422 error = platform_suspend_prepare_late(state); 423 if (error) 424 goto Devices_early_resume; 425 426 error = dpm_suspend_noirq(PMSG_SUSPEND); 427 if (error) { 428 pr_err("noirq suspend of devices failed\n"); 429 goto Platform_early_resume; 430 } 431 error = platform_suspend_prepare_noirq(state); 432 if (error) 433 goto Platform_wake; 434 435 if (suspend_test(TEST_PLATFORM)) 436 goto Platform_wake; 437 438 if (state == PM_SUSPEND_TO_IDLE) { 439 s2idle_loop(); 440 goto Platform_wake; 441 } 442 443 error = pm_sleep_disable_secondary_cpus(); 444 if (error || suspend_test(TEST_CPUS)) 445 goto Enable_cpus; 446 447 arch_suspend_disable_irqs(); 448 BUG_ON(!irqs_disabled()); 449 450 system_state = SYSTEM_SUSPEND; 451 452 error = syscore_suspend(); 453 if (!error) { 454 *wakeup = pm_wakeup_pending(); 455 if (!(suspend_test(TEST_CORE) || *wakeup)) { 456 trace_suspend_resume(TPS("machine_suspend"), 457 state, true); 458 error = suspend_ops->enter(state); 459 trace_suspend_resume(TPS("machine_suspend"), 460 state, false); 461 } else if (*wakeup) { 462 error = -EBUSY; 463 } 464 syscore_resume(); 465 } 466 467 system_state = SYSTEM_RUNNING; 468 469 arch_suspend_enable_irqs(); 470 BUG_ON(irqs_disabled()); 471 472 Enable_cpus: 473 pm_sleep_enable_secondary_cpus(); 474 475 Platform_wake: 476 platform_resume_noirq(state); 477 /* 478 * We do this only for resume instead of suspend and resume for these 479 * reasons: 480 * - Performance is more important than power for resume. 481 * - Power spent entering suspend is more important for suspend. Also, 482 * stangely, disabling EAS was making suspent a few milliseconds 483 * slower in my testing. 484 */ > 485 sched_set_energy_aware(0); 486 dpm_resume_noirq(PMSG_RESUME); 487 488 Platform_early_resume: 489 platform_resume_early(state); 490 491 Devices_early_resume: 492 dpm_resume_early(PMSG_RESUME); 493 494 Platform_finish: 495 platform_resume_finish(state); 496 return error; 497 } 498 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan ` (2 preceding siblings ...) 2024-11-17 0:34 ` kernel test robot @ 2024-11-17 1:17 ` kernel test robot 2024-11-17 13:34 ` kernel test robot 2024-11-18 9:52 ` Christian Loehle 5 siblings, 0 replies; 33+ messages in thread From: kernel test robot @ 2024-11-17 1:17 UTC (permalink / raw) To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: oe-kbuild-all, Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Hi Saravana, kernel test robot noticed the following build errors: [auto build test ERROR on rafael-pm/linux-next] [also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases config: powerpc-tqm8560_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/config) compiler: powerpc-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411170920.Pv29JceD-lkp@intel.com/ All errors (new ones prefixed by >>): powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_enter': suspend.c:(.text+0x414): undefined reference to `sched_set_energy_aware' >> powerpc-linux-ld: suspend.c:(.text+0x63c): undefined reference to `sched_set_energy_aware' powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter': suspend.c:(.text+0x834): undefined reference to `sched_set_energy_aware' powerpc-linux-ld: suspend.c:(.text+0x950): undefined reference to `sched_set_energy_aware' powerpc-linux-ld: suspend.c:(.text+0x970): undefined reference to `sched_set_energy_aware' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan ` (3 preceding siblings ...) 2024-11-17 1:17 ` kernel test robot @ 2024-11-17 13:34 ` kernel test robot 2024-11-18 9:52 ` Christian Loehle 5 siblings, 0 replies; 33+ messages in thread From: kernel test robot @ 2024-11-17 13:34 UTC (permalink / raw) To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: oe-kbuild-all, Saravana Kannan, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel Hi Saravana, kernel test robot noticed the following build warnings: [auto build test WARNING on rafael-pm/linux-next] [also build test WARNING on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases config: arm-randconfig-003-20241117 (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411172344.QqFap290-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from kernel/sched/build_utility.c:88: >> kernel/sched/topology.c:287:6: warning: no previous prototype for 'sched_set_energy_aware' [-Wmissing-prototypes] 287 | void sched_set_energy_aware(unsigned int enable) | ^~~~~~~~~~~~~~~~~~~~~~ vim +/sched_set_energy_aware +287 kernel/sched/topology.c 286 > 287 void sched_set_energy_aware(unsigned int enable) 288 { 289 int state; 290 291 if (!sched_is_eas_possible(cpu_active_mask)) 292 return; 293 294 sysctl_sched_energy_aware = enable; 295 state = static_branch_unlikely(&sched_energy_present); 296 if (state != sysctl_sched_energy_aware) 297 rebuild_sched_domains_energy(); 298 } 299 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan ` (4 preceding siblings ...) 2024-11-17 13:34 ` kernel test robot @ 2024-11-18 9:52 ` Christian Loehle 2024-11-18 17:18 ` Saravana Kannan 5 siblings, 1 reply; 33+ messages in thread From: Christian Loehle @ 2024-11-18 9:52 UTC (permalink / raw) To: Saravana Kannan, Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On 11/14/24 22:09, Saravana Kannan wrote: > As of today, the scheduler doesn't spread out all the kworker threads > across all the available CPUs during suspend/resume. This causes > significant resume latency during the dpm_resume*() phases. > > System resume latency is a very user-visible event. Reducing the > latency is more important than trying to be energy aware during that > period. > > Since there are no userspace processes running during this time and > this is a very short time window, we can simply disable EAS during > resume so that the parallel resume of the devices is spread across all > the CPUs. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > plus disabling EAS for resume yields significant improvements: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > | | | + EAS disabled | | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 123 ms | -32% | > +---------------------------+-----------+------------+------------------+ > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > kernel/power/suspend.c | 16 ++++++++++++++++ > kernel/sched/topology.c | 13 +++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 09f8397bae15..7304dc39958f 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > local_irq_enable(); > } > > +/* > + * Intentionally not part of a header file to avoid risk of abuse by other > + * drivers. > + */ > +void sched_set_energy_aware(unsigned int enable); > + > /** > * suspend_enter - Make the system enter the given sleep state. > * @state: System sleep state to enter. > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > Platform_wake: > platform_resume_noirq(state); > + /* > + * We do this only for resume instead of suspend and resume for these > + * reasons: > + * - Performance is more important than power for resume. > + * - Power spent entering suspend is more important for suspend. Also, > + * stangely, disabling EAS was making suspent a few milliseconds > + * slower in my testing. s/stangely/strangely s/suspent/suspend I'd also be curious why that is. Disabling EAS shouldn't be that expensive. What if you just hack the static branch switch (without the sd rebuild)? > + */ > + sched_set_energy_aware(0); > dpm_resume_noirq(PMSG_RESUME); > > Platform_early_resume: > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > Resume_devices: > suspend_test_start(); > dpm_resume_end(PMSG_RESUME); > + sched_set_energy_aware(1); > suspend_test_finish("resume devices"); > trace_suspend_resume(TPS("resume_console"), state, true); > resume_console(); > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9748a4c8d668..c069c0b17cbf 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > mutex_unlock(&sched_energy_mutex); > } > > +void sched_set_energy_aware(unsigned int enable) bool enable? > +{ > + int state; > + > + if (!sched_is_eas_possible(cpu_active_mask)) > + return; > + > + sysctl_sched_energy_aware = enable; > + state = static_branch_unlikely(&sched_energy_present); > + if (state != sysctl_sched_energy_aware) > + rebuild_sched_domains_energy(); > +} > + This definitely shouldn't just overwrite sysctl_sched_energy_aware, otherwise you enable EAS for users that explicitly disabled it. If it ever comes to other users wanting this we might need a eas_pause counter so this can be nested, but let's just hope that's never needed. Regards, Christian ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases 2024-11-18 9:52 ` Christian Loehle @ 2024-11-18 17:18 ` Saravana Kannan 0 siblings, 0 replies; 33+ messages in thread From: Saravana Kannan @ 2024-11-18 17:18 UTC (permalink / raw) To: Christian Loehle Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Mon, Nov 18, 2024 at 1:52 AM Christian Loehle <christian.loehle@arm.com> wrote: > > On 11/14/24 22:09, Saravana Kannan wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > s/stangely/strangely > s/suspent/suspend Will fix it in the next version. > I'd also be curious why that is. Disabling EAS shouldn't be that expensive. > What if you just hack the static branch switch (without the sd rebuild)? I don't think the enabling/disabling is the expensive part. Because I do it around dpm_resume*() and it helps performance. I tried to see if I could spot a reason, looking at the trace. But nothing stood out. My educated guess is that when going into suspend, the "thundering herd" happens early (all the leaf nodes suspend first) and then peters out. Whereas, during resume it's a slow ramp up until the "thundering herd" happens at the end (all the leaf nodes resume last). Spreading out the threads immediately (no EAS) probably has a different impact on these two styles of thundering herds. > > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > bool enable? Will do. > > > +{ > > + int state; > > + > > + if (!sched_is_eas_possible(cpu_active_mask)) > > + return; > > + > > + sysctl_sched_energy_aware = enable; > > + state = static_branch_unlikely(&sched_energy_present); > > + if (state != sysctl_sched_energy_aware) > > + rebuild_sched_domains_energy(); > > +} > > + > > This definitely shouldn't just overwrite > sysctl_sched_energy_aware, otherwise you enable EAS > for users that explicitly disabled it. Good point. Will fix it in the next version. Thanks for the review! -Saravana > > If it ever comes to other users wanting this we might > need a eas_pause counter so this can be nested, but > let's just hope that's never needed. > > Regards, > Christian > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 0/5] Optimize async device suspend/resume 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan ` (4 preceding siblings ...) 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan @ 2024-11-19 4:04 ` Saravana Kannan 2024-11-19 9:51 ` Greg Kroah-Hartman 5 siblings, 1 reply; 33+ messages in thread From: Saravana Kannan @ 2024-11-19 4:04 UTC (permalink / raw) To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider Cc: Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > A lot of the details are in patch 4/5 and 5/5. The summary is that > there's a lot of overhead and wasted work in how async device > suspend/resume is handled today. I talked about this and otther > suspend/resume issues at LPC 2024[1]. > > You can remove a lot of the overhead by doing a breadth first queuing of > async suspend/resumes. That's what this patch series does. I also > noticed that during resume, because of EAS, we don't use the bigger CPUs > as quickly. This was leading to a lot of scheduling latency and > preemption of runnable threads and increasing the resume latency. So, we > also disable EAS for that tiny period of resume where we know there'll > be a lot of parallelism. > > On a Pixel 6, averaging over 100 suspend/resume cycles, this patch > series yields significant improvements: > +---------------------------+-----------+----------------+------------+-------+ > | Phase | Old full sync | Old full async | New full async | > | | | | + EAS disabled | > +---------------------------+-----------+----------------+------------+-------+ > | Total dpm_suspend*() time | 107 ms | 72 ms | 62 ms | > +---------------------------+-----------+----------------+------------+-------+ > | Total dpm_resume*() time | 75 ms | 90 ms | 61 ms | > +---------------------------+-----------+----------------+------------+-------+ > | Sum | 182 ms | 162 ms | 123 ms | > +---------------------------+-----------+----------------+------------+-------+ > > There might be room for some more optimizations in the future, but I'm > keep this patch series simple enough so that it's easier to review and > check that it's not breaking anything. If this series lands and is > stable and no bug reports for a few months, I can work on optimizing > this a bit further. > > Thanks, > Saravana > P.S: Cc-ing some usual suspects you might be interested in testing this > out. > > [1] - https://lpc.events/event/18/contributions/1845/ > > Saravana Kannan (5): > PM: sleep: Fix runtime PM issue in dpm_resume() > PM: sleep: Remove unnecessary mutex lock when waiting on parent > PM: sleep: Add helper functions to loop through superior/subordinate > devs > PM: sleep: Do breadth first suspend/resume for async suspend/resume > PM: sleep: Spread out async kworker threads during dpm_resume*() > phases > > drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++--------- Hi Rafael/Greg, I'm waiting for one of your reviews before I send out the next version. -Saravana > kernel/power/suspend.c | 16 ++ > kernel/sched/topology.c | 13 ++ > 3 files changed, 276 insertions(+), 78 deletions(-) > > -- > 2.47.0.338.g60cca15819-goog > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v1 0/5] Optimize async device suspend/resume 2024-11-19 4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan @ 2024-11-19 9:51 ` Greg Kroah-Hartman 0 siblings, 0 replies; 33+ messages in thread From: Greg Kroah-Hartman @ 2024-11-19 9:51 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Geert Uytterhoeven, Marek Vasut, Bird, Tim, kernel-team, linux-pm, linux-kernel On Mon, Nov 18, 2024 at 08:04:26PM -0800, Saravana Kannan wrote: > On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > A lot of the details are in patch 4/5 and 5/5. The summary is that > > there's a lot of overhead and wasted work in how async device > > suspend/resume is handled today. I talked about this and otther > > suspend/resume issues at LPC 2024[1]. > > > > You can remove a lot of the overhead by doing a breadth first queuing of > > async suspend/resumes. That's what this patch series does. I also > > noticed that during resume, because of EAS, we don't use the bigger CPUs > > as quickly. This was leading to a lot of scheduling latency and > > preemption of runnable threads and increasing the resume latency. So, we > > also disable EAS for that tiny period of resume where we know there'll > > be a lot of parallelism. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, this patch > > series yields significant improvements: > > +---------------------------+-----------+----------------+------------+-------+ > > | Phase | Old full sync | Old full async | New full async | > > | | | | + EAS disabled | > > +---------------------------+-----------+----------------+------------+-------+ > > | Total dpm_suspend*() time | 107 ms | 72 ms | 62 ms | > > +---------------------------+-----------+----------------+------------+-------+ > > | Total dpm_resume*() time | 75 ms | 90 ms | 61 ms | > > +---------------------------+-----------+----------------+------------+-------+ > > | Sum | 182 ms | 162 ms | 123 ms | > > +---------------------------+-----------+----------------+------------+-------+ > > > > There might be room for some more optimizations in the future, but I'm > > keep this patch series simple enough so that it's easier to review and > > check that it's not breaking anything. If this series lands and is > > stable and no bug reports for a few months, I can work on optimizing > > this a bit further. > > > > Thanks, > > Saravana > > P.S: Cc-ing some usual suspects you might be interested in testing this > > out. > > > > [1] - https://lpc.events/event/18/contributions/1845/ > > > > Saravana Kannan (5): > > PM: sleep: Fix runtime PM issue in dpm_resume() > > PM: sleep: Remove unnecessary mutex lock when waiting on parent > > PM: sleep: Add helper functions to loop through superior/subordinate > > devs > > PM: sleep: Do breadth first suspend/resume for async suspend/resume > > PM: sleep: Spread out async kworker threads during dpm_resume*() > > phases > > > > drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++--------- > > Hi Rafael/Greg, > > I'm waiting for one of your reviews before I send out the next version. Please feel free to send, it's the middle of the merge window now, and I'm busy with that for the next 2 weeks, so I can't do anything until after that. thanks, greg k-h ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2025-03-14 20:54 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-14 22:09 [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 1/5] PM: sleep: Fix runtime PM issue in dpm_resume() Saravana Kannan 2024-11-16 7:43 ` Greg Kroah-Hartman 2024-11-16 21:06 ` Saravana Kannan 2024-12-04 12:53 ` Rafael J. Wysocki 2025-03-11 10:47 ` Rafael J. Wysocki 2025-03-13 1:49 ` Saravana Kannan 2025-03-13 10:58 ` Rafael J. Wysocki 2025-03-14 20:47 ` Pavel Machek 2025-03-14 20:49 ` Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent Saravana Kannan 2024-12-02 20:11 ` Rafael J. Wysocki 2024-12-02 20:16 ` Rafael J. Wysocki 2024-12-02 20:46 ` Saravana Kannan 2024-12-02 21:14 ` Rafael J. Wysocki 2024-12-02 23:27 ` Saravana Kannan 2024-12-04 12:21 ` Rafael J. Wysocki 2024-11-14 22:09 ` [PATCH v1 3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs Saravana Kannan 2024-11-14 22:09 ` [PATCH v1 4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume Saravana Kannan 2025-03-11 11:25 ` Geert Uytterhoeven 2024-11-14 22:09 ` [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases Saravana Kannan 2024-11-15 5:25 ` Saravana Kannan 2024-11-15 9:25 ` Geert Uytterhoeven 2024-11-15 15:30 ` Vincent Guittot 2024-11-15 16:12 ` Vincent Guittot 2024-11-15 18:33 ` Saravana Kannan 2024-11-17 0:34 ` kernel test robot 2024-11-17 1:17 ` kernel test robot 2024-11-17 13:34 ` kernel test robot 2024-11-18 9:52 ` Christian Loehle 2024-11-18 17:18 ` Saravana Kannan 2024-11-19 4:04 ` [PATCH v1 0/5] Optimize async device suspend/resume Saravana Kannan 2024-11-19 9:51 ` Greg Kroah-Hartman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).