* [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort
@ 2025-11-17 9:31 Rose Wu
2025-11-17 16:59 ` Rafael J. Wysocki
0 siblings, 1 reply; 14+ messages in thread
From: Rose Wu @ 2025-11-17 9:31 UTC (permalink / raw)
To: rafael.j.wysocki, linux-pm, regressions
Cc: saravanak, len.brown, pavel, linux-kernel, wsd_upstream,
linux-mediatek, 士顏 邱,
靖智 高
Hi Rafael and All,
I am reporting a regression introduced by the commit
443046d1ad66607f324c604b9fbdf11266fa8aad (PM: sleep: Make suspend of
devices more asynchronous), which can lead to a kernel panic (data
abort) if a late suspend aborts.
The commit modifies list handling during suspend. When a device suspend
aborts at the "late" stage, `dpm_suspended_list` is spliced into
`dpm_late_early_list`.
This creates an imbalance. Devices on this list that had not yet
executed `pm_runtime_disable()` in `device_suspend_late()` are now
incorrectly subjected to `pm_runtime_enable()` during the subsequent
`device_resume_early()` sequence.
This causes two issues:
1. Numerous error messages in dmesg: "Attempt to enable runtime PM when
it is blocked."
2. A critical failure for simple-bus devices: When
`simple_pm_bus_runtime_resume()` is called for a device whose bus is
`NULL`, the kernel attempts to access the null bus struct, triggering a
data abort.
Steps to Reproduce:
The issue can be reliably reproduced by forcing a late suspend to
abort.
1. Apply the following modification to the `device_suspend_late()`
function to simulate a wakeup event:
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1568,7 +1568,7 @@ static int device_suspend_late(struct device
*dev, pm_message_t state, bool asyn
if (async_error)
goto Complete;
- if (pm_wakeup_pending()) {
+ if (1) { /* Force abort for testing */
async_error = -EBUSY;
goto Complete;
}
2. Trigger a system suspend.
3. The system will attempt to suspend, abort at the late stage, and
then trigger the data abort during the resume sequence.
Call Trace:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
pc : [0xffffffe3988e81e4] simple_pm_bus_runtime_resume+0x1c/0x90
lr : [0xffffffe398a848d0] pm_generic_runtime_resume+0x40/0x58
As a potential fix, I am wondering if a conditional check is needed in
`device_resume_early()` before invoking `pm_runtime_enable()` for a
device?
Best Regards,
Rose
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort 2025-11-17 9:31 [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort Rose Wu @ 2025-11-17 16:59 ` Rafael J. Wysocki 2025-11-17 18:57 ` [PATCH v1] PM: sleep: core: Fix runtime PM enabling in device_resume_early() Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-17 16:59 UTC (permalink / raw) To: Rose Wu Cc: rafael.j.wysocki, linux-pm, regressions, saravanak, len.brown, pavel, linux-kernel, wsd_upstream, linux-mediatek, 士顏 邱, 靖智 高 [-- Attachment #1: Type: text/plain, Size: 966 bytes --] Hi, On Mon, Nov 17, 2025 at 10:31 AM Rose Wu <ya-jou.wu@mediatek.com> wrote: > > Hi Rafael and All, > > I am reporting a regression introduced by the commit > 443046d1ad66607f324c604b9fbdf11266fa8aad (PM: sleep: Make suspend of > devices more asynchronous), which can lead to a kernel panic (data > abort) if a late suspend aborts. > The commit modifies list handling during suspend. When a device suspend > aborts at the "late" stage, `dpm_suspended_list` is spliced into > `dpm_late_early_list`. > This creates an imbalance. Devices on this list that had not yet > executed `pm_runtime_disable()` in `device_suspend_late()` are now > incorrectly subjected to `pm_runtime_enable()` during the subsequent > `device_resume_early()` sequence. Ah, obviously. Does the attached patch (that should apply on top of 6.18-rc6) help? If this patch doesn't apply to your kernel, making an analogous change to it should be straightforward enough. [-- Attachment #2: pm-sleep-fix-early-resume.patch --] [-- Type: text/x-patch, Size: 371 bytes --] --- drivers/base/power/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -941,11 +941,11 @@ Run: Skip: dev->power.is_late_suspended = false; + pm_runtime_enable(dev); Out: TRACE_RESUME(error); - pm_runtime_enable(dev); complete_all(&dev->power.completion); if (error) { ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v1] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-17 16:59 ` Rafael J. Wysocki @ 2025-11-17 18:57 ` Rafael J. Wysocki 2025-11-18 8:31 ` Rose Wu 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-17 18:57 UTC (permalink / raw) To: Rose Wu, linux-pm Cc: rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, 士顏 邱, 靖智 高, Ulf Hansson On Monday, November 17, 2025 5:59:05 PM CET Rafael J. Wysocki wrote: > Hi, > > On Mon, Nov 17, 2025 at 10:31 AM Rose Wu <ya-jou.wu@mediatek.com> wrote: > > > > Hi Rafael and All, > > > > I am reporting a regression introduced by the commit > > 443046d1ad66607f324c604b9fbdf11266fa8aad (PM: sleep: Make suspend of > > devices more asynchronous), which can lead to a kernel panic (data > > abort) if a late suspend aborts. > > The commit modifies list handling during suspend. When a device suspend > > aborts at the "late" stage, `dpm_suspended_list` is spliced into > > `dpm_late_early_list`. > > This creates an imbalance. Devices on this list that had not yet > > executed `pm_runtime_disable()` in `device_suspend_late()` are now > > incorrectly subjected to `pm_runtime_enable()` during the subsequent > > `device_resume_early()` sequence. > > Ah, obviously. > > Does the attached patch (that should apply on top of 6.18-rc6) help? > > If this patch doesn't apply to your kernel, making an analogous change > to it should be straightforward enough. That patch was incomplete because it was missing a complementary change in device_suspend_late() to ensure that power.is_suspended will be set for all devices with disabled runtime PM. Please try the one below instead. Thanks! --- From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Runtime PM should only be enabled in device_resume_early() if it has been disabled for the given device by device_suspend_late(). Otherwise, it may cause runtime PM callbacks to run prematurely in some cases or cause runtime PM to be enabled for devices without runtime PM support. That leads to further functional issues. Make two changes to address this problem. First, reorder device_suspend_late() to only disable runtime PM for a device if the power.is_late_suspended flag is going to be set for it. In all of the other cases, disabling runtime PM for the device is not in fact necessary. Second, make device_resume_early() only enable runtime PM for the devices with the power.is_late_suspended flag set. Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") Reported-by: Rose Wu <ya-jou.wu@mediatek.com> Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- drivers/base/power/main.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -941,11 +941,11 @@ Run: Skip: dev->power.is_late_suspended = false; + pm_runtime_enable(dev); Out: TRACE_RESUME(error); - pm_runtime_enable(dev); complete_all(&dev->power.completion); if (error) { @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d TRACE_DEVICE(dev); TRACE_SUSPEND(0); - /* - * Disable runtime PM for the device without checking if there is a - * pending resume request for it. - */ - __pm_runtime_disable(dev, false); - dpm_wait_for_subordinate(dev, async); if (READ_ONCE(async_error)) @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d if (dev->power.syscore || dev->power.direct_complete) goto Complete; + /* + * Disable runtime PM for the device without checking if there is a + * pending resume request for it. + */ + __pm_runtime_disable(dev, false); + if (dev->pm_domain) { info = "late power domain "; callback = pm_late_early_op(&dev->pm_domain->ops, state); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-17 18:57 ` [PATCH v1] PM: sleep: core: Fix runtime PM enabling in device_resume_early() Rafael J. Wysocki @ 2025-11-18 8:31 ` Rose Wu 2025-11-18 11:48 ` [PATCH v2] " Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Rose Wu @ 2025-11-18 8:31 UTC (permalink / raw) To: Rafael J. Wysocki, linux-pm Cc: rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis.chiu, Johnny-cc.Kao, Ulf Hansson Hi, On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > Make two changes to address this problem. > > First, reorder device_suspend_late() to only disable runtime PM for a > device if the power.is_late_suspended flag is going to be set for it. > In all of the other cases, disabling runtime PM for the device is not > in fact necessary. > > Second, make device_resume_early() only enable runtime PM for the > devices with the power.is_late_suspended flag set. > My concern is with the error path in device_suspend_late(). If a device fails its dpm_run_callback(), it appears that its power.is_late_suspended flag is not set, potentially leaving its runtime PM disabled during the resume sequence. Regards, Rose ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 8:31 ` Rose Wu @ 2025-11-18 11:48 ` Rafael J. Wysocki 2025-11-18 12:17 ` Ulf Hansson 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-18 11:48 UTC (permalink / raw) To: linux-pm, Rose Wu Cc: rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao, Ulf Hansson On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > Hi, > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > Make two changes to address this problem. > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > device if the power.is_late_suspended flag is going to be set for it. > > In all of the other cases, disabling runtime PM for the device is not > > in fact necessary. > > > > Second, make device_resume_early() only enable runtime PM for the > > devices with the power.is_late_suspended flag set. > > > > My concern is with the error path in device_suspend_late(). > If a device fails its dpm_run_callback(), it appears that its > power.is_late_suspended flag is not set, potentially leaving its runtime > PM disabled during the resume sequence. Right, pm_runtime_enable() is missing in the error path after calling dpm_run_callback(). --- From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Runtime PM should only be enabled in device_resume_early() if it has been disabled for the given device by device_suspend_late(). Otherwise, it may cause runtime PM callbacks to run prematurely in some cases which leads to further functional issues. Make two changes to address this problem. First, reorder device_suspend_late() to only disable runtime PM for a device when it is going to look for the device's callback. In all of the other cases, disabling runtime PM for the device is not in fact necessary. However, if the device's callback returns an error and the power.is_late_suspended flag is not going to be set, enable runtime PM so it only remains disabled when power.is_late_suspended is set. Second, make device_resume_early() only enable runtime PM for the devices with the power.is_late_suspended flag set. Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") Reported-by: Rose Wu <ya-jou.wu@mediatek.com> Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). --- drivers/base/power/main.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -941,11 +941,11 @@ Run: Skip: dev->power.is_late_suspended = false; + pm_runtime_enable(dev); Out: TRACE_RESUME(error); - pm_runtime_enable(dev); complete_all(&dev->power.completion); if (error) { @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d TRACE_DEVICE(dev); TRACE_SUSPEND(0); - /* - * Disable runtime PM for the device without checking if there is a - * pending resume request for it. - */ - __pm_runtime_disable(dev, false); - dpm_wait_for_subordinate(dev, async); if (READ_ONCE(async_error)) @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d if (dev->power.syscore || dev->power.direct_complete) goto Complete; + /* + * Disable runtime PM for the device without checking if there is a + * pending resume request for it. + */ + __pm_runtime_disable(dev, false); + if (dev->pm_domain) { info = "late power domain "; callback = pm_late_early_op(&dev->pm_domain->ops, state); @@ -1679,6 +1679,7 @@ Run: WRITE_ONCE(async_error, error); dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, async ? " async late" : " late", error); + pm_runtime_enable(dev); goto Complete; } dpm_propagate_wakeup_to_parent(dev); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 11:48 ` [PATCH v2] " Rafael J. Wysocki @ 2025-11-18 12:17 ` Ulf Hansson 2025-11-18 12:26 ` Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Ulf Hansson @ 2025-11-18 12:17 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-pm, Rose Wu, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > Hi, > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > Make two changes to address this problem. > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > device if the power.is_late_suspended flag is going to be set for it. > > > In all of the other cases, disabling runtime PM for the device is not > > > in fact necessary. > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > devices with the power.is_late_suspended flag set. > > > > > > > My concern is with the error path in device_suspend_late(). > > If a device fails its dpm_run_callback(), it appears that its > > power.is_late_suspended flag is not set, potentially leaving its runtime > > PM disabled during the resume sequence. > > Right, pm_runtime_enable() is missing in the error path after calling > dpm_run_callback(). > > --- > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Runtime PM should only be enabled in device_resume_early() if it has > been disabled for the given device by device_suspend_late(). Otherwise, > it may cause runtime PM callbacks to run prematurely in some cases > which leads to further functional issues. > > Make two changes to address this problem. > > First, reorder device_suspend_late() to only disable runtime PM for a > device when it is going to look for the device's callback. In all of > the other cases, disabling runtime PM for the device is not in fact > necessary. However, if the device's callback returns an error and the > power.is_late_suspended flag is not going to be set, enable runtime > PM so it only remains disabled when power.is_late_suspended is set. > > Second, make device_resume_early() only enable runtime PM for the > devices with the power.is_late_suspended flag set. > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > --- > drivers/base/power/main.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -941,11 +941,11 @@ Run: > > Skip: > dev->power.is_late_suspended = false; > + pm_runtime_enable(dev); > > Out: > TRACE_RESUME(error); > > - pm_runtime_enable(dev); > complete_all(&dev->power.completion); > > if (error) { > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > TRACE_DEVICE(dev); > TRACE_SUSPEND(0); > > - /* > - * Disable runtime PM for the device without checking if there is a > - * pending resume request for it. > - */ > - __pm_runtime_disable(dev, false); > - > dpm_wait_for_subordinate(dev, async); > > if (READ_ONCE(async_error)) > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > if (dev->power.syscore || dev->power.direct_complete) > goto Complete; > > + /* > + * Disable runtime PM for the device without checking if there is a > + * pending resume request for it. > + */ > + __pm_runtime_disable(dev, false); > + Moving this here means we are going to keep runtime pm enabled for syscore devices during system wide suspend/resume. That's quite a change in behaviour for a fix for a regression, I think. Not saying that it can't work though. Although, perhaps better to call __pm_runtime_disable() a few lines earlier and use a separate flag to track that we need to call pm_runtime_enable() in device_resume_early()? > if (dev->pm_domain) { > info = "late power domain "; > callback = pm_late_early_op(&dev->pm_domain->ops, state); > @@ -1679,6 +1679,7 @@ Run: > WRITE_ONCE(async_error, error); > dpm_save_failed_dev(dev_name(dev)); > pm_dev_err(dev, state, async ? " async late" : " late", error); > + pm_runtime_enable(dev); > goto Complete; > } > dpm_propagate_wakeup_to_parent(dev); > > > Kind regards Uffe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:17 ` Ulf Hansson @ 2025-11-18 12:26 ` Rafael J. Wysocki 2025-11-18 12:45 ` [PATCH v3] " Rafael J. Wysocki 2025-11-18 12:49 ` [PATCH v2] " Ulf Hansson 0 siblings, 2 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-18 12:26 UTC (permalink / raw) To: Ulf Hansson Cc: Rafael J. Wysocki, linux-pm, Rose Wu, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, Nov 18, 2025 at 1:18 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > > Hi, > > > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > > > Make two changes to address this problem. > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > device if the power.is_late_suspended flag is going to be set for it. > > > > In all of the other cases, disabling runtime PM for the device is not > > > > in fact necessary. > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > My concern is with the error path in device_suspend_late(). > > > If a device fails its dpm_run_callback(), it appears that its > > > power.is_late_suspended flag is not set, potentially leaving its runtime > > > PM disabled during the resume sequence. > > > > Right, pm_runtime_enable() is missing in the error path after calling > > dpm_run_callback(). > > > > --- > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Runtime PM should only be enabled in device_resume_early() if it has > > been disabled for the given device by device_suspend_late(). Otherwise, > > it may cause runtime PM callbacks to run prematurely in some cases > > which leads to further functional issues. > > > > Make two changes to address this problem. > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > device when it is going to look for the device's callback. In all of > > the other cases, disabling runtime PM for the device is not in fact > > necessary. However, if the device's callback returns an error and the > > power.is_late_suspended flag is not going to be set, enable runtime > > PM so it only remains disabled when power.is_late_suspended is set. > > > > Second, make device_resume_early() only enable runtime PM for the > > devices with the power.is_late_suspended flag set. > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > > > --- > > drivers/base/power/main.c | 15 ++++++++------- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -941,11 +941,11 @@ Run: > > > > Skip: > > dev->power.is_late_suspended = false; > > + pm_runtime_enable(dev); > > > > Out: > > TRACE_RESUME(error); > > > > - pm_runtime_enable(dev); > > complete_all(&dev->power.completion); > > > > if (error) { > > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > > TRACE_DEVICE(dev); > > TRACE_SUSPEND(0); > > > > - /* > > - * Disable runtime PM for the device without checking if there is a > > - * pending resume request for it. > > - */ > > - __pm_runtime_disable(dev, false); > > - > > dpm_wait_for_subordinate(dev, async); > > > > if (READ_ONCE(async_error)) > > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > > if (dev->power.syscore || dev->power.direct_complete) > > goto Complete; > > > > + /* > > + * Disable runtime PM for the device without checking if there is a > > + * pending resume request for it. > > + */ > > + __pm_runtime_disable(dev, false); > > + > > Moving this here means we are going to keep runtime pm enabled for > syscore devices during system wide suspend/resume. That's quite a > change in behaviour for a fix for a regression, I think. Not saying > that it can't work though. syscore devices can be a special case, but I thought it wouldn't be necessary to special-case them. Do you actually know about any of them needing special casing? > Although, perhaps better to call __pm_runtime_disable() a few lines > earlier and use a separate flag to track that we need to call > pm_runtime_enable() in device_resume_early()? I don't think it is necessary or even useful to introduce new flags for this. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:26 ` Rafael J. Wysocki @ 2025-11-18 12:45 ` Rafael J. Wysocki 2025-11-18 12:57 ` Ulf Hansson 2025-11-18 14:16 ` [PATCH v4] " Rafael J. Wysocki 2025-11-18 12:49 ` [PATCH v2] " Ulf Hansson 1 sibling, 2 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-18 12:45 UTC (permalink / raw) To: Ulf Hansson, Rose Wu Cc: linux-pm, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tuesday, November 18, 2025 1:26:06 PM CET Rafael J. Wysocki wrote: > On Tue, Nov 18, 2025 at 1:18 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > > > Hi, > > > > > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > > > > > Make two changes to address this problem. > > > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > > device if the power.is_late_suspended flag is going to be set for it. > > > > > In all of the other cases, disabling runtime PM for the device is not > > > > > in fact necessary. > > > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > > > > My concern is with the error path in device_suspend_late(). > > > > If a device fails its dpm_run_callback(), it appears that its > > > > power.is_late_suspended flag is not set, potentially leaving its runtime > > > > PM disabled during the resume sequence. > > > > > > Right, pm_runtime_enable() is missing in the error path after calling > > > dpm_run_callback(). > > > > > > --- > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Runtime PM should only be enabled in device_resume_early() if it has > > > been disabled for the given device by device_suspend_late(). Otherwise, > > > it may cause runtime PM callbacks to run prematurely in some cases > > > which leads to further functional issues. > > > > > > Make two changes to address this problem. > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > device when it is going to look for the device's callback. In all of > > > the other cases, disabling runtime PM for the device is not in fact > > > necessary. However, if the device's callback returns an error and the > > > power.is_late_suspended flag is not going to be set, enable runtime > > > PM so it only remains disabled when power.is_late_suspended is set. > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > devices with the power.is_late_suspended flag set. > > > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > > > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > > > > > --- > > > drivers/base/power/main.c | 15 ++++++++------- > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > --- a/drivers/base/power/main.c > > > +++ b/drivers/base/power/main.c > > > @@ -941,11 +941,11 @@ Run: > > > > > > Skip: > > > dev->power.is_late_suspended = false; > > > + pm_runtime_enable(dev); > > > > > > Out: > > > TRACE_RESUME(error); > > > > > > - pm_runtime_enable(dev); > > > complete_all(&dev->power.completion); > > > > > > if (error) { > > > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > > > TRACE_DEVICE(dev); > > > TRACE_SUSPEND(0); > > > > > > - /* > > > - * Disable runtime PM for the device without checking if there is a > > > - * pending resume request for it. > > > - */ > > > - __pm_runtime_disable(dev, false); > > > - > > > dpm_wait_for_subordinate(dev, async); > > > > > > if (READ_ONCE(async_error)) > > > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > > > if (dev->power.syscore || dev->power.direct_complete) > > > goto Complete; > > > > > > + /* > > > + * Disable runtime PM for the device without checking if there is a > > > + * pending resume request for it. > > > + */ > > > + __pm_runtime_disable(dev, false); > > > + > > > > Moving this here means we are going to keep runtime pm enabled for > > syscore devices during system wide suspend/resume. That's quite a > > change in behaviour for a fix for a regression, I think. Not saying > > that it can't work though. > > syscore devices can be a special case, but I thought it wouldn't be > necessary to special-case them. > > Do you actually know about any of them needing special casing? In any case, below is a v3 that special-cases syscore devices. Fortunately, it is not much more complicated than the v2. Thanks! --- From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Runtime PM should only be enabled in device_resume_early() if it has been disabled for the given device by device_suspend_late(). Otherwise, it may cause runtime PM callbacks to run prematurely in some cases which leads to further functional issues. Make two changes to address this problem. First, reorder device_suspend_late() to only disable runtime PM for a device when it is going to look for the device's callback or if the device is a "syscore" one. In all of the other cases, disabling runtime PM for the device is not in fact necessary. However, if the device's callback returns an error and the power.is_late_suspended flag is not going to be set, enable runtime PM so it only remains disabled when power.is_late_suspended is set. Second, make device_resume_early() only enable runtime PM for the devices with the power.is_late_suspended flag set. Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") Reported-by: Rose Wu <ya-jou.wu@mediatek.com> Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- v2 -> v1: * Also set is_late_suspended for syscore devices to avoid skipping runtime PM disabling for them (Ulf). * Update the changelog to reflect the above. v1 -> v2: * Add pm_runtime_enable() to device_suspend_late() error path (Rose). * Update the changelog to reflect the above. --- drivers/base/power/main.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -903,7 +903,10 @@ static void device_resume_early(struct d TRACE_DEVICE(dev); TRACE_RESUME(0); - if (dev->power.syscore || dev->power.direct_complete) + if (dev->power.syscore) + goto Skip; + + if (dev->power.direct_complete) goto Out; if (!dev->power.is_late_suspended) @@ -941,11 +944,11 @@ Run: Skip: dev->power.is_late_suspended = false; + pm_runtime_enable(dev); Out: TRACE_RESUME(error); - pm_runtime_enable(dev); complete_all(&dev->power.completion); if (error) { @@ -1630,12 +1633,6 @@ static void device_suspend_late(struct d TRACE_DEVICE(dev); TRACE_SUSPEND(0); - /* - * Disable runtime PM for the device without checking if there is a - * pending resume request for it. - */ - __pm_runtime_disable(dev, false); - dpm_wait_for_subordinate(dev, async); if (READ_ONCE(async_error)) @@ -1646,9 +1643,18 @@ static void device_suspend_late(struct d goto Complete; } - if (dev->power.syscore || dev->power.direct_complete) + if (dev->power.direct_complete) goto Complete; + /* + * Disable runtime PM for the device without checking if there is a + * pending resume request for it. + */ + __pm_runtime_disable(dev, false); + + if (dev->power.syscore) + goto Skip; + if (dev->pm_domain) { info = "late power domain "; callback = pm_late_early_op(&dev->pm_domain->ops, state); @@ -1679,6 +1685,7 @@ Run: WRITE_ONCE(async_error, error); dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, async ? " async late" : " late", error); + pm_runtime_enable(dev); goto Complete; } dpm_propagate_wakeup_to_parent(dev); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:45 ` [PATCH v3] " Rafael J. Wysocki @ 2025-11-18 12:57 ` Ulf Hansson 2025-11-18 13:01 ` Rafael J. Wysocki 2025-11-18 14:16 ` [PATCH v4] " Rafael J. Wysocki 1 sibling, 1 reply; 14+ messages in thread From: Ulf Hansson @ 2025-11-18 12:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rose Wu, linux-pm, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, 18 Nov 2025 at 13:45, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tuesday, November 18, 2025 1:26:06 PM CET Rafael J. Wysocki wrote: > > On Tue, Nov 18, 2025 at 1:18 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > > > > Hi, > > > > > > > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > > > > > > > Make two changes to address this problem. > > > > > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > > > device if the power.is_late_suspended flag is going to be set for it. > > > > > > In all of the other cases, disabling runtime PM for the device is not > > > > > > in fact necessary. > > > > > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > > > > > > > My concern is with the error path in device_suspend_late(). > > > > > If a device fails its dpm_run_callback(), it appears that its > > > > > power.is_late_suspended flag is not set, potentially leaving its runtime > > > > > PM disabled during the resume sequence. > > > > > > > > Right, pm_runtime_enable() is missing in the error path after calling > > > > dpm_run_callback(). > > > > > > > > --- > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > Runtime PM should only be enabled in device_resume_early() if it has > > > > been disabled for the given device by device_suspend_late(). Otherwise, > > > > it may cause runtime PM callbacks to run prematurely in some cases > > > > which leads to further functional issues. > > > > > > > > Make two changes to address this problem. > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > device when it is going to look for the device's callback. In all of > > > > the other cases, disabling runtime PM for the device is not in fact > > > > necessary. However, if the device's callback returns an error and the > > > > power.is_late_suspended flag is not going to be set, enable runtime > > > > PM so it only remains disabled when power.is_late_suspended is set. > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > devices with the power.is_late_suspended flag set. > > > > > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > --- > > > > > > > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > > > > > > > --- > > > > drivers/base/power/main.c | 15 ++++++++------- > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > --- a/drivers/base/power/main.c > > > > +++ b/drivers/base/power/main.c > > > > @@ -941,11 +941,11 @@ Run: > > > > > > > > Skip: > > > > dev->power.is_late_suspended = false; > > > > + pm_runtime_enable(dev); > > > > > > > > Out: > > > > TRACE_RESUME(error); > > > > > > > > - pm_runtime_enable(dev); > > > > complete_all(&dev->power.completion); > > > > > > > > if (error) { > > > > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > > > > TRACE_DEVICE(dev); > > > > TRACE_SUSPEND(0); > > > > > > > > - /* > > > > - * Disable runtime PM for the device without checking if there is a > > > > - * pending resume request for it. > > > > - */ > > > > - __pm_runtime_disable(dev, false); > > > > - > > > > dpm_wait_for_subordinate(dev, async); > > > > > > > > if (READ_ONCE(async_error)) > > > > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > > > > if (dev->power.syscore || dev->power.direct_complete) > > > > goto Complete; > > > > > > > > + /* > > > > + * Disable runtime PM for the device without checking if there is a > > > > + * pending resume request for it. > > > > + */ > > > > + __pm_runtime_disable(dev, false); > > > > + > > > > > > Moving this here means we are going to keep runtime pm enabled for > > > syscore devices during system wide suspend/resume. That's quite a > > > change in behaviour for a fix for a regression, I think. Not saying > > > that it can't work though. > > > > syscore devices can be a special case, but I thought it wouldn't be > > necessary to special-case them. > > > > Do you actually know about any of them needing special casing? > > In any case, below is a v3 that special-cases syscore devices. Fortunately, > it is not much more complicated than the v2. > > Thanks! > > --- > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Runtime PM should only be enabled in device_resume_early() if it has > been disabled for the given device by device_suspend_late(). Otherwise, > it may cause runtime PM callbacks to run prematurely in some cases > which leads to further functional issues. > > Make two changes to address this problem. > > First, reorder device_suspend_late() to only disable runtime PM for a > device when it is going to look for the device's callback or if the > device is a "syscore" one. In all of the other cases, disabling runtime > PM for the device is not in fact necessary. However, if the device's > callback returns an error and the power.is_late_suspended flag is not > going to be set, enable runtime PM so it only remains disabled when > power.is_late_suspended is set. > > Second, make device_resume_early() only enable runtime PM for the > devices with the power.is_late_suspended flag set. > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > v2 -> v1: > * Also set is_late_suspended for syscore devices to avoid skipping runtime PM > disabling for them (Ulf). > * Update the changelog to reflect the above. > > v1 -> v2: > * Add pm_runtime_enable() to device_suspend_late() error path (Rose). > * Update the changelog to reflect the above. > > --- > drivers/base/power/main.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -903,7 +903,10 @@ static void device_resume_early(struct d > TRACE_DEVICE(dev); > TRACE_RESUME(0); > > - if (dev->power.syscore || dev->power.direct_complete) > + if (dev->power.syscore) > + goto Skip; This will enable runtime PM for a syscore device, even if it wasn't disabled when device_suspend_late() wasn't called for it. > + > + if (dev->power.direct_complete) > goto Out; > > if (!dev->power.is_late_suspended) > @@ -941,11 +944,11 @@ Run: > > Skip: > dev->power.is_late_suspended = false; > + pm_runtime_enable(dev); > > Out: > TRACE_RESUME(error); > > - pm_runtime_enable(dev); > complete_all(&dev->power.completion); > > if (error) { > @@ -1630,12 +1633,6 @@ static void device_suspend_late(struct d > TRACE_DEVICE(dev); > TRACE_SUSPEND(0); > > - /* > - * Disable runtime PM for the device without checking if there is a > - * pending resume request for it. > - */ > - __pm_runtime_disable(dev, false); > - > dpm_wait_for_subordinate(dev, async); > > if (READ_ONCE(async_error)) > @@ -1646,9 +1643,18 @@ static void device_suspend_late(struct d > goto Complete; > } > > - if (dev->power.syscore || dev->power.direct_complete) > + if (dev->power.direct_complete) > goto Complete; > > + /* > + * Disable runtime PM for the device without checking if there is a > + * pending resume request for it. > + */ > + __pm_runtime_disable(dev, false); > + > + if (dev->power.syscore) > + goto Skip; > + > if (dev->pm_domain) { > info = "late power domain "; > callback = pm_late_early_op(&dev->pm_domain->ops, state); > @@ -1679,6 +1685,7 @@ Run: > WRITE_ONCE(async_error, error); > dpm_save_failed_dev(dev_name(dev)); > pm_dev_err(dev, state, async ? " async late" : " late", error); > + pm_runtime_enable(dev); > goto Complete; > } > dpm_propagate_wakeup_to_parent(dev); > > > Kind regards Uffe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:57 ` Ulf Hansson @ 2025-11-18 13:01 ` Rafael J. Wysocki 0 siblings, 0 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-18 13:01 UTC (permalink / raw) To: Ulf Hansson Cc: Rafael J. Wysocki, Rose Wu, linux-pm, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, Nov 18, 2025 at 1:58 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Tue, 18 Nov 2025 at 13:45, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tuesday, November 18, 2025 1:26:06 PM CET Rafael J. Wysocki wrote: > > > On Tue, Nov 18, 2025 at 1:18 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > > On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > > > > > Hi, > > > > > > > > > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > > > > > > > > > Make two changes to address this problem. > > > > > > > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > > > > device if the power.is_late_suspended flag is going to be set for it. > > > > > > > In all of the other cases, disabling runtime PM for the device is not > > > > > > > in fact necessary. > > > > > > > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > > > > > > > > > > My concern is with the error path in device_suspend_late(). > > > > > > If a device fails its dpm_run_callback(), it appears that its > > > > > > power.is_late_suspended flag is not set, potentially leaving its runtime > > > > > > PM disabled during the resume sequence. > > > > > > > > > > Right, pm_runtime_enable() is missing in the error path after calling > > > > > dpm_run_callback(). > > > > > > > > > > --- > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > > > Runtime PM should only be enabled in device_resume_early() if it has > > > > > been disabled for the given device by device_suspend_late(). Otherwise, > > > > > it may cause runtime PM callbacks to run prematurely in some cases > > > > > which leads to further functional issues. > > > > > > > > > > Make two changes to address this problem. > > > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > > device when it is going to look for the device's callback. In all of > > > > > the other cases, disabling runtime PM for the device is not in fact > > > > > necessary. However, if the device's callback returns an error and the > > > > > power.is_late_suspended flag is not going to be set, enable runtime > > > > > PM so it only remains disabled when power.is_late_suspended is set. > > > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > > > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > > > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > > > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > --- > > > > > > > > > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > > > > > > > > > --- > > > > > drivers/base/power/main.c | 15 ++++++++------- > > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > > > --- a/drivers/base/power/main.c > > > > > +++ b/drivers/base/power/main.c > > > > > @@ -941,11 +941,11 @@ Run: > > > > > > > > > > Skip: > > > > > dev->power.is_late_suspended = false; > > > > > + pm_runtime_enable(dev); > > > > > > > > > > Out: > > > > > TRACE_RESUME(error); > > > > > > > > > > - pm_runtime_enable(dev); > > > > > complete_all(&dev->power.completion); > > > > > > > > > > if (error) { > > > > > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > > > > > TRACE_DEVICE(dev); > > > > > TRACE_SUSPEND(0); > > > > > > > > > > - /* > > > > > - * Disable runtime PM for the device without checking if there is a > > > > > - * pending resume request for it. > > > > > - */ > > > > > - __pm_runtime_disable(dev, false); > > > > > - > > > > > dpm_wait_for_subordinate(dev, async); > > > > > > > > > > if (READ_ONCE(async_error)) > > > > > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > > > > > if (dev->power.syscore || dev->power.direct_complete) > > > > > goto Complete; > > > > > > > > > > + /* > > > > > + * Disable runtime PM for the device without checking if there is a > > > > > + * pending resume request for it. > > > > > + */ > > > > > + __pm_runtime_disable(dev, false); > > > > > + > > > > > > > > Moving this here means we are going to keep runtime pm enabled for > > > > syscore devices during system wide suspend/resume. That's quite a > > > > change in behaviour for a fix for a regression, I think. Not saying > > > > that it can't work though. > > > > > > syscore devices can be a special case, but I thought it wouldn't be > > > necessary to special-case them. > > > > > > Do you actually know about any of them needing special casing? > > > > In any case, below is a v3 that special-cases syscore devices. Fortunately, > > it is not much more complicated than the v2. > > > > Thanks! > > > > --- > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Runtime PM should only be enabled in device_resume_early() if it has > > been disabled for the given device by device_suspend_late(). Otherwise, > > it may cause runtime PM callbacks to run prematurely in some cases > > which leads to further functional issues. > > > > Make two changes to address this problem. > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > device when it is going to look for the device's callback or if the > > device is a "syscore" one. In all of the other cases, disabling runtime > > PM for the device is not in fact necessary. However, if the device's > > callback returns an error and the power.is_late_suspended flag is not > > going to be set, enable runtime PM so it only remains disabled when > > power.is_late_suspended is set. > > > > Second, make device_resume_early() only enable runtime PM for the > > devices with the power.is_late_suspended flag set. > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > v2 -> v1: > > * Also set is_late_suspended for syscore devices to avoid skipping runtime PM > > disabling for them (Ulf). > > * Update the changelog to reflect the above. > > > > v1 -> v2: > > * Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > * Update the changelog to reflect the above. > > > > --- > > drivers/base/power/main.c | 25 ++++++++++++++++--------- > > 1 file changed, 16 insertions(+), 9 deletions(-) > > > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -903,7 +903,10 @@ static void device_resume_early(struct d > > TRACE_DEVICE(dev); > > TRACE_RESUME(0); > > > > - if (dev->power.syscore || dev->power.direct_complete) > > + if (dev->power.syscore) > > + goto Skip; > > This will enable runtime PM for a syscore device, even if it wasn't > disabled when device_suspend_late() wasn't called for it. Yeah, I need to reorder this after the power.is_late_suspended check. I'll send a v4 later today. Thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:45 ` [PATCH v3] " Rafael J. Wysocki 2025-11-18 12:57 ` Ulf Hansson @ 2025-11-18 14:16 ` Rafael J. Wysocki 2025-11-18 14:44 ` Ulf Hansson 1 sibling, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-18 14:16 UTC (permalink / raw) To: Ulf Hansson, Rose Wu, linux-pm Cc: rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Runtime PM should only be enabled in device_resume_early() if it has been disabled for the given device by device_suspend_late(). Otherwise, it may cause runtime PM callbacks to run prematurely in some cases which leads to further functional issues. Make two changes to address this problem. First, reorder device_suspend_late() to only disable runtime PM for a device when it is going to look for the device's callback or if the device is a "syscore" one. In all of the other cases, disabling runtime PM for the device is not in fact necessary. However, if the device's callback returns an error and the power.is_late_suspended flag is not going to be set, enable runtime PM so it only remains disabled when power.is_late_suspended is set. Second, make device_resume_early() only enable runtime PM for the devices with the power.is_late_suspended flag set. Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") Reported-by: Rose Wu <ya-jou.wu@mediatek.com> Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- v3 -> v4: * Reorder the syscore check in device_resume_early() after the is_late_suspended check (Ulf) * Update the changelog to reflect the above. v2 -> v3: * Also set is_late_suspended for syscore devices to avoid skipping runtime PM disabling for them (Ulf). * Update the changelog to reflect the above. v1 -> v2: * Add pm_runtime_enable() to device_suspend_late() error path (Rose). * Update the changelog to reflect the above. --- drivers/base/power/main.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -903,12 +903,15 @@ static void device_resume_early(struct d TRACE_DEVICE(dev); TRACE_RESUME(0); - if (dev->power.syscore || dev->power.direct_complete) + if (dev->power.direct_complete) goto Out; if (!dev->power.is_late_suspended) goto Out; + if (dev->power.syscore) + goto Skip; + if (!dpm_wait_for_superior(dev, async)) goto Out; @@ -941,11 +944,11 @@ Run: Skip: dev->power.is_late_suspended = false; + pm_runtime_enable(dev); Out: TRACE_RESUME(error); - pm_runtime_enable(dev); complete_all(&dev->power.completion); if (error) { @@ -1630,12 +1633,6 @@ static void device_suspend_late(struct d TRACE_DEVICE(dev); TRACE_SUSPEND(0); - /* - * Disable runtime PM for the device without checking if there is a - * pending resume request for it. - */ - __pm_runtime_disable(dev, false); - dpm_wait_for_subordinate(dev, async); if (READ_ONCE(async_error)) @@ -1646,9 +1643,18 @@ static void device_suspend_late(struct d goto Complete; } - if (dev->power.syscore || dev->power.direct_complete) + if (dev->power.direct_complete) goto Complete; + /* + * Disable runtime PM for the device without checking if there is a + * pending resume request for it. + */ + __pm_runtime_disable(dev, false); + + if (dev->power.syscore) + goto Skip; + if (dev->pm_domain) { info = "late power domain "; callback = pm_late_early_op(&dev->pm_domain->ops, state); @@ -1679,6 +1685,7 @@ Run: WRITE_ONCE(async_error, error); dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, async ? " async late" : " late", error); + pm_runtime_enable(dev); goto Complete; } dpm_propagate_wakeup_to_parent(dev); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 14:16 ` [PATCH v4] " Rafael J. Wysocki @ 2025-11-18 14:44 ` Ulf Hansson 0 siblings, 0 replies; 14+ messages in thread From: Ulf Hansson @ 2025-11-18 14:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rose Wu, linux-pm, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, 18 Nov 2025 at 15:16, Rafael J. Wysocki <rafael@kernel.org> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Runtime PM should only be enabled in device_resume_early() if it has > been disabled for the given device by device_suspend_late(). Otherwise, > it may cause runtime PM callbacks to run prematurely in some cases > which leads to further functional issues. > > Make two changes to address this problem. > > First, reorder device_suspend_late() to only disable runtime PM for a > device when it is going to look for the device's callback or if the > device is a "syscore" one. In all of the other cases, disabling runtime > PM for the device is not in fact necessary. However, if the device's > callback returns an error and the power.is_late_suspended flag is not > going to be set, enable runtime PM so it only remains disabled when > power.is_late_suspended is set. > > Second, make device_resume_early() only enable runtime PM for the > devices with the power.is_late_suspended flag set. > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Kind regards Uffe > --- > > v3 -> v4: > * Reorder the syscore check in device_resume_early() after the > is_late_suspended check (Ulf) > * Update the changelog to reflect the above. > > v2 -> v3: > * Also set is_late_suspended for syscore devices to avoid skipping runtime PM > disabling for them (Ulf). > * Update the changelog to reflect the above. > > v1 -> v2: > * Add pm_runtime_enable() to device_suspend_late() error path (Rose). > * Update the changelog to reflect the above. > > --- > drivers/base/power/main.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -903,12 +903,15 @@ static void device_resume_early(struct d > TRACE_DEVICE(dev); > TRACE_RESUME(0); > > - if (dev->power.syscore || dev->power.direct_complete) > + if (dev->power.direct_complete) > goto Out; > > if (!dev->power.is_late_suspended) > goto Out; > > + if (dev->power.syscore) > + goto Skip; > + > if (!dpm_wait_for_superior(dev, async)) > goto Out; > > @@ -941,11 +944,11 @@ Run: > > Skip: > dev->power.is_late_suspended = false; > + pm_runtime_enable(dev); > > Out: > TRACE_RESUME(error); > > - pm_runtime_enable(dev); > complete_all(&dev->power.completion); > > if (error) { > @@ -1630,12 +1633,6 @@ static void device_suspend_late(struct d > TRACE_DEVICE(dev); > TRACE_SUSPEND(0); > > - /* > - * Disable runtime PM for the device without checking if there is a > - * pending resume request for it. > - */ > - __pm_runtime_disable(dev, false); > - > dpm_wait_for_subordinate(dev, async); > > if (READ_ONCE(async_error)) > @@ -1646,9 +1643,18 @@ static void device_suspend_late(struct d > goto Complete; > } > > - if (dev->power.syscore || dev->power.direct_complete) > + if (dev->power.direct_complete) > goto Complete; > > + /* > + * Disable runtime PM for the device without checking if there is a > + * pending resume request for it. > + */ > + __pm_runtime_disable(dev, false); > + > + if (dev->power.syscore) > + goto Skip; > + > if (dev->pm_domain) { > info = "late power domain "; > callback = pm_late_early_op(&dev->pm_domain->ops, state); > @@ -1679,6 +1685,7 @@ Run: > WRITE_ONCE(async_error, error); > dpm_save_failed_dev(dev_name(dev)); > pm_dev_err(dev, state, async ? " async late" : " late", error); > + pm_runtime_enable(dev); > goto Complete; > } > dpm_propagate_wakeup_to_parent(dev); > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:26 ` Rafael J. Wysocki 2025-11-18 12:45 ` [PATCH v3] " Rafael J. Wysocki @ 2025-11-18 12:49 ` Ulf Hansson 2025-11-18 12:52 ` Rafael J. Wysocki 1 sibling, 1 reply; 14+ messages in thread From: Ulf Hansson @ 2025-11-18 12:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-pm, Rose Wu, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, 18 Nov 2025 at 13:26, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Nov 18, 2025 at 1:18 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > > > Hi, > > > > > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > > > > > Make two changes to address this problem. > > > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > > device if the power.is_late_suspended flag is going to be set for it. > > > > > In all of the other cases, disabling runtime PM for the device is not > > > > > in fact necessary. > > > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > > > > My concern is with the error path in device_suspend_late(). > > > > If a device fails its dpm_run_callback(), it appears that its > > > > power.is_late_suspended flag is not set, potentially leaving its runtime > > > > PM disabled during the resume sequence. > > > > > > Right, pm_runtime_enable() is missing in the error path after calling > > > dpm_run_callback(). > > > > > > --- > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Runtime PM should only be enabled in device_resume_early() if it has > > > been disabled for the given device by device_suspend_late(). Otherwise, > > > it may cause runtime PM callbacks to run prematurely in some cases > > > which leads to further functional issues. > > > > > > Make two changes to address this problem. > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > device when it is going to look for the device's callback. In all of > > > the other cases, disabling runtime PM for the device is not in fact > > > necessary. However, if the device's callback returns an error and the > > > power.is_late_suspended flag is not going to be set, enable runtime > > > PM so it only remains disabled when power.is_late_suspended is set. > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > devices with the power.is_late_suspended flag set. > > > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > > > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > > > > > --- > > > drivers/base/power/main.c | 15 ++++++++------- > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > --- a/drivers/base/power/main.c > > > +++ b/drivers/base/power/main.c > > > @@ -941,11 +941,11 @@ Run: > > > > > > Skip: > > > dev->power.is_late_suspended = false; > > > + pm_runtime_enable(dev); > > > > > > Out: > > > TRACE_RESUME(error); > > > > > > - pm_runtime_enable(dev); > > > complete_all(&dev->power.completion); > > > > > > if (error) { > > > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > > > TRACE_DEVICE(dev); > > > TRACE_SUSPEND(0); > > > > > > - /* > > > - * Disable runtime PM for the device without checking if there is a > > > - * pending resume request for it. > > > - */ > > > - __pm_runtime_disable(dev, false); > > > - > > > dpm_wait_for_subordinate(dev, async); > > > > > > if (READ_ONCE(async_error)) > > > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > > > if (dev->power.syscore || dev->power.direct_complete) > > > goto Complete; > > > > > > + /* > > > + * Disable runtime PM for the device without checking if there is a > > > + * pending resume request for it. > > > + */ > > > + __pm_runtime_disable(dev, false); > > > + > > > > Moving this here means we are going to keep runtime pm enabled for > > syscore devices during system wide suspend/resume. That's quite a > > change in behaviour for a fix for a regression, I think. Not saying > > that it can't work though. > > syscore devices can be a special case, but I thought it wouldn't be > necessary to special-case them. > > Do you actually know about any of them needing special casing? There are a couple of clocksource drivers, cpuidle-psci, cpuidle-riscv-sbi and a usb driver that marks their devices as syscore devices. It *probably* works to keep runtime pm enabled for all of them, but I am not sure. > > > Although, perhaps better to call __pm_runtime_disable() a few lines > > earlier and use a separate flag to track that we need to call > > pm_runtime_enable() in device_resume_early()? > > I don't think it is necessary or even useful to introduce new flags for this. Kind regards Uffe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] PM: sleep: core: Fix runtime PM enabling in device_resume_early() 2025-11-18 12:49 ` [PATCH v2] " Ulf Hansson @ 2025-11-18 12:52 ` Rafael J. Wysocki 0 siblings, 0 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2025-11-18 12:52 UTC (permalink / raw) To: Ulf Hansson Cc: Rafael J. Wysocki, linux-pm, Rose Wu, rafael.j.wysocki, regressions, linux-kernel, wsd_upstream, linux-mediatek, artis. chiu, Johnny-cc. Kao On Tue, Nov 18, 2025 at 1:49 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Tue, 18 Nov 2025 at 13:26, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tue, Nov 18, 2025 at 1:18 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > On Tue, 18 Nov 2025 at 12:48, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Tuesday, November 18, 2025 9:31:08 AM CET Rose Wu wrote: > > > > > Hi, > > > > > > > > > > On Mon, 2025-11-17 at 19:57 +0100, Rafael J. Wysocki wrote: > > > > > > > > > > > > Make two changes to address this problem. > > > > > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > > > device if the power.is_late_suspended flag is going to be set for it. > > > > > > In all of the other cases, disabling runtime PM for the device is not > > > > > > in fact necessary. > > > > > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > > > devices with the power.is_late_suspended flag set. > > > > > > > > > > > > > > > > My concern is with the error path in device_suspend_late(). > > > > > If a device fails its dpm_run_callback(), it appears that its > > > > > power.is_late_suspended flag is not set, potentially leaving its runtime > > > > > PM disabled during the resume sequence. > > > > > > > > Right, pm_runtime_enable() is missing in the error path after calling > > > > dpm_run_callback(). > > > > > > > > --- > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > Runtime PM should only be enabled in device_resume_early() if it has > > > > been disabled for the given device by device_suspend_late(). Otherwise, > > > > it may cause runtime PM callbacks to run prematurely in some cases > > > > which leads to further functional issues. > > > > > > > > Make two changes to address this problem. > > > > > > > > First, reorder device_suspend_late() to only disable runtime PM for a > > > > device when it is going to look for the device's callback. In all of > > > > the other cases, disabling runtime PM for the device is not in fact > > > > necessary. However, if the device's callback returns an error and the > > > > power.is_late_suspended flag is not going to be set, enable runtime > > > > PM so it only remains disabled when power.is_late_suspended is set. > > > > > > > > Second, make device_resume_early() only enable runtime PM for the > > > > devices with the power.is_late_suspended flag set. > > > > > > > > Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") > > > > Reported-by: Rose Wu <ya-jou.wu@mediatek.com> > > > > Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/ > > > > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > --- > > > > > > > > v1 -> v2: Add pm_runtime_enable() to device_suspend_late() error path (Rose). > > > > > > > > --- > > > > drivers/base/power/main.c | 15 ++++++++------- > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > --- a/drivers/base/power/main.c > > > > +++ b/drivers/base/power/main.c > > > > @@ -941,11 +941,11 @@ Run: > > > > > > > > Skip: > > > > dev->power.is_late_suspended = false; > > > > + pm_runtime_enable(dev); > > > > > > > > Out: > > > > TRACE_RESUME(error); > > > > > > > > - pm_runtime_enable(dev); > > > > complete_all(&dev->power.completion); > > > > > > > > if (error) { > > > > @@ -1630,12 +1630,6 @@ static void device_suspend_late(struct d > > > > TRACE_DEVICE(dev); > > > > TRACE_SUSPEND(0); > > > > > > > > - /* > > > > - * Disable runtime PM for the device without checking if there is a > > > > - * pending resume request for it. > > > > - */ > > > > - __pm_runtime_disable(dev, false); > > > > - > > > > dpm_wait_for_subordinate(dev, async); > > > > > > > > if (READ_ONCE(async_error)) > > > > @@ -1649,6 +1643,12 @@ static void device_suspend_late(struct d > > > > if (dev->power.syscore || dev->power.direct_complete) > > > > goto Complete; > > > > > > > > + /* > > > > + * Disable runtime PM for the device without checking if there is a > > > > + * pending resume request for it. > > > > + */ > > > > + __pm_runtime_disable(dev, false); > > > > + > > > > > > Moving this here means we are going to keep runtime pm enabled for > > > syscore devices during system wide suspend/resume. That's quite a > > > change in behaviour for a fix for a regression, I think. Not saying > > > that it can't work though. > > > > syscore devices can be a special case, but I thought it wouldn't be > > necessary to special-case them. > > > > Do you actually know about any of them needing special casing? > > There are a couple of clocksource drivers, cpuidle-psci, > cpuidle-riscv-sbi and a usb driver that marks their devices as syscore > devices. > > It *probably* works to keep runtime pm enabled for all of them, but I > am not sure. Well, all of the syscore devices with enabled runtime PM are potentially buggy. Anyway, please see the v3: https://lore.kernel.org/linux-pm/5941318.DvuYhMxLoT@rafael.j.wysocki/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-18 14:45 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-17 9:31 [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort Rose Wu 2025-11-17 16:59 ` Rafael J. Wysocki 2025-11-17 18:57 ` [PATCH v1] PM: sleep: core: Fix runtime PM enabling in device_resume_early() Rafael J. Wysocki 2025-11-18 8:31 ` Rose Wu 2025-11-18 11:48 ` [PATCH v2] " Rafael J. Wysocki 2025-11-18 12:17 ` Ulf Hansson 2025-11-18 12:26 ` Rafael J. Wysocki 2025-11-18 12:45 ` [PATCH v3] " Rafael J. Wysocki 2025-11-18 12:57 ` Ulf Hansson 2025-11-18 13:01 ` Rafael J. Wysocki 2025-11-18 14:16 ` [PATCH v4] " Rafael J. Wysocki 2025-11-18 14:44 ` Ulf Hansson 2025-11-18 12:49 ` [PATCH v2] " Ulf Hansson 2025-11-18 12:52 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).