linux-mediatek.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort
@ 2025-11-17  9:31 Rose Wu
  2025-11-17 16:59 ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Rose Wu @ 2025-11-17  9:31 UTC (permalink / raw)
  To: rafael.j.wysocki, linux-pm, regressions
  Cc: saravanak, len.brown, pavel, linux-kernel, wsd_upstream,
	linux-mediatek, 士顏 邱,
	靖智 高

Hi Rafael and All,

I am reporting a regression introduced by the commit
443046d1ad66607f324c604b9fbdf11266fa8aad (PM: sleep: Make suspend of
devices more asynchronous), which can lead to a kernel panic (data
abort) if a late suspend aborts.
The commit modifies list handling during suspend. When a device suspend
aborts at the "late" stage, `dpm_suspended_list` is spliced into
`dpm_late_early_list`.
This creates an imbalance. Devices on this list that had not yet
executed `pm_runtime_disable()` in `device_suspend_late()` are now
incorrectly subjected to `pm_runtime_enable()` during the subsequent
`device_resume_early()` sequence.

This causes two issues:

1. Numerous error messages in dmesg: "Attempt to enable runtime PM when
it is blocked."
2. A critical failure for simple-bus devices: When
`simple_pm_bus_runtime_resume()` is called for a device whose bus is
`NULL`, the kernel attempts to access the null bus struct, triggering a
data abort.

Steps to Reproduce:

The issue can be reliably reproduced by forcing a late suspend to
abort.

1. Apply the following modification to the `device_suspend_late()`
function to simulate a wakeup event:
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1568,7 +1568,7 @@ static int device_suspend_late(struct device
*dev, pm_message_t state, bool asyn
 	if (async_error)
 		goto Complete;
 
-	if (pm_wakeup_pending()) {
+	if (1) { /* Force abort for testing */
 		async_error = -EBUSY;
 		goto Complete;
 	}
2. Trigger a system suspend.
3. The system will attempt to suspend, abort at the late stage, and
then trigger the data abort during the resume sequence.

Call Trace:

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
pc : [0xffffffe3988e81e4] simple_pm_bus_runtime_resume+0x1c/0x90
lr : [0xffffffe398a848d0] pm_generic_runtime_resume+0x40/0x58

As a potential fix, I am wondering if a conditional check is needed in
`device_resume_early()` before invoking `pm_runtime_enable()` for a
device?

Best Regards,
Rose

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-11-18 14:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-17  9:31 [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort Rose Wu
2025-11-17 16:59 ` Rafael J. Wysocki
2025-11-17 18:57   ` [PATCH v1] PM: sleep: core: Fix runtime PM enabling in device_resume_early() Rafael J. Wysocki
2025-11-18  8:31     ` Rose Wu
2025-11-18 11:48       ` [PATCH v2] " Rafael J. Wysocki
2025-11-18 12:17         ` Ulf Hansson
2025-11-18 12:26           ` Rafael J. Wysocki
2025-11-18 12:45             ` [PATCH v3] " Rafael J. Wysocki
2025-11-18 12:57               ` Ulf Hansson
2025-11-18 13:01                 ` Rafael J. Wysocki
2025-11-18 14:16               ` [PATCH v4] " Rafael J. Wysocki
2025-11-18 14:44                 ` Ulf Hansson
2025-11-18 12:49             ` [PATCH v2] " Ulf Hansson
2025-11-18 12:52               ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).