From: Askar Safin <safinaskar@gmail.com>
To: linux-pm@vger.kernel.org, "Rafael J. Wysocki" <rafael@kernel.org>,
Mario Limonciello <mario.limonciello@amd.com>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: pm: hibernation bug: wake up event during restore
Date: Sun, 21 Dec 2025 09:32:14 +0300 [thread overview]
Message-ID: <20251221063214.3276685-1-safinaskar@gmail.com> (raw)
Hi, PM people! I found (yet another) hibernation bug on my laptop:
sometimes wakeup events abort resuming from hibernation.
I think I know how to fix this. And I already started to write patch.
But I need your help to finish the patch.
The bug is already fixed by 2d967310c49e (i. e. by denylisting
VEN_0488:00@355), but this fix is unsatisfactory for reasons described
below.
====
So, I found a bug on my laptop. This is still the same laptop as with
my previous bug reports: Dell Precision 7780.
When I resume from hibernation, sometimes the resume doesn't work,
and I see this in my logs:
Dec 20 02:04:55 comp kernel: PM: Loading and decompressing image data (811211 pages)...
Dec 20 02:04:55 comp kernel: PM: Image loading progress: 0%
Dec 20 02:04:55 comp kernel: PM: Image loading progress: 10%
[...]
Dec 20 02:04:55 comp kernel: PM: Image loading progress: 100%
Dec 20 02:04:55 comp kernel: PM: Image loading done
Dec 20 02:04:55 comp kernel: PM: hibernation: Read 3244844 kbytes in 1.62 seconds (2002.99 MB/s)
Dec 20 02:04:55 comp kernel: PM: Image successfully loaded
Dec 20 02:04:55 comp kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Dec 20 02:04:55 comp kernel: ACPI: EC: interrupt blocked
Dec 20 02:04:55 comp kernel: ACPI: EC: event blocked
Dec 20 02:04:55 comp kernel: ACPI: EC: EC stopped
Dec 20 02:04:55 comp kernel: Disabling non-boot CPUs ...
Dec 20 02:04:55 comp kernel: smpboot: CPU 31 is now offline
Dec 20 02:04:55 comp kernel: smpboot: CPU 30 is now offline
[...]
Dec 20 02:04:55 comp kernel: smpboot: CPU 11 is now offline
Dec 20 02:04:55 comp kernel: Wakeup pending. Abort CPU freeze
Note that I don't touch anything during this. I don't touch mouse,
keyboard, power button, etc.
This is observed on Linux 6.18 from Debian.
This bug is very bad. It cancels resume, and thus I lose my hibernation
image with all unsaved data. I. e. the bug causes data loss.
This bug has the same cause as another similar bug I discovered recently:
i. e. wakeup event VEN_0488:00@355 , thus it is already fixed by mainline commit
2d967310c49e (authored by me).
But 2d967310c49e fixes this bug for my particular laptop only. But what if
I buy some another laptop in the future? Wakeup events should be simply
ignored completely during restore from hibernation. Because otherwise
we lose hibernation image, i. e. we get data loss!
Moreover, I want to be sure that resume is not cancelled even if I actually
actively touch mouse, keyboard, etc. Touching mouse should not cause data
loss!
Looking at the logs, I see that the cause is call to pm_wakeup_pending in
freeze_secondary_cpus . So, one option is simply to remove this call.
I already checked: in all non-resume code paths call to freeze_secondary_cpus is
always followed by call to syscore_suspend, which calls pm_wakeup_pending, too.
Thus removing pm_wakeup_pending call from freeze_secondary_cpus will not
break anything, i. e. responding to wakeup events in non-resume code
paths will continue to work.
The only problem is this: we will ignore wakeup events during freezing of
CPUs (in non-resume) code paths, and check wakeup events later instead.
But I think this is okay, because freezing of CPUs should not take lot of time.
On my computer it takes less than millisecond.
But, as I said above, I want to fix this problem not only on my laptop,
but on all others, too. What if other calls to pm_wakeup_pending are
problematic, too? So, we should make sure that either pm_wakeup_pending is
not called during resume, either pm_wakeup_pending always returns false
during resume.
====
I see 3 ways to achieve this.
Way #1. Audit all calls to pm_wakeup_pending and make sure that it is
never called during resume.
For call in freeze_secondary_cpus we can simply remove it, as explained above:
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1919,12 +1919,6 @@ int freeze_secondary_cpus(int primary)
if (!cpu_online(cpu) || cpu == primary)
continue;
- if (pm_wakeup_pending()) {
- pr_info("Wakeup pending. Abort CPU freeze\n");
- error = -EBUSY;
- break;
- }
-
trace_suspend_resume(TPS("CPU_OFF"), cpu, true);
error = _cpu_down(cpu, 1, CPUHP_OFFLINE);
trace_suspend_resume(TPS("CPU_OFF"), cpu, false);
Then we have call in device_suspend_late. device_suspend_late is called
both in resume and non-resume code paths. So, we can do something like this:
--- i/drivers/base/power/main.c
+++ w/drivers/base/power/main.c
@@ -1638,7 +1638,7 @@ static void device_suspend_late(struct device *dev, pm_message_t state, bool asy
if (READ_ONCE(async_error))
goto Complete;
- if (pm_wakeup_pending()) {
+ if (!(state.event & PM_EVENT_QUIESCE) && pm_wakeup_pending()) {
WRITE_ONCE(async_error, -EBUSY);
goto Complete;
}
There are other calls, all them should be dealt with, too.
Way #2. Ensure that pm_transition is set correctly in resume code path and
check it in pm_wakeup_pending. (Not good idea, because pm_transition is
file-private, and pm_wakeup_pending is defined in other file.)
Way #3. Ensure that events_check_enabled is false in resume code path and
make sure that pm_wakeup_pending is always false if events_check_enabled is
false:
--- i/drivers/base/power/wakeup.c
+++ w/drivers/base/power/wakeup.c
@@ -890,7 +890,10 @@ bool pm_wakeup_pending(void)
pm_print_active_wakeup_sources();
}
- return ret || atomic_read(&pm_abort_suspend) > 0;
+ if (events_check_enabled && atomic_read(&pm_abort_suspend) > 0)
+ return true;
+
+ return ret;
}
EXPORT_SYMBOL_GPL(pm_wakeup_pending);
====
What should I choose?
Please, fix the bug. Or help me to do this.
--
Askar Safin
reply other threads:[~2025-12-21 6:33 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251221063214.3276685-1-safinaskar@gmail.com \
--to=safinaskar@gmail.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=linux-pm@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=rafael@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).