From: tuhaowen <tuhaowen@uniontech.com>
To: saravanak@google.com
Cc: huangbibo@uniontech.com, len.brown@intel.com,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
pavel@kernel.org, rafael@kernel.org, tuhaowen@uniontech.com,
wusamuel@google.com
Subject: Re: [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation
Date: Tue, 9 Sep 2025 10:45:38 +0800 [thread overview]
Message-ID: <20250909024538.15946-1-tuhaowen@uniontech.com> (raw)
In-Reply-To: <CAGETcx_C_UcjjPOfUip=L28P3PWjMvmSc4nZJ5ML=tScUXfk2w@mail.gmail.com>
Hi Saravana,
Thank you for your detailed feedback and suggestions. I appreciate you
taking the time to review my approach. Let me address your concerns and
explain the real-world issues I'm trying to solve.
> This doesn't really fix our issue where we want to abort suspend only
> if we have to stay awake. If there's no need to abort the suspend (to
> deal with some user/wake source request), then it's okay to take the
> time to sync and then suspend. If you abort the suspend, it's just
> going to get attempted again and in fact waste power.
I understand your perspective, but I'm addressing a different class of
problems. The key issue is user experience and system reliability when
sync operations become indefinitely blocked.
> I also don't understand how your patch fixes anything. If anything
> this makes things worse because the user might have expected their
> device to have hibernated only to come back hours later to see their
> battery dead. And even if the user is actively monitoring it, you
> still need to sync the file system fully before you eventually
> suspend. So, this doesn't really save any time or the need to sync.
This highlights exactly the problem I'm trying to solve. When a user
initiates suspend/hibernation, the screen goes black immediately, giving
the impression that the system has successfully entered sleep state.
However, if ksys_sync() is blocked (which can happen in several scenarios
I'll describe below), the system appears to be suspended but is actually
still running at full power consumption in the background.
The user has no way to know the system is stuck in sync - they see a
black screen and assume suspension succeeded. This leads to exactly the
scenario you mentioned: coming back hours later to find the battery dead,
but with the added risk of data corruption since the sync never completed.
> Can you explain the actual real world issue you are trying to fix? If
> it's the UI hanging and not responding to keypress/mouse move, then to
> me it looks like those should be marked as wake sources.
Here are the specific real-world scenarios I've encountered:
1. **Device removal during file operations**: When copying large files
to USB drives and then initiating suspend, if the USB device is
removed during the sync process, some filesystems may not properly
handle the device disappearance. The sync can become indefinitely
blocked waiting for I/O operations on a device that no longer exists.
2. **Faulty storage devices**: Slow or failing storage devices can cause
sync operations to hang for extended periods, making the system appear
unresponsive.
I've observed stack traces like this when the block device is removed during sync:
```
[<0>] __switch_to+0xd0/0x168
[<0>] iterate_supers+0x88/0x118
[<0>] ksys_sync+0x48/0xb8
[<0>] ksys_sync_helper+0x18/0xa0
[<0>] pm_suspend+0x260/0x3e8
```
In this case, ksys_sync() is permanently blocked and will never complete,
even though the user believes the system has suspended.
My timeout approach provides several benefits:
1. **User feedback**: If I set a 1-minute timeout and sync doesn't
complete, the system fails suspend gracefully and returns control to
the user. This gives them clear indication that something went wrong,
rather than leaving them with a black screen and unknown system state.
2. **Prevents false assumptions**: Users won't mistakenly believe their
system is suspended when it's actually consuming full power.
3. **Allows recovery**: Users can investigate the issue, safely eject
devices, or take other corrective actions.
> However, if you are really set on proving the need for a timeout and
> implementing it, you can always add it as a separate patch after our
> patch lands. You can set up a timer to call pm_system_wakeup(). Or
> just grab a wake source after a time period.
I appreciate this suggestion, and I'm certainly willing to coordinate our
approaches. However, I believe the sync timeout addresses a fundamentally
different problem than wakeup-event-based abortion.
Regarding your concern about sync completion after timeout, I want to
clarify that in my updated implementation, I've removed kthread_stop() to
ensure the sync operation continues in the background even after timeout.
This means:
1. The suspend operation fails immediately with a timeout error, giving
the user feedback
2. The sync operation continues running in the background to completion
3. Data integrity is preserved while providing responsive user experience
This approach ensures that we get both the user experience benefits
(immediate feedback) and data safety (background sync completion).
I believe both our approaches serve different but complementary purposes:
- Your approach: Handle cases where wakeup events should abort sync
- My approach: Handle cases where sync becomes pathologically slow or
blocked
Would you be open to discussing how we might integrate both approaches?
I'm happy to work on this as a follow-up patch series after your changes
land, or we could explore a unified solution that handles both scenarios.
> In fact, thinking more about this, you are generally having a problem
> with suspend taking too long to complete. Doesn't matter if it's due
> to file system sync or anything else. In which case, you should just
> use a timer to call pm_system_wakeup() and not fix just file system
> sync (which happens to be the current long pole).
This is an interesting perspective. However, I believe filesystem sync
deserves special attention because:
1. It's currently the most common source of indefinite hangs during
suspend
2. The consequences of interrupted vs. timed-out sync are different
(data integrity)
3. Sync has specific requirements for background completion that other
suspend operations may not have
I'd be very interested in your thoughts on this approach and whether we
can find a path forward that addresses both our use cases. I really
appreciate your patience in this discussion and look forward to
continuing our collaboration on this important issue.
Thanks again for taking the time to review and provide such detailed
feedback.
Best regards,
Haowen Tu
next prev parent reply other threads:[~2025-09-09 2:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 2:46 Sync timeout mechanisms - Request for coordination tuhaowen
2025-09-09 2:00 ` Saravana Kannan
2025-09-09 2:45 ` tuhaowen [this message]
2025-09-09 6:18 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation Saravana Kannan
2025-09-09 7:13 ` Different approaches to sync timeout: Desktop vs Mobile use cases tuhaowen
2025-09-09 8:51 ` Oliver Neukum
-- strict thread matches above, loose matches on Subject: below --
2025-09-05 2:47 [PATCH] PM: Add configurable sync timeout during suspend and hibernate to prevent hang tuhaowen
2025-09-05 9:24 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
2025-09-05 19:27 ` Rafael J. Wysocki
2025-09-08 2:22 ` tuhaowen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250909024538.15946-1-tuhaowen@uniontech.com \
--to=tuhaowen@uniontech.com \
--cc=huangbibo@uniontech.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=pavel@kernel.org \
--cc=rafael@kernel.org \
--cc=saravanak@google.com \
--cc=wusamuel@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox