* Sync timeout mechanisms - Request for coordination
@ 2025-09-08 2:46 tuhaowen
2025-09-09 2:00 ` Saravana Kannan
0 siblings, 1 reply; 9+ messages in thread
From: tuhaowen @ 2025-09-08 2:46 UTC (permalink / raw)
To: wusamuel, saravanak
Cc: rafael, len.brown, pavel, linux-pm, linux-kernel, huangbibo
Hi Samuel and Saravana,
I hope this email finds you well. I'm reaching out regarding the sync
timeout mechanisms for suspend/hibernation that we've both been working on.
Rafael from the upstream kernel has indicated that he would prefer us to
coordinate our approaches rather than having two separate implementations.
He mentioned your patch series "PM: Support aborting suspend during
filesystem sync" and suggested we work together on a unified solution.
I would like to discuss how we can merge our approaches. Below is a
summary of my current implementation:
**My approach - Time-based timeout mechanism:**
- Introduces a configurable timeout for sync operations during both
suspend and hibernation
- Uses kthread with wait_for_completion_timeout() to implement timeout
- Provides sysfs interface /sys/power/sleep_sync_timeout for runtime
configuration
- Default behavior unchanged (timeout disabled by default)
- Addresses scenarios where sync is excessively slow without wakeup events
You can see the detailed implementation and Rafael's feedback at:
https://lore.kernel.org/linux-pm/CAJZ5v0jBRy=CvZiWQQaorvc-zT+kkaB6+S2TErGmkaPAGmHLOQ@mail.gmail.com/
**Key differences I see between our approaches:**
1. Your solution focuses on aborting sync when wakeup events occur
2. My solution addresses cases where there are no wakeup events but sync
is excessively slow (e.g., slow/faulty storage)
3. Your approach uses workqueue + completion, mine uses kthread + timeout
4. Both aim to prevent indefinite hangs but cover different scenarios
**Potential unified approach:**
I believe both mechanisms could complement each other:
- Event-driven abort (your approach) for responsive wakeup handling
- Time-based timeout (my approach) for proactive protection against
slow storage
- Single, unified implementation in kernel/power/main.c
Would you be interested in discussing how we can combine these approaches?
I'm open to:
1. Merging the implementations into a single solution
2. Adopting your workqueue approach with added timeout functionality
3. Any other technical approach you think would work better
Looking forward to your thoughts and collaboration.
Best regards,
Haowen Tu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sync timeout mechanisms - Request for coordination
2025-09-08 2:46 Sync timeout mechanisms - Request for coordination tuhaowen
@ 2025-09-09 2:00 ` Saravana Kannan
2025-09-09 2:45 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2025-09-09 2:00 UTC (permalink / raw)
To: tuhaowen
Cc: wusamuel, rafael, len.brown, pavel, linux-pm, linux-kernel,
huangbibo
Hi Haowen,
On Sun, Sep 7, 2025 at 7:47 PM tuhaowen <tuhaowen@uniontech.com> wrote:
>
> Hi Samuel and Saravana,
>
> I hope this email finds you well. I'm reaching out regarding the sync
> timeout mechanisms for suspend/hibernation that we've both been working on.
I hope you are well too and thanks for reaching out to us.
> Rafael from the upstream kernel has indicated that he would prefer us to
> coordinate our approaches rather than having two separate implementations.
> He mentioned your patch series "PM: Support aborting suspend during
> filesystem sync" and suggested we work together on a unified solution.
>
> I would like to discuss how we can merge our approaches. Below is a
> summary of my current implementation:
>
> **My approach - Time-based timeout mechanism:**
> - Introduces a configurable timeout for sync operations during both
> suspend and hibernation
> - Uses kthread with wait_for_completion_timeout() to implement timeout
> - Provides sysfs interface /sys/power/sleep_sync_timeout for runtime
> configuration
> - Default behavior unchanged (timeout disabled by default)
> - Addresses scenarios where sync is excessively slow without wakeup events
This doesn't really fix our issue where we want to abort suspend only
if we have to stay awake. If there's no need to abort the suspend (to
deal with some user/wake source request), then it's okay to take the
time to sync and then suspend. If you abort the suspend, it's just
going to get attempted again and in fact waste power.
I also don't understand how your patch fixes anything. If anything
this makes things worse because the user might have expected their
device to have hibernated only to come back hours later to see their
battery dead. And even if the user is actively monitoring it, you
still need to sync the file system fully before you eventually
suspend. So, this doesn't really save any time or the need to sync.
>
> You can see the detailed implementation and Rafael's feedback at:
> https://lore.kernel.org/linux-pm/CAJZ5v0jBRy=CvZiWQQaorvc-zT+kkaB6+S2TErGmkaPAGmHLOQ@mail.gmail.com/
>
> **Key differences I see between our approaches:**
> 1. Your solution focuses on aborting sync when wakeup events occur
> 2. My solution addresses cases where there are no wakeup events but sync
> is excessively slow (e.g., slow/faulty storage)
> 3. Your approach uses workqueue + completion, mine uses kthread + timeout
I don't think the workqueue vs kthread matters -- trivial
implementation detail. The important point is when we want to abort.
> 4. Both aim to prevent indefinite hangs but cover different scenarios
I don't see the point of your change though. Why abort a suspend if
there's no need to wake up? I think whatever use case issue you are
hitting, it's more of an issue of not grabbing a wake source when it's
needed.
Can you explain the actual real world issue you are trying to fix? If
it's the UI hanging and not responding to keypress/mouse move, then to
me it looks like those should be marked as wake sources.
>
> **Potential unified approach:**
> I believe both mechanisms could complement each other:
> - Event-driven abort (your approach) for responsive wakeup handling
> - Time-based timeout (my approach) for proactive protection against
> slow storage
> - Single, unified implementation in kernel/power/main.c
>
> Would you be interested in discussing how we can combine these approaches?
> I'm open to:
> 1. Merging the implementations into a single solution
> 2. Adopting your workqueue approach with added timeout functionality
> 3. Any other technical approach you think would work better
I'm not convinced adding a timeout is the right solution. It just adds
another point of failure and the need for a retry mechanism.
However, if you are really set on proving the need for a timeout and
implementing it, you can always add it as a separate patch after our
patch lands. You can set up a timer to call pm_system_wakeup(). Or
just grab a wake source after a time period.
In fact, thinking more about this, you are generally having a problem
with suspend taking too long to complete. Doesn't matter if it's due
to file system sync or anything else. In which case, you should just
use a timer to call pm_system_wakeup() and not fix just file system
sync (which happens to be the current long pole).
> Looking forward to your thoughts and collaboration.
Hope my response helps.
-Saravana
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation
2025-09-09 2:00 ` Saravana Kannan
@ 2025-09-09 2:45 ` tuhaowen
2025-09-09 6:18 ` Saravana Kannan
0 siblings, 1 reply; 9+ messages in thread
From: tuhaowen @ 2025-09-09 2:45 UTC (permalink / raw)
To: saravanak
Cc: huangbibo, len.brown, linux-kernel, linux-pm, pavel, rafael,
tuhaowen, wusamuel
Hi Saravana,
Thank you for your detailed feedback and suggestions. I appreciate you
taking the time to review my approach. Let me address your concerns and
explain the real-world issues I'm trying to solve.
> This doesn't really fix our issue where we want to abort suspend only
> if we have to stay awake. If there's no need to abort the suspend (to
> deal with some user/wake source request), then it's okay to take the
> time to sync and then suspend. If you abort the suspend, it's just
> going to get attempted again and in fact waste power.
I understand your perspective, but I'm addressing a different class of
problems. The key issue is user experience and system reliability when
sync operations become indefinitely blocked.
> I also don't understand how your patch fixes anything. If anything
> this makes things worse because the user might have expected their
> device to have hibernated only to come back hours later to see their
> battery dead. And even if the user is actively monitoring it, you
> still need to sync the file system fully before you eventually
> suspend. So, this doesn't really save any time or the need to sync.
This highlights exactly the problem I'm trying to solve. When a user
initiates suspend/hibernation, the screen goes black immediately, giving
the impression that the system has successfully entered sleep state.
However, if ksys_sync() is blocked (which can happen in several scenarios
I'll describe below), the system appears to be suspended but is actually
still running at full power consumption in the background.
The user has no way to know the system is stuck in sync - they see a
black screen and assume suspension succeeded. This leads to exactly the
scenario you mentioned: coming back hours later to find the battery dead,
but with the added risk of data corruption since the sync never completed.
> Can you explain the actual real world issue you are trying to fix? If
> it's the UI hanging and not responding to keypress/mouse move, then to
> me it looks like those should be marked as wake sources.
Here are the specific real-world scenarios I've encountered:
1. **Device removal during file operations**: When copying large files
to USB drives and then initiating suspend, if the USB device is
removed during the sync process, some filesystems may not properly
handle the device disappearance. The sync can become indefinitely
blocked waiting for I/O operations on a device that no longer exists.
2. **Faulty storage devices**: Slow or failing storage devices can cause
sync operations to hang for extended periods, making the system appear
unresponsive.
I've observed stack traces like this when the block device is removed during sync:
```
[<0>] __switch_to+0xd0/0x168
[<0>] iterate_supers+0x88/0x118
[<0>] ksys_sync+0x48/0xb8
[<0>] ksys_sync_helper+0x18/0xa0
[<0>] pm_suspend+0x260/0x3e8
```
In this case, ksys_sync() is permanently blocked and will never complete,
even though the user believes the system has suspended.
My timeout approach provides several benefits:
1. **User feedback**: If I set a 1-minute timeout and sync doesn't
complete, the system fails suspend gracefully and returns control to
the user. This gives them clear indication that something went wrong,
rather than leaving them with a black screen and unknown system state.
2. **Prevents false assumptions**: Users won't mistakenly believe their
system is suspended when it's actually consuming full power.
3. **Allows recovery**: Users can investigate the issue, safely eject
devices, or take other corrective actions.
> However, if you are really set on proving the need for a timeout and
> implementing it, you can always add it as a separate patch after our
> patch lands. You can set up a timer to call pm_system_wakeup(). Or
> just grab a wake source after a time period.
I appreciate this suggestion, and I'm certainly willing to coordinate our
approaches. However, I believe the sync timeout addresses a fundamentally
different problem than wakeup-event-based abortion.
Regarding your concern about sync completion after timeout, I want to
clarify that in my updated implementation, I've removed kthread_stop() to
ensure the sync operation continues in the background even after timeout.
This means:
1. The suspend operation fails immediately with a timeout error, giving
the user feedback
2. The sync operation continues running in the background to completion
3. Data integrity is preserved while providing responsive user experience
This approach ensures that we get both the user experience benefits
(immediate feedback) and data safety (background sync completion).
I believe both our approaches serve different but complementary purposes:
- Your approach: Handle cases where wakeup events should abort sync
- My approach: Handle cases where sync becomes pathologically slow or
blocked
Would you be open to discussing how we might integrate both approaches?
I'm happy to work on this as a follow-up patch series after your changes
land, or we could explore a unified solution that handles both scenarios.
> In fact, thinking more about this, you are generally having a problem
> with suspend taking too long to complete. Doesn't matter if it's due
> to file system sync or anything else. In which case, you should just
> use a timer to call pm_system_wakeup() and not fix just file system
> sync (which happens to be the current long pole).
This is an interesting perspective. However, I believe filesystem sync
deserves special attention because:
1. It's currently the most common source of indefinite hangs during
suspend
2. The consequences of interrupted vs. timed-out sync are different
(data integrity)
3. Sync has specific requirements for background completion that other
suspend operations may not have
I'd be very interested in your thoughts on this approach and whether we
can find a path forward that addresses both our use cases. I really
appreciate your patience in this discussion and look forward to
continuing our collaboration on this important issue.
Thanks again for taking the time to review and provide such detailed
feedback.
Best regards,
Haowen Tu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation
2025-09-09 2:45 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
@ 2025-09-09 6:18 ` Saravana Kannan
2025-09-09 7:13 ` Different approaches to sync timeout: Desktop vs Mobile use cases tuhaowen
0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2025-09-09 6:18 UTC (permalink / raw)
To: tuhaowen
Cc: huangbibo, len.brown, linux-kernel, linux-pm, pavel, rafael,
wusamuel
On Mon, Sep 8, 2025 at 7:46 PM tuhaowen <tuhaowen@uniontech.com> wrote:
>
> Hi Saravana,
>
> Thank you for your detailed feedback and suggestions. I appreciate you
> taking the time to review my approach. Let me address your concerns and
> explain the real-world issues I'm trying to solve.
>
> > This doesn't really fix our issue where we want to abort suspend only
> > if we have to stay awake. If there's no need to abort the suspend (to
> > deal with some user/wake source request), then it's okay to take the
> > time to sync and then suspend. If you abort the suspend, it's just
> > going to get attempted again and in fact waste power.
>
> I understand your perspective, but I'm addressing a different class of
> problems. The key issue is user experience and system reliability when
> sync operations become indefinitely blocked.
>
> > I also don't understand how your patch fixes anything. If anything
> > this makes things worse because the user might have expected their
> > device to have hibernated only to come back hours later to see their
> > battery dead. And even if the user is actively monitoring it, you
> > still need to sync the file system fully before you eventually
> > suspend. So, this doesn't really save any time or the need to sync.
>
> This highlights exactly the problem I'm trying to solve. When a user
> initiates suspend/hibernation, the screen goes black immediately, giving
> the impression that the system has successfully entered sleep state.
> However, if ksys_sync() is blocked (which can happen in several scenarios
> I'll describe below), the system appears to be suspended but is actually
> still running at full power consumption in the background.
>
> The user has no way to know the system is stuck in sync - they see a
> black screen and assume suspension succeeded. This leads to exactly the
> scenario you mentioned: coming back hours later to find the battery dead,
> but with the added risk of data corruption since the sync never completed.
>
> > Can you explain the actual real world issue you are trying to fix? If
> > it's the UI hanging and not responding to keypress/mouse move, then to
> > me it looks like those should be marked as wake sources.
>
> Here are the specific real-world scenarios I've encountered:
>
> 1. **Device removal during file operations**: When copying large files
> to USB drives and then initiating suspend, if the USB device is
> removed during the sync process, some filesystems may not properly
> handle the device disappearance. The sync can become indefinitely
> blocked waiting for I/O operations on a device that no longer exists.
>
> 2. **Faulty storage devices**: Slow or failing storage devices can cause
> sync operations to hang for extended periods, making the system appear
> unresponsive.
>
> I've observed stack traces like this when the block device is removed during sync:
>
> ```
> [<0>] __switch_to+0xd0/0x168
> [<0>] iterate_supers+0x88/0x118
> [<0>] ksys_sync+0x48/0xb8
> [<0>] ksys_sync_helper+0x18/0xa0
> [<0>] pm_suspend+0x260/0x3e8
> ```
This seems like a fundamental sync issue you are trying to paper
over/work around. I think the right fix should be in the ksys_sync()
code.
> In this case, ksys_sync() is permanently blocked and will never complete,
> even though the user believes the system has suspended.
>
> My timeout approach provides several benefits:
>
> 1. **User feedback**: If I set a 1-minute timeout and sync doesn't
> complete, the system fails suspend gracefully and returns control to
> the user. This gives them clear indication that something went wrong,
> rather than leaving them with a black screen and unknown system state.
>
> 2. **Prevents false assumptions**: Users won't mistakenly believe their
> system is suspended when it's actually consuming full power.
>
> 3. **Allows recovery**: Users can investigate the issue, safely eject
> devices, or take other corrective actions.
If the user has closed the laptop, none of these will be noticed by
the user. And the laptop will still consume power since it's not
suspended and cause a dead battery hours later.
> > However, if you are really set on proving the need for a timeout and
> > implementing it, you can always add it as a separate patch after our
> > patch lands. You can set up a timer to call pm_system_wakeup(). Or
> > just grab a wake source after a time period.
>
> I appreciate this suggestion, and I'm certainly willing to coordinate our
> approaches. However, I believe the sync timeout addresses a fundamentally
> different problem than wakeup-event-based abortion.
>
> Regarding your concern about sync completion after timeout, I want to
> clarify that in my updated implementation, I've removed kthread_stop() to
> ensure the sync operation continues in the background even after timeout.
I don't think the kthread_stop() is even correct to do. You are just
randomly killing the sync thread where data structures and locks could
be left in a bad state.
> This means:
>
> 1. The suspend operation fails immediately with a timeout error, giving
> the user feedback
> 2. The sync operation continues running in the background to completion
> 3. Data integrity is preserved while providing responsive user experience
>
> This approach ensures that we get both the user experience benefits
> (immediate feedback) and data safety (background sync completion).
>
> I believe both our approaches serve different but complementary purposes:
> - Your approach: Handle cases where wakeup events should abort sync
> - My approach: Handle cases where sync becomes pathologically slow or
> blocked
I'm still not sold on the timeout being the right solution to your
issue, but I'm not going to block the feature.
> Would you be open to discussing how we might integrate both approaches?
> I'm happy to work on this as a follow-up patch series after your changes
> land, or we could explore a unified solution that handles both scenarios.
I think a follow up patch is the best approach. We'll give you the
APIs to abort as needed (timer in your case) and you can try to
convince the community that the timeout is needed.
>
> > In fact, thinking more about this, you are generally having a problem
> > with suspend taking too long to complete. Doesn't matter if it's due
> > to file system sync or anything else. In which case, you should just
> > use a timer to call pm_system_wakeup() and not fix just file system
> > sync (which happens to be the current long pole).
>
> This is an interesting perspective. However, I believe filesystem sync
> deserves special attention because:
>
> 1. It's currently the most common source of indefinite hangs during
> suspend
This is not a reason for special casing for the file system.
> 2. The consequences of interrupted vs. timed-out sync are different
> (data integrity)
Not really. You do realize that even in our patch we let the sync
continue in the background, right?
> 3. Sync has specific requirements for background completion that other
> suspend operations may not have
Not sure what you mean.
> I'd be very interested in your thoughts on this approach and whether we
> can find a path forward that addresses both our use cases. I really
> appreciate your patience in this discussion and look forward to
> continuing our collaboration on this important issue.
I don't want to block your effort, but I also can't support it. So a
unified patch doesn't seem like the right way to go. As I mentioned
above, I'd recommend building on top of what we land.
> Thanks again for taking the time to review and provide such detailed
> feedback.
Same to you!
Thanks,
Saravana
^ permalink raw reply [flat|nested] 9+ messages in thread
* Different approaches to sync timeout: Desktop vs Mobile use cases
2025-09-09 6:18 ` Saravana Kannan
@ 2025-09-09 7:13 ` tuhaowen
2025-09-09 8:51 ` Oliver Neukum
0 siblings, 1 reply; 9+ messages in thread
From: tuhaowen @ 2025-09-09 7:13 UTC (permalink / raw)
To: pavel, rafael
Cc: saravanak, huangbibo, len.brown, linux-kernel, linux-pm, tuhaowen,
wusamuel
Hi Rafael and Linux PM maintainers,
I hope this email finds you well. I'm writing to discuss the different
approaches that have emerged regarding sync timeout mechanisms during
suspend/hibernation, and to seek guidance on how to proceed.
## Background
We (UnionTech) and the Google Android team (Saravana, Samuel) have been
working on similar but fundamentally different solutions to address sync
hangs during suspend operations. After discussions with Saravana, it has
become clear that our approaches diverge significantly due to different
use cases and environments.
## Our Approach - Desktop/PC Focus
Our patch: https://lore.kernel.org/all/20250909065836.32534-1-tuhaowen@uniontech.com/
**Use Case**: Desktop/laptop systems where users expect immediate feedback
**Problem**: When large file operations are in progress (e.g., copying
files to USB drives) and users initiate suspend, ksys_sync() can hang
indefinitely, especially when block devices are removed during the process.
This results in:
- Black screen with no user feedback
- Users assuming the system has frozen
- Need for forced power cycles
- Poor user experience
**Solution**: Configurable sync timeout (default disabled for compatibility)
- If sync exceeds timeout, suspend fails gracefully with user feedback
- Sync continues in background after system resumes
- User can take corrective action (eject devices, etc.)
- Prevents false impression of system freeze
## Google's Approach - Mobile/Android Focus
Their patch: https://lore.kernel.org/all/CAGETcx8x5G75sQ9zVkxe+BpK7WsEk+LS6KDMd5BBR=xWPMtevg@mail.gmail.com/
**Use Case**: Mobile devices where power consumption is critical
**Problem**: Unnecessary wake-ups during sync operations
**Solution**: Abort sync only when wakeup events occur
- Wait indefinitely for sync if no wakeup sources are active
- Prioritize power efficiency over user responsiveness
- Suitable for devices where black screen is expected behavior
## Key Differences
1. **User Expectations**:
- Desktop: Users expect responsive feedback and ability to recover
- Mobile: Users expect device to "just work" when they next unlock
2. **Hardware Context**:
- Desktop: Removable storage, active file operations, power adapter available
- Mobile: Integrated storage, controlled app lifecycle, battery dependent
3. **Problem Focus**:
- Desktop: Prevent indefinite hangs and provide user feedback
- Mobile: Prevent unnecessary power consumption
4. **Timeout Philosophy**:
- Desktop: Proactive timeout to maintain responsiveness
- Mobile: Reactive abort only when wakeup events require it
## Technical Considerations
Both solutions address legitimate but different problems:
- **Desktop scenario**: User copying large files, removes USB drive, initiates
suspend -> ksys_sync() hangs indefinitely -> black screen -> user thinks
system froze
- **Mobile scenario**: Background sync during suspend -> unnecessary wake-ups
-> battery drain
## Request for Guidance
Given these different environments and requirements, we believe both
approaches have merit for their respective use cases. However, we want
to ensure we follow the kernel community's preferred approach for handling
such divergent requirements.
Would you prefer:
1. **Unified solution**: Combine both approaches in a single patchset
(more complex but comprehensive)
2. **Separate solutions**: Allow both patches to coexist, targeting
different use cases
3. **Staged approach**: Land one solution first, then build upon it
(Saravana suggested this approach)
4. **Different direction**: An alternative approach we haven't considered
## Our Position
We believe the desktop use case requires proactive timeout mechanisms
because:
- Users directly interact with the system and expect feedback
- Removable storage creates unique failure scenarios
- The cost of a failed suspend attempt is much lower than an unresponsive system
- Data integrity is preserved through background sync continuation
We're happy to collaborate on a unified solution, but we also recognize
that desktop and mobile environments may legitimately require different
approaches to the same underlying kernel mechanisms.
Thank you for your time and guidance on this matter. We look forward to
your feedback on how to best serve both desktop and mobile use cases
within the kernel's power management subsystem.
Best regards,
Haowen Tu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Different approaches to sync timeout: Desktop vs Mobile use cases
2025-09-09 7:13 ` Different approaches to sync timeout: Desktop vs Mobile use cases tuhaowen
@ 2025-09-09 8:51 ` Oliver Neukum
0 siblings, 0 replies; 9+ messages in thread
From: Oliver Neukum @ 2025-09-09 8:51 UTC (permalink / raw)
To: tuhaowen, pavel, rafael
Cc: saravanak, huangbibo, len.brown, linux-kernel, linux-pm, wusamuel
Hi,
On 09.09.25 09:13, tuhaowen wrote:
> **Use Case**: Desktop/laptop systems where users expect immediate feedback
> **Problem**: When large file operations are in progress (e.g., copying
> files to USB drives) and users initiate suspend, ksys_sync() can hang
It seems to me like both your approaches seek to mitigate
the consequences of the issue rather than solve it.
Your issue is that you have too many dirty blocks associated
with a slow device.
In other words, you should sync and need to limit the number
of dirty blocks created after you've started syncing without
freezing all of user space.
Regards
Oliver
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] PM: Add configurable sync timeout during suspend and hibernate to prevent hang
@ 2025-09-05 2:47 tuhaowen
2025-09-05 9:24 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
0 siblings, 1 reply; 9+ messages in thread
From: tuhaowen @ 2025-09-05 2:47 UTC (permalink / raw)
To: Rafael J . Wysocki, Len Brown, Pavel Machek
Cc: linux-pm, linux-kernel, tuhaowen, huangbibo
When users initiate suspend (S3) or hibernate (S4) while large file
copy operations are in progress (especially to USB storage devices),
the system can appear to hang with a black screen for an extended period.
This occurs because the filesystem sync in the suspend/hibernate path
blocks until all pending I/O operations complete,
which can take several minutes for large file transfers or slow devices.
This patch introduces an optional, configurable timeout mechanism for
the sync operation during both suspend and hibernate.
If the timeout is reached, the operation is aborted,
and a clear error message is provided to the user, improving user
experience by preventing indefinite system hangs.
The timeout is disabled by default for both suspend and hibernate
to maintain backward compatibility. If enabled, the default
timeout value is 10000 ms (10 seconds), which can be changed
at runtime via module parameters:
- sync_on_suspend_timeout_enable (bool, default: false)
- sync_on_suspend_timeout_ms (uint, default: 10000)
- sync_on_hibernation_timeout_enable (bool, default: false)
- sync_on_hibernation_timeout_ms (uint, default: 10000)
Compared to [PATCH v3 0/3] PM: Support aborting suspend during
filesystem sync (see: https://lore.kernel.org/linux-pm/20250821004237.
2712312-1-wusamuel@google.com/), this patch addresses scenarios where
there may be no wakeup event, but the sync operation is excessively
slow (e.g., due to slow or faulty storage). By introducing a configurable
timeout, it proactively prevents indefinite hangs and improves user
experience in a wider range of real-world cases. The implementation
is also simpler and gives users or system integrators more flexibility
to tune behavior for different devices and requirements.
Signed-off-by: tuhaowen <tuhaowen@uniontech.com>
---
kernel/power/hibernate.c | 55 ++++++++++++++++++++++++++++++++++++++-
kernel/power/suspend.c | 56 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 23c0f4e6c..2c173a0e5 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -33,10 +33,27 @@
#include <linux/ktime.h>
#include <linux/security.h>
#include <linux/secretmem.h>
+#include <linux/moduleparam.h>
+#include <linux/completion.h>
+#include <linux/kthread.h>
+#include <linux/jiffies.h>
#include <trace/events/power.h>
#include "power.h"
+/* Sync timeout parameters for hibernation */
+static bool sync_on_hibernation_timeout_enable;
+module_param(sync_on_hibernation_timeout_enable, bool, 0644);
+MODULE_PARM_DESC(sync_on_hibernation_timeout_enable, "Enable sync timeout during hibernation (default: false)");
+
+static unsigned int sync_on_hibernation_timeout_ms = 10000;
+module_param(sync_on_hibernation_timeout_ms, uint, 0644);
+MODULE_PARM_DESC(sync_on_hibernation_timeout_ms, "Sync timeout in milliseconds during hibernation (default: 10000)");
+
+/* Sync timeout implementation for hibernation */
+static struct completion hibernation_sync_completion;
+static struct task_struct *hibernation_sync_task;
+
static int nocompress;
static int noresume;
@@ -97,6 +114,40 @@ bool hibernation_available(void)
!secretmem_active() && !cxl_mem_active();
}
+static int hibernation_sync_thread_func(void *data)
+{
+ ksys_sync_helper();
+ complete(&hibernation_sync_completion);
+ return 0;
+}
+
+static int hibernation_sync_with_timeout(void)
+{
+ unsigned long timeout_jiffies;
+
+ if (!sync_on_hibernation_timeout_enable) {
+ ksys_sync_helper();
+ return 0;
+ }
+
+ init_completion(&hibernation_sync_completion);
+ hibernation_sync_task = kthread_run(hibernation_sync_thread_func, NULL, "hibernation_sync");
+ if (IS_ERR(hibernation_sync_task)) {
+ pr_warn("PM: hibernation: Failed to create sync thread, performing sync directly\n");
+ ksys_sync_helper();
+ return 0;
+ }
+
+ timeout_jiffies = msecs_to_jiffies(sync_on_hibernation_timeout_ms);
+ if (!wait_for_completion_timeout(&hibernation_sync_completion, timeout_jiffies)) {
+ pr_warn("PM: hibernation: Sync operation timed out after %u ms, aborting hibernation\n",
+ sync_on_hibernation_timeout_ms);
+ kthread_stop(hibernation_sync_task);
+ return -ETIMEDOUT;
+ }
+ return 0;
+}
+
/**
* hibernation_set_ops - Set the global hibernate operations.
* @ops: Hibernation operations to use in subsequent hibernation transitions.
@@ -777,7 +828,9 @@ int hibernate(void)
if (error)
goto Restore;
- ksys_sync_helper();
+ error = hibernation_sync_with_timeout();
+ if (error)
+ goto Exit;
error = freeze_processes();
if (error)
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 8eaec4ab1..144caf525 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -30,9 +30,25 @@
#include <trace/events/power.h>
#include <linux/compiler.h>
#include <linux/moduleparam.h>
+#include <linux/completion.h>
+#include <linux/kthread.h>
+#include <linux/jiffies.h>
#include "power.h"
+/* Sync timeout parameters for suspend */
+static bool sync_on_suspend_timeout_enable;
+module_param(sync_on_suspend_timeout_enable, bool, 0644);
+MODULE_PARM_DESC(sync_on_suspend_timeout_enable, "Enable sync timeout during suspend (default: false)");
+
+static unsigned int sync_on_suspend_timeout_ms = 10000;
+module_param(sync_on_suspend_timeout_ms, uint, 0644);
+MODULE_PARM_DESC(sync_on_suspend_timeout_ms, "Sync timeout in milliseconds during suspend (default: 10000)");
+
+/* Sync timeout implementation for suspend */
+static struct completion suspend_sync_completion;
+static struct task_struct *suspend_sync_task;
+
const char * const pm_labels[] = {
[PM_SUSPEND_TO_IDLE] = "freeze",
[PM_SUSPEND_STANDBY] = "standby",
@@ -61,6 +77,40 @@ static DECLARE_SWAIT_QUEUE_HEAD(s2idle_wait_head);
enum s2idle_states __read_mostly s2idle_state;
static DEFINE_RAW_SPINLOCK(s2idle_lock);
+static int suspend_sync_thread_func(void *data)
+{
+ ksys_sync_helper();
+ complete(&suspend_sync_completion);
+ return 0;
+}
+
+static int suspend_sync_with_timeout(void)
+{
+ unsigned long timeout_jiffies;
+
+ if (!sync_on_suspend_timeout_enable) {
+ ksys_sync_helper();
+ return 0;
+ }
+
+ init_completion(&suspend_sync_completion);
+ suspend_sync_task = kthread_run(suspend_sync_thread_func, NULL, "suspend_sync");
+ if (IS_ERR(suspend_sync_task)) {
+ pr_warn("PM: suspend: Failed to create sync thread, performing sync directly\n");
+ ksys_sync_helper();
+ return 0;
+ }
+
+ timeout_jiffies = msecs_to_jiffies(sync_on_suspend_timeout_ms);
+ if (!wait_for_completion_timeout(&suspend_sync_completion, timeout_jiffies)) {
+ pr_warn("PM: suspend: Sync operation timed out after %u ms, aborting suspend\n",
+ sync_on_suspend_timeout_ms);
+ kthread_stop(suspend_sync_task);
+ return -ETIMEDOUT;
+ }
+ return 0;
+}
+
/**
* pm_suspend_default_s2idle - Check if suspend-to-idle is the default suspend.
*
@@ -585,8 +635,12 @@ static int enter_state(suspend_state_t state)
if (sync_on_suspend_enabled) {
trace_suspend_resume(TPS("sync_filesystems"), 0, true);
- ksys_sync_helper();
+ error = suspend_sync_with_timeout();
trace_suspend_resume(TPS("sync_filesystems"), 0, false);
+ if (error) {
+ pr_err("PM: Sync timeout, aborting suspend\n");
+ goto Unlock;
+ }
}
pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]);
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation
2025-09-05 2:47 [PATCH] PM: Add configurable sync timeout during suspend and hibernate to prevent hang tuhaowen
@ 2025-09-05 9:24 ` tuhaowen
2025-09-05 19:27 ` Rafael J. Wysocki
0 siblings, 1 reply; 9+ messages in thread
From: tuhaowen @ 2025-09-05 9:24 UTC (permalink / raw)
To: len.brown, pavel, rafael; +Cc: tuhaowen, huangbibo, linux-kernel, linux-pm
When large file operations are in progress during system suspend or
hibernation, the ksys_sync() call can hang for extended periods,
leading to unresponsive system behavior. Users copying large files
to USB drives may experience black screen hangs when attempting to
suspend, requiring forced power cycles.
This patch introduces a unified sync timeout mechanism for both
suspend-to-RAM (S3) and hibernation (S4) to prevent indefinite
hangs while maintaining data integrity.
Key features:
- Configurable timeout via sysfs interface
- Default behavior unchanged (timeout disabled by default)
- Unified implementation for both suspend and hibernation paths
- Graceful fallback to direct sync on thread creation failure
Sysfs interface:
- /sys/power/sleep_sync_timeout: Runtime configuration (0-600000ms)
When timeout is enabled and exceeded, the sync operation is aborted
and suspend/hibernation fails gracefully with -ETIMEDOUT, preventing
system hangs.
Implementation creates a separate kthread for sync operations when
timeout is enabled, allowing the main suspend path to maintain
control and abort if necessary.
Compared to [PATCH v3 0/3] PM: Support aborting suspend during
filesystem sync (see: https://lore.kernel.org/linux-pm/20250821004237.
2712312-1-wusamuel@google.com/), this patch addresses scenarios where
there may be no wakeup event, but the sync operation is excessively
slow (e.g., due to slow or faulty storage). By introducing a configurable
timeout, it proactively prevents indefinite hangs and improves user
experience in a wider range of real-world cases. The implementation
is also simpler and gives users or system integrators more flexibility
to tune behavior for different devices and requirements. Additionally,
the ksys_sync_helper_timeout() interface is designed as a reusable
generic function that other kernel subsystems can leverage when they
need sync operations with timeout control, promoting code reuse and
reducing maintenance overhead across the kernel.
Signed-off-by: tuhaowen <tuhaowen@uniontech.com>
---
Changes in v2:
- Unified timeout logic in kernel/power/main.c
- Removed duplicate code from suspend.c and hibernate.c
- Added sysfs interface for runtime configuration
- Changed default to 0 (disabled) for backward compatibility
- Increased maximum timeout to 10 minutes
- Simplified parameter control (removed separate enable flag)
---
include/linux/suspend.h | 3 ++
kernel/power/hibernate.c | 4 +-
kernel/power/main.c | 82 ++++++++++++++++++++++++++++++++++++++++
kernel/power/suspend.c | 6 ++-
4 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index da6ebca3f..976c8f8a1 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -439,6 +439,8 @@ void restore_processor_state(void);
extern int register_pm_notifier(struct notifier_block *nb);
extern int unregister_pm_notifier(struct notifier_block *nb);
extern void ksys_sync_helper(void);
+extern int ksys_sync_helper_timeout(unsigned int timeout_ms);
+extern unsigned int sync_timeout_ms;
extern void pm_report_hw_sleep_time(u64 t);
extern void pm_report_max_hw_sleep(u64 t);
@@ -486,6 +488,7 @@ static inline void pm_report_hw_sleep_time(u64 t) {};
static inline void pm_report_max_hw_sleep(u64 t) {};
static inline void ksys_sync_helper(void) {}
+static inline int ksys_sync_helper_timeout(unsigned int timeout_ms) { return 0; }
#define pm_notifier(fn, pri) do { (void)(fn); } while (0)
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 23c0f4e6c..2678181a5 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -777,7 +777,9 @@ int hibernate(void)
if (error)
goto Restore;
- ksys_sync_helper();
+ error = ksys_sync_helper_timeout(sync_timeout_ms);
+ if (error)
+ goto Exit;
error = freeze_processes();
if (error)
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6254814d4..3912f221a 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -17,10 +17,21 @@
#include <linux/suspend.h>
#include <linux/syscalls.h>
#include <linux/pm_runtime.h>
+#include <linux/completion.h>
+#include <linux/kthread.h>
+#include <linux/jiffies.h>
#include "power.h"
#ifdef CONFIG_PM_SLEEP
+/* Sync timeout parameters */
+unsigned int sync_timeout_ms;
+EXPORT_SYMBOL_GPL(sync_timeout_ms);
+
+/* Sync timeout implementation */
+static struct completion sync_completion;
+static struct task_struct *sync_task;
+
/*
* The following functions are used by the suspend/hibernate code to temporarily
* change gfp_allowed_mask in order to avoid using I/O during memory allocations
@@ -79,6 +90,45 @@ void ksys_sync_helper(void)
}
EXPORT_SYMBOL_GPL(ksys_sync_helper);
+static int sync_thread_func(void *data)
+{
+ ksys_sync_helper();
+ complete(&sync_completion);
+ return 0;
+}
+
+int ksys_sync_helper_timeout(unsigned int timeout_ms)
+{
+ unsigned long timeout_jiffies;
+
+ /* If timeout is 0, use regular sync without timeout */
+ if (timeout_ms == 0) {
+ ksys_sync_helper();
+ return 0;
+ }
+
+ init_completion(&sync_completion);
+ sync_task = kthread_run(sync_thread_func, NULL, "sync_timeout");
+ if (IS_ERR(sync_task)) {
+ pr_warn("%s: Failed to create sync thread, performing sync directly\n",
+ __func__);
+ ksys_sync_helper();
+ return 0;
+ }
+
+ timeout_jiffies = msecs_to_jiffies(timeout_ms);
+ if (!wait_for_completion_timeout(&sync_completion, timeout_jiffies)) {
+ pr_warn("%s: Sync operation timed out after %u ms, aborting\n",
+ __func__, timeout_ms);
+ kthread_stop(sync_task);
+ return -ETIMEDOUT;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ksys_sync_helper_timeout);
+
+
+
/* Routines for PM-transition notifications */
static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
@@ -240,6 +290,37 @@ static ssize_t sync_on_suspend_store(struct kobject *kobj,
}
power_attr(sync_on_suspend);
+
+/*
+ * sleep_sync_timeout: configure sync timeout during suspend/hibernation.
+ *
+ * show() returns the current sync timeout in milliseconds.
+ * store() accepts timeout value in milliseconds. 0 disables timeout.
+ */
+static ssize_t sleep_sync_timeout_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%u\n", sync_timeout_ms);
+}
+
+static ssize_t sleep_sync_timeout_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ unsigned long val;
+
+ if (kstrtoul(buf, 10, &val))
+ return -EINVAL;
+
+ /* Allow any reasonable timeout value */
+ if (val > 600000) /* Max 10 minutes */
+ return -EINVAL;
+
+ sync_timeout_ms = val;
+ return n;
+}
+
+power_attr(sleep_sync_timeout);
#endif /* CONFIG_SUSPEND */
#ifdef CONFIG_PM_SLEEP_DEBUG
@@ -974,6 +1055,7 @@ static struct attribute * g[] = {
#ifdef CONFIG_SUSPEND
&mem_sleep_attr.attr,
&sync_on_suspend_attr.attr,
+ &sleep_sync_timeout_attr.attr,
#endif
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 8eaec4ab1..4f8015a75 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -585,8 +585,12 @@ static int enter_state(suspend_state_t state)
if (sync_on_suspend_enabled) {
trace_suspend_resume(TPS("sync_filesystems"), 0, true);
- ksys_sync_helper();
+ error = ksys_sync_helper_timeout(sync_timeout_ms);
trace_suspend_resume(TPS("sync_filesystems"), 0, false);
+ if (error) {
+ pr_err("PM: Sync timeout, aborting suspend\n");
+ goto Unlock;
+ }
}
pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]);
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation
2025-09-05 9:24 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
@ 2025-09-05 19:27 ` Rafael J. Wysocki
2025-09-08 2:22 ` tuhaowen
0 siblings, 1 reply; 9+ messages in thread
From: Rafael J. Wysocki @ 2025-09-05 19:27 UTC (permalink / raw)
To: tuhaowen
Cc: len.brown, pavel, rafael, huangbibo, linux-kernel, linux-pm,
Saravana Kannan, Samuel Wu, Cc: Android Kernel
On Fri, Sep 5, 2025 at 11:25 AM tuhaowen <tuhaowen@uniontech.com> wrote:
>
> When large file operations are in progress during system suspend or
> hibernation, the ksys_sync() call can hang for extended periods,
> leading to unresponsive system behavior. Users copying large files
> to USB drives may experience black screen hangs when attempting to
> suspend, requiring forced power cycles.
>
> This patch introduces a unified sync timeout mechanism for both
> suspend-to-RAM (S3) and hibernation (S4) to prevent indefinite
> hangs while maintaining data integrity.
>
> Key features:
> - Configurable timeout via sysfs interface
> - Default behavior unchanged (timeout disabled by default)
> - Unified implementation for both suspend and hibernation paths
> - Graceful fallback to direct sync on thread creation failure
>
> Sysfs interface:
> - /sys/power/sleep_sync_timeout: Runtime configuration (0-600000ms)
>
> When timeout is enabled and exceeded, the sync operation is aborted
> and suspend/hibernation fails gracefully with -ETIMEDOUT, preventing
> system hangs.
>
> Implementation creates a separate kthread for sync operations when
> timeout is enabled, allowing the main suspend path to maintain
> control and abort if necessary.
>
> Compared to [PATCH v3 0/3] PM: Support aborting suspend during
> filesystem sync (see: https://lore.kernel.org/linux-pm/20250821004237.
> 2712312-1-wusamuel@google.com/), this patch addresses scenarios where
> there may be no wakeup event, but the sync operation is excessively
> slow (e.g., due to slow or faulty storage). By introducing a configurable
> timeout, it proactively prevents indefinite hangs and improves user
> experience in a wider range of real-world cases. The implementation
> is also simpler and gives users or system integrators more flexibility
> to tune behavior for different devices and requirements. Additionally,
> the ksys_sync_helper_timeout() interface is designed as a reusable
> generic function that other kernel subsystems can leverage when they
> need sync operations with timeout control, promoting code reuse and
> reducing maintenance overhead across the kernel.
You need to talk to the authors of the series mentioned above (now
CCed) and come up with a common approach. I have no strong preference
and I'm not going to choose one over the other unless I'm told by
everybody interested that this is the way to go.
I personally think that syncing filesystems during system suspend, in
contrast with hibernation, is rather pointless and hibernation users
can be expected to be sufficiently patient.
There's already /sys/power/sync_on_suspend, so why not use it to
disable the sync on suspend altogether?
> Signed-off-by: tuhaowen <tuhaowen@uniontech.com>
> ---
> Changes in v2:
> - Unified timeout logic in kernel/power/main.c
> - Removed duplicate code from suspend.c and hibernate.c
> - Added sysfs interface for runtime configuration
> - Changed default to 0 (disabled) for backward compatibility
> - Increased maximum timeout to 10 minutes
> - Simplified parameter control (removed separate enable flag)
> ---
> include/linux/suspend.h | 3 ++
> kernel/power/hibernate.c | 4 +-
> kernel/power/main.c | 82 ++++++++++++++++++++++++++++++++++++++++
> kernel/power/suspend.c | 6 ++-
> 4 files changed, 93 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index da6ebca3f..976c8f8a1 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -439,6 +439,8 @@ void restore_processor_state(void);
> extern int register_pm_notifier(struct notifier_block *nb);
> extern int unregister_pm_notifier(struct notifier_block *nb);
> extern void ksys_sync_helper(void);
> +extern int ksys_sync_helper_timeout(unsigned int timeout_ms);
> +extern unsigned int sync_timeout_ms;
> extern void pm_report_hw_sleep_time(u64 t);
> extern void pm_report_max_hw_sleep(u64 t);
>
> @@ -486,6 +488,7 @@ static inline void pm_report_hw_sleep_time(u64 t) {};
> static inline void pm_report_max_hw_sleep(u64 t) {};
>
> static inline void ksys_sync_helper(void) {}
> +static inline int ksys_sync_helper_timeout(unsigned int timeout_ms) { return 0; }
>
> #define pm_notifier(fn, pri) do { (void)(fn); } while (0)
>
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 23c0f4e6c..2678181a5 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -777,7 +777,9 @@ int hibernate(void)
> if (error)
> goto Restore;
>
> - ksys_sync_helper();
> + error = ksys_sync_helper_timeout(sync_timeout_ms);
> + if (error)
> + goto Exit;
>
> error = freeze_processes();
> if (error)
> diff --git a/kernel/power/main.c b/kernel/power/main.c
> index 6254814d4..3912f221a 100644
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -17,10 +17,21 @@
> #include <linux/suspend.h>
> #include <linux/syscalls.h>
> #include <linux/pm_runtime.h>
> +#include <linux/completion.h>
> +#include <linux/kthread.h>
> +#include <linux/jiffies.h>
>
> #include "power.h"
>
> #ifdef CONFIG_PM_SLEEP
> +/* Sync timeout parameters */
> +unsigned int sync_timeout_ms;
> +EXPORT_SYMBOL_GPL(sync_timeout_ms);
> +
> +/* Sync timeout implementation */
> +static struct completion sync_completion;
> +static struct task_struct *sync_task;
> +
> /*
> * The following functions are used by the suspend/hibernate code to temporarily
> * change gfp_allowed_mask in order to avoid using I/O during memory allocations
> @@ -79,6 +90,45 @@ void ksys_sync_helper(void)
> }
> EXPORT_SYMBOL_GPL(ksys_sync_helper);
>
> +static int sync_thread_func(void *data)
> +{
> + ksys_sync_helper();
> + complete(&sync_completion);
> + return 0;
> +}
> +
> +int ksys_sync_helper_timeout(unsigned int timeout_ms)
> +{
> + unsigned long timeout_jiffies;
> +
> + /* If timeout is 0, use regular sync without timeout */
> + if (timeout_ms == 0) {
> + ksys_sync_helper();
> + return 0;
> + }
> +
> + init_completion(&sync_completion);
> + sync_task = kthread_run(sync_thread_func, NULL, "sync_timeout");
> + if (IS_ERR(sync_task)) {
> + pr_warn("%s: Failed to create sync thread, performing sync directly\n",
> + __func__);
> + ksys_sync_helper();
> + return 0;
> + }
> +
> + timeout_jiffies = msecs_to_jiffies(timeout_ms);
> + if (!wait_for_completion_timeout(&sync_completion, timeout_jiffies)) {
> + pr_warn("%s: Sync operation timed out after %u ms, aborting\n",
> + __func__, timeout_ms);
> + kthread_stop(sync_task);
> + return -ETIMEDOUT;
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(ksys_sync_helper_timeout);
> +
> +
> +
> /* Routines for PM-transition notifications */
>
> static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
> @@ -240,6 +290,37 @@ static ssize_t sync_on_suspend_store(struct kobject *kobj,
> }
>
> power_attr(sync_on_suspend);
> +
> +/*
> + * sleep_sync_timeout: configure sync timeout during suspend/hibernation.
> + *
> + * show() returns the current sync timeout in milliseconds.
> + * store() accepts timeout value in milliseconds. 0 disables timeout.
> + */
> +static ssize_t sleep_sync_timeout_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + return sysfs_emit(buf, "%u\n", sync_timeout_ms);
> +}
> +
> +static ssize_t sleep_sync_timeout_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + unsigned long val;
> +
> + if (kstrtoul(buf, 10, &val))
> + return -EINVAL;
> +
> + /* Allow any reasonable timeout value */
> + if (val > 600000) /* Max 10 minutes */
> + return -EINVAL;
> +
> + sync_timeout_ms = val;
> + return n;
> +}
> +
> +power_attr(sleep_sync_timeout);
> #endif /* CONFIG_SUSPEND */
>
> #ifdef CONFIG_PM_SLEEP_DEBUG
> @@ -974,6 +1055,7 @@ static struct attribute * g[] = {
> #ifdef CONFIG_SUSPEND
> &mem_sleep_attr.attr,
> &sync_on_suspend_attr.attr,
> + &sleep_sync_timeout_attr.attr,
> #endif
> #ifdef CONFIG_PM_AUTOSLEEP
> &autosleep_attr.attr,
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 8eaec4ab1..4f8015a75 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -585,8 +585,12 @@ static int enter_state(suspend_state_t state)
>
> if (sync_on_suspend_enabled) {
> trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> - ksys_sync_helper();
> + error = ksys_sync_helper_timeout(sync_timeout_ms);
> trace_suspend_resume(TPS("sync_filesystems"), 0, false);
> + if (error) {
> + pr_err("PM: Sync timeout, aborting suspend\n");
> + goto Unlock;
> + }
> }
>
> pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]);
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation
2025-09-05 19:27 ` Rafael J. Wysocki
@ 2025-09-08 2:22 ` tuhaowen
0 siblings, 0 replies; 9+ messages in thread
From: tuhaowen @ 2025-09-08 2:22 UTC (permalink / raw)
To: rafael
Cc: huangbibo, kernel-team, len.brown, linux-kernel, linux-pm, pavel,
saravanak, tuhaowen, wusamuel
On Fri, Sep 5, 2025 at 9:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Sep 5, 2025 at 11:25 AM tuhaowen <tuhaowen@uniontech.com> wrote:
> >
> > When large file operations are in progress during system suspend or
> > hibernation, the ksys_sync() call can hang for extended periods,
> > leading to unresponsive system behavior. Users copying large files
> > to USB drives may experience black screen hangs when attempting to
> > suspend, requiring forced power cycles.
>
> You need to talk to the authors of the series mentioned above (now
> CCed) and come up with a common approach. I have no strong preference
> and I'm not going to choose one over the other unless I'm told by
> everybody interested that this is the way to go.
>
> I personally think that syncing filesystems during system suspend, in
> contrast with hibernation, is rather pointless and hibernation users
> can be expected to be sufficiently patient.
>
> There's already /sys/power/sync_on_suspend, so why not use it to
> disable the sync on suspend altogether?
Hi Rafael,
Thank you for your feedback on the sync timeout patch.
Regarding your suggestion to use /sys/power/sync_on_suspend to disable sync
altogether, I have concerns about this approach. When resuming from S3
suspend or hibernation, USB and other removable block devices often
disconnect and get rescanned by the kernel. If we disable sync completely
before suspend, any pending writes in the page cache will be lost when
these devices disconnect during the sleep cycle. This can lead to
filesystem metadata corruption, partition table corruption on USB drives,
and data loss for users who were writing files before suspend.
No major Linux distribution dares to enable the sync_on_suspend disable
option by default precisely because of these data corruption risks.
Regarding coordination with the patch series mentioned above, I understand
the need for alignment and I'm actively working on this. I have initiated
discussions with Samuel Wu and the Google team to explore how we can merge
our approaches into a unified solution.
I will coordinate with them to present a unified patch series that
incorporates the best aspects of both approaches.
Best regards,
Haowen Tu
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-09 8:51 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-08 2:46 Sync timeout mechanisms - Request for coordination tuhaowen
2025-09-09 2:00 ` Saravana Kannan
2025-09-09 2:45 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
2025-09-09 6:18 ` Saravana Kannan
2025-09-09 7:13 ` Different approaches to sync timeout: Desktop vs Mobile use cases tuhaowen
2025-09-09 8:51 ` Oliver Neukum
-- strict thread matches above, loose matches on Subject: below --
2025-09-05 2:47 [PATCH] PM: Add configurable sync timeout during suspend and hibernate to prevent hang tuhaowen
2025-09-05 9:24 ` [PATCH v2] PM: Add configurable sync timeout for suspend and hibernation tuhaowen
2025-09-05 19:27 ` Rafael J. Wysocki
2025-09-08 2:22 ` tuhaowen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox