[PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
@ 2025-10-16  9:42 Qu Wenruo
  2025-10-16  9:42 ` [PATCH v2 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  9:42 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Remove copy-pasted comments that are too obvious in the first place
  Also remove a stale comment in the old code.

- Add extra explanation on why both fs and process freezing need to be
  checked
  Mostly due to the configurable behavior of pm suspension/hiberation,
  thus we have to handle both cases.

It's a long known bug that when scrub/dev-replace is running, power
management suspension will time out and fail.

After more debugging and helps from Askar Safin, it turns out there are
at least 3 components involved:

- Process freezing
  This is at the preparation for suspension, which requires all user
  space processes (and some kthreads) to be frozen, which requires the
  process return to user space.

  Thus if the process (normally btrfs command) is falling into a long
  running ioctl (like scrub/dev-replace) it will not be frozen thus
  breaking the pm suspension.

  This mean paused scrub is not feasible, as paused scrub will still
  make the ioctl executing process trapped inside kernel space.

- Filesystem freezing
  It's an optional behavior during pm suspension, previously I submitted
  one patch detecting such situation, and so far it works as expected.
  But this fs freezing is only optional, not yet default behavior of pm
  suspension.

- Systemd slice freezing
  This is the most complex part that I have not yet fully pinned down,
  but during the tests it looks like systemd is sending some signals to
  the processes under the user slice.

  Thus if the process is falling into the kernel for a long time, it will
  not return to the user space and no chance to handle the signal.

To address all those problems, the series will:

- Add extra cancel/pause/removed bg checks for raid56 parity stripes
  Mostly to reduce delay for RAID56 cases, and make the behavior more
  consistent.

- Cancel the scrub if the fs or process is being frozen
  Please note that here we have to check both fs and process freezing,
  please refer to the changelog and comment of the second patch for the
  reason.

- Cancel the scrub if there is a pending signal
  This is mostly for the systemd slice handling, which affects users
  running the scrub inside a user slice. This can cause an obvious
  delay during pm suspension/hiberation and power off/restart.

Qu Wenruo (3):
  btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity
    stripes
  btrfs: scrub: cancel the run if the process or fs is being frozen
  btrfs: scrub: cancel the run if there is a pending signal

 fs/btrfs/scrub.c | 64 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 56 insertions(+), 8 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes
  2025-10-16  9:42 [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
@ 2025-10-16  9:42 ` Qu Wenruo
  2025-10-16  9:42 ` [PATCH v2 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  9:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

For raid56, data and parity stripes are handled differently.

For data stripes they are handled just like regular RAID1/RAID10 stripes,
going through the regular scrub_simple_mirror().

But for parity stripes we have to read out all involved data stripes and
do any needed verification and repair, then scrub the parity stripe.

This process will take a much longer time than a regular stripe, but
unlike scrub_simple_mirror(), we do not check if we should cancel/pause
or the block group is already removed.

Aligned the behavior of scrub_raid56_parity_stripe() to
scrub_simple_mirror(), by adding:

- Cancel check
- Pause check
- Removed block group check

Since those checks are the same from the scrub_simple_mirror(), also
update the comments of scrub_simple_mirror() by:

- Remove too obvious comments
  We do not need extra comments on what we're checking, it's really too
  obvious.

- Remove a stale comment about pausing
  Now the scrub is always queuing all involved stripes, and submit them
  in one go, there is no more submission part during pausing.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index fe266785804e..c3e7e543d350 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2091,6 +2091,20 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 
 	ASSERT(sctx->raid56_data_stripes);
 
+	if (atomic_read(&fs_info->scrub_cancel_req) ||
+	    atomic_read(&sctx->cancel_req))
+		return -ECANCELED;
+
+	if (atomic_read(&fs_info->scrub_pause_req))
+		scrub_blocked_if_needed(fs_info);
+
+	spin_lock(&bg->lock);
+	if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) {
+		spin_unlock(&bg->lock);
+		return 0;
+	}
+	spin_unlock(&bg->lock);
+
 	/*
 	 * For data stripe search, we cannot reuse the same extent/csum paths,
 	 * as the data stripe bytenr may be smaller than previous extent.  Thus
@@ -2261,18 +2275,15 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		u64 found_logical = U64_MAX;
 		u64 cur_physical = physical + cur_logical - logical_start;
 
-		/* Canceled? */
 		if (atomic_read(&fs_info->scrub_cancel_req) ||
 		    atomic_read(&sctx->cancel_req)) {
 			ret = -ECANCELED;
 			break;
 		}
-		/* Paused? */
-		if (atomic_read(&fs_info->scrub_pause_req)) {
-			/* Push queued extents */
+
+		if (atomic_read(&fs_info->scrub_pause_req))
 			scrub_blocked_if_needed(fs_info);
-		}
-		/* Block group removed? */
+
 		spin_lock(&bg->lock);
 		if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) {
 			spin_unlock(&bg->lock);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16  9:42 [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
  2025-10-16  9:42 ` [PATCH v2 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
@ 2025-10-16  9:42 ` Qu Wenruo
  2025-10-16 10:02   ` Filipe Manana
  2025-10-16  9:42 ` [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  9:42 UTC (permalink / raw)
  To: linux-btrfs

It's a known bug that btrfs scrub/dev-replace can prevent the system
from suspending.

There are at least two factors involved:

- Holding super_block::s_writers for the whole scrub/dev-replace
  duration
  We hold that percpu rw semaphore through mnt_want_write_file() for the
  whole scrub/dev-replace duration.

  That will prevent the fs being frozen, which can be initiated by
  either the user (e.g. fsfreeze) or power management suspension/hiberation.

- Stuck in the kernel space for a long time
  During suspension all user processes (and some kernel threads) will
  be frozen.
  But if a user space progress has fallen into kernel (scrub ioctl) and
  do not return for a long time, it will make process freezing time out.

  Unfortunately scrub/dev-replace is a long running ioctl, and it will
  prevent the btrfs process from returning to the user space, thus make pm
  suspension/hiberation time out.

Address them in one go:

- Introduce a new helper should_cancel_scrub()
  Which includes the existing cancel request and new fs/process freezing
  checks.

  Here we have to check both fs and process freezing for pm
  suspension/hiberation.

  Pm can be configured to freeze fses before processes.
  (The current default is not to freeze fses, but planned to freeze the
  fses as the new default)

  Checking only fs freezing will fail pm without fs freezing, as the
  process freezing will time out.

  Checking only process freezing will fail pm with fs freezing since the
  fs freezing happens before process freezing.

- Cancel the run if should_cancel_scrub() is true
  Unfortunately canceling is the only feasible solution here, pausing is
  not possible as we will still stay in the kernel space thus will still
  prevent the process from being frozen.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index c3e7e543d350..214285b216ec 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2069,6 +2069,38 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
 	return 0;
 }
 
+static bool should_cancel_scrub(struct scrub_ctx *sctx)
+{
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+
+	if (atomic_read(&fs_info->scrub_cancel_req) ||
+	    atomic_read(&sctx->cancel_req))
+		return true;
+
+	/*
+	 * The user (e.g. fsfreeze command) or power management (pm)
+	 * suspension/hibernation can freeze the fs.
+	 * And pm suspension/hibernation will also freeze all user processes.
+	 *
+	 * A process can only be frozen when it is in the user space, thus we
+	 * have to cancel the run so that the process can return to the user
+	 * space.
+	 *
+	 * Furthermore we have to check both fs and process freezing, as pm can
+	 * be configured to freeze the fses before processes.
+	 *
+	 * If we only check fs freezing, then suspension without fs freezing
+	 * will timeout, as the process is still in the kernel space.
+	 *
+	 * If we only check process freezing, then suspension with fs freezing
+	 * will timeout, as the running scrub will prevent the fs from being frozen.
+	 */
+	if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
+	    freezing(current))
+		return true;
+	return false;
+}
+
 static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 				      struct btrfs_device *scrub_dev,
 				      struct btrfs_block_group *bg,
@@ -2091,8 +2123,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 
 	ASSERT(sctx->raid56_data_stripes);
 
-	if (atomic_read(&fs_info->scrub_cancel_req) ||
-	    atomic_read(&sctx->cancel_req))
+	if (should_cancel_scrub(sctx))
 		return -ECANCELED;
 
 	if (atomic_read(&fs_info->scrub_pause_req))
@@ -2275,8 +2306,7 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		u64 found_logical = U64_MAX;
 		u64 cur_physical = physical + cur_logical - logical_start;
 
-		if (atomic_read(&fs_info->scrub_cancel_req) ||
-		    atomic_read(&sctx->cancel_req)) {
+		if (should_cancel_scrub(sctx)) {
 			ret = -ECANCELED;
 			break;
 		}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal
  2025-10-16  9:42 [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
  2025-10-16  9:42 ` [PATCH v2 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
  2025-10-16  9:42 ` [PATCH v2 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
@ 2025-10-16  9:42 ` Qu Wenruo
  2025-10-17  9:47   ` Askar Safin
  2025-10-17  9:39 ` [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
  2025-10-18  4:16 ` Askar Safin
  4 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  9:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

Unlike relocation, scrub never checks pending signals, and even for
relocation is only explicitly checking for fatal signal (SIGKILL), not
for regular ones.

Thankfully relocation can still be interrupted by regular signals by
the usage of wait_on_bit(), which is interruptible by regular signals.

So add the proper signal pending checks for scrub, by checking all
pending signals, and if we hit one signal cancel the scrub run since the
signal handling can only be done in the user space.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 214285b216ec..eec76055d245 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2098,6 +2098,13 @@ static bool should_cancel_scrub(struct scrub_ctx *sctx)
 	if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
 	    freezing(current))
 		return true;
+	/*
+	 * Signal handling is also done in user space, so we have to return
+	 * to user space by canceling the run when there is a pending signal.
+	 * (including non-fatal ones like SIGINT)
+	 */
+	if (signal_pending(current))
+		return true;
 	return false;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16  9:42 ` [PATCH v2 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
@ 2025-10-16 10:02   ` Filipe Manana
  0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2025-10-16 10:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 16, 2025 at 10:43 AM Qu Wenruo <wqu@suse.com> wrote:
>
> It's a known bug that btrfs scrub/dev-replace can prevent the system
> from suspending.
>
> There are at least two factors involved:
>
> - Holding super_block::s_writers for the whole scrub/dev-replace
>   duration
>   We hold that percpu rw semaphore through mnt_want_write_file() for the
>   whole scrub/dev-replace duration.
>
>   That will prevent the fs being frozen, which can be initiated by
>   either the user (e.g. fsfreeze) or power management suspension/hiberation.
>
> - Stuck in the kernel space for a long time
>   During suspension all user processes (and some kernel threads) will
>   be frozen.
>   But if a user space progress has fallen into kernel (scrub ioctl) and
>   do not return for a long time, it will make process freezing time out.
>
>   Unfortunately scrub/dev-replace is a long running ioctl, and it will
>   prevent the btrfs process from returning to the user space, thus make pm
>   suspension/hiberation time out.
>
> Address them in one go:
>
> - Introduce a new helper should_cancel_scrub()
>   Which includes the existing cancel request and new fs/process freezing
>   checks.
>
>   Here we have to check both fs and process freezing for pm
>   suspension/hiberation.
>
>   Pm can be configured to freeze fses before processes.
>   (The current default is not to freeze fses, but planned to freeze the
>   fses as the new default)
>
>   Checking only fs freezing will fail pm without fs freezing, as the
>   process freezing will time out.
>
>   Checking only process freezing will fail pm with fs freezing since the
>   fs freezing happens before process freezing.
>
> - Cancel the run if should_cancel_scrub() is true
>   Unfortunately canceling is the only feasible solution here, pausing is
>   not possible as we will still stay in the kernel space thus will still
>   prevent the process from being frozen.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/scrub.c | 38 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index c3e7e543d350..214285b216ec 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2069,6 +2069,38 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
>         return 0;
>  }
>
> +static bool should_cancel_scrub(struct scrub_ctx *sctx)

I think it can be made const.

Anyway, it looks good.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Also, don't forget the Reported-by tag and Link when pushing this.

Thanks.

> +{
> +       struct btrfs_fs_info *fs_info = sctx->fs_info;
> +
> +       if (atomic_read(&fs_info->scrub_cancel_req) ||
> +           atomic_read(&sctx->cancel_req))
> +               return true;
> +
> +       /*
> +        * The user (e.g. fsfreeze command) or power management (pm)
> +        * suspension/hibernation can freeze the fs.
> +        * And pm suspension/hibernation will also freeze all user processes.
> +        *
> +        * A process can only be frozen when it is in the user space, thus we
> +        * have to cancel the run so that the process can return to the user
> +        * space.
> +        *
> +        * Furthermore we have to check both fs and process freezing, as pm can
> +        * be configured to freeze the fses before processes.
> +        *
> +        * If we only check fs freezing, then suspension without fs freezing
> +        * will timeout, as the process is still in the kernel space.
> +        *
> +        * If we only check process freezing, then suspension with fs freezing
> +        * will timeout, as the running scrub will prevent the fs from being frozen.
> +        */
> +       if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
> +           freezing(current))
> +               return true;
> +       return false;
> +}
> +
>  static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>                                       struct btrfs_device *scrub_dev,
>                                       struct btrfs_block_group *bg,
> @@ -2091,8 +2123,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>
>         ASSERT(sctx->raid56_data_stripes);
>
> -       if (atomic_read(&fs_info->scrub_cancel_req) ||
> -           atomic_read(&sctx->cancel_req))
> +       if (should_cancel_scrub(sctx))
>                 return -ECANCELED;
>
>         if (atomic_read(&fs_info->scrub_pause_req))
> @@ -2275,8 +2306,7 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
>                 u64 found_logical = U64_MAX;
>                 u64 cur_physical = physical + cur_logical - logical_start;
>
> -               if (atomic_read(&fs_info->scrub_cancel_req) ||
> -                   atomic_read(&sctx->cancel_req)) {
> +               if (should_cancel_scrub(sctx)) {
>                         ret = -ECANCELED;
>                         break;
>                 }
> --
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
  2025-10-16  9:42 [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
                   ` (2 preceding siblings ...)
  2025-10-16  9:42 ` [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
@ 2025-10-17  9:39 ` Askar Safin
  2025-10-17 13:03   ` Askar Safin
  2025-10-18  4:16 ` Askar Safin
  4 siblings, 1 reply; 13+ messages in thread
From: Askar Safin @ 2025-10-17  9:39 UTC (permalink / raw)
  To: wqu; +Cc: linux-btrfs

Qu Wenruo <wqu@suse.com>:
> - Systemd slice freezing
>   This is the most complex part that I have not yet fully pinned down,
>   but during the tests it looks like systemd is sending some signals to
>   the processes under the user slice.

Systemd doesn't send any signals. First it freezes user slices using
cgroup freezer. This happens here:

https://github.com/systemd/systemd/blob/28aa0a1f25470b21fbe3b8af2d9d99e616c000a4/src/sleep/sleep.c#L665

Ultimately this code does cgroup freezing, see "cgroup.freeze" in
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html .

Then systemd initiates actual kernel-based suspend.

But it is still useful to catch signals, because systemd does
send signals when we poweroff.

(Please, CC me, if you send future versions of this patchset.)

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal
  2025-10-16  9:42 ` [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
@ 2025-10-17  9:47   ` Askar Safin
  2025-10-17 10:34     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Askar Safin @ 2025-10-17  9:47 UTC (permalink / raw)
  To: wqu; +Cc: fdmanana, linux-btrfs

Qu Wenruo <wqu@suse.com>:
> +	/*
> +	 * Signal handling is also done in user space, so we have to return
> +	 * to user space by canceling the run when there is a pending signal.
> +	 * (including non-fatal ones like SIGINT)
> +	 */

SIGINT is bad example here. "btrfs scrub" somehow is already able to catch
SIGINT on unpatched kernel.

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal
  2025-10-17  9:47   ` Askar Safin
@ 2025-10-17 10:34     ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-17 10:34 UTC (permalink / raw)
  To: Askar Safin, wqu; +Cc: fdmanana, linux-btrfs

在 2025/10/17 20:17, Askar Safin 写道:
> Qu Wenruo <wqu@suse.com>:
>> +	/*
>> +	 * Signal handling is also done in user space, so we have to return
>> +	 * to user space by canceling the run when there is a pending signal.
>> +	 * (including non-fatal ones like SIGINT)
>> +	 */
> 
> SIGINT is bad example here. "btrfs scrub" somehow is already able to catch
> SIGINT on unpatched kernel.
> 

I can not find any kernel code that checks the signal before this patchset.

But btrfs-progs are no longer doing straightforward single ioctl 
anymore, even with -B option it will still start one thread for each 
device to do the ioctl.

Thus there is always the main thread which is not falling into the 
kernel space and can handle the signal.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
  2025-10-17  9:39 ` [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
@ 2025-10-17 13:03   ` Askar Safin
  0 siblings, 0 replies; 13+ messages in thread
From: Askar Safin @ 2025-10-17 13:03 UTC (permalink / raw)
  To: safinaskar; +Cc: linux-btrfs, wqu

I did some more testing. I found that when systemd freezes processes using
cgroup freezer, then:
- fatal_signal_pending(current) returns false
- freezing(current) returns false
- signal_pending(current) returns true

So, systemd cgroup freezer indeed looks very similar to signal.

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
  2025-10-16  9:42 [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
                   ` (3 preceding siblings ...)
  2025-10-17  9:39 ` [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
@ 2025-10-18  4:16 ` Askar Safin
  2025-10-18  4:26   ` Qu Wenruo
  4 siblings, 1 reply; 13+ messages in thread
From: Askar Safin @ 2025-10-18  4:16 UTC (permalink / raw)
  To: wqu; +Cc: linux-btrfs

Also, please, add "Fixes" and cc stable to these patches. They should be
backported to stable.

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
  2025-10-18  4:16 ` Askar Safin
@ 2025-10-18  4:26   ` Qu Wenruo
  2025-10-18  6:43     ` Askar Safin
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-18  4:26 UTC (permalink / raw)
  To: Askar Safin; +Cc: linux-btrfs



在 2025/10/18 14:46, Askar Safin 写道:
> Also, please, add "Fixes" and cc stable to these patches. They should be
> backported to stable.
> 

Unfortunately it's not a fix, but a behavior change.
Previously we put priority beyond pm, but not anymore.

So no backport to stable.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
  2025-10-18  4:26   ` Qu Wenruo
@ 2025-10-18  6:43     ` Askar Safin
  2025-10-18  7:14       ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Askar Safin @ 2025-10-18  6:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sat, Oct 18, 2025 at 7:26 AM Qu Wenruo <wqu@suse.com> wrote:
> Unfortunately it's not a fix, but a behavior change.
> Previously we put priority beyond pm, but not anymore.
>
> So no backport to stable.

I absolutely disagree. This is actual fix for actual bug.
Many distros use btrfs for root filesystem by default, so
this bug impacts a lot of users.

I will wait till this fix will reach mainline. Then I will directly
ask Greg KH to port
the fix to stable.

Note this comment
https://github.com/systemd/systemd/issues/33626#issuecomment-2323415732
:
> Failing suspend may not just end up in data loss but also in permanent hardware damage.
> If you close the lid of a laptop and store it in a bag expecting the laptop to suspend and
> the laptop does not, in a short time it will reach crazy temperatures in that bag,
> that can permanently damage the screen

(That comment was not about this btrfs bug, but it still applies.)

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling
  2025-10-18  6:43     ` Askar Safin
@ 2025-10-18  7:14       ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-18  7:14 UTC (permalink / raw)
  To: Askar Safin; +Cc: linux-btrfs



在 2025/10/18 17:13, Askar Safin 写道:
> On Sat, Oct 18, 2025 at 7:26 AM Qu Wenruo <wqu@suse.com> wrote:
>> Unfortunately it's not a fix, but a behavior change.
>> Previously we put priority beyond pm, but not anymore.
>>
>> So no backport to stable.
> 
> I absolutely disagree. This is actual fix for actual bug.
> Many distros use btrfs for root filesystem by default, so
> this bug impacts a lot of users.
> 
> I will wait till this fix will reach mainline. Then I will directly
> ask Greg KH to port
> the fix to stable.

Sure go ahead and check who Greg trust more.

Let me be clear, it will affect existing users, especially for 
dev-replace cases.

If a user expects the dev-replace to finish no matter pm is involved or 
not, and now pm disrupted canceled it, it's a behavior change or even 
regression.

This will need quite some changes, not only to the docs, but even some 
new btrfs-progs options (e.g. --inhibit to inhibit the pm operations).

I appreciate your help so far on testing, but I'm afraid you're not the 
one making the final call, not even me.

Thanks,
Qu
> 
> Note this comment
> https://github.com/systemd/systemd/issues/33626#issuecomment-2323415732
> :
>> Failing suspend may not just end up in data loss but also in permanent hardware damage.
>> If you close the lid of a laptop and store it in a bag expecting the laptop to suspend and
>> the laptop does not, in a short time it will reach crazy temperatures in that bag,
>> that can permanently damage the screen
> 
> (That comment was not about this btrfs bug, but it still applies.)
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-18  7:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16  9:42 [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
2025-10-16  9:42 ` [PATCH v2 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
2025-10-16  9:42 ` [PATCH v2 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
2025-10-16 10:02   ` Filipe Manana
2025-10-16  9:42 ` [PATCH v2 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
2025-10-17  9:47   ` Askar Safin
2025-10-17 10:34     ` Qu Wenruo
2025-10-17  9:39 ` [PATCH v2 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
2025-10-17 13:03   ` Askar Safin
2025-10-18  4:16 ` Askar Safin
2025-10-18  4:26   ` Qu Wenruo
2025-10-18  6:43     ` Askar Safin
2025-10-18  7:14       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox