[PATCH 0/3] btrfs: scrub: enhance freezing and signal handling

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] btrfs: scrub: enhance freezing and signal handling
@ 2025-10-16  4:32 Qu Wenruo
  2025-10-16  4:32 ` [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  4:32 UTC (permalink / raw)
  To: linux-btrfs

It's a long known bug that when scrub/dev-replace is running, power
management suspension will time out and fail.

After more debugging and helps from Askar Safin, it turns out there are
at least 3 components involved:

- Process freezing
  This is at the preparation for suspension, which requires all user
  space processes (and some kthreads) to be frozen, which requires the
  process return to user space.

  Thus if the process (normally btrfs command) is falling into a long
  running ioctl (like scrub/dev-replace) it will not be frozen thus
  breaking the pm suspension.

  This mean paused scrub is not feasible, as paused scrub will still
  make the ioctl executing process trapped inside kernel space.

- Filesystem freezing
  It's an optional behavior during pm suspension, previously I submitted
  one patch detecting such situation, and so far it works as expected.
  But this fs freezing is only optional, not yet default behavior of pm
  suspension.

- Systemd slice freezing
  This is the most complex part that I have not yet fully pinned down.

To address the first 2 problems, the series will:

- Add extra cancel/pause/removed bg checks for raid56 parity stripes
  Mostly to reduce delay for RAID56 cases, and make the behavior more
  consistent.

- Cancel the scrub if the fs/process is being frozen

- Cancel the scrub if there is a pending signal
  For the systemd slice situation I have no idea how slice freezing is
  done, but at least we should check pending signals (not only fatal
  ones), which will align the behavior to relocation.

Qu Wenruo (3):
  btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity
    stripes
  btrfs: scrub: cancel the run if the process or fs is being frozen
  btrfs: scrub: cancel the run if there is a pending signal

 fs/btrfs/scrub.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes
  2025-10-16  4:32 [PATCH 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
@ 2025-10-16  4:32 ` Qu Wenruo
  2025-10-16  7:55   ` Filipe Manana
  2025-10-16  4:32 ` [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
  2025-10-16  4:32 ` [PATCH 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
  2 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  4:32 UTC (permalink / raw)
  To: linux-btrfs

For raid56, data and parity stripes are handled differently.

For data stripes they are handled just like regular RAID1/RAID10 stripes,
going through the regular scrub_simple_mirror().

But for parity stripes we have to read out all involved data stripes and
do any needed verification and repair, then scrub the parity stripe.

This process will take a much longer time than a regular stripe, but
unlike scrub_simple_mirror(), we do not check if we should cancel/pause
or the block group is already removed.

Aligned the behavior of scrub_raid56_parity_stripe() to
scrub_simple_mirror(), by adding:

- Cancel check
- Pause check
- Removed block group check

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index fe266785804e..facbaf3cc231 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2091,6 +2091,24 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 
 	ASSERT(sctx->raid56_data_stripes);
 
+	/* Canceled? */
+	if (atomic_read(&fs_info->scrub_cancel_req) ||
+	    atomic_read(&sctx->cancel_req))
+		return -ECANCELED;
+
+	/* Paused? */
+	if (atomic_read(&fs_info->scrub_pause_req))
+		/* Push queued extents */
+		scrub_blocked_if_needed(fs_info);
+
+	/* Block group removed? */
+	spin_lock(&bg->lock);
+	if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) {
+		spin_unlock(&bg->lock);
+		return 0;
+	}
+	spin_unlock(&bg->lock);
+
 	/*
 	 * For data stripe search, we cannot reuse the same extent/csum paths,
 	 * as the data stripe bytenr may be smaller than previous extent.  Thus
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16  4:32 [PATCH 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
  2025-10-16  4:32 ` [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
@ 2025-10-16  4:32 ` Qu Wenruo
  2025-10-16  7:51   ` Filipe Manana
  2025-10-16 16:42   ` David Sterba
  2025-10-16  4:32 ` [PATCH 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
  2 siblings, 2 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  4:32 UTC (permalink / raw)
  To: linux-btrfs

It's a known bug that btrfs scrub/dev-replace can prevent the system
from suspending.

There are at least two factors involved:

- Holding super_block::s_writers for the whole scrub/dev-replace
  duration
  We hold that mutex through mnt_want_write_file() for the whole
  scrub/dev-replace duration.

  That will prevent the fs being frozen.
  It's tunable for the kernel to suspend all fses before suspending, if
  that's the case, a running scrub will refuse to be frozen and prevent
  suspension.

- Stuck in kernel space for a long time
  During suspension all user processes (and some kernel threads) will
  be frozen.
  But if a user space progress has fallen into kernel (scrub ioctl) and
  do not return for a long time, it will make suspension time out.

  Unfortunately scrub/dev-replace is a long running ioctl, and it will
  prevent the btrfs process from returning to user space.

Address them in one go:

- Introduce a new helper should_cancel_scrub()
  Which checks both fs and process freezing.

- Cancel the run if should_cancel_scrub() is true
  The check is done at scrub_simple_mirror() and
  scrub_raid56_parity_stripe().

  Unfortunately canceling is the only feasible solution here, pausing is
  not possible as we will still stay in the kernel state thus will still
  prevent the process from being frozen.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index facbaf3cc231..728d4e666054 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2069,6 +2069,20 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
 	return 0;
 }
 
+static bool should_cancel_scrub(struct btrfs_fs_info *fs_info)
+{
+	/*
+	 * For fs and process freezing case, it can be preparation
+	 * for a incoming pm suspension.
+	 * In that case we have to return to the user space, thus
+	 * canceling is the only feasible solution.
+	 */
+	if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
+	    freezing(current))
+		return true;
+	return false;
+}
+
 static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 				      struct btrfs_device *scrub_dev,
 				      struct btrfs_block_group *bg,
@@ -2093,7 +2107,8 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 
 	/* Canceled? */
 	if (atomic_read(&fs_info->scrub_cancel_req) ||
-	    atomic_read(&sctx->cancel_req))
+	    atomic_read(&sctx->cancel_req) ||
+	    should_cancel_scrub(fs_info))
 		return -ECANCELED;
 
 	/* Paused? */
@@ -2281,7 +2296,8 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 
 		/* Canceled? */
 		if (atomic_read(&fs_info->scrub_cancel_req) ||
-		    atomic_read(&sctx->cancel_req)) {
+		    atomic_read(&sctx->cancel_req) ||
+		    should_cancel_scrub(fs_info)) {
 			ret = -ECANCELED;
 			break;
 		}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] btrfs: scrub: cancel the run if there is a pending signal
  2025-10-16  4:32 [PATCH 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
  2025-10-16  4:32 ` [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
  2025-10-16  4:32 ` [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
@ 2025-10-16  4:32 ` Qu Wenruo
  2025-10-16  7:58   ` Filipe Manana
  2 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  4:32 UTC (permalink / raw)
  To: linux-btrfs

Unlike relocation, scrub never checks pending signals, and even for
relocation is only explicitly checking for fatal signal (SIGKILL), not
for regular ones.

Thankfully relocation can still be interrupted by regular signals by
the usage of wait_on_bit(), which is interruptible by regular signals.

So add the proper signal pending checks for scrub, by checking all
pending signals, and if we hit one signal cancel the scrub run since the
signal handling can only be done in the user space.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 728d4e666054..f3dfb7023499 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2080,6 +2080,13 @@ static bool should_cancel_scrub(struct btrfs_fs_info *fs_info)
 	if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
 	    freezing(current))
 		return true;
+	/*
+	 * Signal handling is also done in user space, so we have to return
+	 * to user space by canceling the run when there is a pending signal.
+	 * (including non-fatal ones like SIGINT)
+	 */
+	if (signal_pending(current))
+		return true;
 	return false;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16  4:32 ` [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
@ 2025-10-16  7:51   ` Filipe Manana
  2025-10-16  8:00     ` Qu Wenruo
  2025-10-16 16:42   ` David Sterba
  1 sibling, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2025-10-16  7:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 16, 2025 at 5:33 AM Qu Wenruo <wqu@suse.com> wrote:
>
> It's a known bug that btrfs scrub/dev-replace can prevent the system
> from suspending.
>
> There are at least two factors involved:
>
> - Holding super_block::s_writers for the whole scrub/dev-replace
>   duration
>   We hold that mutex through mnt_want_write_file() for the whole

It's not a mutex, it's a percpu rw semaphore.

>   scrub/dev-replace duration.
>
>   That will prevent the fs being frozen.
>   It's tunable for the kernel to suspend all fses before suspending, if
>   that's the case, a running scrub will refuse to be frozen and prevent
>   suspension.
>
> - Stuck in kernel space for a long time
>   During suspension all user processes (and some kernel threads) will
>   be frozen.
>   But if a user space progress has fallen into kernel (scrub ioctl) and
>   do not return for a long time, it will make suspension time out.
>
>   Unfortunately scrub/dev-replace is a long running ioctl, and it will
>   prevent the btrfs process from returning to user space.
>
> Address them in one go:
>
> - Introduce a new helper should_cancel_scrub()
>   Which checks both fs and process freezing.
>
> - Cancel the run if should_cancel_scrub() is true
>   The check is done at scrub_simple_mirror() and
>   scrub_raid56_parity_stripe().
>
>   Unfortunately canceling is the only feasible solution here, pausing is
>   not possible as we will still stay in the kernel state thus will still
>   prevent the process from being frozen.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/scrub.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index facbaf3cc231..728d4e666054 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2069,6 +2069,20 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
>         return 0;
>  }
>
> +static bool should_cancel_scrub(struct btrfs_fs_info *fs_info)
> +{
> +       /*
> +        * For fs and process freezing case, it can be preparation
> +        * for a incoming pm suspension.
> +        * In that case we have to return to the user space, thus
> +        * canceling is the only feasible solution.
> +        */
> +       if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
> +           freezing(current))

Why the check for sb->s_writers.frozen > SB_UNFROZEN?
I don't see why freezing() is not enough and would prefer to not
access such low level details of the vfs directly.

> +               return true;
> +       return false;
> +}
> +
>  static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>                                       struct btrfs_device *scrub_dev,
>                                       struct btrfs_block_group *bg,
> @@ -2093,7 +2107,8 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>
>         /* Canceled? */
>         if (atomic_read(&fs_info->scrub_cancel_req) ||
> -           atomic_read(&sctx->cancel_req))
> +           atomic_read(&sctx->cancel_req) ||
> +           should_cancel_scrub(fs_info))

If we now have a should_cancel_scrub(), the checks for the cancel
atomics should be moved there, otherwise it's confusing why some
cancel checks are in the helper and others are not.

>                 return -ECANCELED;
>
>         /* Paused? */
> @@ -2281,7 +2296,8 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
>
>                 /* Canceled? */
>                 if (atomic_read(&fs_info->scrub_cancel_req) ||
> -                   atomic_read(&sctx->cancel_req)) {
> +                   atomic_read(&sctx->cancel_req) ||
> +                   should_cancel_scrub(fs_info)) {

And it would avoid duplicating the atomic checks by having them in the
new helper.

Thanks.
>                         ret = -ECANCELED;
>                         break;
>                 }
> --
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes
  2025-10-16  4:32 ` [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
@ 2025-10-16  7:55   ` Filipe Manana
  0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2025-10-16  7:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 16, 2025 at 5:33 AM Qu Wenruo <wqu@suse.com> wrote:
>
> For raid56, data and parity stripes are handled differently.
>
> For data stripes they are handled just like regular RAID1/RAID10 stripes,
> going through the regular scrub_simple_mirror().
>
> But for parity stripes we have to read out all involved data stripes and
> do any needed verification and repair, then scrub the parity stripe.
>
> This process will take a much longer time than a regular stripe, but
> unlike scrub_simple_mirror(), we do not check if we should cancel/pause
> or the block group is already removed.
>
> Aligned the behavior of scrub_raid56_parity_stripe() to
> scrub_simple_mirror(), by adding:
>
> - Cancel check
> - Pause check
> - Removed block group check
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/scrub.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index fe266785804e..facbaf3cc231 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2091,6 +2091,24 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>
>         ASSERT(sctx->raid56_data_stripes);
>
> +       /* Canceled? */

This comment is useless, it's obvious from the checks belows we are
checking if we're canceled, as the atomics have "cancel" in their
name.

> +       if (atomic_read(&fs_info->scrub_cancel_req) ||
> +           atomic_read(&sctx->cancel_req))
> +               return -ECANCELED;
> +
> +       /* Paused? */

Same here, the atomic has "pause" in its name.

> +       if (atomic_read(&fs_info->scrub_pause_req))
> +               /* Push queued extents */

Please always end comments with punctuation.
Also add { }.

> +               scrub_blocked_if_needed(fs_info);
> +
> +       /* Block group removed? */

Same here, we 're testing for a flag with "removed" in its name in the
block group's flags.

With those changes:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +       spin_lock(&bg->lock);
> +       if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) {
> +               spin_unlock(&bg->lock);
> +               return 0;
> +       }
> +       spin_unlock(&bg->lock);
> +
>         /*
>          * For data stripe search, we cannot reuse the same extent/csum paths,
>          * as the data stripe bytenr may be smaller than previous extent.  Thus
> --
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] btrfs: scrub: cancel the run if there is a pending signal
  2025-10-16  4:32 ` [PATCH 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
@ 2025-10-16  7:58   ` Filipe Manana
  0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2025-10-16  7:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 16, 2025 at 5:33 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Unlike relocation, scrub never checks pending signals, and even for
> relocation is only explicitly checking for fatal signal (SIGKILL), not
> for regular ones.
>
> Thankfully relocation can still be interrupted by regular signals by
> the usage of wait_on_bit(), which is interruptible by regular signals.
>
> So add the proper signal pending checks for scrub, by checking all
> pending signals, and if we hit one signal cancel the scrub run since the
> signal handling can only be done in the user space.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/scrub.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 728d4e666054..f3dfb7023499 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2080,6 +2080,13 @@ static bool should_cancel_scrub(struct btrfs_fs_info *fs_info)
>         if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
>             freezing(current))
>                 return true;
> +       /*
> +        * Signal handling is also done in user space, so we have to return
> +        * to user space by canceling the run when there is a pending signal.
> +        * (including non-fatal ones like SIGINT)
> +        */
> +       if (signal_pending(current))
> +               return true;
>         return false;
>  }
>
> --
> 2.51.0
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16  7:51   ` Filipe Manana
@ 2025-10-16  8:00     ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16  8:00 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



在 2025/10/16 18:21, Filipe Manana 写道:
> On Thu, Oct 16, 2025 at 5:33 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> It's a known bug that btrfs scrub/dev-replace can prevent the system
>> from suspending.
>>
>> There are at least two factors involved:
>>
>> - Holding super_block::s_writers for the whole scrub/dev-replace
>>    duration
>>    We hold that mutex through mnt_want_write_file() for the whole
> 
> It's not a mutex, it's a percpu rw semaphore.
> 
>>    scrub/dev-replace duration.
>>
>>    That will prevent the fs being frozen.
>>    It's tunable for the kernel to suspend all fses before suspending, if
>>    that's the case, a running scrub will refuse to be frozen and prevent
>>    suspension.
>>
>> - Stuck in kernel space for a long time
>>    During suspension all user processes (and some kernel threads) will
>>    be frozen.
>>    But if a user space progress has fallen into kernel (scrub ioctl) and
>>    do not return for a long time, it will make suspension time out.
>>
>>    Unfortunately scrub/dev-replace is a long running ioctl, and it will
>>    prevent the btrfs process from returning to user space.
>>
>> Address them in one go:
>>
>> - Introduce a new helper should_cancel_scrub()
>>    Which checks both fs and process freezing.
>>
>> - Cancel the run if should_cancel_scrub() is true
>>    The check is done at scrub_simple_mirror() and
>>    scrub_raid56_parity_stripe().
>>
>>    Unfortunately canceling is the only feasible solution here, pausing is
>>    not possible as we will still stay in the kernel state thus will still
>>    prevent the process from being frozen.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/scrub.c | 20 ++++++++++++++++++--
>>   1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
>> index facbaf3cc231..728d4e666054 100644
>> --- a/fs/btrfs/scrub.c
>> +++ b/fs/btrfs/scrub.c
>> @@ -2069,6 +2069,20 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
>>          return 0;
>>   }
>>
>> +static bool should_cancel_scrub(struct btrfs_fs_info *fs_info)
>> +{
>> +       /*
>> +        * For fs and process freezing case, it can be preparation
>> +        * for a incoming pm suspension.
>> +        * In that case we have to return to the user space, thus
>> +        * canceling is the only feasible solution.
>> +        */
>> +       if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
>> +           freezing(current))
> 
> Why the check for sb->s_writers.frozen > SB_UNFROZEN?
> I don't see why freezing() is not enough and would prefer to not
> access such low level details of the vfs directly.

It's possible to configure suspension to freeze fses first, before 
freezing processes.
Although not yet the default behavior, it's going to be the new default.

So we have to check both fs and process freezing.

> 
>> +               return true;
>> +       return false;
>> +}
>> +
>>   static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>>                                        struct btrfs_device *scrub_dev,
>>                                        struct btrfs_block_group *bg,
>> @@ -2093,7 +2107,8 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
>>
>>          /* Canceled? */
>>          if (atomic_read(&fs_info->scrub_cancel_req) ||
>> -           atomic_read(&sctx->cancel_req))
>> +           atomic_read(&sctx->cancel_req) ||
>> +           should_cancel_scrub(fs_info))
> 
> If we now have a should_cancel_scrub(), the checks for the cancel
> atomics should be moved there, otherwise it's confusing why some
> cancel checks are in the helper and others are not.

That sounds good.

Thanks,
Qu
> 
>>                  return -ECANCELED;
>>
>>          /* Paused? */
>> @@ -2281,7 +2296,8 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
>>
>>                  /* Canceled? */
>>                  if (atomic_read(&fs_info->scrub_cancel_req) ||
>> -                   atomic_read(&sctx->cancel_req)) {
>> +                   atomic_read(&sctx->cancel_req) ||
>> +                   should_cancel_scrub(fs_info)) {
> 
> And it would avoid duplicating the atomic checks by having them in the
> new helper.
> 
> Thanks.
>>                          ret = -ECANCELED;
>>                          break;
>>                  }
>> --
>> 2.51.0
>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16  4:32 ` [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
  2025-10-16  7:51   ` Filipe Manana
@ 2025-10-16 16:42   ` David Sterba
  2025-10-16 20:35     ` Qu Wenruo
  1 sibling, 1 reply; 13+ messages in thread
From: David Sterba @ 2025-10-16 16:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Oct 16, 2025 at 03:02:30PM +1030, Qu Wenruo wrote:
> It's a known bug that btrfs scrub/dev-replace can prevent the system
> from suspending.
> 
> There are at least two factors involved:
> 
> - Holding super_block::s_writers for the whole scrub/dev-replace
>   duration
>   We hold that mutex through mnt_want_write_file() for the whole
>   scrub/dev-replace duration.
> 
>   That will prevent the fs being frozen.
>   It's tunable for the kernel to suspend all fses before suspending, if
>   that's the case, a running scrub will refuse to be frozen and prevent
>   suspension.
> 
> - Stuck in kernel space for a long time
>   During suspension all user processes (and some kernel threads) will
>   be frozen.
>   But if a user space progress has fallen into kernel (scrub ioctl) and
>   do not return for a long time, it will make suspension time out.
> 
>   Unfortunately scrub/dev-replace is a long running ioctl, and it will
>   prevent the btrfs process from returning to user space.
> 
> Address them in one go:
> 
> - Introduce a new helper should_cancel_scrub()
>   Which checks both fs and process freezing.
> 
> - Cancel the run if should_cancel_scrub() is true
>   The check is done at scrub_simple_mirror() and
>   scrub_raid56_parity_stripe().
> 
>   Unfortunately canceling is the only feasible solution here, pausing is
>   not possible as we will still stay in the kernel state thus will still
>   prevent the process from being frozen.

I don't recall the details but the solution I was working on allowed
waiting in kernel and not cancelling the whole scrub, which I think is
the wrong solution and bad usability. That way scrub may never finish
going over the whole filesystem.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16 16:42   ` David Sterba
@ 2025-10-16 20:35     ` Qu Wenruo
  2025-10-17  7:54       ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-16 20:35 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2025/10/17 03:12, David Sterba 写道:
> On Thu, Oct 16, 2025 at 03:02:30PM +1030, Qu Wenruo wrote:
>> It's a known bug that btrfs scrub/dev-replace can prevent the system
>> from suspending.
>>
>> There are at least two factors involved:
>>
>> - Holding super_block::s_writers for the whole scrub/dev-replace
>>    duration
>>    We hold that mutex through mnt_want_write_file() for the whole
>>    scrub/dev-replace duration.
>>
>>    That will prevent the fs being frozen.
>>    It's tunable for the kernel to suspend all fses before suspending, if
>>    that's the case, a running scrub will refuse to be frozen and prevent
>>    suspension.
>>
>> - Stuck in kernel space for a long time
>>    During suspension all user processes (and some kernel threads) will
>>    be frozen.
>>    But if a user space progress has fallen into kernel (scrub ioctl) and
>>    do not return for a long time, it will make suspension time out.
>>
>>    Unfortunately scrub/dev-replace is a long running ioctl, and it will
>>    prevent the btrfs process from returning to user space.
>>
>> Address them in one go:
>>
>> - Introduce a new helper should_cancel_scrub()
>>    Which checks both fs and process freezing.
>>
>> - Cancel the run if should_cancel_scrub() is true
>>    The check is done at scrub_simple_mirror() and
>>    scrub_raid56_parity_stripe().
>>
>>    Unfortunately canceling is the only feasible solution here, pausing is
>>    not possible as we will still stay in the kernel state thus will still
>>    prevent the process from being frozen.
> 
> I don't recall the details but the solution I was working on allowed
> waiting in kernel and not cancelling the whole scrub,

That will only allow the fs to be frozen, but not pm suspension/hibernation.

As I explained, pm requires all user space processes to return to user 
space.

That's why your RFC patch doesn't pass Askar Safin's test at all, no 
matter if the pm is configured to freeze the fs or not.

> which I think is
> the wrong solution and bad usability. That way scrub may never finish
> going over the whole filesystem.
> 

That can be solved in user space, with minor changes to the return value 
of the patches.

E.g. only return -ECANCELED for real cancelled runs, and return -EINTR 
for interrupted runs, and let btrfs-progs to catch the -EINTR cases and 
restart from the last scrubbed bytes.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-16 20:35     ` Qu Wenruo
@ 2025-10-17  7:54       ` Qu Wenruo
  2025-10-17 17:00         ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-10-17  7:54 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2025/10/17 07:05, Qu Wenruo 写道:
> 
> 
> 在 2025/10/17 03:12, David Sterba 写道:
>> On Thu, Oct 16, 2025 at 03:02:30PM +1030, Qu Wenruo wrote:
>>> It's a known bug that btrfs scrub/dev-replace can prevent the system
>>> from suspending.
>>>
>>> There are at least two factors involved:
>>>
>>> - Holding super_block::s_writers for the whole scrub/dev-replace
>>>    duration
>>>    We hold that mutex through mnt_want_write_file() for the whole
>>>    scrub/dev-replace duration.
>>>
>>>    That will prevent the fs being frozen.
>>>    It's tunable for the kernel to suspend all fses before suspending, if
>>>    that's the case, a running scrub will refuse to be frozen and prevent
>>>    suspension.
>>>
>>> - Stuck in kernel space for a long time
>>>    During suspension all user processes (and some kernel threads) will
>>>    be frozen.
>>>    But if a user space progress has fallen into kernel (scrub ioctl) and
>>>    do not return for a long time, it will make suspension time out.
>>>
>>>    Unfortunately scrub/dev-replace is a long running ioctl, and it will
>>>    prevent the btrfs process from returning to user space.
>>>
>>> Address them in one go:
>>>
>>> - Introduce a new helper should_cancel_scrub()
>>>    Which checks both fs and process freezing.
>>>
>>> - Cancel the run if should_cancel_scrub() is true
>>>    The check is done at scrub_simple_mirror() and
>>>    scrub_raid56_parity_stripe().
>>>
>>>    Unfortunately canceling is the only feasible solution here, 
>>> pausing is
>>>    not possible as we will still stay in the kernel state thus will 
>>> still
>>>    prevent the process from being frozen.
>>
>> I don't recall the details but the solution I was working on allowed
>> waiting in kernel and not cancelling the whole scrub,
> 
> That will only allow the fs to be frozen, but not pm suspension/ 
> hibernation.
> 
> As I explained, pm requires all user space processes to return to user 
> space.
> 
> That's why your RFC patch doesn't pass Askar Safin's test at all, no 
> matter if the pm is configured to freeze the fs or not.

If you really want to a resumable scrub/dev-replace between pm 
operations, it's possible but not in the current scrub/replace ioctl.

You can do it in a kthread, and freeze the kthread during pm operations, 
and resume from whatever it was.

The problem is, this requires a new ioctl, and we should also take the 
time reconsider how the new scrub ioctl should be.

E.g. there are already some existing problems, like scrub cancel is 
global, that you can not cancel scrub of a single device without 
affecting others.

And of course there are some other minor things like the timing of 
holding s_writers, but that's way small details compared to running 
scrub/replace in a kthread.

If you really want to go that path, I'm totally fine and happy with 
that, but that will be a way larger user interface changes.


For now, my series addresses the more user impacting problem (long time 
out and all other processes frozen, just acting like the system is 
hanging), and I'll add extra notes to scrub/dev-replace about the 
behavior change.

Thanks,
Qu
> 
>> which I think is
>> the wrong solution and bad usability. That way scrub may never finish
>> going over the whole filesystem.
>>
> 
> That can be solved in user space, with minor changes to the return value 
> of the patches.
> 
> E.g. only return -ECANCELED for real cancelled runs, and return -EINTR 
> for interrupted runs, and let btrfs-progs to catch the -EINTR cases and 
> restart from the last scrubbed bytes.
> 
> Thanks,
> Qu
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-17  7:54       ` Qu Wenruo
@ 2025-10-17 17:00         ` David Sterba
  2025-10-17 20:49           ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2025-10-17 17:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Oct 17, 2025 at 06:24:33PM +1030, Qu Wenruo wrote:
> 在 2025/10/17 07:05, Qu Wenruo 写道:
> > 在 2025/10/17 03:12, David Sterba 写道:
> >> On Thu, Oct 16, 2025 at 03:02:30PM +1030, Qu Wenruo wrote:
> >>> It's a known bug that btrfs scrub/dev-replace can prevent the system
> >>> from suspending.
> >>>
> >>> There are at least two factors involved:
> >>>
> >>> - Holding super_block::s_writers for the whole scrub/dev-replace
> >>>    duration
> >>>    We hold that mutex through mnt_want_write_file() for the whole
> >>>    scrub/dev-replace duration.
> >>>
> >>>    That will prevent the fs being frozen.
> >>>    It's tunable for the kernel to suspend all fses before suspending, if
> >>>    that's the case, a running scrub will refuse to be frozen and prevent
> >>>    suspension.
> >>>
> >>> - Stuck in kernel space for a long time
> >>>    During suspension all user processes (and some kernel threads) will
> >>>    be frozen.
> >>>    But if a user space progress has fallen into kernel (scrub ioctl) and
> >>>    do not return for a long time, it will make suspension time out.
> >>>
> >>>    Unfortunately scrub/dev-replace is a long running ioctl, and it will
> >>>    prevent the btrfs process from returning to user space.
> >>>
> >>> Address them in one go:
> >>>
> >>> - Introduce a new helper should_cancel_scrub()
> >>>    Which checks both fs and process freezing.
> >>>
> >>> - Cancel the run if should_cancel_scrub() is true
> >>>    The check is done at scrub_simple_mirror() and
> >>>    scrub_raid56_parity_stripe().
> >>>
> >>>    Unfortunately canceling is the only feasible solution here, 
> >>> pausing is
> >>>    not possible as we will still stay in the kernel state thus will 
> >>> still
> >>>    prevent the process from being frozen.
> >>
> >> I don't recall the details but the solution I was working on allowed
> >> waiting in kernel and not cancelling the whole scrub,
> > 
> > That will only allow the fs to be frozen, but not pm suspension/ 
> > hibernation.
> > 
> > As I explained, pm requires all user space processes to return to user 
> > space.
> > 
> > That's why your RFC patch doesn't pass Askar Safin's test at all, no 
> > matter if the pm is configured to freeze the fs or not.
> 
> If you really want to a resumable scrub/dev-replace between pm 
> operations, it's possible but not in the current scrub/replace ioctl.

Cancelling scrub is not that bad and it also keeps the last status
(assuming we can store it during suspend phase). I'd be more worried
about replace, I'm not sure if it's restartable from the last recorded
point or not.

> You can do it in a kthread, and freeze the kthread during pm operations, 
> and resume from whatever it was.
> 
> The problem is, this requires a new ioctl, and we should also take the 
> time reconsider how the new scrub ioctl should be.

The scrub and replace ioctls are extensible, we don't need a completely
new one. We can add new flag or command if we'd want to start a kthread.

> E.g. there are already some existing problems, like scrub cancel is 
> global, that you can not cancel scrub of a single device without 
> affecting others.

The scrub cancel ioctl is no extensible so in this case we'd need a new
one to cancel just a given device.

> And of course there are some other minor things like the timing of 
> holding s_writers, but that's way small details compared to running 
> scrub/replace in a kthread.
> 
> If you really want to go that path, I'm totally fine and happy with 
> that, but that will be a way larger user interface changes.
> 
> 
> For now, my series addresses the more user impacting problem (long time 
> out and all other processes frozen, just acting like the system is 
> hanging), and I'll add extra notes to scrub/dev-replace about the 
> behavior change.

On one side I agree that allowing the suspend continue at the cost of
cancelled scrub is an improvement. And we can eventually improve that.
The concerns about cancelling the operations are usability and
documenting the behaviour won't be enough, so at lest a message to log
would be good so there's a trace. I don't know if we can catch the
reason, ideally when it would be caused by suspend/freezing then print
something informative and suggest restarting it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
  2025-10-17 17:00         ` David Sterba
@ 2025-10-17 20:49           ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-10-17 20:49 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs



在 2025/10/18 03:30, David Sterba 写道:
> On Fri, Oct 17, 2025 at 06:24:33PM +1030, Qu Wenruo wrote:
>> 在 2025/10/17 07:05, Qu Wenruo 写道:
[...]
>>>
>>> As I explained, pm requires all user space processes to return to user
>>> space.
>>>
>>> That's why your RFC patch doesn't pass Askar Safin's test at all, no
>>> matter if the pm is configured to freeze the fs or not.
>>
>> If you really want to a resumable scrub/dev-replace between pm
>> operations, it's possible but not in the current scrub/replace ioctl.
> 
> Cancelling scrub is not that bad and it also keeps the last status
> (assuming we can store it during suspend phase). I'd be more worried
> about replace, I'm not sure if it's restartable from the last recorded
> point or not.

Dev-replace is unfortunately non-restartable.

After interruption, the new device will be removed from the fs, thus 
there is no way that it can be resumed.

On the other hand, if user really want to finish the dev-replace, we may 
want to add some optional support to use things like systemd-inhibit.

(I don't want to include the full systemd dependency, thus at most some 
system() calls to systemd-inhibit directly)

But for systemd services files provided from btrfs maintenance package, 
you may want to add different versions so that end users can determine 
if they want to priority pm or scrub/replace.
>>
>> For now, my series addresses the more user impacting problem (long time
>> out and all other processes frozen, just acting like the system is
>> hanging), and I'll add extra notes to scrub/dev-replace about the
>> behavior change.
> 
> On one side I agree that allowing the suspend continue at the cost of
> cancelled scrub is an improvement. And we can eventually improve that.
> The concerns about cancelling the operations are usability and
> documenting the behaviour won't be enough, so at lest a message to log
> would be good so there's a trace.

There is already one, every ended scrub (no matter finished properly or 
interrupted) will show a message line indicating the reason.

> I don't know if we can catch the
> reason,

I can improve the return value so that when pm interrupted the 
scrub/replace, we return -EAGAIN/-EINTR other than -ECANCELED.

However I'm not sure how to distinguish pm interruption and signals.

Should they all return -EINTR? Or should the pm ones return differently?


Either way, I'll also try to add some optional inhibit options to 
btrfs-progs, so at least users can determine what policy they want.

Thanks,
Qu

> ideally when it would be caused by suspend/freezing then print
> something informative and suggest restarting it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-17 20:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16  4:32 [PATCH 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
2025-10-16  4:32 ` [PATCH 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
2025-10-16  7:55   ` Filipe Manana
2025-10-16  4:32 ` [PATCH 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
2025-10-16  7:51   ` Filipe Manana
2025-10-16  8:00     ` Qu Wenruo
2025-10-16 16:42   ` David Sterba
2025-10-16 20:35     ` Qu Wenruo
2025-10-17  7:54       ` Qu Wenruo
2025-10-17 17:00         ` David Sterba
2025-10-17 20:49           ` Qu Wenruo
2025-10-16  4:32 ` [PATCH 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
2025-10-16  7:58   ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox