* [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling
@ 2025-10-19 0:45 Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-10-19 0:45 UTC (permalink / raw)
To: linux-btrfs; +Cc: safinaskar
[CHANGELOG]
v3:
- Update the commit message and cover letter to indicate a behavior
change
Btrfs-progs updates will follow soon after merge into for-next.
This will includes docs update, and maybe an automatic resume behavior
(with new options to toggle).
- Update the commit message of the last patch to explain why v2 cgroup
freezing is depending on signals
Unlike legacy freezer of v1 cgroup, v2 cgroup freezer is fully based
on signal handling, thus freezing through v2 cgroup looks exactly like
a pending non-fatal signal. And freezing() will not return true in
this case.
- Add the reviewed-by tag from Filipe
v2:
- Remove copy-pasted comments that are too obvious in the first place
Also remove a stale comment in the old code.
- Add extra explanation on why both fs and process freezing need to be
checked
Mostly due to the configurable behavior of pm suspension/hiberation,
thus we have to handle both cases.
It's a long known bug that when scrub/dev-replace is running, power
management suspension will time out and fail.
After more debugging and helps from Askar Safin, it turns out there are
at least 3 components involved:
- Process freezing
This is at the preparation for suspension, which requires all user
space processes (and some kthreads) to be frozen, which requires the
process return to user space.
Thus if the process (normally btrfs command) is falling into a long
running ioctl (like scrub/dev-replace) it will not be frozen thus
breaking the pm suspension.
This mean paused scrub is not feasible, as paused scrub will still
make the ioctl executing process trapped inside kernel space.
- Filesystem freezing
It's an optional behavior during pm suspension, previously I submitted
one patch detecting such situation, and so far it works as expected.
But this fs freezing is only optional, not yet default behavior of pm
suspension.
- Systemd slice freezing
Systemd slice freezing utilize cgroup freezer, and the freezing()
checks will not return true until the whole cgroup is marked frozen.
But before that a wakeup signal is sent to the user process, and
during the kernel signal handling that process will be frozen.
So until the process returned to user space, it will not be marked
frozen.
Thus we have to do regular pending signal checks to prevent cgroup
freezing time out.
To address all those problems, the series will:
- Add extra cancel/pause/removed bg checks for raid56 parity stripes
Mostly to reduce delay for RAID56 cases, and make the behavior more
consistent.
- Cancel the scrub if the fs or process is being frozen
Please note that here we have to check both fs and process freezing,
please refer to the changelog and comment of the second patch for the
reason.
- Cancel the scrub if there is a pending signal
This allows regular signal to interrupt scrub/dev-replace, without the
extra signal handling hack in btrfs-progs (but that existing handling
won't hurt either).
This also address the time out during systemd slice freezing. Unlike
pm freezing, v2 cgroup freezing is fully based on signal handling thus
freezing() function will not return true during v2 cgroup freeing.
So when v2 cgroup is freezing the processing running scrub/replace, we
will properly detect a pending signal and abort the scrub/replace so
that freezing can be done when the process returned to the user space.
[BEHAVIOR CHANGE]
Unfortunately this will bring a behavior change, and will mostly
affecting dev-replace:
Dev-replace will be interrupted by pm, and can not be resumed but only
starts from the beginning.
The same dev-replace now will fail due to pm, breaking the old
expectation. End users will need extra steps to prevent
suspension/hibernation instead.
And btrfs-progs also needs to be updated to handle the new -EINTR
error (maybe restart the operation if possible).
Qu Wenruo (3):
btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity
stripes
btrfs: scrub: cancel the run if the process or fs is being frozen
btrfs: scrub: cancel the run if there is a pending signal
fs/btrfs/scrub.c | 70 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 60 insertions(+), 10 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes
2025-10-19 0:45 [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
@ 2025-10-19 0:45 ` Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-10-19 0:45 UTC (permalink / raw)
To: linux-btrfs; +Cc: safinaskar, Filipe Manana
For raid56, data and parity stripes are handled differently.
For data stripes they are handled just like regular RAID1/RAID10 stripes,
going through the regular scrub_simple_mirror().
But for parity stripes we have to read out all involved data stripes and
do any needed verification and repair, then scrub the parity stripe.
This process will take a much longer time than a regular stripe, but
unlike scrub_simple_mirror(), we do not check if we should cancel/pause
or the block group is already removed.
Aligned the behavior of scrub_raid56_parity_stripe() to
scrub_simple_mirror(), by adding:
- Cancel check
- Pause check
- Removed block group check
Since those checks are the same from the scrub_simple_mirror(), also
update the comments of scrub_simple_mirror() by:
- Remove too obvious comments
We do not need extra comments on what we're checking, it's really too
obvious.
- Remove a stale comment about pausing
Now the scrub is always queuing all involved stripes, and submit them
in one go, there is no more submission part during pausing.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/scrub.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index fe266785804e..c3e7e543d350 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2091,6 +2091,20 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
ASSERT(sctx->raid56_data_stripes);
+ if (atomic_read(&fs_info->scrub_cancel_req) ||
+ atomic_read(&sctx->cancel_req))
+ return -ECANCELED;
+
+ if (atomic_read(&fs_info->scrub_pause_req))
+ scrub_blocked_if_needed(fs_info);
+
+ spin_lock(&bg->lock);
+ if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) {
+ spin_unlock(&bg->lock);
+ return 0;
+ }
+ spin_unlock(&bg->lock);
+
/*
* For data stripe search, we cannot reuse the same extent/csum paths,
* as the data stripe bytenr may be smaller than previous extent. Thus
@@ -2261,18 +2275,15 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
u64 found_logical = U64_MAX;
u64 cur_physical = physical + cur_logical - logical_start;
- /* Canceled? */
if (atomic_read(&fs_info->scrub_cancel_req) ||
atomic_read(&sctx->cancel_req)) {
ret = -ECANCELED;
break;
}
- /* Paused? */
- if (atomic_read(&fs_info->scrub_pause_req)) {
- /* Push queued extents */
+
+ if (atomic_read(&fs_info->scrub_pause_req))
scrub_blocked_if_needed(fs_info);
- }
- /* Block group removed? */
+
spin_lock(&bg->lock);
if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) {
spin_unlock(&bg->lock);
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen
2025-10-19 0:45 [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
@ 2025-10-19 0:45 ` Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-10-19 0:45 UTC (permalink / raw)
To: linux-btrfs; +Cc: safinaskar, Filipe Manana, Chris Murphy
It's a known bug that btrfs scrub/dev-replace can prevent the system
from suspending.
There are at least two factors involved:
- Holding super_block::s_writers for the whole scrub/dev-replace
duration
We hold that percpu rw semaphore through mnt_want_write_file() for the
whole scrub/dev-replace duration.
That will prevent the fs being frozen, which can be initiated by
either the user (e.g. fsfreeze) or power management suspension/hiberation.
- Stuck in the kernel space for a long time
During suspension all user processes (and some kernel threads) will
be frozen.
But if a user space progress has fallen into kernel (scrub ioctl) and
do not return for a long time, it will make process freezing time out.
Unfortunately scrub/dev-replace is a long running ioctl, and it will
prevent the btrfs process from returning to the user space, thus make pm
suspension/hiberation time out.
Address them in one go:
- Introduce a new helper should_cancel_scrub()
Which includes the existing cancel request and new fs/process freezing
checks.
Here we have to check both fs and process freezing for pm
suspension/hiberation.
Pm can be configured to freeze fses before processes.
(The current default is not to freeze fses, but planned to freeze the
fses as the new default)
Checking only fs freezing will fail pm without fs freezing, as the
process freezing will time out.
Checking only process freezing will fail pm with fs freezing since the
fs freezing happens before process freezing.
And the return value will indicate the reason, -ECANCLED for the
explicitly canceled runs, and -EINTR for fs freeze or pm reasons.
- Cancel the run if should_cancel_scrub() is true
Unfortunately canceling is the only feasible solution here, pausing is
not possible as we will still stay in the kernel space thus will still
prevent the process from being frozen.
This will cause a user impacting behavior change:
Dev-replace can be interrupted by pm, and there is no way to resume
but start from the beginning again.
This means dev-replace may fail on newer kernels, and end users will
need extra steps like using systemd-inhibit to prevent
suspension/hibernation, to get back the old uninterrupted behavior.
This behavior change will need extra documentation updates and
communication with projects involving scrub/dev-replace including
btrfs-progs.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Link: https://lore.kernel.org/linux-btrfs/d93b2a2d-6ad9-4c49-809f-11d769a6f30a@app.fastmail.com/
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/scrub.c | 53 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 46 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index c3e7e543d350..bbd5793c2a16 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2069,6 +2069,47 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
return 0;
}
+/*
+ * Return 0 if we should not cancel the scrub.
+ * Return <0 if we need to cancel the scrub, returned value will
+ * indicate the reason:
+ * - -ECANCELED
+ * Being explicitly canceled through ioctl.
+ * - -EINTR
+ * Being interrupted by fs/process freezing.
+ */
+static int should_cancel_scrub(const struct scrub_ctx *sctx)
+{
+ struct btrfs_fs_info *fs_info = sctx->fs_info;
+
+ if (atomic_read(&fs_info->scrub_cancel_req) ||
+ atomic_read(&sctx->cancel_req))
+ return -ECANCELED;
+
+ /*
+ * The user (e.g. fsfreeze command) or power management (pm)
+ * suspension/hibernation can freeze the fs.
+ * And pm suspension/hibernation will also freeze all user processes.
+ *
+ * A user process can only be frozen when it is in the user space, thus
+ * we have to cancel the run so that the process can return to the user
+ * space.
+ *
+ * Furthermore we have to check both fs and process freezing, as pm can
+ * be configured to freeze the fses before processes.
+ *
+ * If we only check fs freezing, then suspension without fs freezing
+ * will timeout, as the process is still in the kernel space.
+ *
+ * If we only check process freezing, then suspension with fs freezing
+ * will timeout, as the running scrub will prevent the fs from being frozen.
+ */
+ if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
+ freezing(current))
+ return -EINTR;
+ return 0;
+}
+
static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
struct btrfs_device *scrub_dev,
struct btrfs_block_group *bg,
@@ -2091,9 +2132,9 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
ASSERT(sctx->raid56_data_stripes);
- if (atomic_read(&fs_info->scrub_cancel_req) ||
- atomic_read(&sctx->cancel_req))
- return -ECANCELED;
+ ret = should_cancel_scrub(sctx);
+ if (ret < 0)
+ return ret;
if (atomic_read(&fs_info->scrub_pause_req))
scrub_blocked_if_needed(fs_info);
@@ -2275,11 +2316,9 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
u64 found_logical = U64_MAX;
u64 cur_physical = physical + cur_logical - logical_start;
- if (atomic_read(&fs_info->scrub_cancel_req) ||
- atomic_read(&sctx->cancel_req)) {
- ret = -ECANCELED;
+ ret = should_cancel_scrub(sctx);
+ if (ret < 0)
break;
- }
if (atomic_read(&fs_info->scrub_pause_req))
scrub_blocked_if_needed(fs_info);
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/3] btrfs: scrub: cancel the run if there is a pending signal
2025-10-19 0:45 [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
@ 2025-10-19 0:45 ` Qu Wenruo
2025-10-19 2:43 ` [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
2025-12-04 20:06 ` Askar Safin
4 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2025-10-19 0:45 UTC (permalink / raw)
To: linux-btrfs; +Cc: safinaskar, Filipe Manana
Unlike relocation, scrub never checks pending signals, and even for
relocation is only explicitly checking for fatal signal (SIGKILL), not
for regular ones.
Thankfully relocation can still be interrupted by regular signals by
the usage of wait_on_bit(), which is called with TASK_INTERRUPTIBLE.
Do the same for scrub/dev-replace, so that regular signals can also
cancel the scrub/replace run, and more importantly handle v2 cgroup
freezing which is based on signal handling code inside the kernel, and
freezing() function will not return true for v2 cgroup freezing.
This will address the problem that systemd slice freezing will timeout
on long running scrub/dev-replace.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/scrub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index bbd5793c2a16..05241324edd3 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2076,7 +2076,7 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
* - -ECANCELED
* Being explicitly canceled through ioctl.
* - -EINTR
- * Being interrupted by fs/process freezing.
+ * Being interrupted by signal or fs/process freezing.
*/
static int should_cancel_scrub(const struct scrub_ctx *sctx)
{
@@ -2105,7 +2105,7 @@ static int should_cancel_scrub(const struct scrub_ctx *sctx)
* will timeout, as the running scrub will prevent the fs from being frozen.
*/
if (fs_info->sb->s_writers.frozen > SB_UNFROZEN ||
- freezing(current))
+ freezing(current) || signal_pending(current))
return -EINTR;
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling
2025-10-19 0:45 [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
` (2 preceding siblings ...)
2025-10-19 0:45 ` [PATCH v3 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
@ 2025-10-19 2:43 ` Askar Safin
2025-12-04 20:06 ` Askar Safin
4 siblings, 0 replies; 6+ messages in thread
From: Askar Safin @ 2025-10-19 2:43 UTC (permalink / raw)
To: wqu; +Cc: linux-btrfs, safinaskar
I like this version. Especially EINTR/ECANCELLED distinction. Thank you!
--
Askar Safin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling
2025-10-19 0:45 [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
` (3 preceding siblings ...)
2025-10-19 2:43 ` [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
@ 2025-12-04 20:06 ` Askar Safin
4 siblings, 0 replies; 6+ messages in thread
From: Askar Safin @ 2025-12-04 20:06 UTC (permalink / raw)
To: wqu; +Cc: linux-btrfs
Qu Wenruo <wqu@suse.com>:
> It's a long known bug that when scrub/dev-replace is running, power
> management suspension will time out and fail.
Please, also fix btrfs trim.
I just tested hibernation with trim (with current mainline 559e608c4655), and I see that hibernation works
if /sys/power/freeze_filesystems is 0 (current default), but doesn't work if
/sys/power/freeze_filesystems is 1 (future default).
So, I think we need similar thing for trim (i. e. check for superblock
freeze and "signal_pending").
--
Askar Safin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-12-04 20:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-19 0:45 [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 1/3] btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 2/3] btrfs: scrub: cancel the run if the process or fs is being frozen Qu Wenruo
2025-10-19 0:45 ` [PATCH v3 3/3] btrfs: scrub: cancel the run if there is a pending signal Qu Wenruo
2025-10-19 2:43 ` [PATCH v3 0/3] btrfs: scrub: enhance freezing and signal handling Askar Safin
2025-12-04 20:06 ` Askar Safin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.