Linux XFS filesystem development
 help / color / mirror / Atom feed
* [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3
@ 2026-06-25 22:54 Darrick J. Wong
  2026-06-25 22:54 ` [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte Darrick J. Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:54 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

Hi all,

Here's a third batch of xfs_scrub fixes resulting from Codex reviews.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-codex-fixes3
---
Commits in this patchset:
 * xfs_scrub: stop user file scan if caller already aborte
 * xfs_scrub: don't flatten error numbers in read_verify_schedule_now
 * xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4
 * xfs_scrub: fix spacemap external log device scan dev key
 * xfs_scrub: fix estimate of work items for phase 4
 * xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add
 * xfs_scrub: fix phase 8 debug reporting
 * xfs_scrub: always finish cleanup, even if reporting healthy state fails
---
 scrub/repair.h      |   14 ++++++++++++++
 scrub/inodes.c      |    2 +-
 scrub/phase1.c      |   11 ++++++-----
 scrub/phase4.c      |    4 ++++
 scrub/phase5.c      |    8 ++++++--
 scrub/phase8.c      |    5 ++---
 scrub/read_verify.c |    2 +-
 scrub/repair.c      |   16 +++++++++++-----
 scrub/spacemap.c    |    9 +++++++--
 9 files changed, 52 insertions(+), 19 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
@ 2026-06-25 22:54 ` Darrick J. Wong
  2026-06-26  4:58   ` Christoph Hellwig
  2026-06-25 22:54 ` [PATCH 2/8] xfs_scrub: don't flatten error numbers in read_verify_schedule_now Darrick J. Wong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:54 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex points out that scrub_scan_user_files keeps calling
scan_user_bulkstat in a loop even if something else aborts the scan.
That's rather pointless, so fix that.

Cc: <linux-xfs@vger.kernel.org> # v6.14.0
Fixes: 279b0d0e8d73f1 ("xfs_scrub: call bulkstat directly if we're only scanning user files")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/inodes.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 7ce5f7ffeb62a1..7dee938c0a45cb 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -879,7 +879,7 @@ scrub_scan_user_files(
 		return -1;
 	}
 
-	while ((ret = scan_user_bulkstat(ctx, &si, &ino)) == 1) {
+	while (!si.aborted && (ret = scan_user_bulkstat(ctx, &si, &ino)) == 1) {
 		/* empty */
 	}
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] xfs_scrub: don't flatten error numbers in read_verify_schedule_now
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
  2026-06-25 22:54 ` [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte Darrick J. Wong
@ 2026-06-25 22:54 ` Darrick J. Wong
  2026-06-26  4:58   ` Christoph Hellwig
  2026-06-25 22:54 ` [PATCH 3/8] xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4 Darrick J. Wong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:54 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex complains about using a bool variable to convey the results of
workqueue_add (which returns int) to rvp->runtime_error (which is
declared int).  As originally written in commit 2000470d5376e4, this
type inconsistency was a benign thinko because code branching decisions
were made based on whether or not workqueue_add returned nonzero but the
error itself was not preserved.  However, commit 5c657f1e5a9329 started
preserving the errors but didn't change the @ret declaration, so that's
where the bug was introduced.

Cc: <linux-xfs@vger.kernel.org> # v5.3.0
Fixes: 5c657f1e5a9329 ("xfs_scrub: fix handling of read-verify pool runtime errors")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/read_verify.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index efdf6d544858b8..311ea1c29008f7 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -471,7 +471,7 @@ read_verify_schedule_now(
 {
 	struct read_verify_pool		*rvp = rs->rvp;
 	struct read_verify		*tmp;
-	bool				ret;
+	int				ret;
 
 	if (!rvp)
 		return 0;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
  2026-06-25 22:54 ` [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte Darrick J. Wong
  2026-06-25 22:54 ` [PATCH 2/8] xfs_scrub: don't flatten error numbers in read_verify_schedule_now Darrick J. Wong
@ 2026-06-25 22:54 ` Darrick J. Wong
  2026-06-26  4:59   ` Christoph Hellwig
  2026-06-25 22:55 ` [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key Darrick J. Wong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:54 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

While reading the code in repair.c while triaging another Codex
complaint, I remembered that phases 2 and 3 react to an un-scannable
metadata object by dumping them on the repair list for phase 4 with the
"NEEDSCHECK" flag set.  However, repair_item() doesn't actually process
those NEEDSCHECK items, which means that we could silently fail to
complete a full scan.

There are two ways that repair_item() gets called -- one is the action
list processing in phase 4; and the other is repair_item_completely,
which is called directly from phases 2, 5, and 7.  Action list items
never have NEEDSCHECK set because repair_item_to_action_item promotes
them into "corruptions" to force a check; and the repair_item_completely
callers only ever pass in actual corruptions.

Therefore, we should amend repair_item() to process NEEDSCHECK action
items so as not to leave a logic bomb for future authors.  We should
also set sri_inconsistent so that any errors that are found during the
NEEDSCHECK scans result in a revalidation after the repair.

However, there's a real bug in repair_item_to_action_item.  If @sri says
there's at least one NEEDSCHECK item but no corruption elsewhere, then
we exit early instead of queuing an action list item.  This apparently
doesn't happen much in the online fsck QA suite becuase phases 2 and 3
aggressively try to clear NEEDSCHECK items and only deferring scans to
phase 4 if they cannot make any forward progress.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 83ffb5b454b172 ("xfs_scrub: start tracking scrub state in scrub_item")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/repair.h |   14 ++++++++++++++
 scrub/repair.c |   16 +++++++++++-----
 2 files changed, 25 insertions(+), 5 deletions(-)


diff --git a/scrub/repair.h b/scrub/repair.h
index ec4aa381a82d36..af3a9d7186242a 100644
--- a/scrub/repair.h
+++ b/scrub/repair.h
@@ -96,6 +96,20 @@ repair_item_count_needsrepair(
 	return nr;
 }
 
+static inline unsigned int
+repair_item_count_needswork(
+	const struct scrub_item	*sri)
+{
+	unsigned int		scrub_type;
+	unsigned int		nr = 0;
+
+	foreach_scrub_type(scrub_type)
+		if (sri->sri_state[scrub_type] & (SCRUB_ITEM_REPAIR_ANY |
+						  SCRUB_ITEM_NEEDSCHECK))
+			nr++;
+	return nr;
+}
+
 static inline int
 repair_item_completely(
 	struct scrub_ctx	*ctx,
diff --git a/scrub/repair.c b/scrub/repair.c
index fdefc2131aa453..c118938fe21303 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -260,10 +260,11 @@ repair_item_dependencies_ok(
 		if (!(dep_mask & 1))
 			continue;
 		/*
-		 * If this lower level object also needs repair, we can't fix
-		 * the higher level item.
+		 * If this lower level object also needs repair or hasn't been
+		 * scanned yet, we can't fix the higher level item.
 		 */
-		if (sri->sri_state[b] & SCRUB_ITEM_NEEDSREPAIR)
+		if (sri->sri_state[b] & (SCRUB_ITEM_NEEDSREPAIR |
+					 SCRUB_ITEM_NEEDSCHECK))
 			return false;
 	}
 
@@ -855,7 +856,11 @@ repair_item(
 	if (ret)
 		return ret;
 
-	return repair_item_class(ctx, sri, -1, SCRUB_ITEM_PREEN, flags);
+	ret = repair_item_class(ctx, sri, -1, SCRUB_ITEM_PREEN, flags);
+	if (ret)
+		return ret;
+
+	return repair_item_class(ctx, sri, -1, SCRUB_ITEM_NEEDSCHECK, flags);
 }
 
 /* Create an action item around a scrub item that needs repairs. */
@@ -868,7 +873,7 @@ repair_item_to_action_item(
 	struct action_item	*aitem;
 	unsigned int		scrub_type;
 
-	if (repair_item_count_needsrepair(sri) == 0)
+	if (repair_item_count_needswork(sri) == 0)
 		return 0;
 
 	aitem = malloc(sizeof(struct action_item));
@@ -893,6 +898,7 @@ repair_item_to_action_item(
 		if (state[scrub_type] & SCRUB_ITEM_NEEDSCHECK) {
 			state[scrub_type] &= ~SCRUB_ITEM_NEEDSCHECK;
 			state[scrub_type] |= SCRUB_ITEM_CORRUPT;
+			aitem->sri.sri_inconsistent = true;
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2026-06-25 22:54 ` [PATCH 3/8] xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4 Darrick J. Wong
@ 2026-06-25 22:55 ` Darrick J. Wong
  2026-06-26  5:03   ` Christoph Hellwig
  2026-06-25 22:55 ` [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4 Darrick J. Wong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:55 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex bizarrely complains about the rtstart>0 logic in scan_log_rmaps,
because it gets confused about the difference between internal logs
(which are scanned by scan_ag_rmaps) and external logs, which this
function handles.  I'm ignoring the report, but we shouldn't open-code
the synthetic device keys so let's fix that.

Cc: <linux-xfs@vger.kernel.org> # v6.15.0
Fixes: 37591ef3f4f14c ("xfs_scrub: support internal RT device")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/spacemap.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index 8f595ad94c5991..5f6399821dc45a 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -213,7 +213,7 @@ scan_rt_rmaps(
 	scan_dev_rmaps(ctx, ctx->fsinfo.fs_rtdev, arg);
 }
 
-/* Iterate all the reverse mappings of the log device. */
+/* Iterate all the reverse mappings of the external log device. */
 static void
 scan_log_rmaps(
 	struct workqueue	*wq,
@@ -222,7 +222,12 @@ scan_log_rmaps(
 {
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 
-	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
+	/*
+	 * Internal rt sections (rtstart != 0) means we use synthetic device
+	 * keys for external devices.
+	 */
+	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? XFS_DEV_LOG :
+						      ctx->fsinfo.fs_logdev,
 			arg);
 }
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2026-06-25 22:55 ` [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key Darrick J. Wong
@ 2026-06-25 22:55 ` Darrick J. Wong
  2026-06-26  5:04   ` Christoph Hellwig
  2026-06-25 22:55 ` [PATCH 6/8] xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add Darrick J. Wong
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:55 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex complains that the number of work items computed when estimating
the amount of work for phase 4 doesn't include the FSCOUNTERS and
QUOTACHECK items.  Add them back in.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/phase4.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/scrub/phase4.c b/scrub/phase4.c
index 6bd16c23939eb4..744fd92ab195f4 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -297,6 +297,10 @@ phase4_estimate(
 	need_fixing = action_list_length(ctx->fs_repair_list) +
 		      action_list_length(ctx->file_repair_list);
 
+	/* fscounters and quotacheck */
+	if (need_fixing)
+		need_fixing += 2;
+
 	*items = need_fixing;
 	*nr_threads = scrub_nproc(ctx) + 1;
 	*rshift = 0;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/8] xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2026-06-25 22:55 ` [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4 Darrick J. Wong
@ 2026-06-25 22:55 ` Darrick J. Wong
  2026-06-26  5:04   ` Christoph Hellwig
  2026-06-25 22:56 ` [PATCH 7/8] xfs_scrub: fix phase 8 debug reporting Darrick J. Wong
  2026-06-25 22:56 ` [PATCH 8/8] xfs_scrub: always finish cleanup, even if reporting healthy state fails Darrick J. Wong
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:55 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex points out that queue_metapath_scan and queue_fs_scan fail to free
the allocated items if they are not consumed by workqueue_add().

Cc: <linux-xfs@vger.kernel.org> # v6.9.0
Fixes: 8fb4b471322e65 ("xfs_scrub: use multiple threads to run in-kernel metadata scrubs that scan inodes")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/phase5.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


diff --git a/scrub/phase5.c b/scrub/phase5.c
index 329d208d882ef1..fd1b70332deeee 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -687,8 +687,10 @@ queue_fs_scan(
 	item->abortedp = abortedp;
 
 	ret = -workqueue_add(wq, fs_scan_worker, nr, item);
-	if (ret)
+	if (ret) {
 		str_liberror(ctx, ret, _("queuing fs scan work"));
+		free(item);
+	}
 
 	return ret;
 }
@@ -764,8 +766,10 @@ queue_metapath_scan(
 	item->abortedp = abortedp;
 
 	ret = -workqueue_add(wq, fs_scan_worker, 0, item);
-	if (ret)
+	if (ret) {
 		str_liberror(ctx, ret, _("queuing metapath scan work"));
+		free(item);
+	}
 
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/8] xfs_scrub: fix phase 8 debug reporting
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
                   ` (5 preceding siblings ...)
  2026-06-25 22:55 ` [PATCH 6/8] xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add Darrick J. Wong
@ 2026-06-25 22:56 ` Darrick J. Wong
  2026-06-26  5:05   ` Christoph Hellwig
  2026-06-25 22:56 ` [PATCH 8/8] xfs_scrub: always finish cleanup, even if reporting healthy state fails Darrick J. Wong
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:56 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex observes that the debug printf in fstrim_compute_minlen always
prints a threshold of 0 because we never actually set the block
threshold variable.  Fix that by using the results of the
multiplication that's passed into minlen_for_threshold.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 34bed605490f93 ("xfs_scrub: tune fstrim minlen parameter based on free space histograms")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/phase8.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index 5bfe0ffb4868a2..61f412f7139ae9 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -141,7 +141,6 @@ fstrim_compute_minlen(
 	const struct histogram	*freesp_hist)
 {
 	uint64_t		ret;
-	double			blk_threshold = 0;
 	unsigned int		ag_max_usable;
 
 	/*
@@ -164,9 +163,9 @@ fstrim_compute_minlen(
 			freesp_hist->tot_sum * ctx->fstrim_block_pct);
 
 	if (debug > 1)
-		printf(_("fstrim minlen %lld threshold %lld ag_max_usable %u\n"),
+		printf(_("fstrim minlen %lld threshold %.2f ag_max_usable %u\n"),
 				(unsigned long long)ret,
-				(unsigned long long)blk_threshold,
+				freesp_hist->tot_sum * ctx->fstrim_block_pct,
 				ag_max_usable);
 	if (ret > ag_max_usable)
 		ret = ag_max_usable;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/8] xfs_scrub: always finish cleanup, even if reporting healthy state fails
  2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
                   ` (6 preceding siblings ...)
  2026-06-25 22:56 ` [PATCH 7/8] xfs_scrub: fix phase 8 debug reporting Darrick J. Wong
@ 2026-06-25 22:56 ` Darrick J. Wong
  2026-06-26  5:05   ` Christoph Hellwig
  7 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-25 22:56 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: linux-xfs, hch

From: Darrick J. Wong <djwong@kernel.org>

Codex points out here that we should always release resources used by
the program, even if reporting healthy state to the kernel itself fails.
We're exiting soon anyway so this probably doesn't make much of a
difference.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/phase1.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)


diff --git a/scrub/phase1.c b/scrub/phase1.c
index 620a12393b3658..f1c9247e68a084 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -86,11 +86,9 @@ int
 scrub_cleanup(
 	struct scrub_ctx	*ctx)
 {
-	int			error;
+	int			error, error2;
 
 	error = report_to_kernel(ctx);
-	if (error)
-		return error;
 
 	action_list_free(&ctx->file_repair_list);
 	action_list_free(&ctx->fs_repair_list);
@@ -105,9 +103,12 @@ scrub_cleanup(
 	    ctx->verify_disks[XFS_DEV_RT] != ctx->verify_disks[XFS_DEV_DATA])
 		disk_close(ctx->verify_disks[XFS_DEV_RT]);
 	fshandle_destroy();
-	error = -xfd_close(&ctx->mnt);
-	if (error)
+	error2 = -xfd_close(&ctx->mnt);
+	if (error2) {
+		if (!error)
+			error = error2;
 		str_liberror(ctx, error, _("closing mountpoint fd"));
+	}
 	fs_table_destroy();
 
 	return error;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte
  2026-06-25 22:54 ` [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte Darrick J. Wong
@ 2026-06-26  4:58   ` Christoph Hellwig
  2026-06-26 16:46     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  4:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

On Thu, Jun 25, 2026 at 03:54:28PM -0700, Darrick J. Wong wrote:
> +	while (!si.aborted && (ret = scan_user_bulkstat(ctx, &si, &ino)) == 1) {
>  		/* empty */
>  	}

Style nit, but wouldn't this read better as:

	while (!si.aborted) {
		ret = scan_user_bulkstat(ctx, &si, &ino)
		if (ret != 1)
			break;
 	}

?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/8] xfs_scrub: don't flatten error numbers in read_verify_schedule_now
  2026-06-25 22:54 ` [PATCH 2/8] xfs_scrub: don't flatten error numbers in read_verify_schedule_now Darrick J. Wong
@ 2026-06-26  4:58   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  4:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4
  2026-06-25 22:54 ` [PATCH 3/8] xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4 Darrick J. Wong
@ 2026-06-26  4:59   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  4:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key
  2026-06-25 22:55 ` [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key Darrick J. Wong
@ 2026-06-26  5:03   ` Christoph Hellwig
  2026-06-26 17:09     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  5:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

On Thu, Jun 25, 2026 at 03:55:15PM -0700, Darrick J. Wong wrote:
> --- a/scrub/spacemap.c
> +++ b/scrub/spacemap.c
> @@ -213,7 +213,7 @@ scan_rt_rmaps(
>  	scan_dev_rmaps(ctx, ctx->fsinfo.fs_rtdev, arg);
>  }
>  
> -/* Iterate all the reverse mappings of the log device. */
> +/* Iterate all the reverse mappings of the external log device. */
>  static void
>  scan_log_rmaps(
>  	struct workqueue	*wq,
> @@ -222,7 +222,12 @@ scan_log_rmaps(
>  {
>  	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
>  
> -	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
> +	/*
> +	 * Internal rt sections (rtstart != 0) means we use synthetic device
> +	 * keys for external devices.
> +	 */
> +	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? XFS_DEV_LOG :
> +						      ctx->fsinfo.fs_logdev,

Maybe this should be using to_fsmap_dev instead?  Move the call
to to_fsmap_dev into scan_dev_rmaps for further simplification.

scan_rt_rmaps seems to have similar issues.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4
  2026-06-25 22:55 ` [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4 Darrick J. Wong
@ 2026-06-26  5:04   ` Christoph Hellwig
  2026-06-26 16:55     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  5:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

On Thu, Jun 25, 2026 at 03:55:31PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Codex complains that the number of work items computed when estimating
> the amount of work for phase 4 doesn't include the FSCOUNTERS and
> QUOTACHECK items.  Add them back in.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  scrub/phase4.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> 
> diff --git a/scrub/phase4.c b/scrub/phase4.c
> index 6bd16c23939eb4..744fd92ab195f4 100644
> --- a/scrub/phase4.c
> +++ b/scrub/phase4.c
> @@ -297,6 +297,10 @@ phase4_estimate(
>  	need_fixing = action_list_length(ctx->fs_repair_list) +
>  		      action_list_length(ctx->file_repair_list);
>  
> +	/* fscounters and quotacheck */
> +	if (need_fixing)
> +		need_fixing += 2;
> +

This matches the commit message, but there does this come from?
Why are fscountれrs and quotachck special?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/8] xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add
  2026-06-25 22:55 ` [PATCH 6/8] xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add Darrick J. Wong
@ 2026-06-26  5:04   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  5:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

On Thu, Jun 25, 2026 at 03:55:46PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Codex points out that queue_metapath_scan and queue_fs_scan fail to free
> the allocated items if they are not consumed by workqueue_add().

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/8] xfs_scrub: fix phase 8 debug reporting
  2026-06-25 22:56 ` [PATCH 7/8] xfs_scrub: fix phase 8 debug reporting Darrick J. Wong
@ 2026-06-26  5:05   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  5:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/8] xfs_scrub: always finish cleanup, even if reporting healthy state fails
  2026-06-25 22:56 ` [PATCH 8/8] xfs_scrub: always finish cleanup, even if reporting healthy state fails Darrick J. Wong
@ 2026-06-26  5:05   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2026-06-26  5:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

On Thu, Jun 25, 2026 at 03:56:17PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Codex points out here that we should always release resources used by
> the program, even if reporting healthy state to the kernel itself fails.
> We're exiting soon anyway so this probably doesn't make much of a
> difference.

Yeah, this seems a bit pointless.  But also easy enough to not care
if it makes checkers happy:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte
  2026-06-26  4:58   ` Christoph Hellwig
@ 2026-06-26 16:46     ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-26 16:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Fri, Jun 26, 2026 at 06:58:16AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 03:54:28PM -0700, Darrick J. Wong wrote:
> > +	while (!si.aborted && (ret = scan_user_bulkstat(ctx, &si, &ino)) == 1) {
> >  		/* empty */
> >  	}
> 
> Style nit, but wouldn't this read better as:
> 
> 	while (!si.aborted) {
> 		ret = scan_user_bulkstat(ctx, &si, &ino)
> 		if (ret != 1)
> 			break;
>  	}
> 
> ?

Yeah, that's fine too.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks for reviewing!

--D

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4
  2026-06-26  5:04   ` Christoph Hellwig
@ 2026-06-26 16:55     ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-26 16:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Fri, Jun 26, 2026 at 07:04:26AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 03:55:31PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Codex complains that the number of work items computed when estimating
> > the amount of work for phase 4 doesn't include the FSCOUNTERS and
> > QUOTACHECK items.  Add them back in.
> > 
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > ---
> >  scrub/phase4.c |    4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > 
> > diff --git a/scrub/phase4.c b/scrub/phase4.c
> > index 6bd16c23939eb4..744fd92ab195f4 100644
> > --- a/scrub/phase4.c
> > +++ b/scrub/phase4.c
> > @@ -297,6 +297,10 @@ phase4_estimate(
> >  	need_fixing = action_list_length(ctx->fs_repair_list) +
> >  		      action_list_length(ctx->file_repair_list);
> >  
> > +	/* fscounters and quotacheck */
> > +	if (need_fixing)
> > +		need_fixing += 2;
> > +
> 
> This matches the commit message, but there does this come from?
> Why are fscountれrs and quotachck special?

They're run explicitly by phase4_func so that repairs don't fail with
ENOSPC or EDQUOT if the summary counters (or quota usage numbers) are
wrong.  How about I expand on this in the comments?

phase4_func:

	/*
	 * Check the resource usage counters early.  Normally we do this
	 * during phase 7, but some of the cross-referencing requires
	 * fairly accurate summary counters.  Check and try to repair
	 * them now to minimize the chance that repairs of primary
	 * metadata fail due to secondary metadata or ENOSPC on broken
	 * counters.  If repairs fails, we'll come back during phase 7.
	 */
	scrub_item_init_fs(&sri);
	scrub_item_schedule(&sri, XFS_SCRUB_TYPE_FSCOUNTERS);

and:

	/*
	 * Try to fix the quota usage counts so that online repair
	 * doesn't fail with EDQUOT (or worse shut down the fs) due to
	 * bad counts.
	 */
	if (fsgeom.sick & XFS_FSOP_GEOM_SICK_QUOTACHECK)
		scrub_item_schedule(&sri, XFS_SCRUB_TYPE_QUOTACHECK);

phase4_estimate:

	/*
	 * fscounters and quotacheck are run directly by phase4_func
	 * independent of the repair lists, so put that in the item
	 * count.  See phase4_func for why.
	 */
	if (need_fixing)
		need_fixing += 2;

How does that sound?

--D

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key
  2026-06-26  5:03   ` Christoph Hellwig
@ 2026-06-26 17:09     ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2026-06-26 17:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: aalbersh, linux-xfs

On Fri, Jun 26, 2026 at 07:03:45AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 03:55:15PM -0700, Darrick J. Wong wrote:
> > --- a/scrub/spacemap.c
> > +++ b/scrub/spacemap.c
> > @@ -213,7 +213,7 @@ scan_rt_rmaps(
> >  	scan_dev_rmaps(ctx, ctx->fsinfo.fs_rtdev, arg);
> >  }
> >  
> > -/* Iterate all the reverse mappings of the log device. */
> > +/* Iterate all the reverse mappings of the external log device. */
> >  static void
> >  scan_log_rmaps(
> >  	struct workqueue	*wq,
> > @@ -222,7 +222,12 @@ scan_log_rmaps(
> >  {
> >  	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
> >  
> > -	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? 2 : ctx->fsinfo.fs_logdev,
> > +	/*
> > +	 * Internal rt sections (rtstart != 0) means we use synthetic device
> > +	 * keys for external devices.
> > +	 */
> > +	scan_dev_rmaps(ctx, ctx->mnt.fsgeom.rtstart ? XFS_DEV_LOG :
> > +						      ctx->fsinfo.fs_logdev,
> 
> Maybe this should be using to_fsmap_dev instead?  Move the call
> to to_fsmap_dev into scan_dev_rmaps for further simplification.

Yeah.  phase7 could use from_fsmap_dev, so I'll hoist both to spacemap.h
and clean up the callsites.

> scan_rt_rmaps seems to have similar issues.

I don't agree with that since it's only used for pre-rtgroups rt volumes
which can't be internal, but it's trivial to clean it up along with the
other callsites so in it goes.

--D

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-06-26 17:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 22:54 [PATCHSET] xfs_scrub: codex-inspired bug fixes, part 3 Darrick J. Wong
2026-06-25 22:54 ` [PATCH 1/8] xfs_scrub: stop user file scan if caller already aborte Darrick J. Wong
2026-06-26  4:58   ` Christoph Hellwig
2026-06-26 16:46     ` Darrick J. Wong
2026-06-25 22:54 ` [PATCH 2/8] xfs_scrub: don't flatten error numbers in read_verify_schedule_now Darrick J. Wong
2026-06-26  4:58   ` Christoph Hellwig
2026-06-25 22:54 ` [PATCH 3/8] xfs_scrub: actually handle NEEDSCHECK scrub items in phase 4 Darrick J. Wong
2026-06-26  4:59   ` Christoph Hellwig
2026-06-25 22:55 ` [PATCH 4/8] xfs_scrub: fix spacemap external log device scan dev key Darrick J. Wong
2026-06-26  5:03   ` Christoph Hellwig
2026-06-26 17:09     ` Darrick J. Wong
2026-06-25 22:55 ` [PATCH 5/8] xfs_scrub: fix estimate of work items for phase 4 Darrick J. Wong
2026-06-26  5:04   ` Christoph Hellwig
2026-06-26 16:55     ` Darrick J. Wong
2026-06-25 22:55 ` [PATCH 6/8] xfs_scrub: don't leak phase 5 scan items after a failed workqueue_add Darrick J. Wong
2026-06-26  5:04   ` Christoph Hellwig
2026-06-25 22:56 ` [PATCH 7/8] xfs_scrub: fix phase 8 debug reporting Darrick J. Wong
2026-06-26  5:05   ` Christoph Hellwig
2026-06-25 22:56 ` [PATCH 8/8] xfs_scrub: always finish cleanup, even if reporting healthy state fails Darrick J. Wong
2026-06-26  5:05   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox