public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/2] xfs: fix broken MAXREFCOUNT handling
@ 2022-11-29 22:01 Darrick J. Wong
  2022-11-29 22:01 ` [PATCH 1/2] xfs: hoist refcount record merge predicates Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-29 22:01 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

Hi all,

This quick series fixes a bug in the refcount code where we don't merge
records correctly if the refcount is hovering around MAXREFCOUNT.  This
fixes regressions in xfs/179 when fsdax is enabled.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=maxrefcount-fixes-6.2
---
 fs/xfs/libxfs/xfs_refcount.c |  143 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 127 insertions(+), 16 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] xfs: hoist refcount record merge predicates
  2022-11-29 22:01 [PATCHSET 0/2] xfs: fix broken MAXREFCOUNT handling Darrick J. Wong
@ 2022-11-29 22:01 ` Darrick J. Wong
  2022-11-29 22:35   ` Dave Chinner
  2022-11-30  9:24   ` Yang, Xiao/杨 晓
  2022-11-29 22:01 ` [PATCH 2/2] xfs: estimate post-merge refcounts correctly Darrick J. Wong
  2022-11-29 22:06 ` [RFC PATCH] xfs/179: modify test to trigger refcount update bugs Darrick J. Wong
  2 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-29 22:01 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Hoist these multiline conditionals into separate static inline helpers
to improve readability and set the stage for corruption fixes that will
be introduced in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c |  126 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 110 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 3f34bafe18dd..8c68994d09f3 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -815,11 +815,116 @@ xfs_refcount_find_right_extents(
 /* Is this extent valid? */
 static inline bool
 xfs_refc_valid(
-	struct xfs_refcount_irec	*rc)
+	const struct xfs_refcount_irec	*rc)
 {
 	return rc->rc_startblock != NULLAGBLOCK;
 }
 
+static inline bool
+xfs_refc_want_merge_center(
+	const struct xfs_refcount_irec	*left,
+	const struct xfs_refcount_irec	*cleft,
+	const struct xfs_refcount_irec	*cright,
+	const struct xfs_refcount_irec	*right,
+	bool				cleft_is_cright,
+	enum xfs_refc_adjust_op		adjust,
+	unsigned long long		*ulenp)
+{
+	unsigned long long		ulen = left->rc_blockcount;
+
+	/*
+	 * To merge with a center record, both shoulder records must be
+	 * adjacent to the record we want to adjust.  This is only true if
+	 * find_left and find_right made all four records valid.
+	 */
+	if (!xfs_refc_valid(left)  || !xfs_refc_valid(right) ||
+	    !xfs_refc_valid(cleft) || !xfs_refc_valid(cright))
+		return false;
+
+	/* There must only be one record for the entire range. */
+	if (!cleft_is_cright)
+		return false;
+
+	/* The shoulder record refcounts must match the new refcount. */
+	if (left->rc_refcount != cleft->rc_refcount + adjust)
+		return false;
+	if (right->rc_refcount != cleft->rc_refcount + adjust)
+		return false;
+
+	/*
+	 * The new record cannot exceed the max length.  The funny computation
+	 * of ulen avoids casting.
+	 */
+	ulen += cleft->rc_blockcount + right->rc_blockcount;
+	if (ulen >= MAXREFCEXTLEN)
+		return false;
+
+	*ulenp = ulen;
+	return true;
+}
+
+static inline bool
+xfs_refc_want_merge_left(
+	const struct xfs_refcount_irec	*left,
+	const struct xfs_refcount_irec	*cleft,
+	enum xfs_refc_adjust_op		adjust)
+{
+	unsigned long long		ulen = left->rc_blockcount;
+
+	/*
+	 * For a left merge, the left shoulder record must be adjacent to the
+	 * start of the range.  If this is true, find_left made left and cleft
+	 * contain valid contents.
+	 */
+	if (!xfs_refc_valid(left) || !xfs_refc_valid(cleft))
+		return false;
+
+	/* Left shoulder record refcount must match the new refcount. */
+	if (left->rc_refcount != cleft->rc_refcount + adjust)
+		return false;
+
+	/*
+	 * The new record cannot exceed the max length.  The funny computation
+	 * of ulen avoids casting.
+	 */
+	ulen += cleft->rc_blockcount;
+	if (ulen >= MAXREFCEXTLEN)
+		return false;
+
+	return true;
+}
+
+static inline bool
+xfs_refc_want_merge_right(
+	const struct xfs_refcount_irec	*cright,
+	const struct xfs_refcount_irec	*right,
+	enum xfs_refc_adjust_op		adjust)
+{
+	unsigned long long		ulen = right->rc_blockcount;
+
+	/*
+	 * For a right merge, the right shoulder record must be adjacent to the
+	 * end of the range.  If this is true, find_right made cright and right
+	 * contain valid contents.
+	 */
+	if (!xfs_refc_valid(right) || !xfs_refc_valid(cright))
+		return false;
+
+	/* Right shoulder record refcount must match the new refcount. */
+	if (right->rc_refcount != cright->rc_refcount + adjust)
+		return false;
+
+	/*
+	 * The new record cannot exceed the max length.  The funny computation
+	 * of ulen avoids casting.
+	 */
+	ulen += cright->rc_blockcount;
+	if (ulen >= MAXREFCEXTLEN)
+		return false;
+
+	return true;
+}
+
 /*
  * Try to merge with any extents on the boundaries of the adjustment range.
  */
@@ -861,23 +966,15 @@ xfs_refcount_merge_extents(
 		 (cleft.rc_blockcount == cright.rc_blockcount);
 
 	/* Try to merge left, cleft, and right.  cleft must == cright. */
-	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
-			right.rc_blockcount;
-	if (xfs_refc_valid(&left) && xfs_refc_valid(&right) &&
-	    xfs_refc_valid(&cleft) && xfs_refc_valid(&cright) && cequal &&
-	    left.rc_refcount == cleft.rc_refcount + adjust &&
-	    right.rc_refcount == cleft.rc_refcount + adjust &&
-	    ulen < MAXREFCEXTLEN) {
+	if (xfs_refc_want_merge_center(&left, &cleft, &cright, &right, cequal,
+				adjust, &ulen)) {
 		*shape_changed = true;
 		return xfs_refcount_merge_center_extents(cur, &left, &cleft,
 				&right, ulen, aglen);
 	}
 
 	/* Try to merge left and cleft. */
-	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
-	if (xfs_refc_valid(&left) && xfs_refc_valid(&cleft) &&
-	    left.rc_refcount == cleft.rc_refcount + adjust &&
-	    ulen < MAXREFCEXTLEN) {
+	if (xfs_refc_want_merge_left(&left, &cleft, adjust)) {
 		*shape_changed = true;
 		error = xfs_refcount_merge_left_extent(cur, &left, &cleft,
 				agbno, aglen);
@@ -893,10 +990,7 @@ xfs_refcount_merge_extents(
 	}
 
 	/* Try to merge cright and right. */
-	ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
-	if (xfs_refc_valid(&right) && xfs_refc_valid(&cright) &&
-	    right.rc_refcount == cright.rc_refcount + adjust &&
-	    ulen < MAXREFCEXTLEN) {
+	if (xfs_refc_want_merge_right(&cright, &right, adjust)) {
 		*shape_changed = true;
 		return xfs_refcount_merge_right_extent(cur, &right, &cright,
 				aglen);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] xfs: estimate post-merge refcounts correctly
  2022-11-29 22:01 [PATCHSET 0/2] xfs: fix broken MAXREFCOUNT handling Darrick J. Wong
  2022-11-29 22:01 ` [PATCH 1/2] xfs: hoist refcount record merge predicates Darrick J. Wong
@ 2022-11-29 22:01 ` Darrick J. Wong
  2022-11-29 22:37   ` Dave Chinner
  2022-11-30  9:32   ` Yang, Xiao/杨 晓
  2022-11-29 22:06 ` [RFC PATCH] xfs/179: modify test to trigger refcount update bugs Darrick J. Wong
  2 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-29 22:01 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
metadata corruptions after being run.  Specifically, xfs_repair noticed
single-block refcount records that could be combined but had not been.

The root cause of this is improper MAXREFCOUNT edge case handling in
xfs_refcount_merge_extents.  When we're trying to find candidates for a
refcount btree record merge, we compute the refcount attribute of the
merged record, but we fail to account for the fact that once a record
hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence
the computed refcount is wrong, and we fail to merge the extents.

Fix this by adjusting the merge predicates to compute the adjusted
refcount correctly.

Fixes: 3172725814f9 ("xfs: adjust refcount of an extent of blocks in refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 8c68994d09f3..0d68a9230386 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -820,6 +820,17 @@ xfs_refc_valid(
 	return rc->rc_startblock != NULLAGBLOCK;
 }
 
+static inline xfs_nlink_t
+xfs_refc_merge_refcount(
+	const struct xfs_refcount_irec	*irec,
+	enum xfs_refc_adjust_op		adjust)
+{
+	/* Once a record hits MAXREFCOUNT, it is pinned there forever */
+	if (irec->rc_refcount == MAXREFCOUNT)
+		return MAXREFCOUNT;
+	return irec->rc_refcount + adjust;
+}
+
 static inline bool
 xfs_refc_want_merge_center(
 	const struct xfs_refcount_irec	*left,
@@ -831,6 +842,7 @@ xfs_refc_want_merge_center(
 	unsigned long long		*ulenp)
 {
 	unsigned long long		ulen = left->rc_blockcount;
+	xfs_nlink_t			new_refcount;
 
 	/*
 	 * To merge with a center record, both shoulder records must be
@@ -846,9 +858,10 @@ xfs_refc_want_merge_center(
 		return false;
 
 	/* The shoulder record refcounts must match the new refcount. */
-	if (left->rc_refcount != cleft->rc_refcount + adjust)
+	new_refcount = xfs_refc_merge_refcount(cleft, adjust);
+	if (left->rc_refcount != new_refcount)
 		return false;
-	if (right->rc_refcount != cleft->rc_refcount + adjust)
+	if (right->rc_refcount != new_refcount)
 		return false;
 
 	/*
@@ -870,6 +883,7 @@ xfs_refc_want_merge_left(
 	enum xfs_refc_adjust_op		adjust)
 {
 	unsigned long long		ulen = left->rc_blockcount;
+	xfs_nlink_t			new_refcount;
 
 	/*
 	 * For a left merge, the left shoulder record must be adjacent to the
@@ -880,7 +894,8 @@ xfs_refc_want_merge_left(
 		return false;
 
 	/* Left shoulder record refcount must match the new refcount. */
-	if (left->rc_refcount != cleft->rc_refcount + adjust)
+	new_refcount = xfs_refc_merge_refcount(cleft, adjust);
+	if (left->rc_refcount != new_refcount)
 		return false;
 
 	/*
@@ -901,6 +916,7 @@ xfs_refc_want_merge_right(
 	enum xfs_refc_adjust_op		adjust)
 {
 	unsigned long long		ulen = right->rc_blockcount;
+	xfs_nlink_t			new_refcount;
 
 	/*
 	 * For a right merge, the right shoulder record must be adjacent to the
@@ -911,7 +927,8 @@ xfs_refc_want_merge_right(
 		return false;
 
 	/* Right shoulder record refcount must match the new refcount. */
-	if (right->rc_refcount != cright->rc_refcount + adjust)
+	new_refcount = xfs_refc_merge_refcount(cright, adjust);
+	if (right->rc_refcount != new_refcount)
 		return false;
 
 	/*


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH] xfs/179: modify test to trigger refcount update bugs
  2022-11-29 22:01 [PATCHSET 0/2] xfs: fix broken MAXREFCOUNT handling Darrick J. Wong
  2022-11-29 22:01 ` [PATCH 1/2] xfs: hoist refcount record merge predicates Darrick J. Wong
  2022-11-29 22:01 ` [PATCH 2/2] xfs: estimate post-merge refcounts correctly Darrick J. Wong
@ 2022-11-29 22:06 ` Darrick J. Wong
  2022-11-29 22:42   ` Dave Chinner
  2022-11-30 10:07   ` Yang, Xiao/杨 晓
  2 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-29 22:06 UTC (permalink / raw)
  To: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Upon enabling fsdax + reflink for XFS, this test began to report
refcount metadata corruptions after being run.  Specifically, xfs_repair
noticed single-block refcount records that could be combined but had not
been.

The root cause of this is improper MAXREFCOUNT edge case handling in
xfs_refcount_merge_extents.  When we're trying to find candidates for a
record merge, we compute the refcount of the merged record, but without
accounting for the fact that once a record hits rc_refcount ==
MAXREFCOUNT, it is pinned that way forever.

Adjust this test to use a sub-filesize write for one of the COW writes,
because this is how we force the extent merge code to run.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/179 |   28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/tests/xfs/179 b/tests/xfs/179
index ec0cb7e5b4..214558f694 100755
--- a/tests/xfs/179
+++ b/tests/xfs/179
@@ -21,17 +21,28 @@ _require_scratch_nocheck
 _require_cp_reflink
 _require_test_program "punch-alternating"
 
+_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly"
+
 echo "Format and mount"
 _scratch_mkfs -d agcount=1 > $seqres.full 2>&1
 _scratch_mount >> $seqres.full 2>&1
 
+# This test modifies the refcount btree on the data device, so we must force
+# rtinherit off so that the test files are created there.
+_xfs_force_bdev data $SCRATCH_MNT
+
 testdir=$SCRATCH_MNT/test-$seq
 mkdir $testdir
 
+# Set the file size to 10x the block size to guarantee that the COW writes will
+# touch multiple blocks and exercise the refcount extent merging code.  This is
+# necessary to catch a bug in the refcount extent merging code that handles
+# MAXREFCOUNT edge cases.
 blksz=65536
+filesz=$((blksz * 10))
 
 echo "Create original files"
-_pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full
+_pwrite_byte 0x61 0 $filesz $testdir/file1 >> $seqres.full
 _cp_reflink $testdir/file1 $testdir/file2 >> $seqres.full
 
 echo "Change reference count"
@@ -56,9 +67,20 @@ _scratch_xfs_db -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' >> $seqres.full
 _scratch_mount >> $seqres.full
 
 echo "CoW a couple files"
-_pwrite_byte 0x62 0 $blksz $testdir/file3 >> $seqres.full
-_pwrite_byte 0x62 0 $blksz $testdir/file5 >> $seqres.full
+_pwrite_byte 0x62 0 $filesz $testdir/file3 >> $seqres.full
+_pwrite_byte 0x62 0 $filesz $testdir/file5 >> $seqres.full
+
+# For the last COW test, write single blocks at the start, middle, and end of
+# the shared file to exercise a refcount btree update that targets a single
+# block of the multiblock refcount record that we just modified.
+#
+# This trips a bug where XFS didn't correctly identify refcount record merge
+# candidates when any of the records are pinned at MAXREFCOUNT.  The bug was
+# originally discovered by enabling fsdax + reflink, but the bug can be
+# triggered by any COW that doesn't target the entire extent.
 _pwrite_byte 0x62 0 $blksz $testdir/file7 >> $seqres.full
+_pwrite_byte 0x62 $((blksz * 4)) $blksz $testdir/file7 >> $seqres.full
+_pwrite_byte 0x62 $((filesz - blksz)) $blksz $testdir/file7 >> $seqres.full
 
 echo "Check scratch fs"
 _scratch_unmount

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: hoist refcount record merge predicates
  2022-11-29 22:01 ` [PATCH 1/2] xfs: hoist refcount record merge predicates Darrick J. Wong
@ 2022-11-29 22:35   ` Dave Chinner
  2022-11-30  0:13     ` Darrick J. Wong
  2022-11-30  9:24   ` Yang, Xiao/杨 晓
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2022-11-29 22:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Nov 29, 2022 at 02:01:31PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Hoist these multiline conditionals into separate static inline helpers
> to improve readability and set the stage for corruption fixes that will
> be introduced in the next patch.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_refcount.c |  126 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 110 insertions(+), 16 deletions(-)

Looks OK. Minor nit below.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> 
> 
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index 3f34bafe18dd..8c68994d09f3 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -815,11 +815,116 @@ xfs_refcount_find_right_extents(
>  /* Is this extent valid? */
>  static inline bool
>  xfs_refc_valid(
> -	struct xfs_refcount_irec	*rc)
> +	const struct xfs_refcount_irec	*rc)
>  {
>  	return rc->rc_startblock != NULLAGBLOCK;
>  }
>  
> +static inline bool
> +xfs_refc_want_merge_center(
> +	const struct xfs_refcount_irec	*left,
> +	const struct xfs_refcount_irec	*cleft,
> +	const struct xfs_refcount_irec	*cright,
> +	const struct xfs_refcount_irec	*right,
> +	bool				cleft_is_cright,
> +	enum xfs_refc_adjust_op		adjust,
> +	unsigned long long		*ulenp)
> +{
> +	unsigned long long		ulen = left->rc_blockcount;
> +
> +	/*
> +	 * To merge with a center record, both shoulder records must be
> +	 * adjacent to the record we want to adjust.  This is only true if
> +	 * find_left and find_right made all four records valid.
> +	 */
> +	if (!xfs_refc_valid(left)  || !xfs_refc_valid(right) ||
> +	    !xfs_refc_valid(cleft) || !xfs_refc_valid(cright))
> +		return false;
> +
> +	/* There must only be one record for the entire range. */
> +	if (!cleft_is_cright)
> +		return false;
> +
> +	/* The shoulder record refcounts must match the new refcount. */
> +	if (left->rc_refcount != cleft->rc_refcount + adjust)
> +		return false;
> +	if (right->rc_refcount != cleft->rc_refcount + adjust)
> +		return false;
> +
> +	/*
> +	 * The new record cannot exceed the max length.  The funny computation
> +	 * of ulen avoids casting.
> +	 */
> +	ulen += cleft->rc_blockcount + right->rc_blockcount;
> +	if (ulen >= MAXREFCEXTLEN)
> +		return false;

The comment took me a bit of spelunking to decipher what the "funny
computation" was. Better to spell it out directly (catch u32
overflows) than just hint that there's somethign special about it.
Say:

	/*
	 * The new record cannot exceed the max length. ulen is a
	 * ULL as the individual record block counts can be up to
	 * (u32 - 1) in length hence we need to catch u32 addition
	 * overflows here.
	 */

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: estimate post-merge refcounts correctly
  2022-11-29 22:01 ` [PATCH 2/2] xfs: estimate post-merge refcounts correctly Darrick J. Wong
@ 2022-11-29 22:37   ` Dave Chinner
  2022-11-30  9:32   ` Yang, Xiao/杨 晓
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2022-11-29 22:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Nov 29, 2022 at 02:01:36PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
> metadata corruptions after being run.  Specifically, xfs_repair noticed
> single-block refcount records that could be combined but had not been.
> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> refcount btree record merge, we compute the refcount attribute of the
> merged record, but we fail to account for the fact that once a record
> hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence
> the computed refcount is wrong, and we fail to merge the extents.
> 
> Fix this by adjusting the merge predicates to compute the adjusted
> refcount correctly.
> 
> Fixes: 3172725814f9 ("xfs: adjust refcount of an extent of blocks in refcount btree")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_refcount.c |   25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)

Looks good to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] xfs/179: modify test to trigger refcount update bugs
  2022-11-29 22:06 ` [RFC PATCH] xfs/179: modify test to trigger refcount update bugs Darrick J. Wong
@ 2022-11-29 22:42   ` Dave Chinner
  2022-11-30  0:19     ` Darrick J. Wong
  2022-11-30 10:07   ` Yang, Xiao/杨 晓
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2022-11-29 22:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Nov 29, 2022 at 02:06:39PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, this test began to report
> refcount metadata corruptions after being run.  Specifically, xfs_repair
> noticed single-block refcount records that could be combined but had not
> been.
> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> record merge, we compute the refcount of the merged record, but without
> accounting for the fact that once a record hits rc_refcount ==
> MAXREFCOUNT, it is pinned that way forever.
> 
> Adjust this test to use a sub-filesize write for one of the COW writes,
> because this is how we force the extent merge code to run.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Seems like a reasonable modification to the test....

> ---
>  tests/xfs/179 |   28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/xfs/179 b/tests/xfs/179
> index ec0cb7e5b4..214558f694 100755
> --- a/tests/xfs/179
> +++ b/tests/xfs/179
> @@ -21,17 +21,28 @@ _require_scratch_nocheck
>  _require_cp_reflink
>  _require_test_program "punch-alternating"
>  
> +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly"

Though I really don't like these annotation because when the test
fails in future as I'm developing new code it's going to tell me I
need a fix I already have in the kernel. This is just extra noise
that I have to filter out of the results output. IMO a comment for
this information or a line in the commit message is fine - it
just doesn't belong in the test output....

Other than that:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: hoist refcount record merge predicates
  2022-11-29 22:35   ` Dave Chinner
@ 2022-11-30  0:13     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-30  0:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Nov 30, 2022 at 09:35:31AM +1100, Dave Chinner wrote:
> On Tue, Nov 29, 2022 at 02:01:31PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Hoist these multiline conditionals into separate static inline helpers
> > to improve readability and set the stage for corruption fixes that will
> > be introduced in the next patch.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_refcount.c |  126 +++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 110 insertions(+), 16 deletions(-)
> 
> Looks OK. Minor nit below.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> > index 3f34bafe18dd..8c68994d09f3 100644
> > --- a/fs/xfs/libxfs/xfs_refcount.c
> > +++ b/fs/xfs/libxfs/xfs_refcount.c
> > @@ -815,11 +815,116 @@ xfs_refcount_find_right_extents(
> >  /* Is this extent valid? */
> >  static inline bool
> >  xfs_refc_valid(
> > -	struct xfs_refcount_irec	*rc)
> > +	const struct xfs_refcount_irec	*rc)
> >  {
> >  	return rc->rc_startblock != NULLAGBLOCK;
> >  }
> >  
> > +static inline bool
> > +xfs_refc_want_merge_center(
> > +	const struct xfs_refcount_irec	*left,
> > +	const struct xfs_refcount_irec	*cleft,
> > +	const struct xfs_refcount_irec	*cright,
> > +	const struct xfs_refcount_irec	*right,
> > +	bool				cleft_is_cright,
> > +	enum xfs_refc_adjust_op		adjust,
> > +	unsigned long long		*ulenp)
> > +{
> > +	unsigned long long		ulen = left->rc_blockcount;
> > +
> > +	/*
> > +	 * To merge with a center record, both shoulder records must be
> > +	 * adjacent to the record we want to adjust.  This is only true if
> > +	 * find_left and find_right made all four records valid.
> > +	 */
> > +	if (!xfs_refc_valid(left)  || !xfs_refc_valid(right) ||
> > +	    !xfs_refc_valid(cleft) || !xfs_refc_valid(cright))
> > +		return false;
> > +
> > +	/* There must only be one record for the entire range. */
> > +	if (!cleft_is_cright)
> > +		return false;
> > +
> > +	/* The shoulder record refcounts must match the new refcount. */
> > +	if (left->rc_refcount != cleft->rc_refcount + adjust)
> > +		return false;
> > +	if (right->rc_refcount != cleft->rc_refcount + adjust)
> > +		return false;
> > +
> > +	/*
> > +	 * The new record cannot exceed the max length.  The funny computation
> > +	 * of ulen avoids casting.
> > +	 */
> > +	ulen += cleft->rc_blockcount + right->rc_blockcount;
> > +	if (ulen >= MAXREFCEXTLEN)
> > +		return false;
> 
> The comment took me a bit of spelunking to decipher what the "funny
> computation" was. Better to spell it out directly (catch u32
> overflows) than just hint that there's somethign special about it.
> Say:
> 
> 	/*
> 	 * The new record cannot exceed the max length. ulen is a
> 	 * ULL as the individual record block counts can be up to
> 	 * (u32 - 1) in length hence we need to catch u32 addition
> 	 * overflows here.
> 	 */

Done; thanks for the quick review!

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] xfs/179: modify test to trigger refcount update bugs
  2022-11-29 22:42   ` Dave Chinner
@ 2022-11-30  0:19     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-30  0:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Nov 30, 2022 at 09:42:27AM +1100, Dave Chinner wrote:
> On Tue, Nov 29, 2022 at 02:06:39PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Upon enabling fsdax + reflink for XFS, this test began to report
> > refcount metadata corruptions after being run.  Specifically, xfs_repair
> > noticed single-block refcount records that could be combined but had not
> > been.
> > 
> > The root cause of this is improper MAXREFCOUNT edge case handling in
> > xfs_refcount_merge_extents.  When we're trying to find candidates for a
> > record merge, we compute the refcount of the merged record, but without
> > accounting for the fact that once a record hits rc_refcount ==
> > MAXREFCOUNT, it is pinned that way forever.
> > 
> > Adjust this test to use a sub-filesize write for one of the COW writes,
> > because this is how we force the extent merge code to run.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Seems like a reasonable modification to the test....
> 
> > ---
> >  tests/xfs/179 |   28 +++++++++++++++++++++++++---
> >  1 file changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tests/xfs/179 b/tests/xfs/179
> > index ec0cb7e5b4..214558f694 100755
> > --- a/tests/xfs/179
> > +++ b/tests/xfs/179
> > @@ -21,17 +21,28 @@ _require_scratch_nocheck
> >  _require_cp_reflink
> >  _require_test_program "punch-alternating"
> >  
> > +_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: estimate post-merge refcounts correctly"
> 
> Though I really don't like these annotation because when the test
> fails in future as I'm developing new code it's going to tell me I
> need a fix I already have in the kernel. This is just extra noise
> that I have to filter out of the results output. IMO a comment for
> this information or a line in the commit message is fine - it
> just doesn't belong in the test output....

I'll turn that into a comment, since this originally was a functional
test, not a regression test.

> Other than that:
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Ok thanks!

--D

> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: hoist refcount record merge predicates
  2022-11-29 22:01 ` [PATCH 1/2] xfs: hoist refcount record merge predicates Darrick J. Wong
  2022-11-29 22:35   ` Dave Chinner
@ 2022-11-30  9:24   ` Yang, Xiao/杨 晓
  1 sibling, 0 replies; 13+ messages in thread
From: Yang, Xiao/杨 晓 @ 2022-11-30  9:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On 2022/11/30 6:01, Darrick J. Wong wrote:
> From: Darrick J. Wong<djwong@kernel.org>
> 
> Hoist these multiline conditionals into separate static inline helpers
> to improve readability and set the stage for corruption fixes that will
> be introduced in the next patch.
Hi Darrick,

It's a good refactoring. LGTM.
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>

Best Regards,
Xiao Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: estimate post-merge refcounts correctly
  2022-11-29 22:01 ` [PATCH 2/2] xfs: estimate post-merge refcounts correctly Darrick J. Wong
  2022-11-29 22:37   ` Dave Chinner
@ 2022-11-30  9:32   ` Yang, Xiao/杨 晓
  2022-11-30 18:49     ` Darrick J. Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Yang, Xiao/杨 晓 @ 2022-11-30  9:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On 2022/11/30 6:01, Darrick J. Wong wrote:
> From: Darrick J. Wong<djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
> metadata corruptions after being run.  Specifically, xfs_repair noticed
> single-block refcount records that could be combined but had not been.
Hi Darrick,

I am investigating the issue as well. Thanks a lot for your quick fix.
I have confirmed that xfs/179 with the fix patch can works well in DAX mode.
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>

> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> refcount btree record merge, we compute the refcount attribute of the
> merged record, but we fail to account for the fact that once a record
> hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence

One question:
Is it reansonable/expected to pin rc_refcount forever when a record hits 
rc_refcount == MAXREFCOUNT?

> the computed refcount is wrong, and we fail to merge the extents.
> 
> Fix this by adjusting the merge predicates to compute the adjusted
> refcount correctly.

Best Regards,
Xiao Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] xfs/179: modify test to trigger refcount update bugs
  2022-11-29 22:06 ` [RFC PATCH] xfs/179: modify test to trigger refcount update bugs Darrick J. Wong
  2022-11-29 22:42   ` Dave Chinner
@ 2022-11-30 10:07   ` Yang, Xiao/杨 晓
  1 sibling, 0 replies; 13+ messages in thread
From: Yang, Xiao/杨 晓 @ 2022-11-30 10:07 UTC (permalink / raw)
  To: Darrick J. Wong, linux-xfs, david

On 2022/11/30 6:06, Darrick J. Wong wrote:
> From: Darrick J. Wong<djwong@kernel.org>
> 
> Upon enabling fsdax + reflink for XFS, this test began to report
> refcount metadata corruptions after being run.  Specifically, xfs_repair
> noticed single-block refcount records that could be combined but had not
> been.
> 
> The root cause of this is improper MAXREFCOUNT edge case handling in
> xfs_refcount_merge_extents.  When we're trying to find candidates for a
> record merge, we compute the refcount of the merged record, but without
> accounting for the fact that once a record hits rc_refcount ==
> MAXREFCOUNT, it is pinned that way forever.
> 
> Adjust this test to use a sub-filesize write for one of the COW writes,
> because this is how we force the extent merge code to run.
Hi Darrick,

Cool, it is reliable to reproduce the same issue in non-DAX mode.
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>
Tested-by: Xiao Yang <yangx.jy@fujitsu.com>

Best Regards,
Xiao Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: estimate post-merge refcounts correctly
  2022-11-30  9:32   ` Yang, Xiao/杨 晓
@ 2022-11-30 18:49     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2022-11-30 18:49 UTC (permalink / raw)
  To: Yang, Xiao/杨 晓; +Cc: linux-xfs, david

On Wed, Nov 30, 2022 at 05:32:38PM +0800, Yang, Xiao/杨 晓 wrote:
> On 2022/11/30 6:01, Darrick J. Wong wrote:
> > From: Darrick J. Wong<djwong@kernel.org>
> > 
> > Upon enabling fsdax + reflink for XFS, xfs/179 began to report refcount
> > metadata corruptions after being run.  Specifically, xfs_repair noticed
> > single-block refcount records that could be combined but had not been.
> Hi Darrick,
> 
> I am investigating the issue as well. Thanks a lot for your quick fix.
> I have confirmed that xfs/179 with the fix patch can works well in DAX mode.
> Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>
> 
> > 
> > The root cause of this is improper MAXREFCOUNT edge case handling in
> > xfs_refcount_merge_extents.  When we're trying to find candidates for a
> > refcount btree record merge, we compute the refcount attribute of the
> > merged record, but we fail to account for the fact that once a record
> > hits rc_refcount == MAXREFCOUNT, it is pinned that way forever.  Hence
> 
> One question:
> Is it reansonable/expected to pin rc_refcount forever when a record hits
> rc_refcount == MAXREFCOUNT?

In the ideal world we'd have a way for refcount_adjust to return early
if it hit a MAXREFCOUNT record, and stop the reflink operation right
there and then.

*Unfortunately* due to the way defer ops work, by the time we're walking
through the refcount btree, we've already committed to adding the entire
mapping.  There's no good way to back out once we start, since
transactions do not log undo items, only redo items.  Augmenting the log
to support undo items is a very big ask.

We could try to work around that by walking the refcount records
*before* committing log items, but that second walk will increase the
overhead for a fairly difficult to hit corner case.  We'd also need to
hold the AGF buffer lock from the start of the chain all the way until
we start the deferred refcount processing.  Or we'd need to add a
separate AG lock.

Even if we did all that, FICLONE* doesn't have any means to communicate
that a short reflink occurred.  copy_file_range does, but then it has
other weird warts, and cannot reflink more than 2^32 bytes in one go.

The simplest solution is to pin the extent if its refcount hits
MAXREFCOUNT, so that's what I went with. :/

--D

> > the computed refcount is wrong, and we fail to merge the extents.
> > 
> > Fix this by adjusting the merge predicates to compute the adjusted
> > refcount correctly.
> 
> Best Regards,
> Xiao Yang

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-30 18:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-29 22:01 [PATCHSET 0/2] xfs: fix broken MAXREFCOUNT handling Darrick J. Wong
2022-11-29 22:01 ` [PATCH 1/2] xfs: hoist refcount record merge predicates Darrick J. Wong
2022-11-29 22:35   ` Dave Chinner
2022-11-30  0:13     ` Darrick J. Wong
2022-11-30  9:24   ` Yang, Xiao/杨 晓
2022-11-29 22:01 ` [PATCH 2/2] xfs: estimate post-merge refcounts correctly Darrick J. Wong
2022-11-29 22:37   ` Dave Chinner
2022-11-30  9:32   ` Yang, Xiao/杨 晓
2022-11-30 18:49     ` Darrick J. Wong
2022-11-29 22:06 ` [RFC PATCH] xfs/179: modify test to trigger refcount update bugs Darrick J. Wong
2022-11-29 22:42   ` Dave Chinner
2022-11-30  0:19     ` Darrick J. Wong
2022-11-30 10:07   ` Yang, Xiao/杨 晓

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox