[PATCHSET] xfs: bug fixes for 6.13

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHSET] xfs: bug fixes for 6.13
@ 2024-11-18 23:01 Darrick J. Wong
  2024-11-18 23:04 ` [PATCH 01/10] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:01 UTC (permalink / raw)
  To: djwong, cem; +Cc: dan.carpenter, hch, stable, wozizhi, linux-xfs

Hi all,

Bug fixes for 6.13.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-fixes-6.13
---
Commits in this patchset:
 * xfs: fix off-by-one error in fsmap's end_daddr usage
 * xfs: metapath scrubber should use the already loaded inodes
 * xfs: keep quota directory inode loaded
 * xfs: return a 64-bit block count from xfs_btree_count_blocks
 * xfs: don't drop errno values when we fail to ficlone the entire range
 * xfs: separate healthy clearing mask during repair
 * xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
 * xfs: mark metadir repair tempfiles with IRECOVERY
 * xfs: fix null bno_hint handling in xfs_rtallocate_rtg
 * xfs: fix error bailout in xfs_rtginode_create
---
 fs/xfs/libxfs/xfs_btree.c        |    4 +-
 fs/xfs/libxfs/xfs_btree.h        |    2 +
 fs/xfs/libxfs/xfs_ialloc_btree.c |    4 ++
 fs/xfs/libxfs/xfs_rtgroup.c      |    2 +
 fs/xfs/scrub/agheader.c          |    6 ++-
 fs/xfs/scrub/agheader_repair.c   |    6 ++-
 fs/xfs/scrub/fscounters.c        |    2 +
 fs/xfs/scrub/health.c            |   57 ++++++++++++++++++--------------
 fs/xfs/scrub/ialloc.c            |    4 +-
 fs/xfs/scrub/metapath.c          |   68 +++++++++++++++-----------------------
 fs/xfs/scrub/refcount.c          |    2 +
 fs/xfs/scrub/scrub.h             |    6 +++
 fs/xfs/scrub/symlink_repair.c    |    3 +-
 fs/xfs/scrub/tempfile.c          |   10 ++++--
 fs/xfs/xfs_bmap_util.c           |    2 +
 fs/xfs/xfs_file.c                |    8 ++++
 fs/xfs/xfs_fsmap.c               |   38 ++++++++++++---------
 fs/xfs/xfs_inode.h               |    2 +
 fs/xfs/xfs_qm.c                  |   47 ++++++++++++++------------
 fs/xfs/xfs_qm.h                  |    1 +
 fs/xfs/xfs_rtalloc.c             |    2 +
 21 files changed, 151 insertions(+), 125 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/10] xfs: fix off-by-one error in fsmap's end_daddr usage
  2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
@ 2024-11-18 23:04 ` Darrick J. Wong
  2024-11-18 23:05 ` [PATCH 04/10] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:04 UTC (permalink / raw)
  To: djwong, cem; +Cc: stable, wozizhi, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit ca6448aed4f10a, we created an "end_daddr" variable to fix
fsmap reporting when the end of the range requested falls in the middle
of an unknown (aka free on the rmapbt) region.  Unfortunately, I didn't
notice that the the code sets end_daddr to the last sector of the device
but then uses that quantity to compute the length of the synthesized
mapping.

Zizhi Wo later observed that when end_daddr isn't set, we still don't
report the last fsblock on a device because in that case (aka when
info->last is true), the info->high mapping that we pass to
xfs_getfsmap_group_helper has a startblock that points to the last
fsblock.  This is also wrong because the code uses startblock to
compute the length of the synthesized mapping.

Fix the second problem by setting end_daddr unconditionally, and fix the
first problem by setting start_daddr to one past the end of the range to
query.

Cc: <stable@vger.kernel.org> # v6.11
Fixes: ca6448aed4f10a ("xfs: Fix missing interval for missing_owner in xfs fsmap")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Zizhi Wo <wozizhi@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_fsmap.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 82f2e0dd224997..3290dd8524a69a 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -163,7 +163,8 @@ struct xfs_getfsmap_info {
 	xfs_daddr_t		next_daddr;	/* next daddr we expect */
 	/* daddr of low fsmap key when we're using the rtbitmap */
 	xfs_daddr_t		low_daddr;
-	xfs_daddr_t		end_daddr;	/* daddr of high fsmap key */
+	/* daddr of high fsmap key, or the last daddr on the device */
+	xfs_daddr_t		end_daddr;
 	u64			missing_owner;	/* owner of holes */
 	u32			dev;		/* device id */
 	/*
@@ -387,8 +388,8 @@ xfs_getfsmap_group_helper(
 	 * we calculated from userspace's high key to synthesize the record.
 	 * Note that if the btree query found a mapping, there won't be a gap.
 	 */
-	if (info->last && info->end_daddr != XFS_BUF_DADDR_NULL)
-		frec->start_daddr = info->end_daddr;
+	if (info->last)
+		frec->start_daddr = info->end_daddr + 1;
 	else
 		frec->start_daddr = xfs_gbno_to_daddr(xg, startblock);
 
@@ -736,11 +737,10 @@ xfs_getfsmap_rtdev_rtbitmap_helper(
 	 * we calculated from userspace's high key to synthesize the record.
 	 * Note that if the btree query found a mapping, there won't be a gap.
 	 */
-	if (info->last && info->end_daddr != XFS_BUF_DADDR_NULL) {
-		frec.start_daddr = info->end_daddr;
-	} else {
+	if (info->last)
+		frec.start_daddr = info->end_daddr + 1;
+	else
 		frec.start_daddr = xfs_rtb_to_daddr(mp, start_rtb);
-	}
 
 	frec.len_daddr = XFS_FSB_TO_BB(mp, rtbcount);
 	return xfs_getfsmap_helper(tp, info, &frec);
@@ -933,7 +933,10 @@ xfs_getfsmap(
 	struct xfs_trans		*tp = NULL;
 	struct xfs_fsmap		dkeys[2];	/* per-dev keys */
 	struct xfs_getfsmap_dev		handlers[XFS_GETFSMAP_DEVS];
-	struct xfs_getfsmap_info	info = { NULL };
+	struct xfs_getfsmap_info	info = {
+		.fsmap_recs		= fsmap_recs,
+		.head			= head,
+	};
 	bool				use_rmap;
 	int				i;
 	int				error = 0;
@@ -998,9 +1001,6 @@ xfs_getfsmap(
 
 	info.next_daddr = head->fmh_keys[0].fmr_physical +
 			  head->fmh_keys[0].fmr_length;
-	info.end_daddr = XFS_BUF_DADDR_NULL;
-	info.fsmap_recs = fsmap_recs;
-	info.head = head;
 
 	/* For each device we support... */
 	for (i = 0; i < XFS_GETFSMAP_DEVS; i++) {
@@ -1013,17 +1013,23 @@ xfs_getfsmap(
 			break;
 
 		/*
-		 * If this device number matches the high key, we have
-		 * to pass the high key to the handler to limit the
-		 * query results.  If the device number exceeds the
-		 * low key, zero out the low key so that we get
-		 * everything from the beginning.
+		 * If this device number matches the high key, we have to pass
+		 * the high key to the handler to limit the query results, and
+		 * set the end_daddr so that we can synthesize records at the
+		 * end of the query range or device.
 		 */
 		if (handlers[i].dev == head->fmh_keys[1].fmr_device) {
 			dkeys[1] = head->fmh_keys[1];
 			info.end_daddr = min(handlers[i].nr_sectors - 1,
 					     dkeys[1].fmr_physical);
+		} else {
+			info.end_daddr = handlers[i].nr_sectors - 1;
 		}
+
+		/*
+		 * If the device number exceeds the low key, zero out the low
+		 * key so that we get everything from the beginning.
+		 */
 		if (handlers[i].dev > head->fmh_keys[0].fmr_device)
 			memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/10] xfs: return a 64-bit block count from xfs_btree_count_blocks
  2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
  2024-11-18 23:04 ` [PATCH 01/10] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
@ 2024-11-18 23:05 ` Darrick J. Wong
  2024-11-19  5:45   ` Christoph Hellwig
  2024-11-18 23:05 ` [PATCH 05/10] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:05 UTC (permalink / raw)
  To: djwong, cem; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree.c        |    4 ++--
 fs/xfs/libxfs/xfs_btree.h        |    2 +-
 fs/xfs/libxfs/xfs_ialloc_btree.c |    4 +++-
 fs/xfs/scrub/agheader.c          |    6 +++---
 fs/xfs/scrub/agheader_repair.c   |    6 +++---
 fs/xfs/scrub/fscounters.c        |    2 +-
 fs/xfs/scrub/ialloc.c            |    4 ++--
 fs/xfs/scrub/refcount.c          |    2 +-
 fs/xfs/xfs_bmap_util.c           |    2 +-
 9 files changed, 17 insertions(+), 15 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2b5fc5fd16435d..c748866ef92368 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -5144,7 +5144,7 @@ xfs_btree_count_blocks_helper(
 	int			level,
 	void			*data)
 {
-	xfs_extlen_t		*blocks = data;
+	xfs_filblks_t		*blocks = data;
 	(*blocks)++;
 
 	return 0;
@@ -5154,7 +5154,7 @@ xfs_btree_count_blocks_helper(
 int
 xfs_btree_count_blocks(
 	struct xfs_btree_cur	*cur,
-	xfs_extlen_t		*blocks)
+	xfs_filblks_t		*blocks)
 {
 	*blocks = 0;
 	return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 3b739459ebb0f4..c5bff273cae255 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -484,7 +484,7 @@ typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level,
 int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
 		xfs_btree_visit_blocks_fn fn, unsigned int flags, void *data);
 
-int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks);
+int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_filblks_t *blocks);
 
 union xfs_btree_rec *xfs_btree_rec_addr(struct xfs_btree_cur *cur, int n,
 		struct xfs_btree_block *block);
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 9b34896dd1a32f..6f270d8f4270cb 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -744,6 +744,7 @@ xfs_finobt_count_blocks(
 {
 	struct xfs_buf		*agbp = NULL;
 	struct xfs_btree_cur	*cur;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	error = xfs_ialloc_read_agi(pag, tp, 0, &agbp);
@@ -751,9 +752,10 @@ xfs_finobt_count_blocks(
 		return error;
 
 	cur = xfs_finobt_init_cursor(pag, tp, agbp);
-	error = xfs_btree_count_blocks(cur, tree_blocks);
+	error = xfs_btree_count_blocks(cur, &blocks);
 	xfs_btree_del_cursor(cur, error);
 	xfs_trans_brelse(tp, agbp);
+	*tree_blocks = blocks;
 
 	return error;
 }
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 61f80a6410c738..1d41b85478da9d 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -458,7 +458,7 @@ xchk_agf_xref_btreeblks(
 {
 	struct xfs_agf		*agf = sc->sa.agf_bp->b_addr;
 	struct xfs_mount	*mp = sc->mp;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	xfs_agblock_t		btreeblks;
 	int			error;
 
@@ -507,7 +507,7 @@ xchk_agf_xref_refcblks(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_agf		*agf = sc->sa.agf_bp->b_addr;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	if (!sc->sa.refc_cur)
@@ -840,7 +840,7 @@ xchk_agi_xref_fiblocks(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_agi		*agi = sc->sa.agi_bp->b_addr;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error = 0;
 
 	if (!xfs_has_inobtcounts(sc->mp))
diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 0fad0baaba2f69..b45d2b32051a63 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -256,7 +256,7 @@ xrep_agf_calc_from_btrees(
 	struct xfs_agf		*agf = agf_bp->b_addr;
 	struct xfs_mount	*mp = sc->mp;
 	xfs_agblock_t		btreeblks;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	/* Update the AGF counters from the bnobt. */
@@ -946,7 +946,7 @@ xrep_agi_calc_from_btrees(
 	if (error)
 		goto err;
 	if (xfs_has_inobtcounts(mp)) {
-		xfs_agblock_t	blocks;
+		xfs_filblks_t	blocks;
 
 		error = xfs_btree_count_blocks(cur, &blocks);
 		if (error)
@@ -959,7 +959,7 @@ xrep_agi_calc_from_btrees(
 	agi->agi_freecount = cpu_to_be32(freecount);
 
 	if (xfs_has_finobt(mp) && xfs_has_inobtcounts(mp)) {
-		xfs_agblock_t	blocks;
+		xfs_filblks_t	blocks;
 
 		cur = xfs_finobt_init_cursor(sc->sa.pag, sc->tp, agi_bp);
 		error = xfs_btree_count_blocks(cur, &blocks);
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index 4a50f8e0004092..ca23cf4db6c5ef 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -261,7 +261,7 @@ xchk_fscount_btreeblks(
 	struct xchk_fscounters	*fsc,
 	xfs_agnumber_t		agno)
 {
-	xfs_extlen_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	error = xchk_ag_init_existing(sc, agno, &sc->sa);
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index abad54c3621d44..4dc7c83dc08a40 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -650,8 +650,8 @@ xchk_iallocbt_xref_rmap_btreeblks(
 	struct xfs_scrub	*sc)
 {
 	xfs_filblks_t		blocks;
-	xfs_extlen_t		inobt_blocks = 0;
-	xfs_extlen_t		finobt_blocks = 0;
+	xfs_filblks_t		inobt_blocks = 0;
+	xfs_filblks_t		finobt_blocks = 0;
 	int			error;
 
 	if (!sc->sa.ino_cur || !sc->sa.rmap_cur ||
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index 2b6be75e942415..1c5e45cc64190c 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -491,7 +491,7 @@ xchk_refcount_xref_rmap(
 	struct xfs_scrub	*sc,
 	xfs_filblks_t		cow_blocks)
 {
-	xfs_extlen_t		refcbt_blocks = 0;
+	xfs_filblks_t		refcbt_blocks = 0;
 	xfs_filblks_t		blocks;
 	int			error;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 1fe676710394e1..d08505bdfe17a3 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -112,7 +112,7 @@ xfs_bmap_count_blocks(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	struct xfs_btree_cur	*cur;
-	xfs_extlen_t		btblocks = 0;
+	xfs_filblks_t		btblocks = 0;
 	int			error;
 
 	*nextents = 0;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/10] xfs: don't drop errno values when we fail to ficlone the entire range
  2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
  2024-11-18 23:04 ` [PATCH 01/10] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
  2024-11-18 23:05 ` [PATCH 04/10] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
@ 2024-11-18 23:05 ` Darrick J. Wong
  2024-11-19  5:46   ` Christoph Hellwig
  2024-11-18 23:06 ` [PATCH 06/10] xfs: separate healthy clearing mask during repair Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:05 UTC (permalink / raw)
  To: djwong, cem; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Way back when we first implemented FICLONE for XFS, life was simple --
either the the entire remapping completed, or something happened and we
had to return an errno explaining what happened.  Neither of those
ioctls support returning partial results, so it's all or nothing.

Then things got complicated when copy_file_range came along, because it
actually can return the number of bytes copied, so commit 3f68c1f562f1e4
tried to make it so that we could return a partial result if the
REMAP_FILE_CAN_SHORTEN flag is set.  This is also how FIDEDUPERANGE can
indicate that the kernel performed a partial deduplication.

Unfortunately, the logic is wrong if an error stops the remapping and
CAN_SHORTEN is not set.  Because those callers cannot return partial
results, it is an error for ->remap_file_range to return a positive
quantity that is less than the @len passed in.  Implementations really
should be returning a negative errno in this case, because that's what
btrfs (which introduced FICLONE{,RANGE}) did.

Therefore, ->remap_range implementations cannot silently drop an errno
that they might have when the number of bytes remapped is less than the
number of bytes requested and CAN_SHORTEN is not set.

Found by running generic/562 on a 64k fsblock filesystem and wondering
why it reported corrupt files.

Cc: <stable@vger.kernel.org> # v4.20
Fixes: 3fc9f5e409319e ("xfs: remove xfs_reflink_remap_range")
Really-Fixes: 3f68c1f562f1e4 ("xfs: support returning partial reflink results")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index b19916b11fd563..aba54e3c583661 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1228,6 +1228,14 @@ xfs_file_remap_range(
 	xfs_iunlock2_remapping(src, dest);
 	if (ret)
 		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
+	/*
+	 * If the caller did not set CAN_SHORTEN, then it is not prepared to
+	 * handle partial results -- either the whole remap succeeds, or we
+	 * must say why it did not.  In this case, any error should be returned
+	 * to the caller.
+	 */
+	if (ret && remapped < len && !(remap_flags & REMAP_FILE_CAN_SHORTEN))
+		return ret;
 	return remapped > 0 ? remapped : ret;
 }
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/10] xfs: separate healthy clearing mask during repair
  2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2024-11-18 23:05 ` [PATCH 05/10] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
@ 2024-11-18 23:06 ` Darrick J. Wong
  2024-11-19  5:47   ` Christoph Hellwig
  2024-11-18 23:06 ` [PATCH 07/10] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
  2024-11-18 23:06 ` [PATCH 09/10] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit d9041681dd2f53 we introduced some XFS_SICK_*ZAPPED flags so
that the inode record repair code could clean up a damaged inode record
enough to iget the inode but still be able to remember that the higher
level repair code needs to be called.  As part of that, we introduced a
xchk_mark_healthy_if_clean helper that is supposed to cause the ZAPPED
state to be removed if that higher level metadata actually checks out.
This was done by setting additional bits in sick_mask hoping that
xchk_update_health will clear all those bits after a healthy scrub.

Unfortunately, that's not quite what sick_mask means -- bits in that
mask are indeed cleared if the metadata is healthy, but they're set if
the metadata is NOT healthy.  fsck is only intended to set the ZAPPED
bits explicitly.

If something else sets the CORRUPT/XCORRUPT state after the
xchk_mark_healthy_if_clean call, we end up marking the metadata zapped.
This can happen if the following sequence happens:

1. Scrub runs, discovers that the metadata is fine but could be
   optimized and calls xchk_mark_healthy_if_clean on a ZAPPED flag.
   That causes the ZAPPED flag to be set in sick_mask because the
   metadata is not CORRUPT or XCORRUPT.

2. Repair runs to optimize the metadata.

3. Some other metadata used for cross-referencing in (1) becomes
   corrupt.

4. Post-repair scrub runs, but this time it sets CORRUPT or XCORRUPT due
   to the events in (3).

5. Now the xchk_health_update sets the ZAPPED flag on the metadata we
   just repaired.  This is not the correct state.

Fix this by moving the "if healthy" mask to a separate field, and only
ever using it to clear the sick state.

Cc: <stable@vger.kernel.org> # v6.8
Fixes: d9041681dd2f53 ("xfs: set inode sick state flags when we zap either ondisk fork")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/health.c |   57 ++++++++++++++++++++++++++++---------------------
 fs/xfs/scrub/scrub.h  |    6 +++++
 2 files changed, 39 insertions(+), 24 deletions(-)


diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
index ce86bdad37fa42..ccc6ca5934ca6a 100644
--- a/fs/xfs/scrub/health.c
+++ b/fs/xfs/scrub/health.c
@@ -71,7 +71,8 @@
 /* Map our scrub type to a sick mask and a set of health update functions. */
 
 enum xchk_health_group {
-	XHG_FS = 1,
+	XHG_NONE = 1,
+	XHG_FS,
 	XHG_AG,
 	XHG_INO,
 	XHG_RTGROUP,
@@ -83,6 +84,7 @@ struct xchk_health_map {
 };
 
 static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = {
+	[XFS_SCRUB_TYPE_PROBE]		= { XHG_NONE,  0 },
 	[XFS_SCRUB_TYPE_SB]		= { XHG_AG,  XFS_SICK_AG_SB },
 	[XFS_SCRUB_TYPE_AGF]		= { XHG_AG,  XFS_SICK_AG_AGF },
 	[XFS_SCRUB_TYPE_AGFL]		= { XHG_AG,  XFS_SICK_AG_AGFL },
@@ -133,7 +135,7 @@ xchk_mark_healthy_if_clean(
 {
 	if (!(sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
 				  XFS_SCRUB_OFLAG_XCORRUPT)))
-		sc->sick_mask |= mask;
+		sc->healthy_mask |= mask;
 }
 
 /*
@@ -189,6 +191,7 @@ xchk_update_health(
 {
 	struct xfs_perag	*pag;
 	struct xfs_rtgroup	*rtg;
+	unsigned int		mask = sc->sick_mask;
 	bool			bad;
 
 	/*
@@ -203,50 +206,56 @@ xchk_update_health(
 		return;
 	}
 
-	if (!sc->sick_mask)
-		return;
-
 	bad = (sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
 				   XFS_SCRUB_OFLAG_XCORRUPT));
+	if (!bad)
+		mask |= sc->healthy_mask;
 	switch (type_to_health_flag[sc->sm->sm_type].group) {
+	case XHG_NONE:
+		break;
 	case XHG_AG:
+		if (!mask)
+			return;
 		pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
 		if (bad)
-			xfs_group_mark_corrupt(pag_group(pag), sc->sick_mask);
+			xfs_group_mark_corrupt(pag_group(pag), mask);
 		else
-			xfs_group_mark_healthy(pag_group(pag), sc->sick_mask);
+			xfs_group_mark_healthy(pag_group(pag), mask);
 		xfs_perag_put(pag);
 		break;
 	case XHG_INO:
 		if (!sc->ip)
 			return;
-		if (bad) {
-			unsigned int	mask = sc->sick_mask;
-
-			/*
-			 * If we're coming in for repairs then we don't want
-			 * sickness flags to propagate to the incore health
-			 * status if the inode gets inactivated before we can
-			 * fix it.
-			 */
-			if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
-				mask |= XFS_SICK_INO_FORGET;
+		/*
+		 * If we're coming in for repairs then we don't want sickness
+		 * flags to propagate to the incore health status if the inode
+		 * gets inactivated before we can fix it.
+		 */
+		if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+			mask |= XFS_SICK_INO_FORGET;
+		if (!mask)
+			return;
+		if (bad)
 			xfs_inode_mark_corrupt(sc->ip, mask);
-		} else
-			xfs_inode_mark_healthy(sc->ip, sc->sick_mask);
+		else
+			xfs_inode_mark_healthy(sc->ip, mask);
 		break;
 	case XHG_FS:
+		if (!mask)
+			return;
 		if (bad)
-			xfs_fs_mark_corrupt(sc->mp, sc->sick_mask);
+			xfs_fs_mark_corrupt(sc->mp, mask);
 		else
-			xfs_fs_mark_healthy(sc->mp, sc->sick_mask);
+			xfs_fs_mark_healthy(sc->mp, mask);
 		break;
 	case XHG_RTGROUP:
+		if (!mask)
+			return;
 		rtg = xfs_rtgroup_get(sc->mp, sc->sm->sm_agno);
 		if (bad)
-			xfs_group_mark_corrupt(rtg_group(rtg), sc->sick_mask);
+			xfs_group_mark_corrupt(rtg_group(rtg), mask);
 		else
-			xfs_group_mark_healthy(rtg_group(rtg), sc->sick_mask);
+			xfs_group_mark_healthy(rtg_group(rtg), mask);
 		xfs_rtgroup_put(rtg);
 		break;
 	default:
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index a7fda3e2b01377..5dbbe93cb49bfa 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -184,6 +184,12 @@ struct xfs_scrub {
 	 */
 	unsigned int			sick_mask;
 
+	/*
+	 * Clear these XFS_SICK_* flags but only if the scan is ok.  Useful for
+	 * removing ZAPPED flags after a repair.
+	 */
+	unsigned int			healthy_mask;
+
 	/* next time we want to cond_resched() */
 	struct xchk_relax		relax;
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/10] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
  2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2024-11-18 23:06 ` [PATCH 06/10] xfs: separate healthy clearing mask during repair Darrick J. Wong
@ 2024-11-18 23:06 ` Darrick J. Wong
  2024-11-19  5:47   ` Christoph Hellwig
  2024-11-18 23:06 ` [PATCH 09/10] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If we need to reset a symlink target to the "durr it's busted" string,
then we clear the zapped flag as well.  However, this should be using
the provided helper so that we don't set the zapped state on an
otherwise ok symlink.

Cc: <stable@vger.kernel.org> # v6.10
Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/symlink_repair.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c
index d015a86ef460fb..953ce7be78dc2f 100644
--- a/fs/xfs/scrub/symlink_repair.c
+++ b/fs/xfs/scrub/symlink_repair.c
@@ -36,6 +36,7 @@
 #include "scrub/tempfile.h"
 #include "scrub/tempexch.h"
 #include "scrub/reap.h"
+#include "scrub/health.h"
 
 /*
  * Symbolic Link Repair
@@ -233,7 +234,7 @@ xrep_symlink_salvage(
 	 * target zapped flag.
 	 */
 	if (buflen == 0) {
-		sc->sick_mask |= XFS_SICK_INO_SYMLINK_ZAPPED;
+		xchk_mark_healthy_if_clean(sc, XFS_SICK_INO_SYMLINK_ZAPPED);
 		sprintf(target_buf, DUMMY_TARGET);
 	}
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/10] xfs: fix null bno_hint handling in xfs_rtallocate_rtg
  2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2024-11-18 23:06 ` [PATCH 07/10] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
@ 2024-11-18 23:06 ` Darrick J. Wong
  2024-11-19  5:48   ` Christoph Hellwig
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2024-11-18 23:06 UTC (permalink / raw)
  To: djwong, cem; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

xfs_bmap_rtalloc initializes the bno_hint variable to NULLRTBLOCK (aka
NULLFSBLOCK).  If the allocation request is for a file range that's
adjacent to an existing mapping, it will then change bno_hint to the
blkno hint in the bmalloca structure.

In other words, bno_hint is either a rt block number, or it's all 1s.
Unfortunately, commit ec12f97f1b8a8f didn't take the NULLRTBLOCK state
into account, which means that it tries to translate that into a
realtime extent number.  We then end up with an obnoxiously high rtx
number and pointlessly feed that to the near allocator.  This often
fails and falls back to the by-size allocator.  Seeing as we had no
locality hint anyway, this is a waste of time.

Fix the code to detect a lack of bno_hint correctly.  This was detected
by running xfs/009 with metadir enabled and a 28k rt extent size.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: ec12f97f1b8a8f ("xfs: make the rtalloc start hint a xfs_rtblock_t")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 0cb534d71119a5..fcfa6e0eb3ad2a 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1827,7 +1827,7 @@ xfs_rtallocate_rtg(
 	 * For an allocation to an empty file at offset 0, pick an extent that
 	 * will space things out in the rt area.
 	 */
-	if (bno_hint)
+	if (bno_hint != NULLFSBLOCK)
 		start = xfs_rtb_to_rtx(args.mp, bno_hint);
 	else if (!xfs_has_rtgroups(args.mp) && initial_user_data)
 		start = xfs_rtpick_extent(args.rtg, tp, maxlen);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 04/10] xfs: return a 64-bit block count from xfs_btree_count_blocks
  2024-11-18 23:05 ` [PATCH 04/10] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
@ 2024-11-19  5:45   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-19  5:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 05/10] xfs: don't drop errno values when we fail to ficlone the entire range
  2024-11-18 23:05 ` [PATCH 05/10] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
@ 2024-11-19  5:46   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-19  5:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 06/10] xfs: separate healthy clearing mask during repair
  2024-11-18 23:06 ` [PATCH 06/10] xfs: separate healthy clearing mask during repair Darrick J. Wong
@ 2024-11-19  5:47   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-19  5:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 07/10] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
  2024-11-18 23:06 ` [PATCH 07/10] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
@ 2024-11-19  5:47   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-19  5:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 09/10] xfs: fix null bno_hint handling in xfs_rtallocate_rtg
  2024-11-18 23:06 ` [PATCH 09/10] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
@ 2024-11-19  5:48   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-11-19  5:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-11-19  5:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-18 23:01 [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
2024-11-18 23:04 ` [PATCH 01/10] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
2024-11-18 23:05 ` [PATCH 04/10] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
2024-11-19  5:45   ` Christoph Hellwig
2024-11-18 23:05 ` [PATCH 05/10] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
2024-11-19  5:46   ` Christoph Hellwig
2024-11-18 23:06 ` [PATCH 06/10] xfs: separate healthy clearing mask during repair Darrick J. Wong
2024-11-19  5:47   ` Christoph Hellwig
2024-11-18 23:06 ` [PATCH 07/10] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
2024-11-19  5:47   ` Christoph Hellwig
2024-11-18 23:06 ` [PATCH 09/10] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
2024-11-19  5:48   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox