[PATCHSET] xfs: bug fixes for 6.13

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHSET] xfs: bug fixes for 6.13
       [not found] <20241126011838.GI9438@frogsfrogsfrogs>
@ 2024-11-26  1:20 ` Darrick J. Wong
  2024-11-26  1:24   ` [PATCH 01/21] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
                     ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:20 UTC (permalink / raw)
  To: djwong; +Cc: stable, hch, wozizhi, dan.carpenter, linux-xfs

Hi all,

Bug fixes for 6.13.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-6.13-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=xfs-6.13-fixes
---
Commits in this patchset:
 * xfs: fix off-by-one error in fsmap's end_daddr usage
 * xfs: metapath scrubber should use the already loaded inodes
 * xfs: keep quota directory inode loaded
 * xfs: return a 64-bit block count from xfs_btree_count_blocks
 * xfs: don't drop errno values when we fail to ficlone the entire range
 * xfs: separate healthy clearing mask during repair
 * xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
 * xfs: mark metadir repair tempfiles with IRECOVERY
 * xfs: fix null bno_hint handling in xfs_rtallocate_rtg
 * xfs: fix error bailout in xfs_rtginode_create
 * xfs: update btree keys correctly when _insrec splits an inode root block
 * xfs: fix scrub tracepoints when inode-rooted btrees are involved
 * xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
 * xfs: only run precommits once per transaction object
 * xfs: remove recursion in __xfs_trans_commit
 * xfs: don't lose solo superblock counter update transactions
 * xfs: don't lose solo dquot update transactions
 * xfs: separate dquot buffer reads from xfs_dqflush
 * xfs: clean up log item accesses in xfs_qm_dqflush{,_done}
 * xfs: attach dquot buffer to dquot log item buffer
 * xfs: convert quotacheck to attach dquot buffers
---
 fs/xfs/libxfs/xfs_btree.c        |   33 +++++--
 fs/xfs/libxfs/xfs_btree.h        |    2 
 fs/xfs/libxfs/xfs_ialloc_btree.c |    4 +
 fs/xfs/libxfs/xfs_rtgroup.c      |    2 
 fs/xfs/scrub/agheader.c          |    6 +
 fs/xfs/scrub/agheader_repair.c   |    6 +
 fs/xfs/scrub/fscounters.c        |    2 
 fs/xfs/scrub/health.c            |   57 +++++++-----
 fs/xfs/scrub/ialloc.c            |    4 -
 fs/xfs/scrub/metapath.c          |   68 +++++---------
 fs/xfs/scrub/refcount.c          |    2 
 fs/xfs/scrub/scrub.h             |    6 +
 fs/xfs/scrub/symlink_repair.c    |    3 -
 fs/xfs/scrub/tempfile.c          |   10 ++
 fs/xfs/scrub/trace.h             |    2 
 fs/xfs/xfs_bmap_util.c           |    2 
 fs/xfs/xfs_dquot.c               |  185 ++++++++++++++++++++++++++++++++------
 fs/xfs/xfs_dquot.h               |    5 +
 fs/xfs/xfs_dquot_item.c          |   51 ++++++++--
 fs/xfs/xfs_dquot_item.h          |    7 +
 fs/xfs/xfs_file.c                |    8 ++
 fs/xfs/xfs_fsmap.c               |   38 +++++---
 fs/xfs/xfs_inode.h               |    2 
 fs/xfs/xfs_qm.c                  |   92 +++++++++++++------
 fs/xfs/xfs_qm.h                  |    1 
 fs/xfs/xfs_quota.h               |    7 +
 fs/xfs/xfs_rtalloc.c             |    2 
 fs/xfs/xfs_trans.c               |   58 ++++++------
 fs/xfs/xfs_trans_ail.c           |    2 
 fs/xfs/xfs_trans_dquot.c         |   31 +++++-
 30 files changed, 475 insertions(+), 223 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/21] xfs: fix off-by-one error in fsmap's end_daddr usage
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
@ 2024-11-26  1:24   ` Darrick J. Wong
  2024-11-26  1:25   ` [PATCH 04/21] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:24 UTC (permalink / raw)
  To: djwong; +Cc: stable, wozizhi, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit ca6448aed4f10a, we created an "end_daddr" variable to fix
fsmap reporting when the end of the range requested falls in the middle
of an unknown (aka free on the rmapbt) region.  Unfortunately, I didn't
notice that the the code sets end_daddr to the last sector of the device
but then uses that quantity to compute the length of the synthesized
mapping.

Zizhi Wo later observed that when end_daddr isn't set, we still don't
report the last fsblock on a device because in that case (aka when
info->last is true), the info->high mapping that we pass to
xfs_getfsmap_group_helper has a startblock that points to the last
fsblock.  This is also wrong because the code uses startblock to
compute the length of the synthesized mapping.

Fix the second problem by setting end_daddr unconditionally, and fix the
first problem by setting start_daddr to one past the end of the range to
query.

Cc: <stable@vger.kernel.org> # v6.11
Fixes: ca6448aed4f10a ("xfs: Fix missing interval for missing_owner in xfs fsmap")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reported-by: Zizhi Wo <wozizhi@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_fsmap.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 82f2e0dd224997..3290dd8524a69a 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -163,7 +163,8 @@ struct xfs_getfsmap_info {
 	xfs_daddr_t		next_daddr;	/* next daddr we expect */
 	/* daddr of low fsmap key when we're using the rtbitmap */
 	xfs_daddr_t		low_daddr;
-	xfs_daddr_t		end_daddr;	/* daddr of high fsmap key */
+	/* daddr of high fsmap key, or the last daddr on the device */
+	xfs_daddr_t		end_daddr;
 	u64			missing_owner;	/* owner of holes */
 	u32			dev;		/* device id */
 	/*
@@ -387,8 +388,8 @@ xfs_getfsmap_group_helper(
 	 * we calculated from userspace's high key to synthesize the record.
 	 * Note that if the btree query found a mapping, there won't be a gap.
 	 */
-	if (info->last && info->end_daddr != XFS_BUF_DADDR_NULL)
-		frec->start_daddr = info->end_daddr;
+	if (info->last)
+		frec->start_daddr = info->end_daddr + 1;
 	else
 		frec->start_daddr = xfs_gbno_to_daddr(xg, startblock);
 
@@ -736,11 +737,10 @@ xfs_getfsmap_rtdev_rtbitmap_helper(
 	 * we calculated from userspace's high key to synthesize the record.
 	 * Note that if the btree query found a mapping, there won't be a gap.
 	 */
-	if (info->last && info->end_daddr != XFS_BUF_DADDR_NULL) {
-		frec.start_daddr = info->end_daddr;
-	} else {
+	if (info->last)
+		frec.start_daddr = info->end_daddr + 1;
+	else
 		frec.start_daddr = xfs_rtb_to_daddr(mp, start_rtb);
-	}
 
 	frec.len_daddr = XFS_FSB_TO_BB(mp, rtbcount);
 	return xfs_getfsmap_helper(tp, info, &frec);
@@ -933,7 +933,10 @@ xfs_getfsmap(
 	struct xfs_trans		*tp = NULL;
 	struct xfs_fsmap		dkeys[2];	/* per-dev keys */
 	struct xfs_getfsmap_dev		handlers[XFS_GETFSMAP_DEVS];
-	struct xfs_getfsmap_info	info = { NULL };
+	struct xfs_getfsmap_info	info = {
+		.fsmap_recs		= fsmap_recs,
+		.head			= head,
+	};
 	bool				use_rmap;
 	int				i;
 	int				error = 0;
@@ -998,9 +1001,6 @@ xfs_getfsmap(
 
 	info.next_daddr = head->fmh_keys[0].fmr_physical +
 			  head->fmh_keys[0].fmr_length;
-	info.end_daddr = XFS_BUF_DADDR_NULL;
-	info.fsmap_recs = fsmap_recs;
-	info.head = head;
 
 	/* For each device we support... */
 	for (i = 0; i < XFS_GETFSMAP_DEVS; i++) {
@@ -1013,17 +1013,23 @@ xfs_getfsmap(
 			break;
 
 		/*
-		 * If this device number matches the high key, we have
-		 * to pass the high key to the handler to limit the
-		 * query results.  If the device number exceeds the
-		 * low key, zero out the low key so that we get
-		 * everything from the beginning.
+		 * If this device number matches the high key, we have to pass
+		 * the high key to the handler to limit the query results, and
+		 * set the end_daddr so that we can synthesize records at the
+		 * end of the query range or device.
 		 */
 		if (handlers[i].dev == head->fmh_keys[1].fmr_device) {
 			dkeys[1] = head->fmh_keys[1];
 			info.end_daddr = min(handlers[i].nr_sectors - 1,
 					     dkeys[1].fmr_physical);
+		} else {
+			info.end_daddr = handlers[i].nr_sectors - 1;
 		}
+
+		/*
+		 * If the device number exceeds the low key, zero out the low
+		 * key so that we get everything from the beginning.
+		 */
 		if (handlers[i].dev > head->fmh_keys[0].fmr_device)
 			memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/21] xfs: return a 64-bit block count from xfs_btree_count_blocks
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
  2024-11-26  1:24   ` [PATCH 01/21] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
@ 2024-11-26  1:25   ` Darrick J. Wong
  2024-11-26  1:26   ` [PATCH 05/21] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:25 UTC (permalink / raw)
  To: djwong; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_btree.c        |    4 ++--
 fs/xfs/libxfs/xfs_btree.h        |    2 +-
 fs/xfs/libxfs/xfs_ialloc_btree.c |    4 +++-
 fs/xfs/scrub/agheader.c          |    6 +++---
 fs/xfs/scrub/agheader_repair.c   |    6 +++---
 fs/xfs/scrub/fscounters.c        |    2 +-
 fs/xfs/scrub/ialloc.c            |    4 ++--
 fs/xfs/scrub/refcount.c          |    2 +-
 fs/xfs/xfs_bmap_util.c           |    2 +-
 9 files changed, 17 insertions(+), 15 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2b5fc5fd16435d..c748866ef92368 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -5144,7 +5144,7 @@ xfs_btree_count_blocks_helper(
 	int			level,
 	void			*data)
 {
-	xfs_extlen_t		*blocks = data;
+	xfs_filblks_t		*blocks = data;
 	(*blocks)++;
 
 	return 0;
@@ -5154,7 +5154,7 @@ xfs_btree_count_blocks_helper(
 int
 xfs_btree_count_blocks(
 	struct xfs_btree_cur	*cur,
-	xfs_extlen_t		*blocks)
+	xfs_filblks_t		*blocks)
 {
 	*blocks = 0;
 	return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 3b739459ebb0f4..c5bff273cae255 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -484,7 +484,7 @@ typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level,
 int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
 		xfs_btree_visit_blocks_fn fn, unsigned int flags, void *data);
 
-int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks);
+int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_filblks_t *blocks);
 
 union xfs_btree_rec *xfs_btree_rec_addr(struct xfs_btree_cur *cur, int n,
 		struct xfs_btree_block *block);
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 9b34896dd1a32f..6f270d8f4270cb 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -744,6 +744,7 @@ xfs_finobt_count_blocks(
 {
 	struct xfs_buf		*agbp = NULL;
 	struct xfs_btree_cur	*cur;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	error = xfs_ialloc_read_agi(pag, tp, 0, &agbp);
@@ -751,9 +752,10 @@ xfs_finobt_count_blocks(
 		return error;
 
 	cur = xfs_finobt_init_cursor(pag, tp, agbp);
-	error = xfs_btree_count_blocks(cur, tree_blocks);
+	error = xfs_btree_count_blocks(cur, &blocks);
 	xfs_btree_del_cursor(cur, error);
 	xfs_trans_brelse(tp, agbp);
+	*tree_blocks = blocks;
 
 	return error;
 }
diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index 61f80a6410c738..1d41b85478da9d 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -458,7 +458,7 @@ xchk_agf_xref_btreeblks(
 {
 	struct xfs_agf		*agf = sc->sa.agf_bp->b_addr;
 	struct xfs_mount	*mp = sc->mp;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	xfs_agblock_t		btreeblks;
 	int			error;
 
@@ -507,7 +507,7 @@ xchk_agf_xref_refcblks(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_agf		*agf = sc->sa.agf_bp->b_addr;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	if (!sc->sa.refc_cur)
@@ -840,7 +840,7 @@ xchk_agi_xref_fiblocks(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_agi		*agi = sc->sa.agi_bp->b_addr;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error = 0;
 
 	if (!xfs_has_inobtcounts(sc->mp))
diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 0fad0baaba2f69..b45d2b32051a63 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -256,7 +256,7 @@ xrep_agf_calc_from_btrees(
 	struct xfs_agf		*agf = agf_bp->b_addr;
 	struct xfs_mount	*mp = sc->mp;
 	xfs_agblock_t		btreeblks;
-	xfs_agblock_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	/* Update the AGF counters from the bnobt. */
@@ -946,7 +946,7 @@ xrep_agi_calc_from_btrees(
 	if (error)
 		goto err;
 	if (xfs_has_inobtcounts(mp)) {
-		xfs_agblock_t	blocks;
+		xfs_filblks_t	blocks;
 
 		error = xfs_btree_count_blocks(cur, &blocks);
 		if (error)
@@ -959,7 +959,7 @@ xrep_agi_calc_from_btrees(
 	agi->agi_freecount = cpu_to_be32(freecount);
 
 	if (xfs_has_finobt(mp) && xfs_has_inobtcounts(mp)) {
-		xfs_agblock_t	blocks;
+		xfs_filblks_t	blocks;
 
 		cur = xfs_finobt_init_cursor(sc->sa.pag, sc->tp, agi_bp);
 		error = xfs_btree_count_blocks(cur, &blocks);
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index 4a50f8e0004092..ca23cf4db6c5ef 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -261,7 +261,7 @@ xchk_fscount_btreeblks(
 	struct xchk_fscounters	*fsc,
 	xfs_agnumber_t		agno)
 {
-	xfs_extlen_t		blocks;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	error = xchk_ag_init_existing(sc, agno, &sc->sa);
diff --git a/fs/xfs/scrub/ialloc.c b/fs/xfs/scrub/ialloc.c
index abad54c3621d44..4dc7c83dc08a40 100644
--- a/fs/xfs/scrub/ialloc.c
+++ b/fs/xfs/scrub/ialloc.c
@@ -650,8 +650,8 @@ xchk_iallocbt_xref_rmap_btreeblks(
 	struct xfs_scrub	*sc)
 {
 	xfs_filblks_t		blocks;
-	xfs_extlen_t		inobt_blocks = 0;
-	xfs_extlen_t		finobt_blocks = 0;
+	xfs_filblks_t		inobt_blocks = 0;
+	xfs_filblks_t		finobt_blocks = 0;
 	int			error;
 
 	if (!sc->sa.ino_cur || !sc->sa.rmap_cur ||
diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
index 2b6be75e942415..1c5e45cc64190c 100644
--- a/fs/xfs/scrub/refcount.c
+++ b/fs/xfs/scrub/refcount.c
@@ -491,7 +491,7 @@ xchk_refcount_xref_rmap(
 	struct xfs_scrub	*sc,
 	xfs_filblks_t		cow_blocks)
 {
-	xfs_extlen_t		refcbt_blocks = 0;
+	xfs_filblks_t		refcbt_blocks = 0;
 	xfs_filblks_t		blocks;
 	int			error;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index a59bbe767a7dc4..0836fea2d6d814 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -103,7 +103,7 @@ xfs_bmap_count_blocks(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	struct xfs_btree_cur	*cur;
-	xfs_extlen_t		btblocks = 0;
+	xfs_filblks_t		btblocks = 0;
 	int			error;
 
 	*nextents = 0;


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/21] xfs: don't drop errno values when we fail to ficlone the entire range
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
  2024-11-26  1:24   ` [PATCH 01/21] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
  2024-11-26  1:25   ` [PATCH 04/21] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
@ 2024-11-26  1:26   ` Darrick J. Wong
  2024-11-26  1:26   ` [PATCH 06/21] xfs: separate healthy clearing mask during repair Darrick J. Wong
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:26 UTC (permalink / raw)
  To: djwong; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Way back when we first implemented FICLONE for XFS, life was simple --
either the the entire remapping completed, or something happened and we
had to return an errno explaining what happened.  Neither of those
ioctls support returning partial results, so it's all or nothing.

Then things got complicated when copy_file_range came along, because it
actually can return the number of bytes copied, so commit 3f68c1f562f1e4
tried to make it so that we could return a partial result if the
REMAP_FILE_CAN_SHORTEN flag is set.  This is also how FIDEDUPERANGE can
indicate that the kernel performed a partial deduplication.

Unfortunately, the logic is wrong if an error stops the remapping and
CAN_SHORTEN is not set.  Because those callers cannot return partial
results, it is an error for ->remap_file_range to return a positive
quantity that is less than the @len passed in.  Implementations really
should be returning a negative errno in this case, because that's what
btrfs (which introduced FICLONE{,RANGE}) did.

Therefore, ->remap_range implementations cannot silently drop an errno
that they might have when the number of bytes remapped is less than the
number of bytes requested and CAN_SHORTEN is not set.

Found by running generic/562 on a 64k fsblock filesystem and wondering
why it reported corrupt files.

Cc: <stable@vger.kernel.org> # v4.20
Fixes: 3fc9f5e409319e ("xfs: remove xfs_reflink_remap_range")
Really-Fixes: 3f68c1f562f1e4 ("xfs: support returning partial reflink results")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index c6de6b865ef11c..73562ff1c956f0 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1228,6 +1228,14 @@ xfs_file_remap_range(
 	xfs_iunlock2_remapping(src, dest);
 	if (ret)
 		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
+	/*
+	 * If the caller did not set CAN_SHORTEN, then it is not prepared to
+	 * handle partial results -- either the whole remap succeeds, or we
+	 * must say why it did not.  In this case, any error should be returned
+	 * to the caller.
+	 */
+	if (ret && remapped < len && !(remap_flags & REMAP_FILE_CAN_SHORTEN))
+		return ret;
 	return remapped > 0 ? remapped : ret;
 }

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/21] xfs: separate healthy clearing mask during repair
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-11-26  1:26   ` [PATCH 05/21] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
@ 2024-11-26  1:26   ` Darrick J. Wong
  2024-11-26  1:26   ` [PATCH 07/21] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:26 UTC (permalink / raw)
  To: djwong; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit d9041681dd2f53 we introduced some XFS_SICK_*ZAPPED flags so
that the inode record repair code could clean up a damaged inode record
enough to iget the inode but still be able to remember that the higher
level repair code needs to be called.  As part of that, we introduced a
xchk_mark_healthy_if_clean helper that is supposed to cause the ZAPPED
state to be removed if that higher level metadata actually checks out.
This was done by setting additional bits in sick_mask hoping that
xchk_update_health will clear all those bits after a healthy scrub.

Unfortunately, that's not quite what sick_mask means -- bits in that
mask are indeed cleared if the metadata is healthy, but they're set if
the metadata is NOT healthy.  fsck is only intended to set the ZAPPED
bits explicitly.

If something else sets the CORRUPT/XCORRUPT state after the
xchk_mark_healthy_if_clean call, we end up marking the metadata zapped.
This can happen if the following sequence happens:

1. Scrub runs, discovers that the metadata is fine but could be
   optimized and calls xchk_mark_healthy_if_clean on a ZAPPED flag.
   That causes the ZAPPED flag to be set in sick_mask because the
   metadata is not CORRUPT or XCORRUPT.

2. Repair runs to optimize the metadata.

3. Some other metadata used for cross-referencing in (1) becomes
   corrupt.

4. Post-repair scrub runs, but this time it sets CORRUPT or XCORRUPT due
   to the events in (3).

5. Now the xchk_health_update sets the ZAPPED flag on the metadata we
   just repaired.  This is not the correct state.

Fix this by moving the "if healthy" mask to a separate field, and only
ever using it to clear the sick state.

Cc: <stable@vger.kernel.org> # v6.8
Fixes: d9041681dd2f53 ("xfs: set inode sick state flags when we zap either ondisk fork")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/health.c |   57 ++++++++++++++++++++++++++++---------------------
 fs/xfs/scrub/scrub.h  |    6 +++++
 2 files changed, 39 insertions(+), 24 deletions(-)


diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
index ce86bdad37fa42..ccc6ca5934ca6a 100644
--- a/fs/xfs/scrub/health.c
+++ b/fs/xfs/scrub/health.c
@@ -71,7 +71,8 @@
 /* Map our scrub type to a sick mask and a set of health update functions. */
 
 enum xchk_health_group {
-	XHG_FS = 1,
+	XHG_NONE = 1,
+	XHG_FS,
 	XHG_AG,
 	XHG_INO,
 	XHG_RTGROUP,
@@ -83,6 +84,7 @@ struct xchk_health_map {
 };
 
 static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = {
+	[XFS_SCRUB_TYPE_PROBE]		= { XHG_NONE,  0 },
 	[XFS_SCRUB_TYPE_SB]		= { XHG_AG,  XFS_SICK_AG_SB },
 	[XFS_SCRUB_TYPE_AGF]		= { XHG_AG,  XFS_SICK_AG_AGF },
 	[XFS_SCRUB_TYPE_AGFL]		= { XHG_AG,  XFS_SICK_AG_AGFL },
@@ -133,7 +135,7 @@ xchk_mark_healthy_if_clean(
 {
 	if (!(sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
 				  XFS_SCRUB_OFLAG_XCORRUPT)))
-		sc->sick_mask |= mask;
+		sc->healthy_mask |= mask;
 }
 
 /*
@@ -189,6 +191,7 @@ xchk_update_health(
 {
 	struct xfs_perag	*pag;
 	struct xfs_rtgroup	*rtg;
+	unsigned int		mask = sc->sick_mask;
 	bool			bad;
 
 	/*
@@ -203,50 +206,56 @@ xchk_update_health(
 		return;
 	}
 
-	if (!sc->sick_mask)
-		return;
-
 	bad = (sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
 				   XFS_SCRUB_OFLAG_XCORRUPT));
+	if (!bad)
+		mask |= sc->healthy_mask;
 	switch (type_to_health_flag[sc->sm->sm_type].group) {
+	case XHG_NONE:
+		break;
 	case XHG_AG:
+		if (!mask)
+			return;
 		pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
 		if (bad)
-			xfs_group_mark_corrupt(pag_group(pag), sc->sick_mask);
+			xfs_group_mark_corrupt(pag_group(pag), mask);
 		else
-			xfs_group_mark_healthy(pag_group(pag), sc->sick_mask);
+			xfs_group_mark_healthy(pag_group(pag), mask);
 		xfs_perag_put(pag);
 		break;
 	case XHG_INO:
 		if (!sc->ip)
 			return;
-		if (bad) {
-			unsigned int	mask = sc->sick_mask;
-
-			/*
-			 * If we're coming in for repairs then we don't want
-			 * sickness flags to propagate to the incore health
-			 * status if the inode gets inactivated before we can
-			 * fix it.
-			 */
-			if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
-				mask |= XFS_SICK_INO_FORGET;
+		/*
+		 * If we're coming in for repairs then we don't want sickness
+		 * flags to propagate to the incore health status if the inode
+		 * gets inactivated before we can fix it.
+		 */
+		if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
+			mask |= XFS_SICK_INO_FORGET;
+		if (!mask)
+			return;
+		if (bad)
 			xfs_inode_mark_corrupt(sc->ip, mask);
-		} else
-			xfs_inode_mark_healthy(sc->ip, sc->sick_mask);
+		else
+			xfs_inode_mark_healthy(sc->ip, mask);
 		break;
 	case XHG_FS:
+		if (!mask)
+			return;
 		if (bad)
-			xfs_fs_mark_corrupt(sc->mp, sc->sick_mask);
+			xfs_fs_mark_corrupt(sc->mp, mask);
 		else
-			xfs_fs_mark_healthy(sc->mp, sc->sick_mask);
+			xfs_fs_mark_healthy(sc->mp, mask);
 		break;
 	case XHG_RTGROUP:
+		if (!mask)
+			return;
 		rtg = xfs_rtgroup_get(sc->mp, sc->sm->sm_agno);
 		if (bad)
-			xfs_group_mark_corrupt(rtg_group(rtg), sc->sick_mask);
+			xfs_group_mark_corrupt(rtg_group(rtg), mask);
 		else
-			xfs_group_mark_healthy(rtg_group(rtg), sc->sick_mask);
+			xfs_group_mark_healthy(rtg_group(rtg), mask);
 		xfs_rtgroup_put(rtg);
 		break;
 	default:
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index a7fda3e2b01377..5dbbe93cb49bfa 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -184,6 +184,12 @@ struct xfs_scrub {
 	 */
 	unsigned int			sick_mask;
 
+	/*
+	 * Clear these XFS_SICK_* flags but only if the scan is ok.  Useful for
+	 * removing ZAPPED flags after a repair.
+	 */
+	unsigned int			healthy_mask;
+
 	/* next time we want to cond_resched() */
 	struct xchk_relax		relax;
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/21] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-11-26  1:26   ` [PATCH 06/21] xfs: separate healthy clearing mask during repair Darrick J. Wong
@ 2024-11-26  1:26   ` Darrick J. Wong
  2024-11-26  1:27   ` [PATCH 09/21] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:26 UTC (permalink / raw)
  To: djwong; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If we need to reset a symlink target to the "durr it's busted" string,
then we clear the zapped flag as well.  However, this should be using
the provided helper so that we don't set the zapped state on an
otherwise ok symlink.

Cc: <stable@vger.kernel.org> # v6.10
Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/symlink_repair.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c
index d015a86ef460fb..953ce7be78dc2f 100644
--- a/fs/xfs/scrub/symlink_repair.c
+++ b/fs/xfs/scrub/symlink_repair.c
@@ -36,6 +36,7 @@
 #include "scrub/tempfile.h"
 #include "scrub/tempexch.h"
 #include "scrub/reap.h"
+#include "scrub/health.h"
 
 /*
  * Symbolic Link Repair
@@ -233,7 +234,7 @@ xrep_symlink_salvage(
 	 * target zapped flag.
 	 */
 	if (buflen == 0) {
-		sc->sick_mask |= XFS_SICK_INO_SYMLINK_ZAPPED;
+		xchk_mark_healthy_if_clean(sc, XFS_SICK_INO_SYMLINK_ZAPPED);
 		sprintf(target_buf, DUMMY_TARGET);
 	}
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/21] xfs: fix null bno_hint handling in xfs_rtallocate_rtg
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-11-26  1:26   ` [PATCH 07/21] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
@ 2024-11-26  1:27   ` Darrick J. Wong
  2024-11-26  1:27   ` [PATCH 11/21] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:27 UTC (permalink / raw)
  To: djwong; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

xfs_bmap_rtalloc initializes the bno_hint variable to NULLRTBLOCK (aka
NULLFSBLOCK).  If the allocation request is for a file range that's
adjacent to an existing mapping, it will then change bno_hint to the
blkno hint in the bmalloca structure.

In other words, bno_hint is either a rt block number, or it's all 1s.
Unfortunately, commit ec12f97f1b8a8f didn't take the NULLRTBLOCK state
into account, which means that it tries to translate that into a
realtime extent number.  We then end up with an obnoxiously high rtx
number and pointlessly feed that to the near allocator.  This often
fails and falls back to the by-size allocator.  Seeing as we had no
locality hint anyway, this is a waste of time.

Fix the code to detect a lack of bno_hint correctly.  This was detected
by running xfs/009 with metadir enabled and a 28k rt extent size.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: ec12f97f1b8a8f ("xfs: make the rtalloc start hint a xfs_rtblock_t")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_rtalloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 0cb534d71119a5..fcfa6e0eb3ad2a 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1827,7 +1827,7 @@ xfs_rtallocate_rtg(
 	 * For an allocation to an empty file at offset 0, pick an extent that
 	 * will space things out in the rt area.
 	 */
-	if (bno_hint)
+	if (bno_hint != NULLFSBLOCK)
 		start = xfs_rtb_to_rtx(args.mp, bno_hint);
 	else if (!xfs_has_rtgroups(args.mp) && initial_user_data)
 		start = xfs_rtpick_extent(args.rtg, tp, maxlen);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/21] xfs: update btree keys correctly when _insrec splits an inode root block
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-11-26  1:27   ` [PATCH 09/21] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
@ 2024-11-26  1:27   ` Darrick J. Wong
  2024-11-26  5:01     ` Christoph Hellwig
  2024-11-26  1:27   ` [PATCH 12/21] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong
                     ` (6 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:27 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Cc: <stable@vger.kernel.org> # v4.8
Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index c748866ef92368..68ee1c299c25fd 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -3557,14 +3557,31 @@ xfs_btree_insrec(
 	xfs_btree_log_block(cur, bp, XFS_BB_NUMRECS);
 
 	/*
-	 * If we just inserted into a new tree block, we have to
-	 * recalculate nkey here because nkey is out of date.
+	 * Update btree keys to reflect the newly added record or keyptr.
+	 * There are three cases here to be aware of.  Normally, all we have to
+	 * do is walk towards the root, updating keys as necessary.
 	 *
-	 * Otherwise we're just updating an existing block (having shoved
-	 * some records into the new tree block), so use the regular key
-	 * update mechanism.
+	 * If the caller had us target a full block for the insertion, we dealt
+	 * with that by calling the _make_block_unfull function.  If the
+	 * "make unfull" function splits the block, it'll hand us back the key
+	 * and pointer of the new block.  We haven't yet added the new block to
+	 * the next level up, so if we decide to add the new record to the new
+	 * block (bp->b_bn != old_bn), we have to update the caller's pointer
+	 * so that the caller adds the new block with the correct key.
+	 *
+	 * However, there is a third possibility-- if the selected block is the
+	 * root block of an inode-rooted btree and cannot be expanded further,
+	 * the "make unfull" function moves the root block contents to a new
+	 * block and updates the root block to point to the new block.  In this
+	 * case, no block pointer is passed back because the block has already
+	 * been added to the btree.  In this case, we need to use the regular
+	 * key update function, just like the first case.  This is critical for
+	 * overlapping btrees, because the high key must be updated to reflect
+	 * the entire tree, not just the subtree accessible through the first
+	 * child of the root (which is now two levels down from the root).
 	 */
-	if (bp && xfs_buf_daddr(bp) != old_bn) {
+	if (!xfs_btree_ptr_is_null(cur, &nptr) &&
+	    bp && xfs_buf_daddr(bp) != old_bn) {
 		xfs_btree_get_keys(cur, block, lkey);
 	} else if (xfs_btree_needs_key_update(cur, optr)) {
 		error = xfs_btree_update_keys(cur, level);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/21] xfs: fix scrub tracepoints when inode-rooted btrees are involved
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-11-26  1:27   ` [PATCH 11/21] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
@ 2024-11-26  1:27   ` Darrick J. Wong
  2024-11-26  5:01     ` Christoph Hellwig
  2024-11-26  1:28   ` [PATCH 13/21] xfs: unlock inodes when erroring out of xfs_trans_alloc_dir Darrick J. Wong
                     ` (5 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:27 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Fix a minor mistakes in the scrub tracepoints that can manifest when
inode-rooted btrees are enabled.  The existing code worked fine for bmap
btrees, but we should tighten the code up to be less sloppy.

Cc: <stable@vger.kernel.org> # v5.7
Fixes: 92219c292af8dd ("xfs: convert btree cursor inode-private member names")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/scrub/trace.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 9b38f5ad1eaf07..d2ae7e93acb08e 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -605,7 +605,7 @@ TRACE_EVENT(xchk_ifork_btree_op_error,
 	TP_fast_assign(
 		xfs_fsblock_t fsbno = xchk_btree_cur_fsbno(cur, level);
 		__entry->dev = sc->mp->m_super->s_dev;
-		__entry->ino = sc->ip->i_ino;
+		__entry->ino = cur->bc_ino.ip->i_ino;
 		__entry->whichfork = cur->bc_ino.whichfork;
 		__entry->type = sc->sm->sm_type;
 		__assign_str(name);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/21] xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-11-26  1:27   ` [PATCH 12/21] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong
@ 2024-11-26  1:28   ` Darrick J. Wong
  2024-11-26  5:03     ` Christoph Hellwig
  2024-11-26  1:28   ` [PATCH 14/21] xfs: only run precommits once per transaction object Darrick J. Wong
                     ` (4 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:28 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Debugging a filesystem patch with generic/475 caused the system to hang
after observing the following sequences in dmesg:

 XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x491520 len 32 error 5
 XFS (dm-0): metadata I/O error in "xfs_btree_read_buf_block+0xba/0x160 [xfs]" at daddr 0x3445608 len 8 error 5
 XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x138e1c0 len 32 error 5
 XFS (dm-0): log I/O error -5
 XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ea/0x4b0 [xfs] (fs/xfs/xfs_trans_buf.c:311).  Shutting down filesystem.
 XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
 XFS (dm-0): Internal error dqp->q_ino.reserved < dqp->q_ino.count at line 869 of file fs/xfs/xfs_trans_dquot.c.  Caller xfs_trans_dqresv+0x236/0x440 [xfs]
 XFS (dm-0): Corruption detected. Unmount and run xfs_repair
 XFS (dm-0): Unmounting Filesystem be6bcbcc-9921-4deb-8d16-7cc94e335fa7

The system is stuck in unmount trying to lock a couple of inodes so that
they can be purged.  The dquot corruption notice above is a clue to what
happened -- a link() call tried to set up a transaction to link a child
into a directory.  Quota reservation for the transaction failed after IO
errors shut down the filesystem, but then we forgot to unlock the inodes
on our way out.  Fix that.

Cc: <stable@vger.kernel.org> # v6.10
Fixes: bd5562111d5839 ("xfs: Hold inode locks in xfs_trans_alloc_dir")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 30fbed27cf05cc..05b18e30368e4b 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1435,5 +1435,8 @@ xfs_trans_alloc_dir(
 
 out_cancel:
 	xfs_trans_cancel(tp);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	if (dp != ip)
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/21] xfs: only run precommits once per transaction object
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-11-26  1:28   ` [PATCH 13/21] xfs: unlock inodes when erroring out of xfs_trans_alloc_dir Darrick J. Wong
@ 2024-11-26  1:28   ` Darrick J. Wong
  2024-11-26  5:09     ` Christoph Hellwig
  2024-11-26  1:28   ` [PATCH 16/21] xfs: don't lose solo superblock counter update transactions Darrick J. Wong
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:28 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Committing a transaction tx0 with a defer ops chain of (A, B, C)
creates a chain of transactions that looks like this:

tx0 -> txA -> txB -> txC

Prior to commit cb042117488dbf, __xfs_trans_commit would run precommits
on tx0, then call xfs_defer_finish_noroll to convert A-C to tx[A-C].
Unfortunately, after the finish_noroll loop we forgot to run precommits
on txC.  That was fixed by adding the second precommit call.

Unfortunately, none of us remembered that xfs_defer_finish_noroll
calls __xfs_trans_commit a second time to commit tx0 before finishing
work A in txA and committing that.  In other words, we run precommits
twice on tx0:

xfs_trans_commit(tx0)
    __xfs_trans_commit(tx0, false)
        xfs_trans_run_precommits(tx0)
        xfs_defer_finish_noroll(tx0)
            xfs_trans_roll(tx0)
                txA = xfs_trans_dup(tx0)
                __xfs_trans_commit(tx0, true)
                xfs_trans_run_precommits(tx0)

This currently isn't an issue because the inode item precommit is
idempotent; the iunlink item precommit deletes itself so it can't be
called again; and the buffer/dquot item precommits only check the incore
objects for corruption.  However, it doesn't make sense to run
precommits twice.

Fix this situation by only running precommits after finish_noroll.

Cc: <stable@vger.kernel.org> # v6.4
Fixes: cb042117488dbf ("xfs: defered work could create precommits")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 05b18e30368e4b..4a517250efc911 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -860,13 +860,6 @@ __xfs_trans_commit(
 
 	trace_xfs_trans_commit(tp, _RET_IP_);
 
-	error = xfs_trans_run_precommits(tp);
-	if (error) {
-		if (tp->t_flags & XFS_TRANS_PERM_LOG_RES)
-			xfs_defer_cancel(tp);
-		goto out_unreserve;
-	}
-
 	/*
 	 * Finish deferred items on final commit. Only permanent transactions
 	 * should ever have deferred ops.
@@ -877,13 +870,12 @@ __xfs_trans_commit(
 		error = xfs_defer_finish_noroll(&tp);
 		if (error)
 			goto out_unreserve;
-
-		/* Run precommits from final tx in defer chain. */
-		error = xfs_trans_run_precommits(tp);
-		if (error)
-			goto out_unreserve;
 	}
 
+	error = xfs_trans_run_precommits(tp);
+	if (error)
+		goto out_unreserve;
+
 	/*
 	 * If there is nothing to be logged by the transaction,
 	 * then unlock all of the items associated with the


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/21] xfs: don't lose solo superblock counter update transactions
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-11-26  1:28   ` [PATCH 14/21] xfs: only run precommits once per transaction object Darrick J. Wong
@ 2024-11-26  1:28   ` Darrick J. Wong
  2024-11-26  5:14     ` Christoph Hellwig
  2024-11-26  1:29   ` [PATCH 17/21] xfs: don't lose solo dquot " Darrick J. Wong
                     ` (2 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:28 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Superblock counter updates are tracked via per-transaction counters in
the xfs_trans object.  These changes are then turned into dirty log
items in xfs_trans_apply_sb_deltas just prior to commiting the log items
to the CIL.

However, updating the per-transaction counter deltas do not cause
XFS_TRANS_DIRTY to be set on the transaction.  In other words, a pure sb
counter update will be silently discarded if there are no other dirty
log items attached to the transaction.

This is currently not the case anywhere in the filesystem because sb
counter updates always dirty at least one other metadata item, but let's
not leave a logic bomb.

Cc: <stable@vger.kernel.org> # v2.6.35
Fixes: 0924378a689ccb ("xfs: split out iclog writing from xfs_trans_commit()")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 26bb2343082af4..427a8ba0ab99e2 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -860,6 +860,13 @@ __xfs_trans_commit(
 
 	trace_xfs_trans_commit(tp, _RET_IP_);
 
+	/*
+	 * Commit per-transaction changes that are not already tracked through
+	 * log items.  This can add dirty log items to the transaction.
+	 */
+	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
+		xfs_trans_apply_sb_deltas(tp);
+
 	error = xfs_trans_run_precommits(tp);
 	if (error)
 		goto out_unreserve;
@@ -890,8 +897,6 @@ __xfs_trans_commit(
 	/*
 	 * If we need to update the superblock, then do it now.
 	 */
-	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
-		xfs_trans_apply_sb_deltas(tp);
 	xfs_trans_apply_dquot_deltas(tp);
 
 	xlog_cil_commit(log, tp, &commit_seq, regrant);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 17/21] xfs: don't lose solo dquot update transactions
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-11-26  1:28   ` [PATCH 16/21] xfs: don't lose solo superblock counter update transactions Darrick J. Wong
@ 2024-11-26  1:29   ` Darrick J. Wong
  2024-11-26  5:18     ` Christoph Hellwig
  2024-11-26  1:29   ` [PATCH 20/21] xfs: attach dquot buffer to dquot log item buffer Darrick J. Wong
  2024-11-26  1:30   ` [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers Darrick J. Wong
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:29 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Quota counter updates are tracked via incore objects which hang off the
xfs_trans object.  These changes are then turned into dirty log items in
xfs_trans_apply_dquot_deltas just prior to commiting the log items to
the CIL.

However, updating the incore deltas do not cause XFS_TRANS_DIRTY to be
set on the transaction.  In other words, a pure quota counter update
will be silently discarded if there are no other dirty log items
attached to the transaction.

This is currently not the case anywhere in the filesystem because quota
updates always dirty at least one other metadata item, but a subsequent
bug fix will add dquot log item precommits, so we actually need a dirty
dquot log item prior to xfs_trans_run_precommits.  Also let's not leave
a logic bomb.

Cc: <stable@vger.kernel.org> # v2.6.35
Fixes: 0924378a689ccb ("xfs: split out iclog writing from xfs_trans_commit()")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_quota.h       |    7 ++++---
 fs/xfs/xfs_trans.c       |   10 +++-------
 fs/xfs/xfs_trans_dquot.c |   31 ++++++++++++++++++++++++++-----
 3 files changed, 33 insertions(+), 15 deletions(-)


diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index fa1317cc396c96..b864ed59787780 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -101,7 +101,8 @@ extern void xfs_trans_free_dqinfo(struct xfs_trans *);
 extern void xfs_trans_mod_dquot_byino(struct xfs_trans *, struct xfs_inode *,
 		uint, int64_t);
 extern void xfs_trans_apply_dquot_deltas(struct xfs_trans *);
-extern void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *);
+void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *tp,
+		bool already_locked);
 int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip,
 		int64_t dblocks, int64_t rblocks, bool force);
 extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
@@ -172,8 +173,8 @@ static inline void xfs_trans_mod_dquot_byino(struct xfs_trans *tp,
 		struct xfs_inode *ip, uint field, int64_t delta)
 {
 }
-#define xfs_trans_apply_dquot_deltas(tp)
-#define xfs_trans_unreserve_and_mod_dquots(tp)
+#define xfs_trans_apply_dquot_deltas(tp, a)
+#define xfs_trans_unreserve_and_mod_dquots(tp, a)
 static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp,
 		struct xfs_inode *ip, int64_t dblocks, int64_t rblocks,
 		bool force)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 427a8ba0ab99e2..4cd25717c9d130 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -866,6 +866,7 @@ __xfs_trans_commit(
 	 */
 	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
 		xfs_trans_apply_sb_deltas(tp);
+	xfs_trans_apply_dquot_deltas(tp);
 
 	error = xfs_trans_run_precommits(tp);
 	if (error)
@@ -894,11 +895,6 @@ __xfs_trans_commit(
 
 	ASSERT(tp->t_ticket != NULL);
 
-	/*
-	 * If we need to update the superblock, then do it now.
-	 */
-	xfs_trans_apply_dquot_deltas(tp);
-
 	xlog_cil_commit(log, tp, &commit_seq, regrant);
 
 	xfs_trans_free(tp);
@@ -924,7 +920,7 @@ __xfs_trans_commit(
 	 * the dqinfo portion to be.  All that means is that we have some
 	 * (non-persistent) quota reservations that need to be unreserved.
 	 */
-	xfs_trans_unreserve_and_mod_dquots(tp);
+	xfs_trans_unreserve_and_mod_dquots(tp, true);
 	if (tp->t_ticket) {
 		if (regrant && !xlog_is_shutdown(log))
 			xfs_log_ticket_regrant(log, tp->t_ticket);
@@ -1018,7 +1014,7 @@ xfs_trans_cancel(
 	}
 #endif
 	xfs_trans_unreserve_and_mod_sb(tp);
-	xfs_trans_unreserve_and_mod_dquots(tp);
+	xfs_trans_unreserve_and_mod_dquots(tp, false);
 
 	if (tp->t_ticket) {
 		xfs_log_ticket_ungrant(log, tp->t_ticket);
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 481ba3dc9f190d..713b6d243e5631 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -606,6 +606,24 @@ xfs_trans_apply_dquot_deltas(
 			ASSERT(dqp->q_blk.reserved >= dqp->q_blk.count);
 			ASSERT(dqp->q_ino.reserved >= dqp->q_ino.count);
 			ASSERT(dqp->q_rtb.reserved >= dqp->q_rtb.count);
+
+			/*
+			 * We've applied the count changes and given back
+			 * whatever reservation we didn't use.  Zero out the
+			 * dqtrx fields.
+			 */
+			qtrx->qt_blk_res = 0;
+			qtrx->qt_bcount_delta = 0;
+			qtrx->qt_delbcnt_delta = 0;
+
+			qtrx->qt_rtblk_res = 0;
+			qtrx->qt_rtblk_res_used = 0;
+			qtrx->qt_rtbcount_delta = 0;
+			qtrx->qt_delrtb_delta = 0;
+
+			qtrx->qt_ino_res = 0;
+			qtrx->qt_ino_res_used = 0;
+			qtrx->qt_icount_delta = 0;
 		}
 	}
 }
@@ -642,7 +660,8 @@ xfs_trans_unreserve_and_mod_dquots_hook(
  */
 void
 xfs_trans_unreserve_and_mod_dquots(
-	struct xfs_trans	*tp)
+	struct xfs_trans	*tp,
+	bool			already_locked)
 {
 	int			i, j;
 	struct xfs_dquot	*dqp;
@@ -671,10 +690,12 @@ xfs_trans_unreserve_and_mod_dquots(
 			 * about the number of blocks used field, or deltas.
 			 * Also we don't bother to zero the fields.
 			 */
-			locked = false;
+			locked = already_locked;
 			if (qtrx->qt_blk_res) {
-				xfs_dqlock(dqp);
-				locked = true;
+				if (!locked) {
+					xfs_dqlock(dqp);
+					locked = true;
+				}
 				dqp->q_blk.reserved -=
 					(xfs_qcnt_t)qtrx->qt_blk_res;
 			}
@@ -695,7 +716,7 @@ xfs_trans_unreserve_and_mod_dquots(
 				dqp->q_rtb.reserved -=
 					(xfs_qcnt_t)qtrx->qt_rtblk_res;
 			}
-			if (locked)
+			if (locked && !already_locked)
 				xfs_dqunlock(dqp);
 
 		}


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 20/21] xfs: attach dquot buffer to dquot log item buffer
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-11-26  1:29   ` [PATCH 17/21] xfs: don't lose solo dquot " Darrick J. Wong
@ 2024-11-26  1:29   ` Darrick J. Wong
  2024-11-26  5:42     ` Christoph Hellwig
  2024-11-26  1:30   ` [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers Darrick J. Wong
  13 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:29 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Ever since 6.12-rc1, I've observed a pile of warnings from the kernel
when running fstests with quotas enabled:

WARNING: CPU: 1 PID: 458580 at mm/page_alloc.c:4221 __alloc_pages_noprof+0xc9c/0xf18
CPU: 1 UID: 0 PID: 458580 Comm: xfsaild/sda3 Tainted: G        W          6.12.0-rc6-djwa #rc6 6ee3e0e531f6457e2d26aa008a3b65ff184b377c
<snip>
Call trace:
 __alloc_pages_noprof+0xc9c/0xf18
 alloc_pages_mpol_noprof+0x94/0x240
 alloc_pages_noprof+0x68/0xf8
 new_slab+0x3e0/0x568
 ___slab_alloc+0x5a0/0xb88
 __slab_alloc.constprop.0+0x7c/0xf8
 __kmalloc_noprof+0x404/0x4d0
 xfs_buf_get_map+0x594/0xde0 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_buf_read_map+0x64/0x2e0 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_trans_read_buf_map+0x1dc/0x518 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_qm_dqflush+0xac/0x468 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_qm_dquot_logitem_push+0xe4/0x148 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfsaild+0x3f4/0xde8 [xfs 384cb02810558b4c490343c164e9407332118f88]
 kthread+0x110/0x128
 ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---

This corresponds to the line:

	WARN_ON_ONCE(current->flags & PF_MEMALLOC);

within the NOFAIL checks.  What's happening here is that the XFS AIL is
trying to write a disk quota update back into the filesystem, but for
that it needs to read the ondisk buffer for the dquot.  The buffer is
not in memory anymore, probably because it was evicted.  Regardless, the
buffer cache tries to allocate a new buffer, but those allocations are
NOFAIL.  The AIL thread has marked itself PF_MEMALLOC (aka noreclaim)
since commit 43ff2122e6492b ("xfs: on-stack delayed write buffer lists")
presumably because reclaim can push on XFS to push on the AIL.

An easy way to fix this probably would have been to drop the NOFAIL flag
from the xfs_buf allocation and open code a retry loop, but then there's
still the problem that for bs>ps filesystems, the buffer itself could
require up to 64k worth of pages.

Inode items had similar behavior (multi-page cluster buffers that we
don't want to allocate in the AIL) which we solved by making transaction
precommit attach the inode cluster buffers to the dirty log item.  Let's
solve the dquot problem in the same way.

So: Make a real precommit handler to read the dquot buffer and attach it
to the log item; pass it to dqflush in the push method; and have the
iodone function detach the buffer once we've flushed everything.  Add a
state flag to the log item to track when a thread has entered the
precommit -> push mechanism to skip the detaching if it turns out that
the dquot is very busy, as we don't hold the dquot lock between log item
commit and AIL push).

Reading and attaching the dquot buffer in the precommit hook is inspired
by the work done for inode cluster buffers some time ago.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: 903edea6c53f09 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_dquot.c      |  120 +++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_dquot.h      |    5 ++
 fs/xfs/xfs_dquot_item.c |   39 ++++++++++-----
 fs/xfs/xfs_dquot_item.h |    7 +++
 fs/xfs/xfs_qm.c         |    6 +-
 fs/xfs/xfs_trans_ail.c  |    2 -
 6 files changed, 155 insertions(+), 24 deletions(-)


diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 4ba042786cfb7b..c495f7ad80018f 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -75,8 +75,24 @@ void
 xfs_qm_dqdestroy(
 	struct xfs_dquot	*dqp)
 {
+	struct xfs_dq_logitem	*qlip = &dqp->q_logitem;
+	struct xfs_buf		*bp = NULL;
+
 	ASSERT(list_empty(&dqp->q_lru));
 
+	/*
+	 * Detach the dquot buffer if it's still attached, because we can get
+	 * called through dqpurge after a log shutdown.
+	 */
+	spin_lock(&qlip->qli_lock);
+	if (qlip->qli_item.li_buf) {
+		bp = qlip->qli_item.li_buf;
+		qlip->qli_item.li_buf = NULL;
+	}
+	spin_unlock(&qlip->qli_lock);
+	if (bp)
+		xfs_buf_rele(bp);
+
 	kvfree(dqp->q_logitem.qli_item.li_lv_shadow);
 	mutex_destroy(&dqp->q_qlock);
 
@@ -1146,6 +1162,7 @@ xfs_qm_dqflush_done(
 			container_of(lip, struct xfs_dq_logitem, qli_item);
 	struct xfs_dquot	*dqp = qlip->qli_dquot;
 	struct xfs_ail		*ailp = lip->li_ailp;
+	struct xfs_buf		*bp = NULL;
 	xfs_lsn_t		tail_lsn;
 
 	/*
@@ -1175,6 +1192,19 @@ xfs_qm_dqflush_done(
 	 * Release the dq's flush lock since we're done with it.
 	 */
 	xfs_dqfunlock(dqp);
+
+	/*
+	 * If this dquot hasn't been dirtied since initiating the last dqflush,
+	 * release the buffer reference.
+	 */
+	spin_lock(&qlip->qli_lock);
+	if (!qlip->qli_dirty) {
+		bp = lip->li_buf;
+		lip->li_buf = NULL;
+	}
+	spin_unlock(&qlip->qli_lock);
+	if (bp)
+		xfs_buf_rele(bp);
 }
 
 void
@@ -1197,7 +1227,7 @@ xfs_buf_dquot_io_fail(
 
 	spin_lock(&bp->b_mount->m_ail->ail_lock);
 	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
-		xfs_set_li_failed(lip, bp);
+		set_bit(XFS_LI_FAILED, &lip->li_flags);
 	spin_unlock(&bp->b_mount->m_ail->ail_lock);
 }
 
@@ -1249,6 +1279,7 @@ int
 xfs_dquot_read_buf(
 	struct xfs_trans	*tp,
 	struct xfs_dquot	*dqp,
+	xfs_buf_flags_t		xbf_flags,
 	struct xfs_buf		**bpp)
 {
 	struct xfs_mount	*mp = dqp->q_mount;
@@ -1256,7 +1287,7 @@ xfs_dquot_read_buf(
 	int			error;
 
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, dqp->q_blkno,
-				   mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK,
+				   mp->m_quotainfo->qi_dqchunklen, xbf_flags,
 				   &bp, &xfs_dquot_buf_ops);
 	if (error == -EAGAIN)
 		return error;
@@ -1275,6 +1306,77 @@ xfs_dquot_read_buf(
 	return error;
 }
 
+/*
+ * Attach a dquot buffer to this dquot to avoid allocating a buffer during a
+ * dqflush, since dqflush can be called from reclaim context.
+ */
+int
+xfs_dquot_attach_buf(
+	struct xfs_trans	*tp,
+	struct xfs_dquot	*dqp)
+{
+	struct xfs_dq_logitem	*qlip = &dqp->q_logitem;
+	struct xfs_log_item	*lip = &qlip->qli_item;
+	int			error;
+
+	spin_lock(&qlip->qli_lock);
+	if (!lip->li_buf) {
+		struct xfs_buf	*bp = NULL;
+
+		spin_unlock(&qlip->qli_lock);
+		error = xfs_dquot_read_buf(tp, dqp, 0, &bp);
+		if (error)
+			return error;
+
+		/*
+		 * Attach the dquot to the buffer so that the AIL does not have
+		 * to read the dquot buffer to push this item.
+		 */
+		xfs_buf_hold(bp);
+		spin_lock(&qlip->qli_lock);
+		lip->li_buf = bp;
+		xfs_trans_brelse(tp, bp);
+	}
+	qlip->qli_dirty = true;
+	spin_unlock(&qlip->qli_lock);
+
+	return 0;
+}
+
+/*
+ * Get a new reference the dquot buffer attached to this dquot for a dqflush
+ * operation.
+ *
+ * Returns 0 and a NULL bp if none was attached to the dquot; 0 and a locked
+ * bp; or -EAGAIN if the buffer could not be locked.
+ */
+int
+xfs_dquot_use_attached_buf(
+	struct xfs_dquot	*dqp,
+	struct xfs_buf		**bpp)
+{
+	struct xfs_buf		*bp = dqp->q_logitem.qli_item.li_buf;
+
+	/*
+	 * A NULL buffer can happen if the dquot dirty flag was set but the
+	 * filesystem shut down before transaction commit happened.  In that
+	 * case we're not going to flush anyway.
+	 */
+	if (!bp) {
+		ASSERT(xfs_is_shutdown(dqp->q_mount));
+
+		*bpp = NULL;
+		return 0;
+	}
+
+	if (!xfs_buf_trylock(bp))
+		return -EAGAIN;
+
+	xfs_buf_hold(bp);
+	*bpp = bp;
+	return 0;
+}
+
 /*
  * Write a modified dquot to disk.
  * The dquot must be locked and the flush lock too taken by caller.
@@ -1289,7 +1391,8 @@ xfs_qm_dqflush(
 	struct xfs_buf		*bp)
 {
 	struct xfs_mount	*mp = dqp->q_mount;
-	struct xfs_log_item	*lip = &dqp->q_logitem.qli_item;
+	struct xfs_dq_logitem	*qlip = &dqp->q_logitem;
+	struct xfs_log_item	*lip = &qlip->qli_item;
 	struct xfs_dqblk	*dqblk;
 	xfs_failaddr_t		fa;
 	int			error;
@@ -1319,8 +1422,15 @@ xfs_qm_dqflush(
 	 */
 	dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
 
-	xfs_trans_ail_copy_lsn(mp->m_ail, &dqp->q_logitem.qli_flush_lsn,
-			&lip->li_lsn);
+	/*
+	 * We hold the dquot lock, so nobody can dirty it while we're
+	 * scheduling the write out.  Clear the dirty-since-flush flag.
+	 */
+	spin_lock(&qlip->qli_lock);
+	qlip->qli_dirty = false;
+	spin_unlock(&qlip->qli_lock);
+
+	xfs_trans_ail_copy_lsn(mp->m_ail, &qlip->qli_flush_lsn, &lip->li_lsn);
 
 	/*
 	 * copy the lsn into the on-disk dquot now while we have the in memory
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 50f8404c41176c..362ca34f7c248b 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -215,7 +215,7 @@ void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
 
 void		xfs_qm_dqdestroy(struct xfs_dquot *dqp);
 int		xfs_dquot_read_buf(struct xfs_trans *tp, struct xfs_dquot *dqp,
-				struct xfs_buf **bpp);
+				xfs_buf_flags_t flags, struct xfs_buf **bpp);
 int		xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf *bp);
 void		xfs_qm_dqunpin_wait(struct xfs_dquot *dqp);
 void		xfs_qm_adjust_dqtimers(struct xfs_dquot *d);
@@ -239,6 +239,9 @@ void		xfs_dqlockn(struct xfs_dqtrx *q);
 
 void		xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
 
+int xfs_dquot_attach_buf(struct xfs_trans *tp, struct xfs_dquot *dqp);
+int xfs_dquot_use_attached_buf(struct xfs_dquot *dqp, struct xfs_buf **bpp);
+
 static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp)
 {
 	xfs_dqlock(dqp);
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 56ecc5ed01934d..271b195ebb9326 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -123,8 +123,9 @@ xfs_qm_dquot_logitem_push(
 		__releases(&lip->li_ailp->ail_lock)
 		__acquires(&lip->li_ailp->ail_lock)
 {
-	struct xfs_dquot	*dqp = DQUOT_ITEM(lip)->qli_dquot;
-	struct xfs_buf		*bp = lip->li_buf;
+	struct xfs_dq_logitem	*qlip = DQUOT_ITEM(lip);
+	struct xfs_dquot	*dqp = qlip->qli_dquot;
+	struct xfs_buf		*bp;
 	uint			rval = XFS_ITEM_SUCCESS;
 	int			error;
 
@@ -155,11 +156,10 @@ xfs_qm_dquot_logitem_push(
 
 	spin_unlock(&lip->li_ailp->ail_lock);
 
-	error = xfs_dquot_read_buf(NULL, dqp, &bp);
-	if (error) {
-		if (error == -EAGAIN)
-			rval = XFS_ITEM_LOCKED;
+	error = xfs_dquot_use_attached_buf(dqp, &bp);
+	if (error == -EAGAIN) {
 		xfs_dqfunlock(dqp);
+		rval = XFS_ITEM_LOCKED;
 		goto out_relock_ail;
 	}
 
@@ -207,12 +207,10 @@ xfs_qm_dquot_logitem_committing(
 }
 
 #ifdef DEBUG_EXPENSIVE
-static int
-xfs_qm_dquot_logitem_precommit(
-	struct xfs_trans	*tp,
-	struct xfs_log_item	*lip)
+static void
+xfs_qm_dquot_logitem_precommit_check(
+	struct xfs_dquot	*dqp)
 {
-	struct xfs_dquot	*dqp = DQUOT_ITEM(lip)->qli_dquot;
 	struct xfs_mount	*mp = dqp->q_mount;
 	struct xfs_disk_dquot	ddq = { };
 	xfs_failaddr_t		fa;
@@ -228,13 +226,24 @@ xfs_qm_dquot_logitem_precommit(
 		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
 		ASSERT(fa == NULL);
 	}
-
-	return 0;
 }
 #else
-# define xfs_qm_dquot_logitem_precommit	NULL
+# define xfs_qm_dquot_logitem_precommit_check(...)	((void)0)
 #endif
 
+static int
+xfs_qm_dquot_logitem_precommit(
+	struct xfs_trans	*tp,
+	struct xfs_log_item	*lip)
+{
+	struct xfs_dq_logitem	*qlip = DQUOT_ITEM(lip);
+	struct xfs_dquot	*dqp = qlip->qli_dquot;
+
+	xfs_qm_dquot_logitem_precommit_check(dqp);
+
+	return xfs_dquot_attach_buf(tp, dqp);
+}
+
 static const struct xfs_item_ops xfs_dquot_item_ops = {
 	.iop_size	= xfs_qm_dquot_logitem_size,
 	.iop_precommit	= xfs_qm_dquot_logitem_precommit,
@@ -259,5 +268,7 @@ xfs_qm_dquot_logitem_init(
 
 	xfs_log_item_init(dqp->q_mount, &lp->qli_item, XFS_LI_DQUOT,
 					&xfs_dquot_item_ops);
+	spin_lock_init(&lp->qli_lock);
 	lp->qli_dquot = dqp;
+	lp->qli_dirty = false;
 }
diff --git a/fs/xfs/xfs_dquot_item.h b/fs/xfs/xfs_dquot_item.h
index 794710c2447493..d66e52807d76d5 100644
--- a/fs/xfs/xfs_dquot_item.h
+++ b/fs/xfs/xfs_dquot_item.h
@@ -14,6 +14,13 @@ struct xfs_dq_logitem {
 	struct xfs_log_item	qli_item;	/* common portion */
 	struct xfs_dquot	*qli_dquot;	/* dquot ptr */
 	xfs_lsn_t		qli_flush_lsn;	/* lsn at last flush */
+
+	/*
+	 * We use this spinlock to coordinate access to the li_buf pointer in
+	 * the log item and the qli_dirty flag.
+	 */
+	spinlock_t		qli_lock;
+	bool			qli_dirty;	/* dirtied since last flush? */
 };
 
 void xfs_qm_dquot_logitem_init(struct xfs_dquot *dqp);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 341fe4821c2d77..a79c4a1bf27fab 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -148,7 +148,7 @@ xfs_qm_dqpurge(
 		 * We don't care about getting disk errors here. We need
 		 * to purge this dquot anyway, so we go ahead regardless.
 		 */
-		error = xfs_dquot_read_buf(NULL, dqp, &bp);
+		error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
 		if (error == -EAGAIN) {
 			xfs_dqfunlock(dqp);
 			dqp->q_flags &= ~XFS_DQFLAG_FREEING;
@@ -506,7 +506,7 @@ xfs_qm_dquot_isolate(
 		/* we have to drop the LRU lock to flush the dquot */
 		spin_unlock(lru_lock);
 
-		error = xfs_dquot_read_buf(NULL, dqp, &bp);
+		error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
 		if (error) {
 			xfs_dqfunlock(dqp);
 			goto out_unlock_dirty;
@@ -1512,7 +1512,7 @@ xfs_qm_flush_one(
 		goto out_unlock;
 	}
 
-	error = xfs_dquot_read_buf(NULL, dqp, &bp);
+	error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
 	if (error)
 		goto out_unlock;
 
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 8ede9d099d1fea..f56d62dced97b1 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -360,7 +360,7 @@ xfsaild_resubmit_item(
 
 	/* protected by ail_lock */
 	list_for_each_entry(lip, &bp->b_li_list, li_bio_list) {
-		if (bp->b_flags & _XBF_INODES)
+		if (bp->b_flags & (_XBF_INODES | _XBF_DQUOTS))
 			clear_bit(XFS_LI_FAILED, &lip->li_flags);
 		else
 			xfs_clear_li_failed(lip);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers
  2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-11-26  1:29   ` [PATCH 20/21] xfs: attach dquot buffer to dquot log item buffer Darrick J. Wong
@ 2024-11-26  1:30   ` Darrick J. Wong
  2024-11-26  5:42     ` Christoph Hellwig
  2024-12-17  2:06     ` Lai, Yi
  13 siblings, 2 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26  1:30 UTC (permalink / raw)
  To: djwong; +Cc: stable, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we've converted the dquot logging machinery to attach the dquot
buffer to the li_buf pointer so that the AIL dqflush doesn't have to
allocate or read buffers in a reclaim path, do the same for the
quotacheck code so that the reclaim shrinker dqflush call doesn't have
to do that either.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: 903edea6c53f09 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_dquot.c |    9 +++------
 fs/xfs/xfs_dquot.h |    2 --
 fs/xfs/xfs_qm.c    |   18 +++++++++++++-----
 3 files changed, 16 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index c495f7ad80018f..c47f95c96fe0cf 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1275,11 +1275,10 @@ xfs_qm_dqflush_check(
  * Requires dquot flush lock, will clear the dirty flag, delete the quota log
  * item from the AIL, and shut down the system if something goes wrong.
  */
-int
+static int
 xfs_dquot_read_buf(
 	struct xfs_trans	*tp,
 	struct xfs_dquot	*dqp,
-	xfs_buf_flags_t		xbf_flags,
 	struct xfs_buf		**bpp)
 {
 	struct xfs_mount	*mp = dqp->q_mount;
@@ -1287,10 +1286,8 @@ xfs_dquot_read_buf(
 	int			error;
 
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, dqp->q_blkno,
-				   mp->m_quotainfo->qi_dqchunklen, xbf_flags,
+				   mp->m_quotainfo->qi_dqchunklen, 0,
 				   &bp, &xfs_dquot_buf_ops);
-	if (error == -EAGAIN)
-		return error;
 	if (xfs_metadata_is_sick(error))
 		xfs_dquot_mark_sick(dqp);
 	if (error)
@@ -1324,7 +1321,7 @@ xfs_dquot_attach_buf(
 		struct xfs_buf	*bp = NULL;
 
 		spin_unlock(&qlip->qli_lock);
-		error = xfs_dquot_read_buf(tp, dqp, 0, &bp);
+		error = xfs_dquot_read_buf(tp, dqp, &bp);
 		if (error)
 			return error;
 
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 362ca34f7c248b..1c5c911615bf7f 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -214,8 +214,6 @@ void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
 #define XFS_DQ_IS_DIRTY(dqp)	((dqp)->q_flags & XFS_DQFLAG_DIRTY)
 
 void		xfs_qm_dqdestroy(struct xfs_dquot *dqp);
-int		xfs_dquot_read_buf(struct xfs_trans *tp, struct xfs_dquot *dqp,
-				xfs_buf_flags_t flags, struct xfs_buf **bpp);
 int		xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf *bp);
 void		xfs_qm_dqunpin_wait(struct xfs_dquot *dqp);
 void		xfs_qm_adjust_dqtimers(struct xfs_dquot *d);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index a79c4a1bf27fab..e073ad51af1a3d 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -148,13 +148,13 @@ xfs_qm_dqpurge(
 		 * We don't care about getting disk errors here. We need
 		 * to purge this dquot anyway, so we go ahead regardless.
 		 */
-		error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
+		error = xfs_dquot_use_attached_buf(dqp, &bp);
 		if (error == -EAGAIN) {
 			xfs_dqfunlock(dqp);
 			dqp->q_flags &= ~XFS_DQFLAG_FREEING;
 			goto out_unlock;
 		}
-		if (error)
+		if (!bp)
 			goto out_funlock;
 
 		/*
@@ -506,8 +506,8 @@ xfs_qm_dquot_isolate(
 		/* we have to drop the LRU lock to flush the dquot */
 		spin_unlock(lru_lock);
 
-		error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
-		if (error) {
+		error = xfs_dquot_use_attached_buf(dqp, &bp);
+		if (!bp || error == -EAGAIN) {
 			xfs_dqfunlock(dqp);
 			goto out_unlock_dirty;
 		}
@@ -1330,6 +1330,10 @@ xfs_qm_quotacheck_dqadjust(
 		return error;
 	}
 
+	error = xfs_dquot_attach_buf(NULL, dqp);
+	if (error)
+		return error;
+
 	trace_xfs_dqadjust(dqp);
 
 	/*
@@ -1512,9 +1516,13 @@ xfs_qm_flush_one(
 		goto out_unlock;
 	}
 
-	error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
+	error = xfs_dquot_use_attached_buf(dqp, &bp);
 	if (error)
 		goto out_unlock;
+	if (!bp) {
+		error = -EFSCORRUPTED;
+		goto out_unlock;
+	}
 
 	error = xfs_qm_dqflush(dqp, bp);
 	if (!error)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/21] xfs: update btree keys correctly when _insrec splits an inode root block
  2024-11-26  1:27   ` [PATCH 11/21] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
@ 2024-11-26  5:01     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 12/21] xfs: fix scrub tracepoints when inode-rooted btrees are involved
  2024-11-26  1:27   ` [PATCH 12/21] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong
@ 2024-11-26  5:01     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 13/21] xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
  2024-11-26  1:28   ` [PATCH 13/21] xfs: unlock inodes when erroring out of xfs_trans_alloc_dir Darrick J. Wong
@ 2024-11-26  5:03     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/21] xfs: only run precommits once per transaction object
  2024-11-26  1:28   ` [PATCH 14/21] xfs: only run precommits once per transaction object Darrick J. Wong
@ 2024-11-26  5:09     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 16/21] xfs: don't lose solo superblock counter update transactions
  2024-11-26  1:28   ` [PATCH 16/21] xfs: don't lose solo superblock counter update transactions Darrick J. Wong
@ 2024-11-26  5:14     ` Christoph Hellwig
  2024-11-26 18:23       ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

On Mon, Nov 25, 2024 at 05:28:53PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Superblock counter updates are tracked via per-transaction counters in
> the xfs_trans object.  These changes are then turned into dirty log
> items in xfs_trans_apply_sb_deltas just prior to commiting the log items
> to the CIL.
> 
> However, updating the per-transaction counter deltas do not cause
> XFS_TRANS_DIRTY to be set on the transaction.  In other words, a pure sb
> counter update will be silently discarded if there are no other dirty
> log items attached to the transaction.
> 
> This is currently not the case anywhere in the filesystem because sb
> counter updates always dirty at least one other metadata item, but let's
> not leave a logic bomb.

xfs_trans_mod_sb is the only place that sets XFS_TRANS_SB_DIRTY, and
always forces XFS_TRANS_DIRTY.  So this seems kinda intentional, even
if this new version is much cleaner.  So the change looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

But I don't think the Fixes tag is really warranted.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 17/21] xfs: don't lose solo dquot update transactions
  2024-11-26  1:29   ` [PATCH 17/21] xfs: don't lose solo dquot " Darrick J. Wong
@ 2024-11-26  5:18     ` Christoph Hellwig
  2024-11-26 18:23       ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

On Mon, Nov 25, 2024 at 05:29:08PM -0800, Darrick J. Wong wrote:
> This is currently not the case anywhere in the filesystem because quota
> updates always dirty at least one other metadata item, but a subsequent
> bug fix will add dquot log item precommits, so we actually need a dirty
> dquot log item prior to xfs_trans_run_precommits.  Also let's not leave
> a logic bomb.

Unlike for the sb updates there doesn't seem to be anything in
xfs_trans_mod_dquot that forces the transaction to be dirty, so I guess
the fixes tag is fine here.

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 20/21] xfs: attach dquot buffer to dquot log item buffer
  2024-11-26  1:29   ` [PATCH 20/21] xfs: attach dquot buffer to dquot log item buffer Darrick J. Wong
@ 2024-11-26  5:42     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

On Mon, Nov 25, 2024 at 05:29:55PM -0800, Darrick J. Wong wrote:
> +	spin_lock(&qlip->qli_lock);
> +	if (qlip->qli_item.li_buf) {
> +		bp = qlip->qli_item.li_buf;
> +		qlip->qli_item.li_buf = NULL;
> +	}
> +	spin_unlock(&qlip->qli_lock);
> +	if (bp)
> +		xfs_buf_rele(bp);

> +	spin_lock(&qlip->qli_lock);
> +	if (!qlip->qli_dirty) {
> +		bp = lip->li_buf;
> +		lip->li_buf = NULL;
> +	}
> +	spin_unlock(&qlip->qli_lock);
> +	if (bp)
> +		xfs_buf_rele(bp);

Should this move into a common helper and always use either the
buf or dirty check instead of mixing them?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers
  2024-11-26  1:30   ` [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers Darrick J. Wong
@ 2024-11-26  5:42     ` Christoph Hellwig
  2024-12-17  2:06     ` Lai, Yi
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2024-11-26  5:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 16/21] xfs: don't lose solo superblock counter update transactions
  2024-11-26  5:14     ` Christoph Hellwig
@ 2024-11-26 18:23       ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26 18:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: stable, linux-xfs

On Mon, Nov 25, 2024 at 09:14:07PM -0800, Christoph Hellwig wrote:
> On Mon, Nov 25, 2024 at 05:28:53PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Superblock counter updates are tracked via per-transaction counters in
> > the xfs_trans object.  These changes are then turned into dirty log
> > items in xfs_trans_apply_sb_deltas just prior to commiting the log items
> > to the CIL.
> > 
> > However, updating the per-transaction counter deltas do not cause
> > XFS_TRANS_DIRTY to be set on the transaction.  In other words, a pure sb
> > counter update will be silently discarded if there are no other dirty
> > log items attached to the transaction.
> > 
> > This is currently not the case anywhere in the filesystem because sb
> > counter updates always dirty at least one other metadata item, but let's
> > not leave a logic bomb.
> 
> xfs_trans_mod_sb is the only place that sets XFS_TRANS_SB_DIRTY, and
> always forces XFS_TRANS_DIRTY.  So this seems kinda intentional, even
> if this new version is much cleaner.  So the change looks fine:

<nod> I don't think this one was absolutely necessary, it just seemed
like a cleanup to put anything that set TRANS_DIRTY before the
TRANS_DIRTY check.

> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> But I don't think the Fixes tag is really warranted.

Dropped.

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 17/21] xfs: don't lose solo dquot update transactions
  2024-11-26  5:18     ` Christoph Hellwig
@ 2024-11-26 18:23       ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2024-11-26 18:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: stable, linux-xfs

On Mon, Nov 25, 2024 at 09:18:14PM -0800, Christoph Hellwig wrote:
> On Mon, Nov 25, 2024 at 05:29:08PM -0800, Darrick J. Wong wrote:
> > This is currently not the case anywhere in the filesystem because quota
> > updates always dirty at least one other metadata item, but a subsequent
> > bug fix will add dquot log item precommits, so we actually need a dirty
> > dquot log item prior to xfs_trans_run_precommits.  Also let's not leave
> > a logic bomb.
> 
> Unlike for the sb updates there doesn't seem to be anything in
> xfs_trans_mod_dquot that forces the transaction to be dirty, so I guess
> the fixes tag is fine here.

Correct.

> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers
  2024-11-26  1:30   ` [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers Darrick J. Wong
  2024-11-26  5:42     ` Christoph Hellwig
@ 2024-12-17  2:06     ` Lai, Yi
  1 sibling, 0 replies; 26+ messages in thread
From: Lai, Yi @ 2024-12-17  2:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: stable, linux-xfs, yi1.lai, syzkaller-bugs

On Mon, Nov 25, 2024 at 05:30:11PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Now that we've converted the dquot logging machinery to attach the dquot
> buffer to the li_buf pointer so that the AIL dqflush doesn't have to
> allocate or read buffers in a reclaim path, do the same for the
> quotacheck code so that the reclaim shrinker dqflush call doesn't have
> to do that either.
> 
> Cc: <stable@vger.kernel.org> # v6.12
> Fixes: 903edea6c53f09 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  fs/xfs/xfs_dquot.c |    9 +++------
>  fs/xfs/xfs_dquot.h |    2 --
>  fs/xfs/xfs_qm.c    |   18 +++++++++++++-----
>  3 files changed, 16 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index c495f7ad80018f..c47f95c96fe0cf 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -1275,11 +1275,10 @@ xfs_qm_dqflush_check(
>   * Requires dquot flush lock, will clear the dirty flag, delete the quota log
>   * item from the AIL, and shut down the system if something goes wrong.
>   */
> -int
> +static int
>  xfs_dquot_read_buf(
>  	struct xfs_trans	*tp,
>  	struct xfs_dquot	*dqp,
> -	xfs_buf_flags_t		xbf_flags,
>  	struct xfs_buf		**bpp)
>  {
>  	struct xfs_mount	*mp = dqp->q_mount;
> @@ -1287,10 +1286,8 @@ xfs_dquot_read_buf(
>  	int			error;
>  
>  	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, dqp->q_blkno,
> -				   mp->m_quotainfo->qi_dqchunklen, xbf_flags,
> +				   mp->m_quotainfo->qi_dqchunklen, 0,
>  				   &bp, &xfs_dquot_buf_ops);
> -	if (error == -EAGAIN)
> -		return error;
>  	if (xfs_metadata_is_sick(error))
>  		xfs_dquot_mark_sick(dqp);
>  	if (error)
> @@ -1324,7 +1321,7 @@ xfs_dquot_attach_buf(
>  		struct xfs_buf	*bp = NULL;
>  
>  		spin_unlock(&qlip->qli_lock);
> -		error = xfs_dquot_read_buf(tp, dqp, 0, &bp);
> +		error = xfs_dquot_read_buf(tp, dqp, &bp);
>  		if (error)
>  			return error;
>  
> diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> index 362ca34f7c248b..1c5c911615bf7f 100644
> --- a/fs/xfs/xfs_dquot.h
> +++ b/fs/xfs/xfs_dquot.h
> @@ -214,8 +214,6 @@ void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
>  #define XFS_DQ_IS_DIRTY(dqp)	((dqp)->q_flags & XFS_DQFLAG_DIRTY)
>  
>  void		xfs_qm_dqdestroy(struct xfs_dquot *dqp);
> -int		xfs_dquot_read_buf(struct xfs_trans *tp, struct xfs_dquot *dqp,
> -				xfs_buf_flags_t flags, struct xfs_buf **bpp);
>  int		xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf *bp);
>  void		xfs_qm_dqunpin_wait(struct xfs_dquot *dqp);
>  void		xfs_qm_adjust_dqtimers(struct xfs_dquot *d);
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index a79c4a1bf27fab..e073ad51af1a3d 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -148,13 +148,13 @@ xfs_qm_dqpurge(
>  		 * We don't care about getting disk errors here. We need
>  		 * to purge this dquot anyway, so we go ahead regardless.
>  		 */
> -		error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
> +		error = xfs_dquot_use_attached_buf(dqp, &bp);
>  		if (error == -EAGAIN) {
>  			xfs_dqfunlock(dqp);
>  			dqp->q_flags &= ~XFS_DQFLAG_FREEING;
>  			goto out_unlock;
>  		}
> -		if (error)
> +		if (!bp)
>  			goto out_funlock;
>  
>  		/*
> @@ -506,8 +506,8 @@ xfs_qm_dquot_isolate(
>  		/* we have to drop the LRU lock to flush the dquot */
>  		spin_unlock(lru_lock);
>  
> -		error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
> -		if (error) {
> +		error = xfs_dquot_use_attached_buf(dqp, &bp);
> +		if (!bp || error == -EAGAIN) {
>  			xfs_dqfunlock(dqp);
>  			goto out_unlock_dirty;
>  		}
> @@ -1330,6 +1330,10 @@ xfs_qm_quotacheck_dqadjust(
>  		return error;
>  	}
>  
> +	error = xfs_dquot_attach_buf(NULL, dqp);
> +	if (error)
> +		return error;
> +
>  	trace_xfs_dqadjust(dqp);
>  
>  	/*
> @@ -1512,9 +1516,13 @@ xfs_qm_flush_one(
>  		goto out_unlock;
>  	}
>  
> -	error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp);
> +	error = xfs_dquot_use_attached_buf(dqp, &bp);
>  	if (error)
>  		goto out_unlock;
> +	if (!bp) {
> +		error = -EFSCORRUPTED;
> +		goto out_unlock;
> +	}
>  
>  	error = xfs_qm_dqflush(dqp, bp);
>  	if (!error)
>
Hi Darrick J. Wong,

Greetings!

I used Syzkaller and found that there is possible deadlock in xfs_dquot_detach_buf in linux v6.13-rc3.

After bisection and the first bad commit is:
"
ca378189fdfa xfs: convert quotacheck to attach dquot buffers
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/241216_192201_xfs_dquot_detach_buf
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/241216_192201_xfs_dquot_detach_buf/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/241216_192201_xfs_dquot_detach_buf/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/241216_192201_xfs_dquot_detach_buf/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/241216_192201_xfs_dquot_detach_buf/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/241216_192201_xfs_dquot_detach_buf/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/241216_192201_xfs_dquot_detach_buf/bzImage_78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/241216_192201_xfs_dquot_detach_buf/78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8_dmesg.log

"
[   52.971391] ======================================================
[   52.971706] WARNING: possible circular locking dependency detected
[   52.972026] 6.13.0-rc3-78d4f34e2115+ #1 Not tainted
[   52.972282] ------------------------------------------------------
[   52.972596] repro/673 is trying to acquire lock:
[   52.972842] ffff88802366b510 (&lp->qli_lock){+.+.}-{3:3}, at: xfs_dquot_detach_buf+0x2d/0x190
[   52.973324]
[   52.973324] but task is already holding lock:
[   52.973617] ffff888015681b30 (&l->lock){+.+.}-{3:3}, at: __list_lru_walk_one+0x409/0x810
[   52.974039]
[   52.974039] which lock already depends on the new lock.
[   52.974039]
[   52.974442]
[   52.974442] the existing dependency chain (in reverse order) is:
[   52.974815]
[   52.974815] -> #3 (&l->lock){+.+.}-{3:3}:
[   52.975100]        lock_acquire+0x80/0xb0
[   52.975319]        _raw_spin_lock+0x38/0x50
[   52.975550]        list_lru_add+0x198/0x5d0
[   52.975770]        list_lru_add_obj+0x207/0x360
[   52.976008]        xfs_buf_rele+0xcb6/0x1610
[   52.976243]        xfs_trans_brelse+0x385/0x410
[   52.976484]        xfs_imap_lookup+0x38d/0x5a0
[   52.976719]        xfs_imap+0x668/0xc80
[   52.976923]        xfs_iget+0x875/0x2dd0
[   52.977129]        xfs_mountfs+0x116b/0x2060
[   52.977360]        xfs_fs_fill_super+0x12bc/0x1f10
[   52.977612]        get_tree_bdev_flags+0x3d8/0x6c0
[   52.977869]        get_tree_bdev+0x29/0x40
[   52.978086]        xfs_fs_get_tree+0x26/0x30
[   52.978310]        vfs_get_tree+0x9e/0x390
[   52.978526]        path_mount+0x707/0x2000
[   52.978742]        __x64_sys_mount+0x2bf/0x350
[   52.978974]        x64_sys_call+0x1e1d/0x2140
[   52.979210]        do_syscall_64+0x6d/0x140
[   52.979431]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   52.979723]
[   52.979723] -> #2 (&bch->bc_lock){+.+.}-{3:3}:
[   52.980029]        lock_acquire+0x80/0xb0
[   52.980240]        _raw_spin_lock+0x38/0x50
[   52.980456]        _atomic_dec_and_lock+0xb8/0x100
[   52.980712]        xfs_buf_rele+0x112/0x1610
[   52.980937]        xfs_trans_brelse+0x385/0x410
[   52.981175]        xfs_imap_lookup+0x38d/0x5a0
[   52.981410]        xfs_imap+0x668/0xc80
[   52.981612]        xfs_iget+0x875/0x2dd0
[   52.981816]        xfs_mountfs+0x116b/0x2060
[   52.982042]        xfs_fs_fill_super+0x12bc/0x1f10
[   52.982289]        get_tree_bdev_flags+0x3d8/0x6c0
[   52.982540]        get_tree_bdev+0x29/0x40
[   52.982756]        xfs_fs_get_tree+0x26/0x30
[   52.982979]        vfs_get_tree+0x9e/0x390
[   52.983194]        path_mount+0x707/0x2000
[   52.983406]        __x64_sys_mount+0x2bf/0x350
[   52.983637]        x64_sys_call+0x1e1d/0x2140
[   52.983865]        do_syscall_64+0x6d/0x140
[   52.984081]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   52.984366]
[   52.984366] -> #1 (&bp->b_lock){+.+.}-{3:3}:
[   52.984665]        lock_acquire+0x80/0xb0
[   52.984876]        _raw_spin_lock+0x38/0x50
[   52.985092]        xfs_buf_rele+0x106/0x1610
[   52.985319]        xfs_trans_brelse+0x385/0x410
[   52.985556]        xfs_dquot_attach_buf+0x312/0x490
[   52.985806]        xfs_qm_quotacheck_dqadjust+0x122/0x580
[   52.986083]        xfs_qm_dqusage_adjust+0x5e0/0x7c0
[   52.986340]        xfs_iwalk_ag_recs+0x423/0x780
[   52.986579]        xfs_iwalk_run_callbacks+0x1e2/0x540
[   52.986842]        xfs_iwalk_ag+0x6e6/0x920
[   52.987061]        xfs_iwalk_ag_work+0x160/0x1e0
[   52.987301]        xfs_pwork_work+0x8b/0x180
[   52.987528]        process_one_work+0x92e/0x1b60
[   52.987770]        worker_thread+0x68d/0xe90
[   52.987992]        kthread+0x35a/0x470
[   52.988186]        ret_from_fork+0x56/0x90
[   52.988401]        ret_from_fork_asm+0x1a/0x30
[   52.988627]
[   52.988627] -> #0 (&lp->qli_lock){+.+.}-{3:3}:
[   52.988924]        __lock_acquire+0x2ff8/0x5d60
[   52.989156]        lock_acquire.part.0+0x142/0x390
[   52.989402]        lock_acquire+0x80/0xb0
[   52.989609]        _raw_spin_lock+0x38/0x50
[   52.989820]        xfs_dquot_detach_buf+0x2d/0x190
[   52.990061]        xfs_qm_dquot_isolate+0x1c6/0x12f0
[   52.990312]        __list_lru_walk_one+0x31a/0x810
[   52.990553]        list_lru_walk_one+0x4c/0x60
[   52.990781]        xfs_qm_shrink_scan+0x1d0/0x380
[   52.991020]        do_shrink_slab+0x410/0x10a0
[   52.991253]        shrink_slab+0x349/0x1370
[   52.991469]        drop_slab+0xf5/0x1f0
[   52.991667]        drop_caches_sysctl_handler+0x179/0x1a0
[   52.991943]        proc_sys_call_handler+0x418/0x610
[   52.992197]        proc_sys_write+0x2c/0x40
[   52.992409]        vfs_write+0xc59/0x1140
[   52.992613]        ksys_write+0x14f/0x280
[   52.992817]        __x64_sys_write+0x7b/0xc0
[   52.993034]        x64_sys_call+0x16b3/0x2140
[   52.993259]        do_syscall_64+0x6d/0x140
[   52.993470]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   52.993748]
[   52.993748] other info that might help us debug this:
[   52.993748]
[   52.994133] Chain exists of:
[   52.994133]   &lp->qli_lock --> &bch->bc_lock --> &l->lock
[   52.994133]
[   52.994613]  Possible unsafe locking scenario:
[   52.994613]
[   52.994901]        CPU0                    CPU1
[   52.995125]        ----                    ----
[   52.995348]   lock(&l->lock);
[   52.995505]                                lock(&bch->bc_lock);
[   52.995797]                                lock(&l->lock);
[   52.996067]   lock(&lp->qli_lock);
[   52.996242]
[   52.996242]  *** DEADLOCK ***
[   52.996242]
[   52.996530] 3 locks held by repro/673:
[   52.996719]  #0: ffff888012734408 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x14f/0x280
[   52.997127]  #1: ffff888015681b30 (&l->lock){+.+.}-{3:3}, at: __list_lru_walk_one+0x409/0x810
[   52.997559]  #2: ffff88802366b5f8 (&dqp->q_qlock){+.+.}-{4:4}, at: xfs_qm_dquot_isolate+0x8f/0x12f0
[   52.998011]
[   52.998011] stack backtrace:
[   52.998230] CPU: 1 UID: 0 PID: 673 Comm: repro Not tainted 6.13.0-rc3-78d4f34e2115+ #1
[   52.998617] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qem4
[   52.999165] Call Trace:
[   52.999296]  <TASK>
[   52.999411]  dump_stack_lvl+0xea/0x150
[   52.999609]  dump_stack+0x19/0x20
[   52.999786]  print_circular_bug+0x47f/0x750
[   53.000002]  check_noncircular+0x2f4/0x3e0
[   53.000213]  ? __pfx_check_noncircular+0x10/0x10
[   53.000453]  ? __pfx_lockdep_lock+0x10/0x10
[   53.000668]  __lock_acquire+0x2ff8/0x5d60
[   53.000881]  ? __pfx___lock_acquire+0x10/0x10
[   53.001105]  ? __kasan_check_read+0x15/0x20
[   53.001324]  ? mark_lock.part.0+0xf3/0x17b0
[   53.001539]  ? __this_cpu_preempt_check+0x21/0x30
[   53.001779]  ? lock_acquire.part.0+0x152/0x390
[   53.002008]  lock_acquire.part.0+0x142/0x390
[   53.002228]  ? xfs_dquot_detach_buf+0x2d/0x190
[   53.002456]  ? __pfx_lock_acquire.part.0+0x10/0x10
[   53.002702]  ? debug_smp_processor_id+0x20/0x30
[   53.002933]  ? rcu_is_watching+0x19/0xc0
[   53.003141]  ? trace_lock_acquire+0x13d/0x1b0
[   53.003366]  lock_acquire+0x80/0xb0
[   53.003549]  ? xfs_dquot_detach_buf+0x2d/0x190
[   53.003776]  _raw_spin_lock+0x38/0x50
[   53.003965]  ? xfs_dquot_detach_buf+0x2d/0x190
[   53.004190]  xfs_dquot_detach_buf+0x2d/0x190
[   53.004408]  xfs_qm_dquot_isolate+0x1c6/0x12f0
[   53.004638]  ? __pfx_xfs_qm_dquot_isolate+0x10/0x10
[   53.004886]  ? lock_acquire+0x80/0xb0
[   53.005080]  __list_lru_walk_one+0x31a/0x810
[   53.005306]  ? __pfx_xfs_qm_dquot_isolate+0x10/0x10
[   53.005555]  ? __pfx_xfs_qm_dquot_isolate+0x10/0x10
[   53.005803]  list_lru_walk_one+0x4c/0x60
[   53.006006]  xfs_qm_shrink_scan+0x1d0/0x380
[   53.006221]  ? __pfx_xfs_qm_shrink_scan+0x10/0x10
[   53.006465]  do_shrink_slab+0x410/0x10a0
[   53.006673]  shrink_slab+0x349/0x1370
[   53.006864]  ? __this_cpu_preempt_check+0x21/0x30
[   53.007103]  ? lock_release+0x441/0x870
[   53.007303]  ? __pfx_lock_release+0x10/0x10
[   53.007517]  ? shrink_slab+0x161/0x1370
[   53.007717]  ? __pfx_shrink_slab+0x10/0x10
[   53.007931]  ? mem_cgroup_iter+0x3a6/0x670
[   53.008147]  drop_slab+0xf5/0x1f0
[   53.008324]  drop_caches_sysctl_handler+0x179/0x1a0
[   53.008575]  proc_sys_call_handler+0x418/0x610
[   53.008803]  ? __pfx_proc_sys_call_handler+0x10/0x10
[   53.009054]  ? rcu_is_watching+0x19/0xc0
[   53.009263]  ? __this_cpu_preempt_check+0x21/0x30
[   53.009504]  proc_sys_write+0x2c/0x40
[   53.009694]  vfs_write+0xc59/0x1140
[   53.009876]  ? __pfx_proc_sys_write+0x10/0x10
[   53.010101]  ? __pfx_vfs_write+0x10/0x10
[   53.010307]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[   53.010583]  ksys_write+0x14f/0x280
[   53.010765]  ? __pfx_ksys_write+0x10/0x10
[   53.010971]  ? __audit_syscall_entry+0x39c/0x500
[   53.011211]  __x64_sys_write+0x7b/0xc0
[   53.011404]  ? syscall_trace_enter+0x14f/0x280
[   53.011633]  x64_sys_call+0x16b3/0x2140
[   53.011831]  do_syscall_64+0x6d/0x140
[   53.012020]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   53.012275] RIP: 0033:0x7f56d423ee5d
[   53.012463] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 8
[   53.013348] RSP: 002b:00007ffcc7ab57f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   53.013723] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f56d423ee5d
[   53.014070] RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000004
[   53.014416] RBP: 00007ffcc7ab5810 R08: 00007ffcc7ab5810 R09: 00007ffcc7ab5810
[   53.014763] R10: 002c646975756f6e R11: 0000000000000202 R12: 00007ffcc7ab5968
[   53.015110] R13: 0000000000402d04 R14: 000000000041ae08 R15: 00007f56d45aa000
[   53.015463]  </TASK>
"

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-12-17  2:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241126011838.GI9438@frogsfrogsfrogs>
2024-11-26  1:20 ` [PATCHSET] xfs: bug fixes for 6.13 Darrick J. Wong
2024-11-26  1:24   ` [PATCH 01/21] xfs: fix off-by-one error in fsmap's end_daddr usage Darrick J. Wong
2024-11-26  1:25   ` [PATCH 04/21] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
2024-11-26  1:26   ` [PATCH 05/21] xfs: don't drop errno values when we fail to ficlone the entire range Darrick J. Wong
2024-11-26  1:26   ` [PATCH 06/21] xfs: separate healthy clearing mask during repair Darrick J. Wong
2024-11-26  1:26   ` [PATCH 07/21] xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink Darrick J. Wong
2024-11-26  1:27   ` [PATCH 09/21] xfs: fix null bno_hint handling in xfs_rtallocate_rtg Darrick J. Wong
2024-11-26  1:27   ` [PATCH 11/21] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
2024-11-26  5:01     ` Christoph Hellwig
2024-11-26  1:27   ` [PATCH 12/21] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong
2024-11-26  5:01     ` Christoph Hellwig
2024-11-26  1:28   ` [PATCH 13/21] xfs: unlock inodes when erroring out of xfs_trans_alloc_dir Darrick J. Wong
2024-11-26  5:03     ` Christoph Hellwig
2024-11-26  1:28   ` [PATCH 14/21] xfs: only run precommits once per transaction object Darrick J. Wong
2024-11-26  5:09     ` Christoph Hellwig
2024-11-26  1:28   ` [PATCH 16/21] xfs: don't lose solo superblock counter update transactions Darrick J. Wong
2024-11-26  5:14     ` Christoph Hellwig
2024-11-26 18:23       ` Darrick J. Wong
2024-11-26  1:29   ` [PATCH 17/21] xfs: don't lose solo dquot " Darrick J. Wong
2024-11-26  5:18     ` Christoph Hellwig
2024-11-26 18:23       ` Darrick J. Wong
2024-11-26  1:29   ` [PATCH 20/21] xfs: attach dquot buffer to dquot log item buffer Darrick J. Wong
2024-11-26  5:42     ` Christoph Hellwig
2024-11-26  1:30   ` [PATCH 21/21] xfs: convert quotacheck to attach dquot buffers Darrick J. Wong
2024-11-26  5:42     ` Christoph Hellwig
2024-12-17  2:06     ` Lai, Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox