[PATCH 6.1 00/29] patches for 6.1.y from 6.8

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 6.1 00/29] patches for 6.1.y from 6.8
@ 2025-03-13 20:25 Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code Leah Rumancik
                   ` (28 more replies)
  0 siblings, 29 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Leah Rumancik

Hello,

(This series has already been ack'd on the xfs-stable mailing list.)

Here is the 6.1.y series corresponding to the 6.6.y series for 6.8
(https://lore.kernel.org/all/20240325220724.42216-1-catherine.hoang@oracle.com/#r).

Descrepancies between the patch series are as follows...

The following were added as dependencies (9 patches):

0b11553ec54a6d88907e60d0595dbcef98539747
xfs: pass refcount intent directly through the log intent code
(v6.3-rc1~142^2~5)

72ba455599ad13d08c29dafa22a32360e07b1961
xfs: pass xfs_extent_free_item directly through the log intent code
(v6.3-rc1~142^2~9)

578c714b215d474c52949e65a914dae67924f0fe
xfs: fix confusing xfs_extent_item variable names
(v6.3-rc1~142^2~8)

ddccb81b26ec021ae1f3366aa996cc4c68dd75ce
xfs: pass the xfs_bmbt_irec directly through the log intent code
(v6.3-rc1~142^2~11)

b2ccab3199aa7cea9154d80ea2585312c5f6eba0
xfs: pass per-ag references to xfs_free_extent
(v6.4-rc1~80^2~22^2~3)

7dfee17b13e5024c5c0ab1911859ded4182de3e5
xfs: validate block number being freed before adding to xefi
(v6.4-rc6~19^2~1)

fix of 7dfee17b13e:
2bed0d82c2f78b91a0a9a5a73da57ee883a0c070
xfs: fix bounds check in xfs_defer_agfl_block()
(v6.5-rc1~44^2~10)

b742d7b4f0e03df25c2a772adcded35044b625ca
xfs: use deferred frees for btree block freeing
(v6.5-rc1~44^2~16)

3c919b0910906cc69d76dea214776f0eac73358b
xfs: reserve less log space when recovering log intent items
(v6.6-rc3~13^2~5^2)


And the following were skipped for 6.1.y (4 patches):

fb6e584e74710a1b7caee9dac59b494a37e07a62 (scrub)
xfs: make xchk_iget safer in the presence of corrupt inode btrees

c0e37f07d2bd3c1ee3fb5a650da7d8673557ed16 (scrub)
xfs: fix an off-by-one error in xreap_agextent_binval

b9358db0a811ff698b0a743bcfb80dfc44b88ebd (scrub)
xfs: add missing nrext64 inode flag check to scrub

84712492e6dab803bf595fb8494d11098b74a652 (already in 6.1.y)
xfs: short circuit xfs_growfs_data_private() if delta is zero


The auto group was run 1x on each of these configs:

xfs/4k
xfs/1k
xfs/logdev
xfs/realtime
xfs/quota
xfs/v4
xfs/dax
xfs/adv
xfs/dirblock_8k

and no regressions were seen.

Let me know if you see any issues. Thanks,
Leah


Andrey Albershteyn (1):
  xfs: reset XFS_ATTR_INCOMPLETE filter on node removal

Christoph Hellwig (1):
  xfs: consider minlen sized extents in xfs_rtallocate_extent_block

Darrick J. Wong (19):
  xfs: pass refcount intent directly through the log intent code
  xfs: pass xfs_extent_free_item directly through the log intent code
  xfs: fix confusing xfs_extent_item variable names
  xfs: pass the xfs_bmbt_irec directly through the log intent code
  xfs: pass per-ag references to xfs_free_extent
  xfs: reserve less log space when recovering log intent items
  xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h
  xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t
  xfs: don't leak recovered attri intent items
  xfs: use xfs_defer_pending objects to recover intent items
  xfs: pass the xfs_defer_pending object to iop_recover
  xfs: transfer recovered intent item ownership in ->iop_recover
  xfs: make rextslog computation consistent with mkfs
  xfs: fix 32-bit truncation in xfs_compute_rextslog
  xfs: don't allow overly small or large realtime volumes
  xfs: remove unused fields from struct xbtree_ifakeroot
  xfs: recompute growfsrtfree transaction reservation while growing rt
    volume
  xfs: force all buffers to be written during btree bulk load
  xfs: remove conditional building of rt geometry validator functions

Dave Chinner (4):
  xfs: validate block number being freed before adding to xefi
  xfs: fix bounds check in xfs_defer_agfl_block()
  xfs: use deferred frees for btree block freeing
  xfs: initialise di_crc in xfs_log_dinode

Jiachen Zhang (1):
  xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real

Long Li (2):
  xfs: add lock protection when remove perag from radix tree
  xfs: fix perag leak when growfs fails

Zhang Tianci (1):
  xfs: update dir3 leaf block metadata after swap

 fs/xfs/libxfs/xfs_ag.c             |  45 ++++++++---
 fs/xfs/libxfs/xfs_ag.h             |   3 +
 fs/xfs/libxfs/xfs_alloc.c          |  70 ++++++++++-------
 fs/xfs/libxfs/xfs_alloc.h          |  20 +++--
 fs/xfs/libxfs/xfs_attr.c           |   6 +-
 fs/xfs/libxfs/xfs_bmap.c           | 121 ++++++++++++++--------------
 fs/xfs/libxfs/xfs_bmap.h           |   5 +-
 fs/xfs/libxfs/xfs_bmap_btree.c     |   8 +-
 fs/xfs/libxfs/xfs_btree_staging.c  |   4 +-
 fs/xfs/libxfs/xfs_btree_staging.h  |   6 --
 fs/xfs/libxfs/xfs_da_btree.c       |   7 ++
 fs/xfs/libxfs/xfs_defer.c          | 103 +++++++++++++++++-------
 fs/xfs/libxfs/xfs_defer.h          |   5 ++
 fs/xfs/libxfs/xfs_format.h         |   2 +-
 fs/xfs/libxfs/xfs_ialloc.c         |  24 ++++--
 fs/xfs/libxfs/xfs_ialloc_btree.c   |   6 +-
 fs/xfs/libxfs/xfs_log_recover.h    |  27 +++++++
 fs/xfs/libxfs/xfs_refcount.c       | 116 +++++++++++++--------------
 fs/xfs/libxfs/xfs_refcount.h       |   4 +-
 fs/xfs/libxfs/xfs_refcount_btree.c |   9 +--
 fs/xfs/libxfs/xfs_rtbitmap.c       |   2 +
 fs/xfs/libxfs/xfs_rtbitmap.h       |  83 ++++++++++++++++++++
 fs/xfs/libxfs/xfs_sb.c             |  20 ++++-
 fs/xfs/libxfs/xfs_sb.h             |   2 +
 fs/xfs/libxfs/xfs_types.h          |  13 +++
 fs/xfs/scrub/repair.c              |   3 +-
 fs/xfs/scrub/rtbitmap.c            |   3 +-
 fs/xfs/xfs_attr_item.c             |  30 +++----
 fs/xfs/xfs_bmap_item.c             |  99 ++++++++++-------------
 fs/xfs/xfs_buf.c                   |  44 ++++++++++-
 fs/xfs/xfs_buf.h                   |   1 +
 fs/xfs/xfs_extfree_item.c          | 122 ++++++++++++++++-------------
 fs/xfs/xfs_fsmap.c                 |   2 +-
 fs/xfs/xfs_fsops.c                 |   5 +-
 fs/xfs/xfs_inode_item.c            |   3 +
 fs/xfs/xfs_log.c                   |   1 +
 fs/xfs/xfs_log_priv.h              |   1 +
 fs/xfs/xfs_log_recover.c           | 118 +++++++++++++++-------------
 fs/xfs/xfs_refcount_item.c         |  81 +++++++++----------
 fs/xfs/xfs_reflink.c               |   7 +-
 fs/xfs/xfs_rmap_item.c             |  20 ++---
 fs/xfs/xfs_rtalloc.c               |  14 +++-
 fs/xfs/xfs_rtalloc.h               |  73 -----------------
 fs/xfs/xfs_trace.h                 |  15 +---
 fs/xfs/xfs_trans.h                 |   4 +-
 45 files changed, 782 insertions(+), 575 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h

-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item " Leah Rumancik
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 0b11553ec54a6d88907e60d0595dbcef98539747 ]

Pass the incore refcount intent through the CUI logging code instead of
repeatedly boxing and unboxing parameters.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_refcount.c | 96 ++++++++++++++++--------------------
 fs/xfs/libxfs/xfs_refcount.h |  4 +-
 fs/xfs/xfs_refcount_item.c   | 62 ++++++++++-------------
 fs/xfs/xfs_trace.h           | 15 ++----
 4 files changed, 74 insertions(+), 103 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 6f7ed9288fe4..bcf46aa0d08b 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1211,60 +1211,56 @@ xfs_refcount_adjust_extents(
 
 /* Adjust the reference count of a range of AG blocks. */
 STATIC int
 xfs_refcount_adjust(
 	struct xfs_btree_cur	*cur,
-	xfs_agblock_t		agbno,
-	xfs_extlen_t		aglen,
-	xfs_agblock_t		*new_agbno,
-	xfs_extlen_t		*new_aglen,
+	xfs_agblock_t		*agbno,
+	xfs_extlen_t		*aglen,
 	enum xfs_refc_adjust_op	adj)
 {
 	bool			shape_changed;
 	int			shape_changes = 0;
 	int			error;
 
-	*new_agbno = agbno;
-	*new_aglen = aglen;
 	if (adj == XFS_REFCOUNT_ADJUST_INCREASE)
-		trace_xfs_refcount_increase(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-				agbno, aglen);
+		trace_xfs_refcount_increase(cur->bc_mp,
+				cur->bc_ag.pag->pag_agno, *agbno, *aglen);
 	else
-		trace_xfs_refcount_decrease(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-				agbno, aglen);
+		trace_xfs_refcount_decrease(cur->bc_mp,
+				cur->bc_ag.pag->pag_agno, *agbno, *aglen);
 
 	/*
 	 * Ensure that no rcextents cross the boundary of the adjustment range.
 	 */
 	error = xfs_refcount_split_extent(cur, XFS_REFC_DOMAIN_SHARED,
-			agbno, &shape_changed);
+			*agbno, &shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
 		shape_changes++;
 
 	error = xfs_refcount_split_extent(cur, XFS_REFC_DOMAIN_SHARED,
-			agbno + aglen, &shape_changed);
+			*agbno + *aglen, &shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
 		shape_changes++;
 
 	/*
 	 * Try to merge with the left or right extents of the range.
 	 */
 	error = xfs_refcount_merge_extents(cur, XFS_REFC_DOMAIN_SHARED,
-			new_agbno, new_aglen, adj, &shape_changed);
+			agbno, aglen, adj, &shape_changed);
 	if (error)
 		goto out_error;
 	if (shape_changed)
 		shape_changes++;
 	if (shape_changes)
 		cur->bc_ag.refc.shape_changes++;
 
 	/* Now that we've taken care of the ends, adjust the middle extents */
-	error = xfs_refcount_adjust_extents(cur, new_agbno, new_aglen, adj);
+	error = xfs_refcount_adjust_extents(cur, agbno, aglen, adj);
 	if (error)
 		goto out_error;
 
 	return 0;
 
@@ -1296,62 +1292,56 @@ xfs_refcount_finish_one_cleanup(
  * Checks to make sure we're not going to run off the end of the AG.
  */
 static inline int
 xfs_refcount_continue_op(
 	struct xfs_btree_cur		*cur,
-	xfs_fsblock_t			startblock,
-	xfs_agblock_t			new_agbno,
-	xfs_extlen_t			new_len,
-	xfs_fsblock_t			*new_fsbno)
+	struct xfs_refcount_intent	*ri,
+	xfs_agblock_t			new_agbno)
 {
 	struct xfs_mount		*mp = cur->bc_mp;
 	struct xfs_perag		*pag = cur->bc_ag.pag;
 
-	if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, new_len)))
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno,
+					ri->ri_blockcount)))
 		return -EFSCORRUPTED;
 
-	*new_fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+	ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
 
-	ASSERT(xfs_verify_fsbext(mp, *new_fsbno, new_len));
-	ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, *new_fsbno));
+	ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount));
+	ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock));
 
 	return 0;
 }
 
 /*
  * Process one of the deferred refcount operations.  We pass back the
  * btree cursor to maintain our lock on the btree between calls.
  * This saves time and eliminates a buffer deadlock between the
  * superblock and the AGF because we'll always grab them in the same
  * order.
  */
 int
 xfs_refcount_finish_one(
 	struct xfs_trans		*tp,
-	enum xfs_refcount_intent_type	type,
-	xfs_fsblock_t			startblock,
-	xfs_extlen_t			blockcount,
-	xfs_fsblock_t			*new_fsb,
-	xfs_extlen_t			*new_len,
+	struct xfs_refcount_intent	*ri,
 	struct xfs_btree_cur		**pcur)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_btree_cur		*rcur;
 	struct xfs_buf			*agbp = NULL;
 	int				error = 0;
 	xfs_agblock_t			bno;
-	xfs_agblock_t			new_agbno;
 	unsigned long			nr_ops = 0;
 	int				shape_changes = 0;
 	struct xfs_perag		*pag;
 
-	pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, startblock));
-	bno = XFS_FSB_TO_AGBNO(mp, startblock);
+	pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ri->ri_startblock));
+	bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock);
 
-	trace_xfs_refcount_deferred(mp, XFS_FSB_TO_AGNO(mp, startblock),
-			type, XFS_FSB_TO_AGBNO(mp, startblock),
-			blockcount);
+	trace_xfs_refcount_deferred(mp, XFS_FSB_TO_AGNO(mp, ri->ri_startblock),
+			ri->ri_type, XFS_FSB_TO_AGBNO(mp, ri->ri_startblock),
+			ri->ri_blockcount);
 
 	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) {
 		error = -EIO;
 		goto out_drop;
 	}
@@ -1378,46 +1368,46 @@ xfs_refcount_finish_one(
 		rcur->bc_ag.refc.nr_ops = nr_ops;
 		rcur->bc_ag.refc.shape_changes = shape_changes;
 	}
 	*pcur = rcur;
 
-	switch (type) {
+	switch (ri->ri_type) {
 	case XFS_REFCOUNT_INCREASE:
-		error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno,
-				new_len, XFS_REFCOUNT_ADJUST_INCREASE);
+		error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount,
+				XFS_REFCOUNT_ADJUST_INCREASE);
 		if (error)
 			goto out_drop;
-		if (*new_len > 0)
-			error = xfs_refcount_continue_op(rcur, startblock,
-					new_agbno, *new_len, new_fsb);
+		if (ri->ri_blockcount > 0)
+			error = xfs_refcount_continue_op(rcur, ri, bno);
 		break;
 	case XFS_REFCOUNT_DECREASE:
-		error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno,
-				new_len, XFS_REFCOUNT_ADJUST_DECREASE);
+		error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount,
+				XFS_REFCOUNT_ADJUST_DECREASE);
 		if (error)
 			goto out_drop;
-		if (*new_len > 0)
-			error = xfs_refcount_continue_op(rcur, startblock,
-					new_agbno, *new_len, new_fsb);
+		if (ri->ri_blockcount > 0)
+			error = xfs_refcount_continue_op(rcur, ri, bno);
 		break;
 	case XFS_REFCOUNT_ALLOC_COW:
-		*new_fsb = startblock + blockcount;
-		*new_len = 0;
-		error = __xfs_refcount_cow_alloc(rcur, bno, blockcount);
+		error = __xfs_refcount_cow_alloc(rcur, bno, ri->ri_blockcount);
+		if (error)
+			goto out_drop;
+		ri->ri_blockcount = 0;
 		break;
 	case XFS_REFCOUNT_FREE_COW:
-		*new_fsb = startblock + blockcount;
-		*new_len = 0;
-		error = __xfs_refcount_cow_free(rcur, bno, blockcount);
+		error = __xfs_refcount_cow_free(rcur, bno, ri->ri_blockcount);
+		if (error)
+			goto out_drop;
+		ri->ri_blockcount = 0;
 		break;
 	default:
 		ASSERT(0);
 		error = -EFSCORRUPTED;
 	}
-	if (!error && *new_len > 0)
-		trace_xfs_refcount_finish_one_leftover(mp, pag->pag_agno, type,
-				bno, blockcount, new_agbno, *new_len);
+	if (!error && ri->ri_blockcount > 0)
+		trace_xfs_refcount_finish_one_leftover(mp, pag->pag_agno,
+				ri->ri_type, bno, ri->ri_blockcount);
 out_drop:
 	xfs_perag_put(pag);
 	return error;
 }
 
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 452f30556f5a..c633477ce3ce 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -73,13 +73,11 @@ void xfs_refcount_decrease_extent(struct xfs_trans *tp,
 		struct xfs_bmbt_irec *irec);
 
 extern void xfs_refcount_finish_one_cleanup(struct xfs_trans *tp,
 		struct xfs_btree_cur *rcur, int error);
 extern int xfs_refcount_finish_one(struct xfs_trans *tp,
-		enum xfs_refcount_intent_type type, xfs_fsblock_t startblock,
-		xfs_extlen_t blockcount, xfs_fsblock_t *new_fsb,
-		xfs_extlen_t *new_len, struct xfs_btree_cur **pcur);
+		struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur);
 
 extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur,
 		xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
 		xfs_extlen_t *flen, bool find_end_of_shared);
 
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 858e3e9eb4a8..ff4d5087ba00 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -250,21 +250,16 @@ xfs_trans_get_cud(
  */
 static int
 xfs_trans_log_finish_refcount_update(
 	struct xfs_trans		*tp,
 	struct xfs_cud_log_item		*cudp,
-	enum xfs_refcount_intent_type	type,
-	xfs_fsblock_t			startblock,
-	xfs_extlen_t			blockcount,
-	xfs_fsblock_t			*new_fsb,
-	xfs_extlen_t			*new_len,
+	struct xfs_refcount_intent	*ri,
 	struct xfs_btree_cur		**pcur)
 {
 	int				error;
 
-	error = xfs_refcount_finish_one(tp, type, startblock,
-			blockcount, new_fsb, new_len, pcur);
+	error = xfs_refcount_finish_one(tp, ri, pcur);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
 	 *
@@ -376,29 +371,24 @@ xfs_refcount_update_finish_item(
 	struct xfs_trans		*tp,
 	struct xfs_log_item		*done,
 	struct list_head		*item,
 	struct xfs_btree_cur		**state)
 {
-	struct xfs_refcount_intent	*refc;
-	xfs_fsblock_t			new_fsb;
-	xfs_extlen_t			new_aglen;
+	struct xfs_refcount_intent	*ri;
 	int				error;
 
-	refc = container_of(item, struct xfs_refcount_intent, ri_list);
-	error = xfs_trans_log_finish_refcount_update(tp, CUD_ITEM(done),
-			refc->ri_type, refc->ri_startblock, refc->ri_blockcount,
-			&new_fsb, &new_aglen, state);
+	ri = container_of(item, struct xfs_refcount_intent, ri_list);
+	error = xfs_trans_log_finish_refcount_update(tp, CUD_ITEM(done), ri,
+			state);
 
 	/* Did we run out of reservation?  Requeue what we didn't finish. */
-	if (!error && new_aglen > 0) {
-		ASSERT(refc->ri_type == XFS_REFCOUNT_INCREASE ||
-		       refc->ri_type == XFS_REFCOUNT_DECREASE);
-		refc->ri_startblock = new_fsb;
-		refc->ri_blockcount = new_aglen;
+	if (!error && ri->ri_blockcount > 0) {
+		ASSERT(ri->ri_type == XFS_REFCOUNT_INCREASE ||
+		       ri->ri_type == XFS_REFCOUNT_DECREASE);
 		return -EAGAIN;
 	}
-	kmem_cache_free(xfs_refcount_intent_cache, refc);
+	kmem_cache_free(xfs_refcount_intent_cache, ri);
 	return error;
 }
 
 /* Abort all pending CUIs. */
 STATIC void
@@ -461,22 +451,17 @@ xfs_cui_validate_phys(
 STATIC int
 xfs_cui_item_recover(
 	struct xfs_log_item		*lip,
 	struct list_head		*capture_list)
 {
-	struct xfs_bmbt_irec		irec;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
-	struct xfs_phys_extent		*refc;
 	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
-	xfs_fsblock_t			new_fsb;
-	xfs_extlen_t			new_len;
 	unsigned int			refc_type;
 	bool				requeue_only = false;
-	enum xfs_refcount_intent_type	type;
 	int				i;
 	int				error = 0;
 
 	/*
 	 * First check the validity of the extents described by the
@@ -511,45 +496,50 @@ xfs_cui_item_recover(
 		return error;
 
 	cudp = xfs_trans_get_cud(tp, cuip);
 
 	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
+		struct xfs_refcount_intent	fake = { };
+		struct xfs_phys_extent		*refc;
+
 		refc = &cuip->cui_format.cui_extents[i];
 		refc_type = refc->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK;
 		switch (refc_type) {
 		case XFS_REFCOUNT_INCREASE:
 		case XFS_REFCOUNT_DECREASE:
 		case XFS_REFCOUNT_ALLOC_COW:
 		case XFS_REFCOUNT_FREE_COW:
-			type = refc_type;
+			fake.ri_type = refc_type;
 			break;
 		default:
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					&cuip->cui_format,
 					sizeof(cuip->cui_format));
 			error = -EFSCORRUPTED;
 			goto abort_error;
 		}
-		if (requeue_only) {
-			new_fsb = refc->pe_startblock;
-			new_len = refc->pe_len;
-		} else
+
+		fake.ri_startblock = refc->pe_startblock;
+		fake.ri_blockcount = refc->pe_len;
+		if (!requeue_only)
 			error = xfs_trans_log_finish_refcount_update(tp, cudp,
-				type, refc->pe_startblock, refc->pe_len,
-				&new_fsb, &new_len, &rcur);
+					&fake, &rcur);
 		if (error == -EFSCORRUPTED)
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					&cuip->cui_format,
 					sizeof(cuip->cui_format));
 		if (error)
 			goto abort_error;
 
 		/* Requeue what we didn't finish. */
-		if (new_len > 0) {
-			irec.br_startblock = new_fsb;
-			irec.br_blockcount = new_len;
-			switch (type) {
+		if (fake.ri_blockcount > 0) {
+			struct xfs_bmbt_irec	irec = {
+				.br_startblock	= fake.ri_startblock,
+				.br_blockcount	= fake.ri_blockcount,
+			};
+
+			switch (fake.ri_type) {
 			case XFS_REFCOUNT_INCREASE:
 				xfs_refcount_increase_extent(tp, &irec);
 				break;
 			case XFS_REFCOUNT_DECREASE:
 				xfs_refcount_decrease_extent(tp, &irec);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 0cd62031e53f..20e2ec8b73aa 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3206,39 +3206,32 @@ DEFINE_AG_ERROR_EVENT(xfs_refcount_find_shared_error);
 DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_defer);
 DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_deferred);
 
 TRACE_EVENT(xfs_refcount_finish_one_leftover,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
-		 int type, xfs_agblock_t agbno, xfs_extlen_t len,
-		 xfs_agblock_t new_agbno, xfs_extlen_t new_len),
-	TP_ARGS(mp, agno, type, agbno, len, new_agbno, new_len),
+		 int type, xfs_agblock_t agbno, xfs_extlen_t len),
+	TP_ARGS(mp, agno, type, agbno, len),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
 		__field(int, type)
 		__field(xfs_agblock_t, agbno)
 		__field(xfs_extlen_t, len)
-		__field(xfs_agblock_t, new_agbno)
-		__field(xfs_extlen_t, new_len)
 	),
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
 		__entry->type = type;
 		__entry->agbno = agbno;
 		__entry->len = len;
-		__entry->new_agbno = new_agbno;
-		__entry->new_len = new_len;
 	),
-	TP_printk("dev %d:%d type %d agno 0x%x agbno 0x%x fsbcount 0x%x new_agbno 0x%x new_fsbcount 0x%x",
+	TP_printk("dev %d:%d type %d agno 0x%x agbno 0x%x fsbcount 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->type,
 		  __entry->agno,
 		  __entry->agbno,
-		  __entry->len,
-		  __entry->new_agbno,
-		  __entry->new_len)
+		  __entry->len)
 );
 
 /* simple inode-based error/%ip tracepoint class */
 DECLARE_EVENT_CLASS(xfs_inode_error_class,
 	TP_PROTO(struct xfs_inode *ip, int error, unsigned long caller_ip),
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code
  2025-03-13 20:25 ` [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable; +Cc: Leah Rumancik, Sasha Levin

[ Sasha's backport helper bot ]

Hi,

✅ All tests passed successfully. No issues detected.
No action required from the submitter.

The upstream commit SHA1 provided is correct: 0b11553ec54a6d88907e60d0595dbcef98539747

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Darrick J. Wong<djwong@kernel.org>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
1:  0b11553ec54a6 ! 1:  17e9e595a5fc0 xfs: pass refcount intent directly through the log intent code
    @@ Metadata
      ## Commit message ##
         xfs: pass refcount intent directly through the log intent code
     
    +    [ Upstream commit 0b11553ec54a6d88907e60d0595dbcef98539747 ]
    +
         Pass the incore refcount intent through the CUI logging code instead of
         repeatedly boxing and unboxing parameters.
     
         Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/libxfs/xfs_refcount.c ##
     @@ fs/xfs/libxfs/xfs_refcount.c: xfs_refcount_adjust_extents(
---

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item directly through the log intent code
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names Leah Rumancik
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 72ba455599ad13d08c29dafa22a32360e07b1961 ]

Pass the incore xfs_extent_free_item through the EFI logging code
instead of repeatedly boxing and unboxing parameters.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_extfree_item.c | 55 +++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index d5130d1fcfae..618d2f9ff535 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -343,42 +343,49 @@ xfs_trans_get_efd(
  */
 static int
 xfs_trans_free_extent(
 	struct xfs_trans		*tp,
 	struct xfs_efd_log_item		*efdp,
-	xfs_fsblock_t			start_block,
-	xfs_extlen_t			ext_len,
-	const struct xfs_owner_info	*oinfo,
-	bool				skip_discard)
+	struct xfs_extent_free_item	*free)
 {
+	struct xfs_owner_info		oinfo = { };
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_extent		*extp;
 	uint				next_extent;
-	xfs_agnumber_t			agno = XFS_FSB_TO_AGNO(mp, start_block);
+	xfs_agnumber_t			agno = XFS_FSB_TO_AGNO(mp,
+							free->xefi_startblock);
 	xfs_agblock_t			agbno = XFS_FSB_TO_AGBNO(mp,
-								start_block);
+							free->xefi_startblock);
 	int				error;
 
-	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, ext_len);
+	oinfo.oi_owner = free->xefi_owner;
+	if (free->xefi_flags & XFS_EFI_ATTR_FORK)
+		oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
+	if (free->xefi_flags & XFS_EFI_BMBT_BLOCK)
+		oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
+
+	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno,
+			free->xefi_blockcount);
 
-	error = __xfs_free_extent(tp, start_block, ext_len,
-				  oinfo, XFS_AG_RESV_NONE, skip_discard);
+	error = __xfs_free_extent(tp, free->xefi_startblock,
+			free->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE,
+			free->xefi_flags & XFS_EFI_SKIP_DISCARD);
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
 	 *
 	 * 1.) releases the EFI and frees the EFD
 	 * 2.) shuts down the filesystem
 	 */
 	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
 
 	next_extent = efdp->efd_next_extent;
 	ASSERT(next_extent < efdp->efd_format.efd_nextents);
 	extp = &(efdp->efd_format.efd_extents[next_extent]);
-	extp->ext_start = start_block;
-	extp->ext_len = ext_len;
+	extp->ext_start = free->xefi_startblock;
+	extp->ext_len = free->xefi_blockcount;
 	efdp->efd_next_extent++;
 
 	return error;
 }
 
@@ -461,24 +468,16 @@ xfs_extent_free_finish_item(
 	struct xfs_trans		*tp,
 	struct xfs_log_item		*done,
 	struct list_head		*item,
 	struct xfs_btree_cur		**state)
 {
-	struct xfs_owner_info		oinfo = { };
 	struct xfs_extent_free_item	*free;
 	int				error;
 
 	free = container_of(item, struct xfs_extent_free_item, xefi_list);
-	oinfo.oi_owner = free->xefi_owner;
-	if (free->xefi_flags & XFS_EFI_ATTR_FORK)
-		oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
-	if (free->xefi_flags & XFS_EFI_BMBT_BLOCK)
-		oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
-	error = xfs_trans_free_extent(tp, EFD_ITEM(done),
-			free->xefi_startblock,
-			free->xefi_blockcount,
-			&oinfo, free->xefi_flags & XFS_EFI_SKIP_DISCARD);
+
+	error = xfs_trans_free_extent(tp, EFD_ITEM(done), free);
 	kmem_cache_free(xfs_extfree_item_cache, free);
 	return error;
 }
 
 /* Abort all pending EFIs. */
@@ -597,39 +596,45 @@ xfs_efi_item_recover(
 {
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
-	struct xfs_extent		*extp;
 	int				i;
 	int				error = 0;
 
 	/*
 	 * First check the validity of the extents described by the
 	 * EFI.  If any are bad, then assume that all are bad and
 	 * just toss the EFI.
 	 */
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		if (!xfs_efi_validate_ext(mp,
 					&efip->efi_format.efi_extents[i])) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					&efip->efi_format,
 					sizeof(efip->efi_format));
 			return -EFSCORRUPTED;
 		}
 	}
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
 	if (error)
 		return error;
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
 
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
+		struct xfs_extent_free_item	fake = {
+			.xefi_owner		= XFS_RMAP_OWN_UNKNOWN,
+		};
+		struct xfs_extent		*extp;
+
 		extp = &efip->efi_format.efi_extents[i];
-		error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
-					      extp->ext_len,
-					      &XFS_RMAP_OINFO_ANY_OWNER, false);
+
+		fake.xefi_startblock = extp->ext_start;
+		fake.xefi_blockcount = extp->ext_len;
+
+		error = xfs_trans_free_extent(tp, efdp, &fake);
 		if (error == -EFSCORRUPTED)
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					extp, sizeof(*extp));
 		if (error)
 			goto abort_error;
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item directly through the log intent code
  2025-03-13 20:25 ` [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item " Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable; +Cc: Leah Rumancik, Sasha Levin

[ Sasha's backport helper bot ]

Hi,

✅ All tests passed successfully. No issues detected.
No action required from the submitter.

The upstream commit SHA1 provided is correct: 72ba455599ad13d08c29dafa22a32360e07b1961

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Darrick J. Wong<djwong@kernel.org>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
1:  72ba455599ad1 ! 1:  a4d4bccfa3b1e xfs: pass xfs_extent_free_item directly through the log intent code
    @@ Metadata
      ## Commit message ##
         xfs: pass xfs_extent_free_item directly through the log intent code
     
    +    [ Upstream commit 72ba455599ad13d08c29dafa22a32360e07b1961 ]
    +
         Pass the incore xfs_extent_free_item through the EFI logging code
         instead of repeatedly boxing and unboxing parameters.
     
         Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/xfs_extfree_item.c ##
     @@ fs/xfs/xfs_extfree_item.c: static int
---

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item " Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code Leah Rumancik
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 578c714b215d474c52949e65a914dae67924f0fe ]

Change the name of all pointers to xfs_extent_item structures to "xefi"
to make the name consistent and because the current selections ("new"
and "free") mean other things in C.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc.c | 32 +++++++++---------
 fs/xfs/xfs_extfree_item.c | 70 +++++++++++++++++++--------------------
 2 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 74d039bdc9f7..cc5c954cda88 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2491,78 +2491,78 @@ xfs_defer_agfl_block(
 	xfs_agnumber_t			agno,
 	xfs_fsblock_t			agbno,
 	struct xfs_owner_info		*oinfo)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
-	struct xfs_extent_free_item	*new;		/* new element */
+	struct xfs_extent_free_item	*xefi;
 
 	ASSERT(xfs_extfree_item_cache != NULL);
 	ASSERT(oinfo != NULL);
 
-	new = kmem_cache_zalloc(xfs_extfree_item_cache,
+	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
-	new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
-	new->xefi_blockcount = 1;
-	new->xefi_owner = oinfo->oi_owner;
+	xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
+	xefi->xefi_blockcount = 1;
+	xefi->xefi_owner = oinfo->oi_owner;
 
 	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
 
-	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list);
+	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
 }
 
 /*
  * Add the extent to the list of extents to be free at transaction end.
  * The list is maintained sorted (by block number).
  */
 void
 __xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
 	const struct xfs_owner_info	*oinfo,
 	bool				skip_discard)
 {
-	struct xfs_extent_free_item	*new;		/* new element */
+	struct xfs_extent_free_item	*xefi;
 #ifdef DEBUG
 	struct xfs_mount		*mp = tp->t_mountp;
 	xfs_agnumber_t			agno;
 	xfs_agblock_t			agbno;
 
 	ASSERT(bno != NULLFSBLOCK);
 	ASSERT(len > 0);
 	ASSERT(len <= XFS_MAX_BMBT_EXTLEN);
 	ASSERT(!isnullstartblock(bno));
 	agno = XFS_FSB_TO_AGNO(mp, bno);
 	agbno = XFS_FSB_TO_AGBNO(mp, bno);
 	ASSERT(agno < mp->m_sb.sb_agcount);
 	ASSERT(agbno < mp->m_sb.sb_agblocks);
 	ASSERT(len < mp->m_sb.sb_agblocks);
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_extfree_item_cache != NULL);
 
-	new = kmem_cache_zalloc(xfs_extfree_item_cache,
+	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
-	new->xefi_startblock = bno;
-	new->xefi_blockcount = (xfs_extlen_t)len;
+	xefi->xefi_startblock = bno;
+	xefi->xefi_blockcount = (xfs_extlen_t)len;
 	if (skip_discard)
-		new->xefi_flags |= XFS_EFI_SKIP_DISCARD;
+		xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD;
 	if (oinfo) {
 		ASSERT(oinfo->oi_offset == 0);
 
 		if (oinfo->oi_flags & XFS_OWNER_INFO_ATTR_FORK)
-			new->xefi_flags |= XFS_EFI_ATTR_FORK;
+			xefi->xefi_flags |= XFS_EFI_ATTR_FORK;
 		if (oinfo->oi_flags & XFS_OWNER_INFO_BMBT_BLOCK)
-			new->xefi_flags |= XFS_EFI_BMBT_BLOCK;
-		new->xefi_owner = oinfo->oi_owner;
+			xefi->xefi_flags |= XFS_EFI_BMBT_BLOCK;
+		xefi->xefi_owner = oinfo->oi_owner;
 	} else {
-		new->xefi_owner = XFS_RMAP_OWN_NULL;
+		xefi->xefi_owner = XFS_RMAP_OWN_NULL;
 	}
 	trace_xfs_bmap_free_defer(tp->t_mountp,
 			XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0,
 			XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len);
-	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list);
+	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list);
 }
 
 #ifdef DEBUG
 /*
  * Check if an AGF has a free extent record whose length is equal to
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 618d2f9ff535..011b50469301 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -343,49 +343,49 @@ xfs_trans_get_efd(
  */
 static int
 xfs_trans_free_extent(
 	struct xfs_trans		*tp,
 	struct xfs_efd_log_item		*efdp,
-	struct xfs_extent_free_item	*free)
+	struct xfs_extent_free_item	*xefi)
 {
 	struct xfs_owner_info		oinfo = { };
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_extent		*extp;
 	uint				next_extent;
 	xfs_agnumber_t			agno = XFS_FSB_TO_AGNO(mp,
-							free->xefi_startblock);
+							xefi->xefi_startblock);
 	xfs_agblock_t			agbno = XFS_FSB_TO_AGBNO(mp,
-							free->xefi_startblock);
+							xefi->xefi_startblock);
 	int				error;
 
-	oinfo.oi_owner = free->xefi_owner;
-	if (free->xefi_flags & XFS_EFI_ATTR_FORK)
+	oinfo.oi_owner = xefi->xefi_owner;
+	if (xefi->xefi_flags & XFS_EFI_ATTR_FORK)
 		oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
-	if (free->xefi_flags & XFS_EFI_BMBT_BLOCK)
+	if (xefi->xefi_flags & XFS_EFI_BMBT_BLOCK)
 		oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
 
 	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno,
-			free->xefi_blockcount);
+			xefi->xefi_blockcount);
 
-	error = __xfs_free_extent(tp, free->xefi_startblock,
-			free->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE,
-			free->xefi_flags & XFS_EFI_SKIP_DISCARD);
+	error = __xfs_free_extent(tp, xefi->xefi_startblock,
+			xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE,
+			xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
 	 *
 	 * 1.) releases the EFI and frees the EFD
 	 * 2.) shuts down the filesystem
 	 */
 	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
 
 	next_extent = efdp->efd_next_extent;
 	ASSERT(next_extent < efdp->efd_format.efd_nextents);
 	extp = &(efdp->efd_format.efd_extents[next_extent]);
-	extp->ext_start = free->xefi_startblock;
-	extp->ext_len = free->xefi_blockcount;
+	extp->ext_start = xefi->xefi_startblock;
+	extp->ext_len = xefi->xefi_blockcount;
 	efdp->efd_next_extent++;
 
 	return error;
 }
 
@@ -409,162 +409,162 @@ xfs_extent_free_diff_items(
 /* Log a free extent to the intent item. */
 STATIC void
 xfs_extent_free_log_item(
 	struct xfs_trans		*tp,
 	struct xfs_efi_log_item		*efip,
-	struct xfs_extent_free_item	*free)
+	struct xfs_extent_free_item	*xefi)
 {
 	uint				next_extent;
 	struct xfs_extent		*extp;
 
 	tp->t_flags |= XFS_TRANS_DIRTY;
 	set_bit(XFS_LI_DIRTY, &efip->efi_item.li_flags);
 
 	/*
 	 * atomic_inc_return gives us the value after the increment;
 	 * we want to use it as an array index so we need to subtract 1 from
 	 * it.
 	 */
 	next_extent = atomic_inc_return(&efip->efi_next_extent) - 1;
 	ASSERT(next_extent < efip->efi_format.efi_nextents);
 	extp = &efip->efi_format.efi_extents[next_extent];
-	extp->ext_start = free->xefi_startblock;
-	extp->ext_len = free->xefi_blockcount;
+	extp->ext_start = xefi->xefi_startblock;
+	extp->ext_len = xefi->xefi_blockcount;
 }
 
 static struct xfs_log_item *
 xfs_extent_free_create_intent(
 	struct xfs_trans		*tp,
 	struct list_head		*items,
 	unsigned int			count,
 	bool				sort)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_efi_log_item		*efip = xfs_efi_init(mp, count);
-	struct xfs_extent_free_item	*free;
+	struct xfs_extent_free_item	*xefi;
 
 	ASSERT(count > 0);
 
 	xfs_trans_add_item(tp, &efip->efi_item);
 	if (sort)
 		list_sort(mp, items, xfs_extent_free_diff_items);
-	list_for_each_entry(free, items, xefi_list)
-		xfs_extent_free_log_item(tp, efip, free);
+	list_for_each_entry(xefi, items, xefi_list)
+		xfs_extent_free_log_item(tp, efip, xefi);
 	return &efip->efi_item;
 }
 
 /* Get an EFD so we can process all the free extents. */
 static struct xfs_log_item *
 xfs_extent_free_create_done(
 	struct xfs_trans		*tp,
 	struct xfs_log_item		*intent,
 	unsigned int			count)
 {
 	return &xfs_trans_get_efd(tp, EFI_ITEM(intent), count)->efd_item;
 }
 
 /* Process a free extent. */
 STATIC int
 xfs_extent_free_finish_item(
 	struct xfs_trans		*tp,
 	struct xfs_log_item		*done,
 	struct list_head		*item,
 	struct xfs_btree_cur		**state)
 {
-	struct xfs_extent_free_item	*free;
+	struct xfs_extent_free_item	*xefi;
 	int				error;
 
-	free = container_of(item, struct xfs_extent_free_item, xefi_list);
+	xefi = container_of(item, struct xfs_extent_free_item, xefi_list);
 
-	error = xfs_trans_free_extent(tp, EFD_ITEM(done), free);
-	kmem_cache_free(xfs_extfree_item_cache, free);
+	error = xfs_trans_free_extent(tp, EFD_ITEM(done), xefi);
+	kmem_cache_free(xfs_extfree_item_cache, xefi);
 	return error;
 }
 
 /* Abort all pending EFIs. */
 STATIC void
 xfs_extent_free_abort_intent(
 	struct xfs_log_item		*intent)
 {
 	xfs_efi_release(EFI_ITEM(intent));
 }
 
 /* Cancel a free extent. */
 STATIC void
 xfs_extent_free_cancel_item(
 	struct list_head		*item)
 {
-	struct xfs_extent_free_item	*free;
+	struct xfs_extent_free_item	*xefi;
 
-	free = container_of(item, struct xfs_extent_free_item, xefi_list);
-	kmem_cache_free(xfs_extfree_item_cache, free);
+	xefi = container_of(item, struct xfs_extent_free_item, xefi_list);
+	kmem_cache_free(xfs_extfree_item_cache, xefi);
 }
 
 const struct xfs_defer_op_type xfs_extent_free_defer_type = {
 	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
 	.create_intent	= xfs_extent_free_create_intent,
 	.abort_intent	= xfs_extent_free_abort_intent,
 	.create_done	= xfs_extent_free_create_done,
 	.finish_item	= xfs_extent_free_finish_item,
 	.cancel_item	= xfs_extent_free_cancel_item,
 };
 
 /*
  * AGFL blocks are accounted differently in the reserve pools and are not
  * inserted into the busy extent list.
  */
 STATIC int
 xfs_agfl_free_finish_item(
 	struct xfs_trans		*tp,
 	struct xfs_log_item		*done,
 	struct list_head		*item,
 	struct xfs_btree_cur		**state)
 {
 	struct xfs_owner_info		oinfo = { };
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_efd_log_item		*efdp = EFD_ITEM(done);
-	struct xfs_extent_free_item	*free;
+	struct xfs_extent_free_item	*xefi;
 	struct xfs_extent		*extp;
 	struct xfs_buf			*agbp;
 	int				error;
 	xfs_agnumber_t			agno;
 	xfs_agblock_t			agbno;
 	uint				next_extent;
 	struct xfs_perag		*pag;
 
-	free = container_of(item, struct xfs_extent_free_item, xefi_list);
-	ASSERT(free->xefi_blockcount == 1);
-	agno = XFS_FSB_TO_AGNO(mp, free->xefi_startblock);
-	agbno = XFS_FSB_TO_AGBNO(mp, free->xefi_startblock);
-	oinfo.oi_owner = free->xefi_owner;
+	xefi = container_of(item, struct xfs_extent_free_item, xefi_list);
+	ASSERT(xefi->xefi_blockcount == 1);
+	agno = XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock);
+	agbno = XFS_FSB_TO_AGBNO(mp, xefi->xefi_startblock);
+	oinfo.oi_owner = xefi->xefi_owner;
 
-	trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, free->xefi_blockcount);
+	trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, xefi->xefi_blockcount);
 
 	pag = xfs_perag_get(mp, agno);
 	error = xfs_alloc_read_agf(pag, tp, 0, &agbp);
 	if (!error)
 		error = xfs_free_agfl_block(tp, agno, agbno, agbp, &oinfo);
 	xfs_perag_put(pag);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
 	 *
 	 * 1.) releases the EFI and frees the EFD
 	 * 2.) shuts down the filesystem
 	 */
 	tp->t_flags |= XFS_TRANS_DIRTY;
 	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
 
 	next_extent = efdp->efd_next_extent;
 	ASSERT(next_extent < efdp->efd_format.efd_nextents);
 	extp = &(efdp->efd_format.efd_extents[next_extent]);
-	extp->ext_start = free->xefi_startblock;
-	extp->ext_len = free->xefi_blockcount;
+	extp->ext_start = xefi->xefi_startblock;
+	extp->ext_len = xefi->xefi_blockcount;
 	efdp->efd_next_extent++;
 
-	kmem_cache_free(xfs_extfree_item_cache, free);
+	kmem_cache_free(xfs_extfree_item_cache, xefi);
 	return error;
 }
 
 /* sub-type with special handling for AGFL deferred frees */
 const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names
  2025-03-13 20:25 ` [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable; +Cc: Leah Rumancik, Sasha Levin

[ Sasha's backport helper bot ]

Hi,

✅ All tests passed successfully. No issues detected.
No action required from the submitter.

The upstream commit SHA1 provided is correct: 578c714b215d474c52949e65a914dae67924f0fe

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Darrick J. Wong<djwong@kernel.org>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
1:  578c714b215d4 ! 1:  48382d577161d xfs: fix confusing xfs_extent_item variable names
    @@ Metadata
      ## Commit message ##
         xfs: fix confusing xfs_extent_item variable names
     
    +    [ Upstream commit 578c714b215d474c52949e65a914dae67924f0fe ]
    +
         Change the name of all pointers to xfs_extent_item structures to "xefi"
         to make the name consistent and because the current selections ("new"
         and "free") mean other things in C.
     
         Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/libxfs/xfs_alloc.c ##
     @@ fs/xfs/libxfs/xfs_alloc.c: xfs_defer_agfl_block(
---

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (2 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent Leah Rumancik
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit ddccb81b26ec021ae1f3366aa996cc4c68dd75ce ]

Instead of repeatedly boxing and unboxing the incore extent mapping
structure as it passes through the BUI code, pass the pointer directly
through.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 32 ++++++++--------
 fs/xfs/libxfs/xfs_bmap.h |  5 +--
 fs/xfs/xfs_bmap_item.c   | 81 +++++++++++++++-------------------------
 3 files changed, 46 insertions(+), 72 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 27d3121e6da9..d523ac51c662 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -6117,43 +6117,41 @@ xfs_bmap_unmap_extent(
  * btree cursor to maintain our lock on the bmapbt between calls.
  */
 int
 xfs_bmap_finish_one(
 	struct xfs_trans		*tp,
-	struct xfs_inode		*ip,
-	enum xfs_bmap_intent_type	type,
-	int				whichfork,
-	xfs_fileoff_t			startoff,
-	xfs_fsblock_t			startblock,
-	xfs_filblks_t			*blockcount,
-	xfs_exntst_t			state)
+	struct xfs_bmap_intent		*bi)
 {
+	struct xfs_bmbt_irec		*bmap = &bi->bi_bmap;
 	int				error = 0;
 
 	ASSERT(tp->t_firstblock == NULLFSBLOCK);
 
 	trace_xfs_bmap_deferred(tp->t_mountp,
-			XFS_FSB_TO_AGNO(tp->t_mountp, startblock), type,
-			XFS_FSB_TO_AGBNO(tp->t_mountp, startblock),
-			ip->i_ino, whichfork, startoff, *blockcount, state);
+			XFS_FSB_TO_AGNO(tp->t_mountp, bmap->br_startblock),
+			bi->bi_type,
+			XFS_FSB_TO_AGBNO(tp->t_mountp, bmap->br_startblock),
+			bi->bi_owner->i_ino, bi->bi_whichfork,
+			bmap->br_startoff, bmap->br_blockcount,
+			bmap->br_state);
 
-	if (WARN_ON_ONCE(whichfork != XFS_DATA_FORK))
+	if (WARN_ON_ONCE(bi->bi_whichfork != XFS_DATA_FORK))
 		return -EFSCORRUPTED;
 
 	if (XFS_TEST_ERROR(false, tp->t_mountp,
 			XFS_ERRTAG_BMAP_FINISH_ONE))
 		return -EIO;
 
-	switch (type) {
+	switch (bi->bi_type) {
 	case XFS_BMAP_MAP:
-		error = xfs_bmapi_remap(tp, ip, startoff, *blockcount,
-				startblock, 0);
-		*blockcount = 0;
+		error = xfs_bmapi_remap(tp, bi->bi_owner, bmap->br_startoff,
+				bmap->br_blockcount, bmap->br_startblock, 0);
+		bmap->br_blockcount = 0;
 		break;
 	case XFS_BMAP_UNMAP:
-		error = __xfs_bunmapi(tp, ip, startoff, blockcount,
-				XFS_BMAPI_REMAP, 1);
+		error = __xfs_bunmapi(tp, bi->bi_owner, bmap->br_startoff,
+				&bmap->br_blockcount, XFS_BMAPI_REMAP, 1);
 		break;
 	default:
 		ASSERT(0);
 		error = -EFSCORRUPTED;
 	}
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 08c16e4edc0f..524912f276f8 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -234,14 +234,11 @@ struct xfs_bmap_intent {
 	int					bi_whichfork;
 	struct xfs_inode			*bi_owner;
 	struct xfs_bmbt_irec			bi_bmap;
 };
 
-int	xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_inode *ip,
-		enum xfs_bmap_intent_type type, int whichfork,
-		xfs_fileoff_t startoff, xfs_fsblock_t startblock,
-		xfs_filblks_t *blockcount, xfs_exntst_t state);
+int	xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_bmap_intent *bi);
 void	xfs_bmap_map_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap);
 void	xfs_bmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap);
 
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 41323da523d1..13aa5359c02f 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -244,22 +244,15 @@ xfs_trans_get_bud(
  */
 static int
 xfs_trans_log_finish_bmap_update(
 	struct xfs_trans		*tp,
 	struct xfs_bud_log_item		*budp,
-	enum xfs_bmap_intent_type	type,
-	struct xfs_inode		*ip,
-	int				whichfork,
-	xfs_fileoff_t			startoff,
-	xfs_fsblock_t			startblock,
-	xfs_filblks_t			*blockcount,
-	xfs_exntst_t			state)
+	struct xfs_bmap_intent		*bi)
 {
 	int				error;
 
-	error = xfs_bmap_finish_one(tp, ip, type, whichfork, startoff,
-			startblock, blockcount, state);
+	error = xfs_bmap_finish_one(tp, bi);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
 	 *
@@ -376,29 +369,21 @@ xfs_bmap_update_finish_item(
 	struct xfs_trans		*tp,
 	struct xfs_log_item		*done,
 	struct list_head		*item,
 	struct xfs_btree_cur		**state)
 {
-	struct xfs_bmap_intent		*bmap;
-	xfs_filblks_t			count;
+	struct xfs_bmap_intent		*bi;
 	int				error;
 
-	bmap = container_of(item, struct xfs_bmap_intent, bi_list);
-	count = bmap->bi_bmap.br_blockcount;
-	error = xfs_trans_log_finish_bmap_update(tp, BUD_ITEM(done),
-			bmap->bi_type,
-			bmap->bi_owner, bmap->bi_whichfork,
-			bmap->bi_bmap.br_startoff,
-			bmap->bi_bmap.br_startblock,
-			&count,
-			bmap->bi_bmap.br_state);
-	if (!error && count > 0) {
-		ASSERT(bmap->bi_type == XFS_BMAP_UNMAP);
-		bmap->bi_bmap.br_blockcount = count;
+	bi = container_of(item, struct xfs_bmap_intent, bi_list);
+
+	error = xfs_trans_log_finish_bmap_update(tp, BUD_ITEM(done), bi);
+	if (!error && bi->bi_bmap.br_blockcount > 0) {
+		ASSERT(bi->bi_type == XFS_BMAP_UNMAP);
 		return -EAGAIN;
 	}
-	kmem_cache_free(xfs_bmap_intent_cache, bmap);
+	kmem_cache_free(xfs_bmap_intent_cache, bi);
 	return error;
 }
 
 /* Abort all pending BUIs. */
 STATIC void
@@ -469,79 +454,73 @@ xfs_bui_validate(
 STATIC int
 xfs_bui_item_recover(
 	struct xfs_log_item		*lip,
 	struct list_head		*capture_list)
 {
-	struct xfs_bmbt_irec		irec;
+	struct xfs_bmap_intent		fake = { };
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
-	struct xfs_map_extent		*bmap;
+	struct xfs_map_extent		*map;
 	struct xfs_bud_log_item		*budp;
-	xfs_filblks_t			count;
-	xfs_exntst_t			state;
-	unsigned int			bui_type;
-	int				whichfork;
 	int				iext_delta;
 	int				error = 0;
 
 	if (!xfs_bui_validate(mp, buip)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				&buip->bui_format, sizeof(buip->bui_format));
 		return -EFSCORRUPTED;
 	}
 
-	bmap = &buip->bui_format.bui_extents[0];
-	state = (bmap->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
-			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
-	whichfork = (bmap->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
+	map = &buip->bui_format.bui_extents[0];
+	fake.bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
 			XFS_ATTR_FORK : XFS_DATA_FORK;
-	bui_type = bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
+	fake.bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
 
-	error = xlog_recover_iget(mp, bmap->me_owner, &ip);
+	error = xlog_recover_iget(mp, map->me_owner, &ip);
 	if (error)
 		return error;
 
 	/* Allocate transaction and do the work. */
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
 			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		goto err_rele;
 
 	budp = xfs_trans_get_bud(tp, buip);
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	if (bui_type == XFS_BMAP_MAP)
+	if (fake.bi_type == XFS_BMAP_MAP)
 		iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT;
 	else
 		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
 
-	error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta);
+	error = xfs_iext_count_may_overflow(ip, fake.bi_whichfork, iext_delta);
 	if (error == -EFBIG)
 		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
 	if (error)
 		goto err_cancel;
 
-	count = bmap->me_len;
-	error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip,
-			whichfork, bmap->me_startoff, bmap->me_startblock,
-			&count, state);
+	fake.bi_owner = ip;
+	fake.bi_bmap.br_startblock = map->me_startblock;
+	fake.bi_bmap.br_startoff = map->me_startoff;
+	fake.bi_bmap.br_blockcount = map->me_len;
+	fake.bi_bmap.br_state = (map->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
+			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+
+	error = xfs_trans_log_finish_bmap_update(tp, budp, &fake);
 	if (error == -EFSCORRUPTED)
-		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bmap,
-				sizeof(*bmap));
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, map,
+				sizeof(*map));
 	if (error)
 		goto err_cancel;
 
-	if (count > 0) {
-		ASSERT(bui_type == XFS_BMAP_UNMAP);
-		irec.br_startblock = bmap->me_startblock;
-		irec.br_blockcount = count;
-		irec.br_startoff = bmap->me_startoff;
-		irec.br_state = state;
-		xfs_bmap_unmap_extent(tp, ip, &irec);
+	if (fake.bi_bmap.br_blockcount > 0) {
+		ASSERT(fake.bi_type == XFS_BMAP_UNMAP);
+		xfs_bmap_unmap_extent(tp, ip, &fake.bi_bmap);
 	}
 
 	/*
 	 * Commit transaction, which frees the transaction and saves the inode
 	 * for later replay activities.
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code
  2025-03-13 20:25 ` [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable; +Cc: Leah Rumancik, Sasha Levin

[ Sasha's backport helper bot ]

Hi,

✅ All tests passed successfully. No issues detected.
No action required from the submitter.

The upstream commit SHA1 provided is correct: ddccb81b26ec021ae1f3366aa996cc4c68dd75ce

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Darrick J. Wong<djwong@kernel.org>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
1:  ddccb81b26ec0 ! 1:  f2c38698292e3 xfs: pass the xfs_bmbt_irec directly through the log intent code
    @@ Metadata
      ## Commit message ##
         xfs: pass the xfs_bmbt_irec directly through the log intent code
     
    +    [ Upstream commit ddccb81b26ec021ae1f3366aa996cc4c68dd75ce ]
    +
         Instead of repeatedly boxing and unboxing the incore extent mapping
         structure as it passes through the BUI code, pass the pointer directly
         through.
     
         Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/libxfs/xfs_bmap.c ##
     @@ fs/xfs/libxfs/xfs_bmap.c: xfs_bmap_unmap_extent(
---

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (3 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi Leah Rumancik
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit b2ccab3199aa7cea9154d80ea2585312c5f6eba0 ]

Pass a reference to the per-AG structure to xfs_free_extent.  Most
callers already have one, so we can eliminate unnecessary lookups.  The
one exception to this is the EFI code, which the next patch will fix.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_ag.c             |  6 ++----
 fs/xfs/libxfs/xfs_alloc.c          | 15 +++++----------
 fs/xfs/libxfs/xfs_alloc.h          |  8 +++++---
 fs/xfs/libxfs/xfs_ialloc_btree.c   |  7 +++++--
 fs/xfs/libxfs/xfs_refcount_btree.c |  5 +++--
 fs/xfs/scrub/repair.c              |  3 ++-
 fs/xfs/xfs_extfree_item.c          |  8 ++++++--
 7 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index bf47efe08a58..f29767e6eb7f 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -979,14 +979,12 @@ xfs_ag_extend_space(
 	error = xfs_rmap_free(tp, bp, pag, be32_to_cpu(agf->agf_length) - len,
 				len, &XFS_RMAP_OINFO_SKIP_UPDATE);
 	if (error)
 		return error;
 
-	error = xfs_free_extent(tp, XFS_AGB_TO_FSB(pag->pag_mount, pag->pag_agno,
-					be32_to_cpu(agf->agf_length) - len),
-				len, &XFS_RMAP_OINFO_SKIP_UPDATE,
-				XFS_AG_RESV_NONE);
+	error = xfs_free_extent(tp, pag, be32_to_cpu(agf->agf_length) - len,
+			len, &XFS_RMAP_OINFO_SKIP_UPDATE, XFS_AG_RESV_NONE);
 	if (error)
 		return error;
 
 	/* Update perag geometry */
 	pag->block_count = be32_to_cpu(agf->agf_length);
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index cc5c954cda88..da6d158d666f 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3445,63 +3445,58 @@ xfs_free_extent_fix_freelist(
  * after fixing up the freelist.
  */
 int
 __xfs_free_extent(
 	struct xfs_trans		*tp,
-	xfs_fsblock_t			bno,
+	struct xfs_perag		*pag,
+	xfs_agblock_t			agbno,
 	xfs_extlen_t			len,
 	const struct xfs_owner_info	*oinfo,
 	enum xfs_ag_resv_type		type,
 	bool				skip_discard)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_buf			*agbp;
-	xfs_agnumber_t			agno = XFS_FSB_TO_AGNO(mp, bno);
-	xfs_agblock_t			agbno = XFS_FSB_TO_AGBNO(mp, bno);
 	struct xfs_agf			*agf;
 	int				error;
 	unsigned int			busy_flags = 0;
-	struct xfs_perag		*pag;
 
 	ASSERT(len != 0);
 	ASSERT(type != XFS_AG_RESV_AGFL);
 
 	if (XFS_TEST_ERROR(false, mp,
 			XFS_ERRTAG_FREE_EXTENT))
 		return -EIO;
 
-	pag = xfs_perag_get(mp, agno);
 	error = xfs_free_extent_fix_freelist(tp, pag, &agbp);
 	if (error)
-		goto err;
+		return error;
 	agf = agbp->b_addr;
 
 	if (XFS_IS_CORRUPT(mp, agbno >= mp->m_sb.sb_agblocks)) {
 		error = -EFSCORRUPTED;
 		goto err_release;
 	}
 
 	/* validate the extent size is legal now we have the agf locked */
 	if (XFS_IS_CORRUPT(mp, agbno + len > be32_to_cpu(agf->agf_length))) {
 		error = -EFSCORRUPTED;
 		goto err_release;
 	}
 
-	error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, type);
+	error = xfs_free_ag_extent(tp, agbp, pag->pag_agno, agbno, len, oinfo,
+			type);
 	if (error)
 		goto err_release;
 
 	if (skip_discard)
 		busy_flags |= XFS_EXTENT_BUSY_SKIP_DISCARD;
 	xfs_extent_busy_insert(tp, pag, agbno, len, busy_flags);
-	xfs_perag_put(pag);
 	return 0;
 
 err_release:
 	xfs_trans_brelse(tp, agbp);
-err:
-	xfs_perag_put(pag);
 	return error;
 }
 
 struct xfs_alloc_query_range_info {
 	xfs_alloc_query_range_fn	fn;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 2c3f762dfb58..5074aed6dfad 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -128,25 +128,27 @@ xfs_alloc_vextent(
  * Free an extent.
  */
 int				/* error */
 __xfs_free_extent(
 	struct xfs_trans	*tp,	/* transaction pointer */
-	xfs_fsblock_t		bno,	/* starting block number of extent */
+	struct xfs_perag	*pag,
+	xfs_agblock_t		agbno,
 	xfs_extlen_t		len,	/* length of extent */
 	const struct xfs_owner_info	*oinfo,	/* extent owner */
 	enum xfs_ag_resv_type	type,	/* block reservation type */
 	bool			skip_discard);
 
 static inline int
 xfs_free_extent(
 	struct xfs_trans	*tp,
-	xfs_fsblock_t		bno,
+	struct xfs_perag	*pag,
+	xfs_agblock_t		agbno,
 	xfs_extlen_t		len,
 	const struct xfs_owner_info	*oinfo,
 	enum xfs_ag_resv_type	type)
 {
-	return __xfs_free_extent(tp, bno, len, oinfo, type, false);
+	return __xfs_free_extent(tp, pag, agbno, len, oinfo, type, false);
 }
 
 int				/* error */
 xfs_alloc_lookup_le(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 8c83e265770c..2dbe553d87fb 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -154,13 +154,16 @@ STATIC int
 __xfs_inobt_free_block(
 	struct xfs_btree_cur	*cur,
 	struct xfs_buf		*bp,
 	enum xfs_ag_resv_type	resv)
 {
+	xfs_fsblock_t		fsbno;
+
 	xfs_inobt_mod_blockcount(cur, -1);
-	return xfs_free_extent(cur->bc_tp,
-			XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)), 1,
+	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
+	return xfs_free_extent(cur->bc_tp, cur->bc_ag.pag,
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1,
 			&XFS_RMAP_OINFO_INOBT, resv);
 }
 
 STATIC int
 xfs_inobt_free_block(
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index e1f789866683..3d8e62da2ccc 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -110,12 +110,13 @@ xfs_refcountbt_free_block(
 
 	trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno,
 			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1);
 	be32_add_cpu(&agf->agf_refcount_blocks, -1);
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS);
-	error = xfs_free_extent(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_REFC,
-			XFS_AG_RESV_METADATA);
+	error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag,
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1,
+			&XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA);
 	if (error)
 		return error;
 
 	return error;
 }
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index c18bd039fce9..e0ed0ebfdaea 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -580,11 +580,12 @@ xrep_reap_block(
 		error = xfs_rmap_free(sc->tp, agf_bp, sc->sa.pag, agbno,
 					1, oinfo);
 	else if (resv == XFS_AG_RESV_AGFL)
 		error = xrep_put_freelist(sc, agbno);
 	else
-		error = xfs_free_extent(sc->tp, fsbno, 1, oinfo, resv);
+		error = xfs_free_extent(sc->tp, sc->sa.pag, agbno, 1, oinfo,
+				resv);
 	if (agf_bp != sc->sa.agf_bp)
 		xfs_trans_brelse(sc->tp, agf_bp);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 011b50469301..c1aae07467c9 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -348,29 +348,33 @@ xfs_trans_free_extent(
 	struct xfs_extent_free_item	*xefi)
 {
 	struct xfs_owner_info		oinfo = { };
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_extent		*extp;
+	struct xfs_perag		*pag;
 	uint				next_extent;
 	xfs_agnumber_t			agno = XFS_FSB_TO_AGNO(mp,
 							xefi->xefi_startblock);
 	xfs_agblock_t			agbno = XFS_FSB_TO_AGBNO(mp,
 							xefi->xefi_startblock);
 	int				error;
 
 	oinfo.oi_owner = xefi->xefi_owner;
 	if (xefi->xefi_flags & XFS_EFI_ATTR_FORK)
 		oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK;
 	if (xefi->xefi_flags & XFS_EFI_BMBT_BLOCK)
 		oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
 
 	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno,
 			xefi->xefi_blockcount);
 
-	error = __xfs_free_extent(tp, xefi->xefi_startblock,
-			xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE,
+	pag = xfs_perag_get(mp, agno);
+	error = __xfs_free_extent(tp, pag, agbno, xefi->xefi_blockcount,
+			&oinfo, XFS_AG_RESV_NONE,
 			xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
+	xfs_perag_put(pag);
+
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
 	 *
 	 * 1.) releases the EFI and frees the EFD
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent
  2025-03-13 20:25 ` [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable; +Cc: Leah Rumancik, Sasha Levin

[ Sasha's backport helper bot ]

Hi,

✅ All tests passed successfully. No issues detected.
No action required from the submitter.

The upstream commit SHA1 provided is correct: b2ccab3199aa7cea9154d80ea2585312c5f6eba0

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Darrick J. Wong<djwong@kernel.org>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
1:  b2ccab3199aa7 ! 1:  ab2da2d8a4063 xfs: pass per-ag references to xfs_free_extent
    @@ Metadata
      ## Commit message ##
         xfs: pass per-ag references to xfs_free_extent
     
    +    [ Upstream commit b2ccab3199aa7cea9154d80ea2585312c5f6eba0 ]
    +
         Pass a reference to the per-AG structure to xfs_free_extent.  Most
         callers already have one, so we can eliminate unnecessary lookups.  The
         one exception to this is the EFI code, which the next patch will fix.
     
         Signed-off-by: Darrick J. Wong <djwong@kernel.org>
         Reviewed-by: Dave Chinner <dchinner@redhat.com>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/libxfs/xfs_ag.c ##
     @@ fs/xfs/libxfs/xfs_ag.c: xfs_ag_extend_space(
    @@ fs/xfs/libxfs/xfs_alloc.c: __xfs_free_extent(
      
     
      ## fs/xfs/libxfs/xfs_alloc.h ##
    -@@ fs/xfs/libxfs/xfs_alloc.h: int xfs_alloc_vextent_first_ag(struct xfs_alloc_arg *args,
    +@@ fs/xfs/libxfs/xfs_alloc.h: xfs_alloc_vextent(
      int				/* error */
      __xfs_free_extent(
      	struct xfs_trans	*tp,	/* transaction pointer */
---

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (4 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block() Leah Rumancik
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Dave Chinner, Christoph Hellwig, Darrick J. Wong,
	Dave Chinner, Leah Rumancik

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 7dfee17b13e5024c5c0ab1911859ded4182de3e5 ]

Bad things happen in defered extent freeing operations if it is
passed a bad block number in the xefi. This can come from a bogus
agno/agbno pair from deferred agfl freeing, or just a bad fsbno
being passed to __xfs_free_extent_later(). Either way, it's very
difficult to diagnose where a null perag oops in EFI creation
is coming from when the operation that queued the xefi has already
been completed and there's no longer any trace of it around....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_ag.c         |  5 ++++-
 fs/xfs/libxfs/xfs_alloc.c      | 16 +++++++++++++---
 fs/xfs/libxfs/xfs_alloc.h      |  6 +++---
 fs/xfs/libxfs/xfs_bmap.c       | 10 ++++++++--
 fs/xfs/libxfs/xfs_bmap_btree.c |  7 +++++--
 fs/xfs/libxfs/xfs_ialloc.c     | 24 ++++++++++++++++--------
 fs/xfs/libxfs/xfs_refcount.c   | 13 ++++++++++---
 fs/xfs/xfs_reflink.c           |  4 +++-
 8 files changed, 62 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index f29767e6eb7f..ee0d56854b83 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -904,11 +904,14 @@ xfs_ag_shrink_space(
 		be32_add_cpu(&agi->agi_length, delta);
 		be32_add_cpu(&agf->agf_length, delta);
 		if (err2 != -ENOSPC)
 			goto resv_err;
 
-		__xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, true);
+		err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL,
+				true);
+		if (err2)
+			goto resv_err;
 
 		/*
 		 * Roll the transaction before trying to re-init the per-ag
 		 * reservation. The new transaction is clean so it will cancel
 		 * without any side effects.
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index da6d158d666f..ec03040237db 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2483,39 +2483,43 @@ xfs_agfl_reset(
  * because for one they are always freed one at a time. Further, an immediate
  * AGFL block free can cause a btree join and require another block free before
  * the real allocation can proceed. Deferring the free disconnects freeing up
  * the AGFL slot from freeing the block.
  */
-STATIC void
+static int
 xfs_defer_agfl_block(
 	struct xfs_trans		*tp,
 	xfs_agnumber_t			agno,
 	xfs_fsblock_t			agbno,
 	struct xfs_owner_info		*oinfo)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_extent_free_item	*xefi;
 
 	ASSERT(xfs_extfree_item_cache != NULL);
 	ASSERT(oinfo != NULL);
 
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
 	xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
 	xefi->xefi_blockcount = 1;
 	xefi->xefi_owner = oinfo->oi_owner;
 
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock)))
+		return -EFSCORRUPTED;
+
 	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
 
 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
+	return 0;
 }
 
 /*
  * Add the extent to the list of extents to be free at transaction end.
  * The list is maintained sorted (by block number).
  */
-void
+int
 __xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
 	const struct xfs_owner_info	*oinfo,
@@ -2538,31 +2542,35 @@ __xfs_free_extent_later(
 	ASSERT(len < mp->m_sb.sb_agblocks);
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_extfree_item_cache != NULL);
 
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
+		return -EFSCORRUPTED;
+
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
 	xefi->xefi_startblock = bno;
 	xefi->xefi_blockcount = (xfs_extlen_t)len;
 	if (skip_discard)
 		xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD;
 	if (oinfo) {
 		ASSERT(oinfo->oi_offset == 0);
 
 		if (oinfo->oi_flags & XFS_OWNER_INFO_ATTR_FORK)
 			xefi->xefi_flags |= XFS_EFI_ATTR_FORK;
 		if (oinfo->oi_flags & XFS_OWNER_INFO_BMBT_BLOCK)
 			xefi->xefi_flags |= XFS_EFI_BMBT_BLOCK;
 		xefi->xefi_owner = oinfo->oi_owner;
 	} else {
 		xefi->xefi_owner = XFS_RMAP_OWN_NULL;
 	}
 	trace_xfs_bmap_free_defer(tp->t_mountp,
 			XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0,
 			XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len);
 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list);
+	return 0;
 }
 
 #ifdef DEBUG
 /*
  * Check if an AGF has a free extent record whose length is equal to
@@ -2718,11 +2726,13 @@ xfs_alloc_fix_freelist(
 		error = xfs_alloc_get_freelist(pag, tp, agbp, &bno, 0);
 		if (error)
 			goto out_agbp_relse;
 
 		/* defer agfl frees */
-		xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+		error = xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+		if (error)
+			goto out_agbp_relse;
 	}
 
 	targs.tp = tp;
 	targs.mp = mp;
 	targs.agbp = agbp;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 5074aed6dfad..cfc9dcdc1753 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -211,38 +211,38 @@ xfs_buf_to_agfl_bno(
 	if (xfs_has_crc(bp->b_mount))
 		return bp->b_addr + sizeof(struct xfs_agfl);
 	return bp->b_addr;
 }
 
-void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
+int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
 		xfs_filblks_t len, const struct xfs_owner_info *oinfo,
 		bool skip_discard);
 
 /*
  * List of extents to be free "later".
  * The list is kept sorted on xbf_startblock.
  */
 struct xfs_extent_free_item {
 	struct list_head	xefi_list;
 	uint64_t		xefi_owner;
 	xfs_fsblock_t		xefi_startblock;/* starting fs block number */
 	xfs_extlen_t		xefi_blockcount;/* number of blocks in extent */
 	unsigned int		xefi_flags;
 };
 
 #define XFS_EFI_SKIP_DISCARD	(1U << 0) /* don't issue discard */
 #define XFS_EFI_ATTR_FORK	(1U << 1) /* freeing attr fork block */
 #define XFS_EFI_BMBT_BLOCK	(1U << 2) /* freeing bmap btree block */
 
-static inline void
+static inline int
 xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
 	const struct xfs_owner_info	*oinfo)
 {
-	__xfs_free_extent_later(tp, bno, len, oinfo, false);
+	return __xfs_free_extent_later(tp, bno, len, oinfo, false);
 }
 
 
 extern struct kmem_cache	*xfs_extfree_item_cache;
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index d523ac51c662..29781a124323 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -570,12 +570,16 @@ xfs_bmap_btree_to_extents(
 	if (error)
 		return error;
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
+
 	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
-	xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo);
+	error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo);
+	if (error)
+		return error;
+
 	ip->i_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	xfs_trans_binval(tp, cbp);
 	if (cur->bc_levels[0].bp == cbp)
 		cur->bc_levels[0].bp = NULL;
@@ -5200,14 +5204,16 @@ xfs_bmap_del_extent_real(
 	 */
 	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
 		} else {
-			__xfs_free_extent_later(tp, del->br_startblock,
+			error = __xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
 					(bflags & XFS_BMAPI_NODISCARD) ||
 					del->br_state == XFS_EXT_UNWRITTEN);
+			if (error)
+				goto done;
 		}
 	}
 
 	/*
 	 * Adjust inode # blocks in the file.
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 18de4fbfef4e..bac2a6496a8e 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -283,15 +283,18 @@ xfs_bmbt_free_block(
 	struct xfs_mount	*mp = cur->bc_mp;
 	struct xfs_inode	*ip = cur->bc_ino.ip;
 	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
 	struct xfs_owner_info	oinfo;
+	int			error;
 
 	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork);
-	xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo);
-	ip->i_nblocks--;
+	error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo);
+	if (error)
+		return error;
 
+	ip->i_nblocks--;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	return 0;
 }
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 120dbec16f5c..448ea76d50f8 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1825,81 +1825,87 @@ xfs_dialloc(
 /*
  * Free the blocks of an inode chunk. We must consider that the inode chunk
  * might be sparse and only free the regions that are allocated as part of the
  * chunk.
  */
-STATIC void
+static int
 xfs_difree_inode_chunk(
 	struct xfs_trans		*tp,
 	xfs_agnumber_t			agno,
 	struct xfs_inobt_rec_incore	*rec)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	xfs_agblock_t			sagbno = XFS_AGINO_TO_AGBNO(mp,
 							rec->ir_startino);
 	int				startidx, endidx;
 	int				nextbit;
 	xfs_agblock_t			agbno;
 	int				contigblk;
 	DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
-		xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno),
-				  M_IGEO(mp)->ialloc_blks,
-				  &XFS_RMAP_OINFO_INODES);
-		return;
+		return xfs_free_extent_later(tp,
+				XFS_AGB_TO_FSB(mp, agno, sagbno),
+				M_IGEO(mp)->ialloc_blks,
+				&XFS_RMAP_OINFO_INODES);
 	}
 
 	/* holemask is only 16-bits (fits in an unsigned long) */
 	ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0]));
 	holemask[0] = rec->ir_holemask;
 
 	/*
 	 * Find contiguous ranges of zeroes (i.e., allocated regions) in the
 	 * holemask and convert the start/end index of each range to an extent.
 	 * We start with the start and end index both pointing at the first 0 in
 	 * the mask.
 	 */
 	startidx = endidx = find_first_zero_bit(holemask,
 						XFS_INOBT_HOLEMASK_BITS);
 	nextbit = startidx + 1;
 	while (startidx < XFS_INOBT_HOLEMASK_BITS) {
+		int error;
+
 		nextbit = find_next_zero_bit(holemask, XFS_INOBT_HOLEMASK_BITS,
 					     nextbit);
 		/*
 		 * If the next zero bit is contiguous, update the end index of
 		 * the current range and continue.
 		 */
 		if (nextbit != XFS_INOBT_HOLEMASK_BITS &&
 		    nextbit == endidx + 1) {
 			endidx = nextbit;
 			goto next;
 		}
 
 		/*
 		 * nextbit is not contiguous with the current end index. Convert
 		 * the current start/end to an extent and add it to the free
 		 * list.
 		 */
 		agbno = sagbno + (startidx * XFS_INODES_PER_HOLEMASK_BIT) /
 				  mp->m_sb.sb_inopblock;
 		contigblk = ((endidx - startidx + 1) *
 			     XFS_INODES_PER_HOLEMASK_BIT) /
 			    mp->m_sb.sb_inopblock;
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
-		xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, agbno),
-				  contigblk, &XFS_RMAP_OINFO_INODES);
+		error = xfs_free_extent_later(tp,
+				XFS_AGB_TO_FSB(mp, agno, agbno),
+				contigblk, &XFS_RMAP_OINFO_INODES);
+		if (error)
+			return error;
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
 
 next:
 		nextbit++;
 	}
+	return 0;
 }
 
 STATIC int
 xfs_difree_inobt(
 	struct xfs_mount		*mp,
@@ -1996,11 +2002,13 @@ xfs_difree_inobt(
 			xfs_warn(mp, "%s: xfs_btree_delete returned error %d.",
 				__func__, error);
 			goto error0;
 		}
 
-		xfs_difree_inode_chunk(tp, pag->pag_agno, &rec);
+		error = xfs_difree_inode_chunk(tp, pag->pag_agno, &rec);
+		if (error)
+			goto error0;
 	} else {
 		xic->deleted = false;
 
 		error = xfs_inobt_update(cur, &rec);
 		if (error) {
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index bcf46aa0d08b..fec1ad95988c 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1127,12 +1127,14 @@ xfs_refcount_adjust_extents(
 				}
 			} else {
 				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
 						cur->bc_ag.pag->pag_agno,
 						tmp.rc_startblock);
-				xfs_free_extent_later(cur->bc_tp, fsbno,
+				error = xfs_free_extent_later(cur->bc_tp, fsbno,
 						  tmp.rc_blockcount, NULL);
+				if (error)
+					goto out_error;
 			}
 
 			(*agbno) += tmp.rc_blockcount;
 			(*aglen) -= tmp.rc_blockcount;
 
@@ -1186,12 +1188,14 @@ xfs_refcount_adjust_extents(
 			goto advloop;
 		} else {
 			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
 					cur->bc_ag.pag->pag_agno,
 					ext.rc_startblock);
-			xfs_free_extent_later(cur->bc_tp, fsbno,
+			error = xfs_free_extent_later(cur->bc_tp, fsbno,
 					ext.rc_blockcount, NULL);
+			if (error)
+				goto out_error;
 		}
 
 skip:
 		error = xfs_btree_increment(cur, 0, &found_rec);
 		if (error)
@@ -1956,11 +1960,14 @@ xfs_refcount_recover_cow_leftovers(
 				rr->rr_rrec.rc_startblock);
 		xfs_refcount_free_cow_extent(tp, fsb,
 				rr->rr_rrec.rc_blockcount);
 
 		/* Free the block. */
-		xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL);
+		error = xfs_free_extent_later(tp, fsb,
+				rr->rr_rrec.rc_blockcount, NULL);
+		if (error)
+			goto out_trans;
 
 		error = xfs_trans_commit(tp);
 		if (error)
 			goto out_free;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index cbdc23217a42..ee187e90d943 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -616,12 +616,14 @@ xfs_reflink_cancel_cow_blocks(
 
 			/* Free the CoW orphan record. */
 			xfs_refcount_free_cow_extent(*tpp, del.br_startblock,
 					del.br_blockcount);
 
-			xfs_free_extent_later(*tpp, del.br_startblock,
+			error = xfs_free_extent_later(*tpp, del.br_startblock,
 					  del.br_blockcount, NULL);
+			if (error)
+				break;
 
 			/* Roll the transaction */
 			error = xfs_defer_finish(tpp);
 			if (error)
 				break;
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi
  2025-03-13 20:25 ` [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable, leah.rumancik; +Cc: Sasha Levin

[ Sasha's backport helper bot ]

Hi,

Summary of potential issues:
ℹ️ This is part 06/29 of a series
⚠️ Found follow-up fixes in mainline

The upstream commit SHA1 provided is correct: 7dfee17b13e5024c5c0ab1911859ded4182de3e5

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Dave Chinner<dchinner@redhat.com>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Found fixes commits:
2bed0d82c2f7 xfs: fix bounds check in xfs_defer_agfl_block()

Note: The patch differs from the upstream commit:
---
1:  7dfee17b13e50 ! 1:  0de6be42d2941 xfs: validate block number being freed before adding to xefi
    @@ Metadata
      ## Commit message ##
         xfs: validate block number being freed before adding to xefi
     
    +    [ Upstream commit 7dfee17b13e5024c5c0ab1911859ded4182de3e5 ]
    +
         Bad things happen in defered extent freeing operations if it is
         passed a bad block number in the xefi. This can come from a bogus
         agno/agbno pair from deferred agfl freeing, or just a bad fsbno
    @@ Commit message
         Reviewed-by: Christoph Hellwig <hch@lst.de>
         Reviewed-by: Darrick J. Wong <djwong@kernel.org>
         Signed-off-by: Dave Chinner <david@fromorbit.com>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/libxfs/xfs_ag.c ##
     @@ fs/xfs/libxfs/xfs_ag.c: xfs_ag_shrink_space(
    @@ fs/xfs/libxfs/xfs_alloc.c: xfs_defer_agfl_block(
     +
      	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
      
    - 	xfs_extent_free_get_group(mp, xefi);
      	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
     +	return 0;
      }
    @@ fs/xfs/libxfs/xfs_alloc.c: __xfs_free_extent_later(
      			       GFP_KERNEL | __GFP_NOFAIL);
      	xefi->xefi_startblock = bno;
     @@ fs/xfs/libxfs/xfs_alloc.c: __xfs_free_extent_later(
    - 
    - 	xfs_extent_free_get_group(mp, xefi);
    + 			XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0,
    + 			XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len);
      	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list);
     +	return 0;
      }
    @@ fs/xfs/libxfs/xfs_alloc.h: xfs_buf_to_agfl_bno(
      		xfs_filblks_t len, const struct xfs_owner_info *oinfo,
      		bool skip_discard);
      
    -@@ fs/xfs/libxfs/xfs_alloc.h: void xfs_extent_free_get_group(struct xfs_mount *mp,
    +@@ fs/xfs/libxfs/xfs_alloc.h: struct xfs_extent_free_item {
      #define XFS_EFI_ATTR_FORK	(1U << 1) /* freeing attr fork block */
      #define XFS_EFI_BMBT_BLOCK	(1U << 2) /* freeing bmap btree block */
      
---

NOTE: These results are for this patch alone. Full series testing will be
performed when all parts are received.

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block()
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (5 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing Leah Rumancik
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Dave Chinner, Christoph Hellwig, Darrick J. Wong,
	Leah Rumancik

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070 ]

Need to happen before we allocate and then leak the xefi. Found by
coverity via an xfsprogs libxfs scan.

[djwong: This also fixes the type of the @agbno argument.]

Fixes: 7dfee17b13e5 ("xfs: validate block number being freed before adding to xefi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ec03040237db..af447605051b 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2487,28 +2487,29 @@ xfs_agfl_reset(
  */
 static int
 xfs_defer_agfl_block(
 	struct xfs_trans		*tp,
 	xfs_agnumber_t			agno,
-	xfs_fsblock_t			agbno,
+	xfs_agblock_t			agbno,
 	struct xfs_owner_info		*oinfo)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_extent_free_item	*xefi;
+	xfs_fsblock_t			fsbno = XFS_AGB_TO_FSB(mp, agno, agbno);
 
 	ASSERT(xfs_extfree_item_cache != NULL);
 	ASSERT(oinfo != NULL);
 
+	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, fsbno)))
+		return -EFSCORRUPTED;
+
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
-	xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
+	xefi->xefi_startblock = fsbno;
 	xefi->xefi_blockcount = 1;
 	xefi->xefi_owner = oinfo->oi_owner;
 
-	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock)))
-		return -EFSCORRUPTED;
-
 	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
 
 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
 	return 0;
 }
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block()
  2025-03-13 20:25 ` [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block() Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable; +Cc: Leah Rumancik, Sasha Levin

[ Sasha's backport helper bot ]

Hi,

✅ All tests passed successfully. No issues detected.
No action required from the submitter.

The upstream commit SHA1 provided is correct: 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Dave Chinner<dchinner@redhat.com>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
1:  2bed0d82c2f78 ! 1:  aec5f983044b3 xfs: fix bounds check in xfs_defer_agfl_block()
    @@ Metadata
      ## Commit message ##
         xfs: fix bounds check in xfs_defer_agfl_block()
     
    +    [ Upstream commit 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070 ]
    +
         Need to happen before we allocate and then leak the xefi. Found by
         coverity via an xfsprogs libxfs scan.
     
    @@ Commit message
         Reviewed-by: Christoph Hellwig <hch@lst.de>
         Reviewed-by: Darrick J. Wong <djwong@kernel.org>
         Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    +    Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
    +    Acked-by: "Darrick J. Wong" <djwong@kernel.org>
     
      ## fs/xfs/libxfs/xfs_alloc.c ##
     @@ fs/xfs/libxfs/xfs_alloc.c: static int
    @@ fs/xfs/libxfs/xfs_alloc.c: static int
     +	xefi->xefi_startblock = fsbno;
      	xefi->xefi_blockcount = 1;
      	xefi->xefi_owner = oinfo->oi_owner;
    - 	xefi->xefi_agresv = XFS_AG_RESV_AGFL;
      
     -	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock)))
     -		return -EFSCORRUPTED;
     -
      	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
      
    - 	xfs_extent_free_get_group(mp, xefi);
    + 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
---

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Success    |  Success   |

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (6 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block() Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-14 23:10   ` Sasha Levin
  2025-03-13 20:25 ` [PATCH 6.1 09/29] xfs: reserve less log space when recovering log intent items Leah Rumancik
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Dave Chinner, Darrick J. Wong, Chandan Babu R,
	Leah Rumancik

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit b742d7b4f0e03df25c2a772adcded35044b625ca ]

[ 6.1: resolved conflict in xfs_extfree_item.c ]

Btrees that aren't freespace management trees use the normal extent
allocation and freeing routines for their blocks. Hence when a btree
block is freed, a direct call to xfs_free_extent() is made and the
extent is immediately freed. This puts the entire free space
management btrees under this path, so we are stacking btrees on
btrees in the call stack. The inobt, finobt and refcount btrees
all do this.

However, the bmap btree does not do this - it calls
xfs_free_extent_later() to defer the extent free operation via an
XEFI and hence it gets processed in deferred operation processing
during the commit of the primary transaction (i.e. via intent
chaining).

We need to change xfs_free_extent() to behave in a non-blocking
manner so that we can avoid deadlocks with busy extents near ENOSPC
in transactions that free multiple extents. Inserting or removing a
record from a btree can cause a multi-level tree merge operation and
that will free multiple blocks from the btree in a single
transaction. i.e. we can call xfs_free_extent() multiple times, and
hence the btree manipulation transaction is vulnerable to this busy
extent deadlock vector.

To fix this, convert all the remaining callers of xfs_free_extent()
to use xfs_free_extent_later() to queue XEFIs and hence defer
processing of the extent frees to a context that can be safely
restarted if a deadlock condition is detected.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_ag.c             | 2 +-
 fs/xfs/libxfs/xfs_alloc.c          | 4 ++++
 fs/xfs/libxfs/xfs_alloc.h          | 8 +++++---
 fs/xfs/libxfs/xfs_bmap.c           | 8 +++++---
 fs/xfs/libxfs/xfs_bmap_btree.c     | 3 ++-
 fs/xfs/libxfs/xfs_ialloc.c         | 8 ++++----
 fs/xfs/libxfs/xfs_ialloc_btree.c   | 3 +--
 fs/xfs/libxfs/xfs_refcount.c       | 9 ++++++---
 fs/xfs/libxfs/xfs_refcount_btree.c | 8 +-------
 fs/xfs/xfs_extfree_item.c          | 3 ++-
 fs/xfs/xfs_reflink.c               | 3 ++-
 11 files changed, 33 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index ee0d56854b83..e03bfeacbed4 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -905,11 +905,11 @@ xfs_ag_shrink_space(
 		be32_add_cpu(&agf->agf_length, delta);
 		if (err2 != -ENOSPC)
 			goto resv_err;
 
 		err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL,
-				true);
+				XFS_AG_RESV_NONE, true);
 		if (err2)
 			goto resv_err;
 
 		/*
 		 * Roll the transaction before trying to re-init the per-ag
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index af447605051b..e44f3f5c6d27 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2505,55 +2505,59 @@ xfs_defer_agfl_block(
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
 	xefi->xefi_startblock = fsbno;
 	xefi->xefi_blockcount = 1;
 	xefi->xefi_owner = oinfo->oi_owner;
+	xefi->xefi_agresv = XFS_AG_RESV_AGFL;
 
 	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
 
 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
 	return 0;
 }
 
 /*
  * Add the extent to the list of extents to be free at transaction end.
  * The list is maintained sorted (by block number).
  */
 int
 __xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
 	const struct xfs_owner_info	*oinfo,
+	enum xfs_ag_resv_type		type,
 	bool				skip_discard)
 {
 	struct xfs_extent_free_item	*xefi;
 #ifdef DEBUG
 	struct xfs_mount		*mp = tp->t_mountp;
 	xfs_agnumber_t			agno;
 	xfs_agblock_t			agbno;
 
 	ASSERT(bno != NULLFSBLOCK);
 	ASSERT(len > 0);
 	ASSERT(len <= XFS_MAX_BMBT_EXTLEN);
 	ASSERT(!isnullstartblock(bno));
 	agno = XFS_FSB_TO_AGNO(mp, bno);
 	agbno = XFS_FSB_TO_AGBNO(mp, bno);
 	ASSERT(agno < mp->m_sb.sb_agcount);
 	ASSERT(agbno < mp->m_sb.sb_agblocks);
 	ASSERT(len < mp->m_sb.sb_agblocks);
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_extfree_item_cache != NULL);
+	ASSERT(type != XFS_AG_RESV_AGFL);
 
 	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
 		return -EFSCORRUPTED;
 
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
 	xefi->xefi_startblock = bno;
 	xefi->xefi_blockcount = (xfs_extlen_t)len;
+	xefi->xefi_agresv = type;
 	if (skip_discard)
 		xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD;
 	if (oinfo) {
 		ASSERT(oinfo->oi_offset == 0);
 
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index cfc9dcdc1753..bbedb18651de 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -213,36 +213,38 @@ xfs_buf_to_agfl_bno(
 	return bp->b_addr;
 }
 
 int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
 		xfs_filblks_t len, const struct xfs_owner_info *oinfo,
-		bool skip_discard);
+		enum xfs_ag_resv_type type, bool skip_discard);
 
 /*
  * List of extents to be free "later".
  * The list is kept sorted on xbf_startblock.
  */
 struct xfs_extent_free_item {
 	struct list_head	xefi_list;
 	uint64_t		xefi_owner;
 	xfs_fsblock_t		xefi_startblock;/* starting fs block number */
 	xfs_extlen_t		xefi_blockcount;/* number of blocks in extent */
 	unsigned int		xefi_flags;
+	enum xfs_ag_resv_type	xefi_agresv;
 };
 
 #define XFS_EFI_SKIP_DISCARD	(1U << 0) /* don't issue discard */
 #define XFS_EFI_ATTR_FORK	(1U << 1) /* freeing attr fork block */
 #define XFS_EFI_BMBT_BLOCK	(1U << 2) /* freeing bmap btree block */
 
 static inline int
 xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
-	const struct xfs_owner_info	*oinfo)
+	const struct xfs_owner_info	*oinfo,
+	enum xfs_ag_resv_type		type)
 {
-	return __xfs_free_extent_later(tp, bno, len, oinfo, false);
+	return __xfs_free_extent_later(tp, bno, len, oinfo, type, false);
 }
 
 
 extern struct kmem_cache	*xfs_extfree_item_cache;
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 29781a124323..aac071e4d000 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -572,11 +572,12 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
 
 	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
-	error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo);
+	error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo,
+			XFS_AG_RESV_NONE);
 	if (error)
 		return error;
 
 	ip->i_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
@@ -5206,12 +5207,13 @@ xfs_bmap_del_extent_real(
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
 		} else {
 			error = __xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
-					(bflags & XFS_BMAPI_NODISCARD) ||
-					del->br_state == XFS_EXT_UNWRITTEN);
+					XFS_AG_RESV_NONE,
+					((bflags & XFS_BMAPI_NODISCARD) ||
+					del->br_state == XFS_EXT_UNWRITTEN));
 			if (error)
 				goto done;
 		}
 	}
 
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index bac2a6496a8e..57f401f2492d 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -286,11 +286,12 @@ xfs_bmbt_free_block(
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
 	struct xfs_owner_info	oinfo;
 	int			error;
 
 	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork);
-	error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo);
+	error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo,
+			XFS_AG_RESV_NONE);
 	if (error)
 		return error;
 
 	ip->i_nblocks--;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 448ea76d50f8..d1472cbd48ff 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1844,12 +1844,12 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
 		return xfs_free_extent_later(tp,
 				XFS_AGB_TO_FSB(mp, agno, sagbno),
-				M_IGEO(mp)->ialloc_blks,
-				&XFS_RMAP_OINFO_INODES);
+				M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES,
+				XFS_AG_RESV_NONE);
 	}
 
 	/* holemask is only 16-bits (fits in an unsigned long) */
 	ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0]));
 	holemask[0] = rec->ir_holemask;
@@ -1890,12 +1890,12 @@ xfs_difree_inode_chunk(
 			    mp->m_sb.sb_inopblock;
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
 		error = xfs_free_extent_later(tp,
-				XFS_AGB_TO_FSB(mp, agno, agbno),
-				contigblk, &XFS_RMAP_OINFO_INODES);
+				XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
+				&XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE);
 		if (error)
 			return error;
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 2dbe553d87fb..7125447cde1a 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -158,12 +158,11 @@ __xfs_inobt_free_block(
 {
 	xfs_fsblock_t		fsbno;
 
 	xfs_inobt_mod_blockcount(cur, -1);
 	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
-	return xfs_free_extent(cur->bc_tp, cur->bc_ag.pag,
-			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1,
+	return xfs_free_extent_later(cur->bc_tp, fsbno, 1,
 			&XFS_RMAP_OINFO_INOBT, resv);
 }
 
 STATIC int
 xfs_inobt_free_block(
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index fec1ad95988c..4ec7a81dd3ef 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1128,11 +1128,12 @@ xfs_refcount_adjust_extents(
 			} else {
 				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
 						cur->bc_ag.pag->pag_agno,
 						tmp.rc_startblock);
 				error = xfs_free_extent_later(cur->bc_tp, fsbno,
-						  tmp.rc_blockcount, NULL);
+						  tmp.rc_blockcount, NULL,
+						  XFS_AG_RESV_NONE);
 				if (error)
 					goto out_error;
 			}
 
 			(*agbno) += tmp.rc_blockcount;
@@ -1189,11 +1190,12 @@ xfs_refcount_adjust_extents(
 		} else {
 			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
 					cur->bc_ag.pag->pag_agno,
 					ext.rc_startblock);
 			error = xfs_free_extent_later(cur->bc_tp, fsbno,
-					ext.rc_blockcount, NULL);
+					ext.rc_blockcount, NULL,
+					XFS_AG_RESV_NONE);
 			if (error)
 				goto out_error;
 		}
 
 skip:
@@ -1961,11 +1963,12 @@ xfs_refcount_recover_cow_leftovers(
 		xfs_refcount_free_cow_extent(tp, fsb,
 				rr->rr_rrec.rc_blockcount);
 
 		/* Free the block. */
 		error = xfs_free_extent_later(tp, fsb,
-				rr->rr_rrec.rc_blockcount, NULL);
+				rr->rr_rrec.rc_blockcount, NULL,
+				XFS_AG_RESV_NONE);
 		if (error)
 			goto out_trans;
 
 		error = xfs_trans_commit(tp);
 		if (error)
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 3d8e62da2ccc..fbd53b6951a9 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -104,23 +104,17 @@ xfs_refcountbt_free_block(
 {
 	struct xfs_mount	*mp = cur->bc_mp;
 	struct xfs_buf		*agbp = cur->bc_ag.agbp;
 	struct xfs_agf		*agf = agbp->b_addr;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
-	int			error;
 
 	trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno,
 			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1);
 	be32_add_cpu(&agf->agf_refcount_blocks, -1);
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS);
-	error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag,
-			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1,
+	return xfs_free_extent_later(cur->bc_tp, fsbno, 1,
 			&XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA);
-	if (error)
-		return error;
-
-	return error;
 }
 
 STATIC int
 xfs_refcountbt_get_minrecs(
 	struct xfs_btree_cur	*cur,
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index c1aae07467c9..b670863d8467 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -367,11 +367,11 @@ xfs_trans_free_extent(
 	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno,
 			xefi->xefi_blockcount);
 
 	pag = xfs_perag_get(mp, agno);
 	error = __xfs_free_extent(tp, pag, agbno, xefi->xefi_blockcount,
-			&oinfo, XFS_AG_RESV_NONE,
+			&oinfo, xefi->xefi_agresv,
 			xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
 	xfs_perag_put(pag);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
@@ -626,10 +626,11 @@ xfs_efi_item_recover(
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
 
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		struct xfs_extent_free_item	fake = {
 			.xefi_owner		= XFS_RMAP_OWN_UNKNOWN,
+			.xefi_agresv		= XFS_AG_RESV_NONE,
 		};
 		struct xfs_extent		*extp;
 
 		extp = &efip->efi_format.efi_extents[i];
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index ee187e90d943..1bac6a8af970 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -617,11 +617,12 @@ xfs_reflink_cancel_cow_blocks(
 			/* Free the CoW orphan record. */
 			xfs_refcount_free_cow_extent(*tpp, del.br_startblock,
 					del.br_blockcount);
 
 			error = xfs_free_extent_later(*tpp, del.br_startblock,
-					  del.br_blockcount, NULL);
+					del.br_blockcount, NULL,
+					XFS_AG_RESV_NONE);
 			if (error)
 				break;
 
 			/* Roll the transaction */
 			error = xfs_defer_finish(tpp);
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing
  2025-03-13 20:25 ` [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing Leah Rumancik
@ 2025-03-14 23:10   ` Sasha Levin
  0 siblings, 0 replies; 45+ messages in thread
From: Sasha Levin @ 2025-03-14 23:10 UTC (permalink / raw)
  To: stable, leah.rumancik; +Cc: Sasha Levin

[ Sasha's backport helper bot ]

Hi,

Summary of potential issues:
ℹ️ This is part 08/29 of a series
❌ Build failures detected

The upstream commit SHA1 provided is correct: b742d7b4f0e03df25c2a772adcded35044b625ca

WARNING: Author mismatch between patch and upstream commit:
Backport author: Leah Rumancik<leah.rumancik@gmail.com>
Commit author: Dave Chinner<dchinner@redhat.com>

Status in newer kernel trees:
6.13.y | Present (exact SHA1)
6.12.y | Present (exact SHA1)
6.6.y | Present (exact SHA1)

Note: The patch differs from the upstream commit:
---
Failed to apply patch cleanly.
---

NOTE: These results are for this patch alone. Full series testing will be
performed when all parts are received.

Results of testing on various branches:

| Branch                    | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y        |  Failed     |  N/A       |

Build Errors:
Patch failed to apply on stable/linux-6.1.y. Reject:

diff a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c	(rejected hunks)
@@ -1128,11 +1128,12 @@ xfs_refcount_adjust_extents(
 			} else {
 				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
 						cur->bc_ag.pag->pag_agno,
 						tmp.rc_startblock);
 				error = xfs_free_extent_later(cur->bc_tp, fsbno,
-						  tmp.rc_blockcount, NULL);
+						  tmp.rc_blockcount, NULL,
+						  XFS_AG_RESV_NONE);
 				if (error)
 					goto out_error;
 			}
 
 			(*agbno) += tmp.rc_blockcount;
@@ -1189,11 +1190,12 @@ xfs_refcount_adjust_extents(
 		} else {
 			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
 					cur->bc_ag.pag->pag_agno,
 					ext.rc_startblock);
 			error = xfs_free_extent_later(cur->bc_tp, fsbno,
-					ext.rc_blockcount, NULL);
+					ext.rc_blockcount, NULL,
+					XFS_AG_RESV_NONE);
 			if (error)
 				goto out_error;
 		}
 
 skip:
@@ -1961,11 +1963,12 @@ xfs_refcount_recover_cow_leftovers(
 		xfs_refcount_free_cow_extent(tp, fsb,
 				rr->rr_rrec.rc_blockcount);
 
 		/* Free the block. */
 		error = xfs_free_extent_later(tp, fsb,
-				rr->rr_rrec.rc_blockcount, NULL);
+				rr->rr_rrec.rc_blockcount, NULL,
+				XFS_AG_RESV_NONE);
 		if (error)
 			goto out_trans;
 
 		error = xfs_trans_commit(tp);
 		if (error)
diff a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c	(rejected hunks)
@@ -1844,12 +1844,12 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
 		return xfs_free_extent_later(tp,
 				XFS_AGB_TO_FSB(mp, agno, sagbno),
-				M_IGEO(mp)->ialloc_blks,
-				&XFS_RMAP_OINFO_INODES);
+				M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES,
+				XFS_AG_RESV_NONE);
 	}
 
 	/* holemask is only 16-bits (fits in an unsigned long) */
 	ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0]));
 	holemask[0] = rec->ir_holemask;
@@ -1890,12 +1890,12 @@ xfs_difree_inode_chunk(
 			    mp->m_sb.sb_inopblock;
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
 		error = xfs_free_extent_later(tp,
-				XFS_AGB_TO_FSB(mp, agno, agbno),
-				contigblk, &XFS_RMAP_OINFO_INODES);
+				XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
+				&XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE);
 		if (error)
 			return error;
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
diff a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c	(rejected hunks)
@@ -2505,55 +2505,59 @@ xfs_defer_agfl_block(
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
 	xefi->xefi_startblock = fsbno;
 	xefi->xefi_blockcount = 1;
 	xefi->xefi_owner = oinfo->oi_owner;
+	xefi->xefi_agresv = XFS_AG_RESV_AGFL;
 
 	trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
 
 	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list);
 	return 0;
 }
 
 /*
  * Add the extent to the list of extents to be free at transaction end.
  * The list is maintained sorted (by block number).
  */
 int
 __xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
 	const struct xfs_owner_info	*oinfo,
+	enum xfs_ag_resv_type		type,
 	bool				skip_discard)
 {
 	struct xfs_extent_free_item	*xefi;
 #ifdef DEBUG
 	struct xfs_mount		*mp = tp->t_mountp;
 	xfs_agnumber_t			agno;
 	xfs_agblock_t			agbno;
 
 	ASSERT(bno != NULLFSBLOCK);
 	ASSERT(len > 0);
 	ASSERT(len <= XFS_MAX_BMBT_EXTLEN);
 	ASSERT(!isnullstartblock(bno));
 	agno = XFS_FSB_TO_AGNO(mp, bno);
 	agbno = XFS_FSB_TO_AGBNO(mp, bno);
 	ASSERT(agno < mp->m_sb.sb_agcount);
 	ASSERT(agbno < mp->m_sb.sb_agblocks);
 	ASSERT(len < mp->m_sb.sb_agblocks);
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_extfree_item_cache != NULL);
+	ASSERT(type != XFS_AG_RESV_AGFL);
 
 	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
 		return -EFSCORRUPTED;
 
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
 	xefi->xefi_startblock = bno;
 	xefi->xefi_blockcount = (xfs_extlen_t)len;
+	xefi->xefi_agresv = type;
 	if (skip_discard)
 		xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD;
 	if (oinfo) {
 		ASSERT(oinfo->oi_offset == 0);
 
diff a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c	(rejected hunks)
@@ -104,23 +104,17 @@ xfs_refcountbt_free_block(
 {
 	struct xfs_mount	*mp = cur->bc_mp;
 	struct xfs_buf		*agbp = cur->bc_ag.agbp;
 	struct xfs_agf		*agf = agbp->b_addr;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
-	int			error;
 
 	trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno,
 			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1);
 	be32_add_cpu(&agf->agf_refcount_blocks, -1);
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS);
-	error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag,
-			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1,
+	return xfs_free_extent_later(cur->bc_tp, fsbno, 1,
 			&XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA);
-	if (error)
-		return error;
-
-	return error;
 }
 
 STATIC int
 xfs_refcountbt_get_minrecs(
 	struct xfs_btree_cur	*cur,
diff a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c	(rejected hunks)
@@ -158,12 +158,11 @@ __xfs_inobt_free_block(
 {
 	xfs_fsblock_t		fsbno;
 
 	xfs_inobt_mod_blockcount(cur, -1);
 	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
-	return xfs_free_extent(cur->bc_tp, cur->bc_ag.pag,
-			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1,
+	return xfs_free_extent_later(cur->bc_tp, fsbno, 1,
 			&XFS_RMAP_OINFO_INOBT, resv);
 }
 
 STATIC int
 xfs_inobt_free_block(
diff a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c	(rejected hunks)
@@ -905,11 +905,11 @@ xfs_ag_shrink_space(
 		be32_add_cpu(&agf->agf_length, delta);
 		if (err2 != -ENOSPC)
 			goto resv_err;
 
 		err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL,
-				true);
+				XFS_AG_RESV_NONE, true);
 		if (err2)
 			goto resv_err;
 
 		/*
 		 * Roll the transaction before trying to re-init the per-ag
diff a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c	(rejected hunks)
@@ -286,11 +286,12 @@ xfs_bmbt_free_block(
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp));
 	struct xfs_owner_info	oinfo;
 	int			error;
 
 	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork);
-	error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo);
+	error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo,
+			XFS_AG_RESV_NONE);
 	if (error)
 		return error;
 
 	ip->i_nblocks--;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h	(rejected hunks)
@@ -213,36 +213,38 @@ xfs_buf_to_agfl_bno(
 	return bp->b_addr;
 }
 
 int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
 		xfs_filblks_t len, const struct xfs_owner_info *oinfo,
-		bool skip_discard);
+		enum xfs_ag_resv_type type, bool skip_discard);
 
 /*
  * List of extents to be free "later".
  * The list is kept sorted on xbf_startblock.
  */
 struct xfs_extent_free_item {
 	struct list_head	xefi_list;
 	uint64_t		xefi_owner;
 	xfs_fsblock_t		xefi_startblock;/* starting fs block number */
 	xfs_extlen_t		xefi_blockcount;/* number of blocks in extent */
 	unsigned int		xefi_flags;
+	enum xfs_ag_resv_type	xefi_agresv;
 };
 
 #define XFS_EFI_SKIP_DISCARD	(1U << 0) /* don't issue discard */
 #define XFS_EFI_ATTR_FORK	(1U << 1) /* freeing attr fork block */
 #define XFS_EFI_BMBT_BLOCK	(1U << 2) /* freeing bmap btree block */
 
 static inline int
 xfs_free_extent_later(
 	struct xfs_trans		*tp,
 	xfs_fsblock_t			bno,
 	xfs_filblks_t			len,
-	const struct xfs_owner_info	*oinfo)
+	const struct xfs_owner_info	*oinfo,
+	enum xfs_ag_resv_type		type)
 {
-	return __xfs_free_extent_later(tp, bno, len, oinfo, false);
+	return __xfs_free_extent_later(tp, bno, len, oinfo, type, false);
 }
 
 
 extern struct kmem_cache	*xfs_extfree_item_cache;
 
diff a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c	(rejected hunks)
@@ -572,11 +572,12 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
 
 	xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork);
-	error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo);
+	error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo,
+			XFS_AG_RESV_NONE);
 	if (error)
 		return error;
 
 	ip->i_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
@@ -5206,12 +5207,13 @@ xfs_bmap_del_extent_real(
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
 		} else {
 			error = __xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
-					(bflags & XFS_BMAPI_NODISCARD) ||
-					del->br_state == XFS_EXT_UNWRITTEN);
+					XFS_AG_RESV_NONE,
+					((bflags & XFS_BMAPI_NODISCARD) ||
+					del->br_state == XFS_EXT_UNWRITTEN));
 			if (error)
 				goto done;
 		}
 	}
 
diff a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c	(rejected hunks)
@@ -367,11 +367,11 @@ xfs_trans_free_extent(
 	trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno,
 			xefi->xefi_blockcount);
 
 	pag = xfs_perag_get(mp, agno);
 	error = __xfs_free_extent(tp, pag, agbno, xefi->xefi_blockcount,
-			&oinfo, XFS_AG_RESV_NONE,
+			&oinfo, xefi->xefi_agresv,
 			xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
 	xfs_perag_put(pag);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
@@ -626,10 +626,11 @@ xfs_efi_item_recover(
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
 
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		struct xfs_extent_free_item	fake = {
 			.xefi_owner		= XFS_RMAP_OWN_UNKNOWN,
+			.xefi_agresv		= XFS_AG_RESV_NONE,
 		};
 		struct xfs_extent		*extp;
 
 		extp = &efip->efi_format.efi_extents[i];
 
diff a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c	(rejected hunks)
@@ -617,11 +617,12 @@ xfs_reflink_cancel_cow_blocks(
 			/* Free the CoW orphan record. */
 			xfs_refcount_free_cow_extent(*tpp, del.br_startblock,
 					del.br_blockcount);
 
 			error = xfs_free_extent_later(*tpp, del.br_startblock,
-					  del.br_blockcount, NULL);
+					del.br_blockcount, NULL,
+					XFS_AG_RESV_NONE);
 			if (error)
 				break;
 
 			/* Roll the transaction */
 			error = xfs_defer_finish(tpp);

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 09/29] xfs: reserve less log space when recovering log intent items
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (7 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 10/29] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Leah Rumancik
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Darrick J. Wong, Wengang Wang, Srikanth C S,
	Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 3c919b0910906cc69d76dea214776f0eac73358b ]

Wengang Wang reports that a customer's system was running a number of
truncate operations on a filesystem with a very small log.  Contention
on the reserve heads lead to other threads stalling on smaller updates
(e.g.  mtime updates) long enough to result in the node being rebooted
on account of the lack of responsivenes.  The node failed to recover
because log recovery of an EFI became stuck waiting for a grant of
reserve space.  From Wengang's report:

"For the file deletion, log bytes are reserved basing on
xfs_mount->tr_itruncate which is:

    tr_logres = 175488,
    tr_logcount = 2,
    tr_logflags = XFS_TRANS_PERM_LOG_RES,

"You see it's a permanent log reservation with two log operations (two
transactions in rolling mode).  After calculation (xlog_calc_unit_res()
adds space for various log headers), the final log space needed per
transaction changes from  175488 to 180208 bytes.  So the total log
space needed is 360416 bytes (180208 * 2).  [That quantity] of log space
(360416 bytes) needs to be reserved for both run time inode removing
(xfs_inactive_truncate()) and EFI recover (xfs_efi_item_recover())."

In other words, runtime pre-reserves 360K of space in anticipation of
running a chain of two transactions in which each transaction gets a
180K reservation.

Now that we've allocated the transaction, we delete the bmap mapping,
log an EFI to free the space, and roll the transaction as part of
finishing the deferops chain.  Rolling creates a new xfs_trans which
shares its ticket with the old transaction.  Next, xfs_trans_roll calls
__xfs_trans_commit with regrant == true, which calls xlog_cil_commit
with the same regrant parameter.

xlog_cil_commit calls xfs_log_ticket_regrant, which decrements t_cnt and
subtracts t_curr_res from the reservation and write heads.

If the filesystem is fresh and the first transaction only used (say)
20K, then t_curr_res will be 160K, and we give that much reservation
back to the reservation head.  Or if the file is really fragmented and
the first transaction actually uses 170K, then t_curr_res will be 10K,
and that's what we give back to the reservation.

Having done that, we're now headed into the second transaction with an
EFI and 180K of reservation.  Other threads apparently consumed all the
reservation for smaller transactions, such as timestamp updates.

Now let's say the first transaction gets written to disk and we crash
without ever completing the second transaction.  Now we remount the fs,
log recovery finds the unfinished EFI, and calls xfs_efi_recover to
finish the EFI.  However, xfs_efi_recover starts a new tr_itruncate
tranasction, which asks for 360K log reservation.  This is a lot more
than the 180K that we had reserved at the time of the crash.  If the
first EFI to be recovered is also pinning the tail of the log, we will
be unable to free any space in the log, and recovery livelocks.

Wengang confirmed this:

"Now we have the second transaction which has 180208 log bytes reserved
too. The second transaction is supposed to process intents including
extent freeing.  With my hacking patch, I blocked the extent freeing 5
hours. So in that 5 hours, 180208 (NOT 360416) log bytes are reserved.

"With my test case, other transactions (update timestamps) then happen.
As my hacking patch pins the journal tail, those timestamp-updating
transactions finally use up (almost) all the left available log space
(in memory in on disk).  And finally the on disk (and in memory)
available log space goes down near to 180208 bytes.  Those 180208 bytes
are reserved by [the] second (extent-free) transaction [in the chain]."

Wengang and I noticed that EFI recovery starts a transaction, completes
one step of the chain, and commits the transaction without completing
any other steps of the chain.  Those subsequent steps are completed by
xlog_finish_defer_ops, which allocates yet another transaction to
finish the rest of the chain.  That transaction gets the same tr_logres
as the head transaction, but with tr_logcount = 1 to force regranting
with every roll to avoid livelocks.

In other words, we already figured this out in commit 929b92f64048d
("xfs: xfs_defer_capture should absorb remaining transaction
reservation"), but should have applied that logic to each intent item's
recovery function.  For Wengang's case, the xfs_trans_alloc call in the
EFI recovery function should only be asking for a single transaction's
worth of log reservation -- 180K, not 360K.

Quoting Wengang again:

"With log recovery, during EFI recovery, we use tr_itruncate again to
reserve two transactions that needs 360416 log bytes.  Reserving 360416
bytes fails [stalls] because we now only have about 180208 available.

"Actually during the EFI recover, we only need one transaction to free
the extents just like the 2nd transaction at RUNTIME.  So it only needs
to reserve 180208 rather than 360416 bytes.  We have (a bit) more than
180208 available log bytes on disk, so [if we decrease the reservation
to 180K] the reservation goes and the recovery [finishes].  That is to
say: we can fix the log recover part to fix the issue. We can introduce
a new xfs_trans_res xfs_mount->tr_ext_free

{
  tr_logres = 175488,
  tr_logcount = 0,
  tr_logflags = 0,
}

"and use tr_ext_free instead of tr_itruncate in EFI recover."

However, I don't think it quite makes sense to create an entirely new
transaction reservation type to handle single-stepping during log
recovery.  Instead, we should copy the transaction reservation
information in the xfs_mount, change tr_logcount to 1, and pass that
into xfs_trans_alloc.  We know this won't risk changing the min log size
computation since we always ask for a fraction of the reservation for
all known transaction types.

This looks like it's been lurking in the codebase since commit
3d3c8b5222b92, which changed the xfs_trans_reserve call in
xlog_recover_process_efi to use the tr_logcount in tr_itruncate.
That changed the EFI recovery transaction from making a
non-XFS_TRANS_PERM_LOG_RES request for one transaction's worth of log
space to a XFS_TRANS_PERM_LOG_RES request for two transactions worth.

Fixes: 3d3c8b5222b92 ("xfs: refactor xfs_trans_reserve() interface")
Complements: 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation")
Suggested-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Srikanth C S <srikanth.c.s@oracle.com>
[djwong: apply the same transformation to all log intent recovery]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_recover.h | 22 ++++++++++++++++++++++
 fs/xfs/xfs_attr_item.c          |  7 ++++---
 fs/xfs/xfs_bmap_item.c          |  4 +++-
 fs/xfs/xfs_extfree_item.c       |  4 +++-
 fs/xfs/xfs_refcount_item.c      |  6 ++++--
 fs/xfs/xfs_rmap_item.c          |  6 ++++--
 6 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 2420865f3007..a5100a11faf9 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -129,6 +129,28 @@ void xlog_free_buf_cancel_table(struct xlog *log);
 void xlog_check_buf_cancel_table(struct xlog *log);
 #else
 #define xlog_check_buf_cancel_table(log) do { } while (0)
 #endif

+/*
+ * Transform a regular reservation into one suitable for recovery of a log
+ * intent item.
+ *
+ * Intent recovery only runs a single step of the transaction chain and defers
+ * the rest to a separate transaction.  Therefore, we reduce logcount to 1 here
+ * to avoid livelocks if the log grant space is nearly exhausted due to the
+ * recovered intent pinning the tail.  Keep the same logflags to avoid tripping
+ * asserts elsewhere.  Struct copies abound below.
+ */
+static inline struct xfs_trans_res
+xlog_recover_resv(const struct xfs_trans_res *r)
+{
+	struct xfs_trans_res ret = {
+		.tr_logres	= r->tr_logres,
+		.tr_logcount	= 1,
+		.tr_logflags	= r->tr_logflags,
+	};
+
+	return ret;
+}
+
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 2788a6f2edcd..36fe2abb16e6 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -545,11 +545,11 @@ xfs_attri_item_recover(
 	struct xfs_attr_intent		*attr;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_inode		*ip;
 	struct xfs_da_args		*args;
 	struct xfs_trans		*tp;
-	struct xfs_trans_res		tres;
+	struct xfs_trans_res		resv;
 	struct xfs_attri_log_format	*attrp;
 	struct xfs_attri_log_nameval	*nv = attrip->attri_nameval;
 	int				error;
 	int				total;
 	int				local;
@@ -616,12 +616,13 @@ xfs_attri_item_recover(
 		ASSERT(0);
 		error = -EFSCORRUPTED;
 		goto out;
 	}

-	xfs_init_attr_trans(args, &tres, &total);
-	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
+	xfs_init_attr_trans(args, &resv, &total);
+	resv = xlog_recover_resv(&resv);
+	error = xfs_trans_alloc(mp, &resv, total, 0, XFS_TRANS_RESERVE, &tp);
 	if (error)
 		goto out;

 	args->trans = tp;
 	done_item = xfs_trans_get_attrd(tp, attrip);
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 13aa5359c02f..1058603db3ac 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -455,36 +455,38 @@ STATIC int
 xfs_bui_item_recover(
 	struct xfs_log_item		*lip,
 	struct list_head		*capture_list)
 {
 	struct xfs_bmap_intent		fake = { };
+	struct xfs_trans_res		resv;
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_map_extent		*map;
 	struct xfs_bud_log_item		*budp;
 	int				iext_delta;
 	int				error = 0;

 	if (!xfs_bui_validate(mp, buip)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				&buip->bui_format, sizeof(buip->bui_format));
 		return -EFSCORRUPTED;
 	}

 	map = &buip->bui_format.bui_extents[0];
 	fake.bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
 			XFS_ATTR_FORK : XFS_DATA_FORK;
 	fake.bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;

 	error = xlog_recover_iget(mp, map->me_owner, &ip);
 	if (error)
 		return error;

 	/* Allocate transaction and do the work. */
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
+	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
+	error = xfs_trans_alloc(mp, &resv,
 			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		goto err_rele;

 	budp = xfs_trans_get_bud(tp, buip);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index b670863d8467..3ed25c352269 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -596,33 +596,35 @@ xfs_efi_validate_ext(
 STATIC int
 xfs_efi_item_recover(
 	struct xfs_log_item		*lip,
 	struct list_head		*capture_list)
 {
+	struct xfs_trans_res		resv;
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
 	int				i;
 	int				error = 0;

 	/*
 	 * First check the validity of the extents described by the
 	 * EFI.  If any are bad, then assume that all are bad and
 	 * just toss the EFI.
 	 */
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		if (!xfs_efi_validate_ext(mp,
 					&efip->efi_format.efi_extents[i])) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					&efip->efi_format,
 					sizeof(efip->efi_format));
 			return -EFSCORRUPTED;
 		}
 	}

-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
+	error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp);
 	if (error)
 		return error;
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);

 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index ff4d5087ba00..dfd7b824e32b 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -451,10 +451,11 @@ xfs_cui_validate_phys(
 STATIC int
 xfs_cui_item_recover(
 	struct xfs_log_item		*lip,
 	struct list_head		*capture_list)
 {
+	struct xfs_trans_res		resv;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
 	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
@@ -488,12 +489,13 @@ xfs_cui_item_recover(
 	 * refcount update.  However, we're in log recovery here, so we
 	 * use the passed in defer_ops and to finish up any work that
 	 * doesn't fit.  We need to reserve enough blocks to handle a
 	 * full btree split on either end of the refcount range.
 	 */
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
-			mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp);
+	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
+	error = xfs_trans_alloc(mp, &resv, mp->m_refc_maxlevels * 2, 0,
+			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;

 	cudp = xfs_trans_get_cud(tp, cuip);

diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 534504ede1a3..2043cea261c0 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -490,10 +490,11 @@ xfs_rui_validate_map(
 STATIC int
 xfs_rui_item_recover(
 	struct xfs_log_item		*lip,
 	struct list_head		*capture_list)
 {
+	struct xfs_trans_res		resv;
 	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
 	struct xfs_map_extent		*rmap;
 	struct xfs_rud_log_item		*rudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
@@ -517,12 +518,13 @@ xfs_rui_item_recover(
 					sizeof(ruip->rui_format));
 			return -EFSCORRUPTED;
 		}
 	}

-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
-			mp->m_rmap_maxlevels, 0, XFS_TRANS_RESERVE, &tp);
+	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
+	error = xfs_trans_alloc(mp, &resv, mp->m_rmap_maxlevels, 0,
+			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
 	rudp = xfs_trans_get_rud(tp, ruip);

 	for (i = 0; i < ruip->rui_format.rui_nextents; i++) {
-- 
2.49.0.rc1.451.g8f38331e32-goog

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 10/29] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (8 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 09/29] xfs: reserve less log space when recovering log intent items Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 11/29] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Leah Rumancik
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Catherine Hoang,
	Greg Kroah-Hartman, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 13928113fc5b5e79c91796290a99ed991ac0efe2 ]

[6.1: resolved conflicts with fscounters.c and rtsummary.c ]

Move all the declarations for functionality in xfs_rtbitmap.c into a
separate xfs_rtbitmap.h header file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c     |  2 +-
 fs/xfs/libxfs/xfs_rtbitmap.c |  1 +
 fs/xfs/libxfs/xfs_rtbitmap.h | 82 ++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/rtbitmap.c      |  2 +-
 fs/xfs/xfs_fsmap.c           |  2 +-
 fs/xfs/xfs_rtalloc.c         |  1 +
 fs/xfs/xfs_rtalloc.h         | 73 --------------------------------
 7 files changed, 87 insertions(+), 76 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index aac071e4d000..c99fd7fe242e 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -19,11 +19,11 @@
 #include "xfs_trans.h"
 #include "xfs_alloc.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
 #include "xfs_bmap_btree.h"
-#include "xfs_rtalloc.h"
+#include "xfs_rtbitmap.h"
 #include "xfs_errortag.h"
 #include "xfs_error.h"
 #include "xfs_quota.h"
 #include "xfs_trans_space.h"
 #include "xfs_buf_item.h"
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 655108a4cd05..9eb1b5aa7e35 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -14,10 +14,11 @@
 #include "xfs_inode.h"
 #include "xfs_bmap.h"
 #include "xfs_trans.h"
 #include "xfs_rtalloc.h"
 #include "xfs_error.h"
+#include "xfs_rtbitmap.h"
 
 /*
  * Realtime allocator bitmap functions shared with userspace.
  */
 
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
new file mode 100644
index 000000000000..546dea34bb37
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ */
+#ifndef __XFS_RTBITMAP_H__
+#define	__XFS_RTBITMAP_H__
+
+/*
+ * XXX: Most of the realtime allocation functions deal in units of realtime
+ * extents, not realtime blocks.  This looks funny when paired with the type
+ * name and screams for a larger cleanup.
+ */
+struct xfs_rtalloc_rec {
+	xfs_rtblock_t		ar_startext;
+	xfs_rtblock_t		ar_extcount;
+};
+
+typedef int (*xfs_rtalloc_query_range_fn)(
+	struct xfs_mount		*mp,
+	struct xfs_trans		*tp,
+	const struct xfs_rtalloc_rec	*rec,
+	void				*priv);
+
+#ifdef CONFIG_XFS_RT
+int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp,
+		  xfs_rtblock_t block, int issum, struct xfs_buf **bpp);
+int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp,
+		      xfs_rtblock_t start, xfs_extlen_t len, int val,
+		      xfs_rtblock_t *new, int *stat);
+int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp,
+		    xfs_rtblock_t start, xfs_rtblock_t limit,
+		    xfs_rtblock_t *rtblock);
+int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp,
+		    xfs_rtblock_t start, xfs_rtblock_t limit,
+		    xfs_rtblock_t *rtblock);
+int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp,
+		       xfs_rtblock_t start, xfs_extlen_t len, int val);
+int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp,
+			     int log, xfs_rtblock_t bbno, int delta,
+			     struct xfs_buf **rbpp, xfs_fsblock_t *rsb,
+			     xfs_suminfo_t *sum);
+int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log,
+			 xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp,
+			 xfs_fsblock_t *rsb);
+int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp,
+		     xfs_rtblock_t start, xfs_extlen_t len,
+		     struct xfs_buf **rbpp, xfs_fsblock_t *rsb);
+int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp,
+		const struct xfs_rtalloc_rec *low_rec,
+		const struct xfs_rtalloc_rec *high_rec,
+		xfs_rtalloc_query_range_fn fn, void *priv);
+int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp,
+			  xfs_rtalloc_query_range_fn fn,
+			  void *priv);
+bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
+int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp,
+			       xfs_rtblock_t start, xfs_extlen_t len,
+			       bool *is_free);
+/*
+ * Free an extent in the realtime subvolume.  Length is expressed in
+ * realtime extents, as is the block number.
+ */
+int					/* error */
+xfs_rtfree_extent(
+	struct xfs_trans	*tp,	/* transaction pointer */
+	xfs_rtblock_t		bno,	/* starting block number to free */
+	xfs_extlen_t		len);	/* length of extent freed */
+
+/* Same as above, but in units of rt blocks. */
+int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
+		xfs_filblks_t rtlen);
+#else /* CONFIG_XFS_RT */
+# define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
+# define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
+# define xfs_rtalloc_query_range(m,t,l,h,f,p)		(-ENOSYS)
+# define xfs_rtalloc_query_all(m,t,f,p)			(-ENOSYS)
+# define xfs_rtbuf_get(m,t,b,i,p)			(-ENOSYS)
+# define xfs_rtalloc_extent_is_free(m,t,s,l,i)		(-ENOSYS)
+#endif /* CONFIG_XFS_RT */
+
+#endif /* __XFS_RTBITMAP_H__ */
diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
index 0a3bde64c675..faf6e0890d44 100644
--- a/fs/xfs/scrub/rtbitmap.c
+++ b/fs/xfs/scrub/rtbitmap.c
@@ -9,11 +9,11 @@
 #include "xfs_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_log_format.h"
 #include "xfs_trans.h"
-#include "xfs_rtalloc.h"
+#include "xfs_rtbitmap.h"
 #include "xfs_inode.h"
 #include "xfs_bmap.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 062e5dc5db9f..a5b9754c62d1 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -21,11 +21,11 @@
 #include <linux/fsmap.h>
 #include "xfs_fsmap.h"
 #include "xfs_refcount.h"
 #include "xfs_refcount_btree.h"
 #include "xfs_alloc_btree.h"
-#include "xfs_rtalloc.h"
+#include "xfs_rtbitmap.h"
 #include "xfs_ag.h"
 
 /* Convert an xfs_fsmap to an fsmap. */
 static void
 xfs_fsmap_from_internal(
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 0bfbbc1dd0da..7ce122da43fe 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -17,10 +17,11 @@
 #include "xfs_trans.h"
 #include "xfs_trans_space.h"
 #include "xfs_icache.h"
 #include "xfs_rtalloc.h"
 #include "xfs_sb.h"
+#include "xfs_rtbitmap.h"
 
 /*
  * Read and return the summary information for a given extent size,
  * bitmap block combination.
  * Keeps track of a current summary block, so we don't keep reading
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 65c284e9d33e..11859c259a1c 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -9,60 +9,31 @@
 /* kernel only definitions and functions */
 
 struct xfs_mount;
 struct xfs_trans;
 
-/*
- * XXX: Most of the realtime allocation functions deal in units of realtime
- * extents, not realtime blocks.  This looks funny when paired with the type
- * name and screams for a larger cleanup.
- */
-struct xfs_rtalloc_rec {
-	xfs_rtblock_t		ar_startext;
-	xfs_rtblock_t		ar_extcount;
-};
-
-typedef int (*xfs_rtalloc_query_range_fn)(
-	struct xfs_mount		*mp,
-	struct xfs_trans		*tp,
-	const struct xfs_rtalloc_rec	*rec,
-	void				*priv);
-
 #ifdef CONFIG_XFS_RT
 /*
  * Function prototypes for exported functions.
  */
 
 /*
  * Allocate an extent in the realtime subvolume, with the usual allocation
  * parameters.  The length units are all in realtime extents, as is the
  * result block number.
  */
 int					/* error */
 xfs_rtallocate_extent(
 	struct xfs_trans	*tp,	/* transaction pointer */
 	xfs_rtblock_t		bno,	/* starting block number to allocate */
 	xfs_extlen_t		minlen,	/* minimum length to allocate */
 	xfs_extlen_t		maxlen,	/* maximum length to allocate */
 	xfs_extlen_t		*len,	/* out: actual length allocated */
 	int			wasdel,	/* was a delayed allocation extent */
 	xfs_extlen_t		prod,	/* extent product factor */
 	xfs_rtblock_t		*rtblock); /* out: start block allocated */
 
-/*
- * Free an extent in the realtime subvolume.  Length is expressed in
- * realtime extents, as is the block number.
- */
-int					/* error */
-xfs_rtfree_extent(
-	struct xfs_trans	*tp,	/* transaction pointer */
-	xfs_rtblock_t		bno,	/* starting block number to free */
-	xfs_extlen_t		len);	/* length of extent freed */
-
-/* Same as above, but in units of rt blocks. */
-int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
-		xfs_filblks_t rtlen);
 
 /*
  * Initialize realtime fields in the mount structure.
  */
 int					/* error */
@@ -100,59 +71,15 @@ xfs_rtpick_extent(
 int
 xfs_growfs_rt(
 	struct xfs_mount	*mp,	/* file system mount structure */
 	xfs_growfs_rt_t		*in);	/* user supplied growfs struct */
 
-/*
- * From xfs_rtbitmap.c
- */
-int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp,
-		  xfs_rtblock_t block, int issum, struct xfs_buf **bpp);
-int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp,
-		      xfs_rtblock_t start, xfs_extlen_t len, int val,
-		      xfs_rtblock_t *new, int *stat);
-int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp,
-		    xfs_rtblock_t start, xfs_rtblock_t limit,
-		    xfs_rtblock_t *rtblock);
-int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp,
-		    xfs_rtblock_t start, xfs_rtblock_t limit,
-		    xfs_rtblock_t *rtblock);
-int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp,
-		       xfs_rtblock_t start, xfs_extlen_t len, int val);
-int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp,
-			     int log, xfs_rtblock_t bbno, int delta,
-			     struct xfs_buf **rbpp, xfs_fsblock_t *rsb,
-			     xfs_suminfo_t *sum);
-int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log,
-			 xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp,
-			 xfs_fsblock_t *rsb);
-int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp,
-		     xfs_rtblock_t start, xfs_extlen_t len,
-		     struct xfs_buf **rbpp, xfs_fsblock_t *rsb);
-int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp,
-		const struct xfs_rtalloc_rec *low_rec,
-		const struct xfs_rtalloc_rec *high_rec,
-		xfs_rtalloc_query_range_fn fn, void *priv);
-int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp,
-			  xfs_rtalloc_query_range_fn fn,
-			  void *priv);
-bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
-int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp,
-			       xfs_rtblock_t start, xfs_extlen_t len,
-			       bool *is_free);
 int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp);
 #else
 # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)	(-ENOSYS)
-# define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
-# define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
 # define xfs_rtpick_extent(m,t,l,rb)			(-ENOSYS)
 # define xfs_growfs_rt(mp,in)				(-ENOSYS)
-# define xfs_rtalloc_query_range(m,t,l,h,f,p)		(-ENOSYS)
-# define xfs_rtalloc_query_all(m,t,f,p)			(-ENOSYS)
-# define xfs_rtbuf_get(m,t,b,i,p)			(-ENOSYS)
-# define xfs_verify_rtbno(m, r)				(false)
-# define xfs_rtalloc_extent_is_free(m,t,s,l,i)		(-ENOSYS)
 # define xfs_rtalloc_reinit_frextents(m)		(0)
 static inline int		/* error */
 xfs_rtmount_init(
 	xfs_mount_t	*mp)	/* file system mount structure */
 {
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 11/29] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (9 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 10/29] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 12/29] xfs: consider minlen sized extents in xfs_rtallocate_extent_block Leah Rumancik
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Catherine Hoang,
	Greg Kroah-Hartman, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit f29c3e745dc253bf9d9d06ddc36af1a534ba1dd0 ]

[ 6.1: excluded changes to trace.h as xchk_rtsum_record_free
does not exist yet ]

XFS uses xfs_rtblock_t for many different uses, which makes it much more
difficult to perform a unit analysis on the codebase.  One of these
(ab)uses is when we need to store the length of a free space extent as
stored in the realtime bitmap.  Because there can be up to 2^64 realtime
extents in a filesystem, we need a new type that is larger than
xfs_rtxlen_t for callers that are querying the bitmap directly.  This
means scrub and growfs.

Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx
lengths.  'b' stands for 'bitmap' or 'big'; reader's choice.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h   | 2 +-
 fs/xfs/libxfs/xfs_rtbitmap.h | 2 +-
 fs/xfs/libxfs/xfs_types.h    | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 371dc07233e0..20acb8573d7a 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -96,11 +96,11 @@ struct xfs_ifork;
 typedef struct xfs_sb {
 	uint32_t	sb_magicnum;	/* magic number == XFS_SB_MAGIC */
 	uint32_t	sb_blocksize;	/* logical block size, bytes */
 	xfs_rfsblock_t	sb_dblocks;	/* number of data blocks */
 	xfs_rfsblock_t	sb_rblocks;	/* number of realtime blocks */
-	xfs_rtblock_t	sb_rextents;	/* number of realtime extents */
+	xfs_rtbxlen_t	sb_rextents;	/* number of realtime extents */
 	uuid_t		sb_uuid;	/* user-visible file system unique id */
 	xfs_fsblock_t	sb_logstart;	/* starting block of log if internal */
 	xfs_ino_t	sb_rootino;	/* root inode number */
 	xfs_ino_t	sb_rbmino;	/* bitmap inode for realtime extents */
 	xfs_ino_t	sb_rsumino;	/* summary inode for rt bitmap */
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 546dea34bb37..c3ef22e67aa3 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -11,11 +11,11 @@
  * extents, not realtime blocks.  This looks funny when paired with the type
  * name and screams for a larger cleanup.
  */
 struct xfs_rtalloc_rec {
 	xfs_rtblock_t		ar_startext;
-	xfs_rtblock_t		ar_extcount;
+	xfs_rtbxlen_t		ar_extcount;
 };
 
 typedef int (*xfs_rtalloc_query_range_fn)(
 	struct xfs_mount		*mp,
 	struct xfs_trans		*tp,
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 5ebdda7e1078..7023e4a79f87 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -29,10 +29,11 @@ typedef uint32_t	xfs_dahash_t;	/* dir/attr hash value */
 typedef uint64_t	xfs_fsblock_t;	/* blockno in filesystem (agno|agbno) */
 typedef uint64_t	xfs_rfsblock_t;	/* blockno in filesystem (raw) */
 typedef uint64_t	xfs_rtblock_t;	/* extent (block) in realtime area */
 typedef uint64_t	xfs_fileoff_t;	/* block number in a file */
 typedef uint64_t	xfs_filblks_t;	/* number of blocks in a file */
+typedef uint64_t	xfs_rtbxlen_t;	/* rtbitmap extent length in rtextents */
 
 typedef int64_t		xfs_srtblock_t;	/* signed version of xfs_rtblock_t */
 
 /*
  * New verifiers will return the instruction address of the failing check.
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 12/29] xfs: consider minlen sized extents in xfs_rtallocate_extent_block
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (10 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 11/29] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 13/29] xfs: don't leak recovered attri intent items Leah Rumancik
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Christoph Hellwig, Darrick J. Wong, Chandan Babu R,
	Leah Rumancik

From: Christoph Hellwig <hch@lst.de>

[ Upstream commit 944df75958807d56f2db9fdc769eb15dd9f0366a ]

minlen is the lower bound on the extent length that the caller can
accept, and maxlen is at this point the maximal available length.
This means a minlen extent is perfectly fine to use, so do it.  This
matches the equivalent logic in xfs_rtallocate_extent_exact that also
accepts a minlen sized extent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 7ce122da43fe..2f2280f4e7fa 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -316,11 +316,11 @@ xfs_rtallocate_extent_block(
 			break;
 	}
 	/*
 	 * Searched the whole thing & didn't find a maxlen free extent.
 	 */
-	if (minlen < maxlen && besti != -1) {
+	if (minlen <= maxlen && besti != -1) {
 		xfs_extlen_t	p;	/* amount to trim length by */
 
 		/*
 		 * If size should be a multiple of prod, make that so.
 		 */
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 13/29] xfs: don't leak recovered attri intent items
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (11 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 12/29] xfs: consider minlen sized extents in xfs_rtallocate_extent_block Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover " Leah Rumancik
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 07bcbdf020c9fd3c14bec51c50225a2a02707b94 ]

If recovery finds an xattr log intent item calling for the removal of an
attribute and the file doesn't even have an attr fork, we know that the
removal is trivially complete.  However, we can't just exit the recovery
function without doing something about the recovered log intent item --
it's still on the AIL, and not logging an attrd item means it stays
there forever.

This has likely not been seen in practice because few people use LARP
and the runtime code won't log the attri for a no-attrfork removexattr
operation.  But let's fix this anyway.

Also we shouldn't really be testing the attr fork presence until we've
taken the ILOCK, though this doesn't matter much in recovery, which is
single threaded.

Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 36fe2abb16e6..11e88a76a33c 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -327,10 +327,17 @@ xfs_xattri_finish_update(
 	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_LARP)) {
 		error = -EIO;
 		goto out;
 	}
 
+	/* If an attr removal is trivially complete, we're done. */
+	if (attr->xattri_op_flags == XFS_ATTRI_OP_FLAGS_REMOVE &&
+	    !xfs_inode_hasattr(args->dp)) {
+		error = 0;
+		goto out;
+	}
+
 	error = xfs_attr_set_iter(attr);
 	if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
 		error = -EAGAIN;
 out:
 	/*
@@ -606,12 +613,10 @@ xfs_attri_item_recover(
 			attr->xattri_dela_state = xfs_attr_init_replace_state(args);
 		else
 			attr->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
-		if (!xfs_inode_hasattr(args->dp))
-			goto out;
 		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	default:
 		ASSERT(0);
 		error = -EFSCORRUPTED;
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (12 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 13/29] xfs: don't leak recovered attri intent items Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-21  8:39   ` Fedor Pchelkin
  2025-03-13 20:25 ` [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover Leah Rumancik
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Catherine Hoang,
	Greg Kroah-Hartman, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 ]

[ 6.1: resovled conflict in xfs_defer.c ]

One thing I never quite got around to doing is porting the log intent
item recovery code to reconstruct the deferred pending work state.  As a
result, each intent item open codes xfs_defer_finish_one in its recovery
method, because that's what the EFI code did before xfs_defer.c even
existed.

This is a gross thing to have left unfixed -- if an EFI cannot proceed
due to busy extents, we end up creating separate new EFIs for each
unfinished work item, which is a change in behavior from what runtime
would have done.

Worse yet, Long Li pointed out that there's a UAF in the recovery code.
The ->commit_pass2 function adds the intent item to the AIL and drops
the refcount.  The one remaining refcount is now owned by the recovery
mechanism (aka the log intent items in the AIL) with the intent of
giving the refcount to the intent done item in the ->iop_recover
function.

However, if something fails later in recovery, xlog_recover_finish will
walk the recovered intent items in the AIL and release them.  If the CIL
hasn't been pushed before that point (which is possible since we don't
force the log until later) then the intent done release will try to free
its associated intent, which has already been freed.

This patch starts to address this mess by having the ->commit_pass2
functions recreate the xfs_defer_pending state.  The next few patches
will fix the recovery functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_defer.c       | 103 +++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_defer.h       |   5 ++
 fs/xfs/libxfs/xfs_log_recover.h |   3 +
 fs/xfs/xfs_attr_item.c          |  10 +--
 fs/xfs/xfs_bmap_item.c          |   9 +--
 fs/xfs/xfs_extfree_item.c       |   9 +--
 fs/xfs/xfs_log.c                |   1 +
 fs/xfs/xfs_log_priv.h           |   1 +
 fs/xfs/xfs_log_recover.c        | 113 ++++++++++++++++----------------
 fs/xfs/xfs_refcount_item.c      |   9 +--
 fs/xfs/xfs_rmap_item.c          |   9 +--
 11 files changed, 157 insertions(+), 115 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 92470ed3fcbd..64005ea1e8af 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -243,37 +243,66 @@ xfs_defer_create_intents(
 		ret |= ret2;
 	}
 	return ret;
 }
 
-STATIC void
+static inline void
 xfs_defer_pending_abort(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp)
+{
+	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
+
+	trace_xfs_defer_pending_abort(mp, dfp);
+
+	if (dfp->dfp_intent && !dfp->dfp_done) {
+		ops->abort_intent(dfp->dfp_intent);
+		dfp->dfp_intent = NULL;
+	}
+}
+
+static inline void
+xfs_defer_pending_cancel_work(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp)
+{
+	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
+	struct list_head		*pwi;
+	struct list_head		*n;
+
+	trace_xfs_defer_cancel_list(mp, dfp);
+
+	list_del(&dfp->dfp_list);
+	list_for_each_safe(pwi, n, &dfp->dfp_work) {
+		list_del(pwi);
+		dfp->dfp_count--;
+		ops->cancel_item(pwi);
+	}
+	ASSERT(dfp->dfp_count == 0);
+	kmem_cache_free(xfs_defer_pending_cache, dfp);
+}
+
+STATIC void
+xfs_defer_pending_abort_list(
 	struct xfs_mount		*mp,
 	struct list_head		*dop_list)
 {
 	struct xfs_defer_pending	*dfp;
-	const struct xfs_defer_op_type	*ops;
 
 	/* Abort intent items that don't have a done item. */
-	list_for_each_entry(dfp, dop_list, dfp_list) {
-		ops = defer_op_types[dfp->dfp_type];
-		trace_xfs_defer_pending_abort(mp, dfp);
-		if (dfp->dfp_intent && !dfp->dfp_done) {
-			ops->abort_intent(dfp->dfp_intent);
-			dfp->dfp_intent = NULL;
-		}
-	}
+	list_for_each_entry(dfp, dop_list, dfp_list)
+		xfs_defer_pending_abort(mp, dfp);
 }
 
 /* Abort all the intents that were committed. */
 STATIC void
 xfs_defer_trans_abort(
 	struct xfs_trans		*tp,
 	struct list_head		*dop_pending)
 {
 	trace_xfs_defer_trans_abort(tp, _RET_IP_);
-	xfs_defer_pending_abort(tp->t_mountp, dop_pending);
+	xfs_defer_pending_abort_list(tp->t_mountp, dop_pending);
 }
 
 /*
  * Capture resources that the caller said not to release ("held") when the
  * transaction commits.  Caller is responsible for zero-initializing @dres.
@@ -387,30 +416,17 @@ xfs_defer_cancel_list(
 	struct xfs_mount		*mp,
 	struct list_head		*dop_list)
 {
 	struct xfs_defer_pending	*dfp;
 	struct xfs_defer_pending	*pli;
-	struct list_head		*pwi;
-	struct list_head		*n;
-	const struct xfs_defer_op_type	*ops;
 
 	/*
 	 * Free the pending items.  Caller should already have arranged
 	 * for the intent items to be released.
 	 */
-	list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) {
-		ops = defer_op_types[dfp->dfp_type];
-		trace_xfs_defer_cancel_list(mp, dfp);
-		list_del(&dfp->dfp_list);
-		list_for_each_safe(pwi, n, &dfp->dfp_work) {
-			list_del(pwi);
-			dfp->dfp_count--;
-			ops->cancel_item(pwi);
-		}
-		ASSERT(dfp->dfp_count == 0);
-		kmem_cache_free(xfs_defer_pending_cache, dfp);
-	}
+	list_for_each_entry_safe(dfp, pli, dop_list, dfp_list)
+		xfs_defer_pending_cancel_work(mp, dfp);
 }
 
 /*
  * Prevent a log intent item from pinning the tail of the log by logging a
  * done item to release the intent item; and then log a new intent item.
@@ -661,10 +677,43 @@ xfs_defer_add(
 
 	list_add_tail(li, &dfp->dfp_work);
 	dfp->dfp_count++;
 }
 
+/*
+ * Create a pending deferred work item to replay the recovered intent item
+ * and add it to the list.
+ */
+void
+xfs_defer_start_recovery(
+	struct xfs_log_item		*lip,
+	enum xfs_defer_ops_type		dfp_type,
+	struct list_head		*r_dfops)
+{
+	struct xfs_defer_pending	*dfp;
+
+	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
+			GFP_NOFS | __GFP_NOFAIL);
+	dfp->dfp_type = dfp_type;
+	dfp->dfp_intent = lip;
+	INIT_LIST_HEAD(&dfp->dfp_work);
+	list_add_tail(&dfp->dfp_list, r_dfops);
+}
+
+/*
+ * Cancel a deferred work item created to recover a log intent item.  @dfp
+ * will be freed after this function returns.
+ */
+void
+xfs_defer_cancel_recovery(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp)
+{
+	xfs_defer_pending_abort(mp, dfp);
+	xfs_defer_pending_cancel_work(mp, dfp);
+}
+
 /*
  * Move deferred ops from one transaction to another and reset the source to
  * initial state. This is primarily used to carry state forward across
  * transaction rolls with pending dfops.
  */
@@ -765,11 +814,11 @@ xfs_defer_ops_capture_abort(
 	struct xfs_mount		*mp,
 	struct xfs_defer_capture	*dfc)
 {
 	unsigned short			i;
 
-	xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
+	xfs_defer_pending_abort_list(mp, &dfc->dfc_dfops);
 	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
 
 	for (i = 0; i < dfc->dfc_held.dr_bufs; i++)
 		xfs_buf_relse(dfc->dfc_held.dr_bp[i]);
 
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 8788ad5f6a73..5dce938ba3d5 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -123,9 +123,14 @@ void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
 		struct xfs_defer_resources *dres);
 void xfs_defer_ops_capture_abort(struct xfs_mount *mp,
 		struct xfs_defer_capture *d);
 void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
 
+void xfs_defer_start_recovery(struct xfs_log_item *lip,
+		enum xfs_defer_ops_type dfp_type, struct list_head *r_dfops);
+void xfs_defer_cancel_recovery(struct xfs_mount *mp,
+		struct xfs_defer_pending *dfp);
+
 int __init xfs_defer_init_item_caches(void);
 void xfs_defer_destroy_item_caches(void);
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index a5100a11faf9..271a4ce7375c 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -151,6 +151,9 @@ xlog_recover_resv(const struct xfs_trans_res *r)
 	};
 
 	return ret;
 }
 
+void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
+		xfs_lsn_t lsn, unsigned int dfp_type);
+
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 11e88a76a33c..a32716b8cbbd 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -770,18 +770,12 @@ xlog_recover_attri_commit_pass2(
 			attri_formatp->alfi_value_len);
 
 	attrip = xfs_attri_init(mp, nv);
 	memcpy(&attrip->attri_format, attri_formatp, len);
 
-	/*
-	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
-	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
-	 * directly and drop the ATTRI reference. Note that
-	 * xfs_trans_ail_update() drops the AIL lock.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
-	xfs_attri_release(attrip);
+	xlog_recover_intent_item(log, &attrip->attri_item, lsn,
+			XFS_DEFER_OPS_TYPE_ATTR);
 	xfs_attri_log_nameval_put(nv);
 	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 1058603db3ac..8d08252e1953 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -644,16 +644,13 @@ xlog_recover_bui_commit_pass2(
 	}
 
 	buip = xfs_bui_init(mp);
 	xfs_bui_copy_format(&buip->bui_format, bui_formatp);
 	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &buip->bui_item, lsn);
-	xfs_bui_release(buip);
+
+	xlog_recover_intent_item(log, &buip->bui_item, lsn,
+			XFS_DEFER_OPS_TYPE_BMAP);
 	return 0;
 }
 
 const struct xlog_recover_item_ops xlog_bui_item_ops = {
 	.item_type		= XFS_LI_BUI,
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 3ed25c352269..fd9fe51bcc31 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -734,16 +734,13 @@ xlog_recover_efi_commit_pass2(
 	if (error) {
 		xfs_efi_item_free(efip);
 		return error;
 	}
 	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &efip->efi_item, lsn);
-	xfs_efi_release(efip);
+
+	xlog_recover_intent_item(log, &efip->efi_item, lsn,
+			XFS_DEFER_OPS_TYPE_FREE);
 	return 0;
 }
 
 const struct xlog_recover_item_ops xlog_efi_item_ops = {
 	.item_type		= XFS_LI_EFI,
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index ce6b303484cf..d39ee05ac1f2 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1538,10 +1538,11 @@ xlog_alloc_log(
 	log->l_logBBstart  = blk_offset;
 	log->l_logBBsize   = num_bblks;
 	log->l_covered_state = XLOG_STATE_COVER_IDLE;
 	set_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate);
 	INIT_DELAYED_WORK(&log->l_work, xfs_log_worker);
+	INIT_LIST_HEAD(&log->r_dfops);
 
 	log->l_prev_block  = -1;
 	/* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */
 	xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0);
 	xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 1bd2963e8fbd..8677ba92d317 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -401,10 +401,11 @@ struct xlog {
 	struct workqueue_struct	*l_ioend_workqueue; /* for I/O completions */
 	struct delayed_work	l_work;		/* background flush work */
 	long			l_opstate;	/* operational state */
 	uint			l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */
 	struct list_head	*l_buf_cancel_table;
+	struct list_head	r_dfops;	/* recovered log intent items */
 	int			l_iclog_hsize;  /* size of iclog header */
 	int			l_iclog_heads;  /* # of iclog header sectors */
 	uint			l_sectBBsize;   /* sector size in BBs (2^n) */
 	int			l_iclog_size;	/* size of log in bytes */
 	int			l_iclog_bufs;	/* number of iclog buffers */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e009bb23d8a2..65041ed7833d 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1721,34 +1721,28 @@ xlog_clear_stale_blocks(
  * Release the recovered intent item in the AIL that matches the given intent
  * type and intent id.
  */
 void
 xlog_recover_release_intent(
-	struct xlog		*log,
-	unsigned short		intent_type,
-	uint64_t		intent_id)
+	struct xlog			*log,
+	unsigned short			intent_type,
+	uint64_t			intent_id)
 {
-	struct xfs_ail_cursor	cur;
-	struct xfs_log_item	*lip;
-	struct xfs_ail		*ailp = log->l_ailp;
+	struct xfs_defer_pending	*dfp, *n;
+
+	list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
+		struct xfs_log_item	*lip = dfp->dfp_intent;
 
-	spin_lock(&ailp->ail_lock);
-	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip != NULL;
-	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
 		if (lip->li_type != intent_type)
 			continue;
 		if (!lip->li_ops->iop_match(lip, intent_id))
 			continue;
 
-		spin_unlock(&ailp->ail_lock);
-		lip->li_ops->iop_release(lip);
-		spin_lock(&ailp->ail_lock);
-		break;
-	}
+		ASSERT(xlog_item_is_intent(lip));
 
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
+		xfs_defer_cancel_recovery(log->l_mp, dfp);
+	}
 }
 
 int
 xlog_recover_iget(
 	struct xfs_mount	*mp,
@@ -1937,10 +1931,33 @@ xlog_buf_readahead(
 {
 	if (!xlog_is_buffer_cancelled(log, blkno, len))
 		xfs_buf_readahead(log->l_mp->m_ddev_targp, blkno, len, ops);
 }
 
+/*
+ * Create a deferred work structure for resuming and tracking the progress of a
+ * log intent item that was found during recovery.
+ */
+void
+xlog_recover_intent_item(
+	struct xlog			*log,
+	struct xfs_log_item		*lip,
+	xfs_lsn_t			lsn,
+	unsigned int			dfp_type)
+{
+	ASSERT(xlog_item_is_intent(lip));
+
+	xfs_defer_start_recovery(lip, dfp_type, &log->r_dfops);
+
+	/*
+	 * Insert the intent into the AIL directly and drop one reference so
+	 * that finishing or canceling the work will drop the other.
+	 */
+	xfs_trans_ail_insert(log->l_ailp, lip, lsn);
+	lip->li_ops->iop_unpin(lip, 0);
+}
+
 STATIC int
 xlog_recover_items_pass2(
 	struct xlog                     *log,
 	struct xlog_recover             *trans,
 	struct list_head                *buffer_list,
@@ -2534,104 +2551,88 @@ xlog_abort_defer_ops(
  * have started recovery on all the pending intents when we find an non-intent
  * item in the AIL.
  */
 STATIC int
 xlog_recover_process_intents(
-	struct xlog		*log)
+	struct xlog			*log)
 {
 	LIST_HEAD(capture_list);
-	struct xfs_ail_cursor	cur;
-	struct xfs_log_item	*lip;
-	struct xfs_ail		*ailp;
-	int			error = 0;
+	struct xfs_defer_pending	*dfp, *n;
+	int				error = 0;
 #if defined(DEBUG) || defined(XFS_WARN)
-	xfs_lsn_t		last_lsn;
-#endif
+	xfs_lsn_t			last_lsn;
 
-	ailp = log->l_ailp;
-	spin_lock(&ailp->ail_lock);
-#if defined(DEBUG) || defined(XFS_WARN)
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
 #endif
-	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	     lip != NULL;
-	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
-		const struct xfs_item_ops	*ops;
 
-		if (!xlog_item_is_intent(lip))
-			break;
+	list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
+		struct xfs_log_item	*lip = dfp->dfp_intent;
+		const struct xfs_item_ops *ops = lip->li_ops;
+
+		ASSERT(xlog_item_is_intent(lip));
 
 		/*
 		 * We should never see a redo item with a LSN higher than
 		 * the last transaction we found in the log at the start
 		 * of recovery.
 		 */
 		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
 
 		/*
 		 * NOTE: If your intent processing routine can create more
 		 * deferred ops, you /must/ attach them to the capture list in
 		 * the recover routine or else those subsequent intents will be
 		 * replayed in the wrong order!
 		 *
 		 * The recovery function can free the log item, so we must not
 		 * access lip after it returns.
 		 */
-		spin_unlock(&ailp->ail_lock);
-		ops = lip->li_ops;
 		error = ops->iop_recover(lip, &capture_list);
-		spin_lock(&ailp->ail_lock);
 		if (error) {
 			trace_xlog_intent_recovery_failed(log->l_mp, error,
 					ops->iop_recover);
 			break;
 		}
-	}
 
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
+		/*
+		 * XXX: @lip could have been freed, so detach the log item from
+		 * the pending item before freeing the pending item.  This does
+		 * not fix the existing UAF bug that occurs if ->iop_recover
+		 * fails after creating the intent done item.
+		 */
+		dfp->dfp_intent = NULL;
+		xfs_defer_cancel_recovery(log->l_mp, dfp);
+	}
 	if (error)
 		goto err;
 
 	error = xlog_finish_defer_ops(log->l_mp, &capture_list);
 	if (error)
 		goto err;
 
 	return 0;
 err:
 	xlog_abort_defer_ops(log->l_mp, &capture_list);
 	return error;
 }
 
 /*
  * A cancel occurs when the mount has failed and we're bailing out.  Release all
  * pending log intent items that we haven't started recovery on so they don't
  * pin the AIL.
  */
 STATIC void
 xlog_recover_cancel_intents(
-	struct xlog		*log)
+	struct xlog			*log)
 {
-	struct xfs_log_item	*lip;
-	struct xfs_ail_cursor	cur;
-	struct xfs_ail		*ailp;
-
-	ailp = log->l_ailp;
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (!xlog_item_is_intent(lip))
-			break;
+	struct xfs_defer_pending	*dfp, *n;
 
-		spin_unlock(&ailp->ail_lock);
-		lip->li_ops->iop_release(lip);
-		spin_lock(&ailp->ail_lock);
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
-	}
+	list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
+		ASSERT(xlog_item_is_intent(dfp->dfp_intent));
 
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
+		xfs_defer_cancel_recovery(log->l_mp, dfp);
+	}
 }
 
 /*
  * This routine performs a transaction to null out a bad inode pointer
  * in an agi unlinked inode hash bucket.
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index dfd7b824e32b..1e047107d2f2 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -666,16 +666,13 @@ xlog_recover_cui_commit_pass2(
 	}
 
 	cuip = xfs_cui_init(mp, cui_formatp->cui_nextents);
 	xfs_cui_copy_format(&cuip->cui_format, cui_formatp);
 	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
-	xfs_cui_release(cuip);
+
+	xlog_recover_intent_item(log, &cuip->cui_item, lsn,
+			XFS_DEFER_OPS_TYPE_REFCOUNT);
 	return 0;
 }
 
 const struct xlog_recover_item_ops xlog_cui_item_ops = {
 	.item_type		= XFS_LI_CUI,
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 2043cea261c0..12ae8ab6a69d 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -680,16 +680,13 @@ xlog_recover_rui_commit_pass2(
 	}
 
 	ruip = xfs_rui_init(mp, rui_formatp->rui_nextents);
 	xfs_rui_copy_format(&ruip->rui_format, rui_formatp);
 	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &ruip->rui_item, lsn);
-	xfs_rui_release(ruip);
+
+	xlog_recover_intent_item(log, &ruip->rui_item, lsn,
+			XFS_DEFER_OPS_TYPE_RMAP);
 	return 0;
 }
 
 const struct xlog_recover_item_ops xlog_rui_item_ops = {
 	.item_type		= XFS_LI_RUI,
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-13 20:25 ` [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover " Leah Rumancik
@ 2025-03-21  8:39   ` Fedor Pchelkin
  2025-03-21 17:42     ` Leah Rumancik
  0 siblings, 1 reply; 45+ messages in thread
From: Fedor Pchelkin @ 2025-03-21  8:39 UTC (permalink / raw)
  To: Leah Rumancik
  Cc: stable, xfs-stable, Darrick J. Wong, Christoph Hellwig,
	Catherine Hoang, Greg Kroah-Hartman, lvc-project

Hi,

Leah Rumancik wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> [ Upstream commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 ]

Commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 leads to a kernel crash
during the xfs/235 test execution on 6.1.

Bisecting this on the mainline shows it is resolved by

e5f1a5146ec3 ("xfs: use xfs_defer_finish_one to finish recovered work items")

which was a part of the same patchset [1] with intent recovery changes but
is not included into the current 6.1 backport-series.

[1]: https://lore.kernel.org/linux-xfs/170120318847.13206.17051442307252477333.stgit@frogsfrogsfrogs/



$ cat local.config
export TEST_DEV=/dev/loop0
export TEST_DIR=/mnt/test
export SCRATCH_DEV=/dev/loop1
export SCRATCH_MNT=/mnt/scratch
export SCRATCH_LOGDEV=/dev/loop2
export SCRATCH_RTDEV=/dev/loop3
export MKFS_OPTIONS="-m rmapbt=1"
$ ./check xfs/235
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 pserver 6.1.131+ #5 SMP PREEMPT_DYNAMIC Fri Mar 21 00:23:33 2025
MKFS_OPTIONS  -- -f -m rmapbt=1 /dev/loop1
MOUNT_OPTIONS -- -o context=unconfined_u:object_r:root_t:s0 /dev/loop1 /mnt/scratch

xfs/235 6s ...


Kernel log taken with "xfs: use xfs_defer_pending objects to recover intent items"
as HEAD commit in stable 6.1-queue

[   24.491106] loop0: detected capacity change from 0 to 2097152
[   24.497086] loop1: detected capacity change from 0 to 2097152
[   24.502562] loop2: detected capacity change from 0 to 2097152
[   24.507256] loop3: detected capacity change from 0 to 2097152
[   28.801611] XFS (loop0): Mounting V5 Filesystem
[   28.857208] XFS (loop0): Ending clean mount
[   32.056570] XFS (loop1): Mounting V5 Filesystem
[   32.062298] XFS (loop1): Ending clean mount
[   32.068974] XFS (loop1): Unmounting Filesystem
[   32.129230] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
[   32.242946] XFS (loop0): Unmounting Filesystem
[   32.312170] XFS (loop0): Mounting V5 Filesystem
[   32.317663] XFS (loop0): Ending clean mount
[   32.362570] run fstests xfs/235 at 2025-03-21 08:06:16
[   33.010312] XFS (loop1): Mounting V5 Filesystem
[   33.016581] XFS (loop1): Ending clean mount
[   33.047697] XFS (loop1): Unmounting Filesystem
[   33.124483] XFS (loop1): Mounting V5 Filesystem
[   33.130343] XFS (loop1): Ending clean mount
[   33.200269] cp (1897) used greatest stack depth: 22704 bytes left
[   33.229054] XFS (loop1): Unmounting Filesystem
[   33.348374] XFS (loop1): Mounting V5 Filesystem
[   33.352033] XFS (loop1): Ending clean mount
[   33.363401] XFS (loop1): Metadata CRC error detected at xfs_rmapbt_read_verify+0x28/0x240, xfs_rmapbt block 0x28 
[   33.364028] XFS (loop1): Unmount and run xfs_repair
[   33.364186] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   33.364387] 00000000: 52 4d 42 33 00 00 00 0b ff ff ff ff ff ff ff ff  RMB3............
[   33.364642] 00000010: 00 00 00 00 00 00 00 28 00 00 00 01 00 00 00 02  .......(........
[   33.364881] 00000020: 85 c5 70 96 4c 65 44 11 bf 35 27 35 35 35 ec 19  ..p.LeD..5'555..
[   33.365173] 00000030: 00 00 00 00 16 08 c6 49 00 00 00 00 00 00 00 01  .......I........
[   33.365484] 00000040: ff ff ff ff ff ff ff fd 00 00 00 00 00 00 00 00  ................
[   33.365854] 00000050: 00 00 00 01 00 00 00 02 ff ff ff ff ff ff ff fb  ................
[   33.366204] 00000060: 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 02  ................
[   33.366435] 00000070: ff ff ff ff ff ff ff fa 00 00 00 00 00 00 00 00  ................
[   33.366694] XFS (loop1): metadata I/O error in "xfs_btree_read_buf_block.constprop.0+0x1ae/0x360" at daddr 0x28 len 8 error 74
[   33.374164] XFS (loop1): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0xcc3/0x1c10 (fs/xfs/libxfs/xfs_defer.c:596).  Shutting down filesys.
[   33.374863] XFS (loop1): Please unmount the filesystem and rectify the problem(s)
[   33.389190] XFS (loop1): Unmounting Filesystem
[   33.422139] XFS (loop1): Mounting V5 Filesystem
[   33.426543] XFS (loop1): Starting recovery (logdev: internal)
[   33.428526] XFS (loop1): Metadata CRC error detected at xfs_rmapbt_read_verify+0x28/0x240, xfs_rmapbt block 0x28 
[   33.428923] XFS (loop1): Unmount and run xfs_repair
[   33.429147] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   33.429378] 00000000: 52 4d 42 33 00 00 00 0b ff ff ff ff ff ff ff ff  RMB3............
[   33.429788] 00000010: 00 00 00 00 00 00 00 28 00 00 00 01 00 00 00 02  .......(........
[   33.430087] 00000020: 85 c5 70 96 4c 65 44 11 bf 35 27 35 35 35 ec 19  ..p.LeD..5'555..
[   33.430442] 00000030: 00 00 00 00 16 08 c6 49 00 00 00 00 00 00 00 01  .......I........
[   33.430733] 00000040: ff ff ff ff ff ff ff fd 00 00 00 00 00 00 00 00  ................
[   33.431047] 00000050: 00 00 00 01 00 00 00 02 ff ff ff ff ff ff ff fb  ................
[   33.431350] 00000060: 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 02  ................
[   33.431616] 00000070: ff ff ff ff ff ff ff fa 00 00 00 00 00 00 00 00  ................
[   33.431858] XFS (loop1): metadata I/O error in "xfs_btree_read_buf_block.constprop.0+0x1ae/0x360" at daddr 0x28 len 8 error 74
[   33.432279] 00000000: 87 00 00 00 00 00 00 00 98 00 00 00 00 00 00 00  ................
[   33.432553] 00000010: 00 00 00 00 00 00 00 00 70 00 00 00 01 00 00 20  ........p...... 
[   33.432883] XFS (loop1): Internal error xfs_rui_item_recover at line 573 of file fs/xfs/xfs_rmap_item.c.  Caller xlog_recover_process_intents+0x221/0xbc0
[   33.433397] CPU: 0 PID: 2107 Comm: mount Not tainted 6.1.131+ #5
[   33.433607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[   33.433893] Call Trace:
[   33.433985]  <TASK>
[   33.434065]  dump_stack_lvl+0x91/0xbe
[   33.434204]  xfs_corruption_error+0x139/0x160
[   33.434540]  xfs_rui_item_recover+0x8b2/0xb70
[   33.435985]  xlog_recover_process_intents+0x221/0xbc0
[   33.437896]  xlog_recover_finish+0x8b/0x9f0
[   33.439417]  xfs_log_mount_finish+0x386/0x650
[   33.439565]  xfs_mountfs+0x125c/0x1d90
[   33.440407]  xfs_fs_fill_super+0x1327/0x1ea0
[   33.440547]  get_tree_bdev+0x42c/0x740
[   33.440850]  vfs_get_tree+0x8e/0x2f0
[   33.440969]  path_mount+0x137f/0x1ec0
[   33.441561]  __x64_sys_mount+0x286/0x310
[   33.442210]  do_syscall_64+0x39/0x90
[   33.442340]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   33.444714]  </TASK>
[   33.444947] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[   33.445212] XFS (loop1): Internal error xfs_trans_cancel at line 1096 of file fs/xfs/xfs_trans.c.  Caller xfs_rui_item_recover+0x8de/0xb70
[   33.445667] CPU: 0 PID: 2107 Comm: mount Not tainted 6.1.131+ #5
[   33.445905] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[   33.446225] Call Trace:
[   33.446318]  <TASK>
[   33.446406]  dump_stack_lvl+0x91/0xbe
[   33.446552]  xfs_error_report+0xb7/0xc0
[   33.446875]  xfs_trans_cancel+0x512/0x6d0
[   33.447025]  xfs_rui_item_recover+0x8de/0xb70
[   33.448066]  xlog_recover_process_intents+0x221/0xbc0
[   33.449553]  xlog_recover_finish+0x8b/0x9f0
[   33.450811]  xfs_log_mount_finish+0x386/0x650
[   33.450971]  xfs_mountfs+0x125c/0x1d90
[   33.451866]  xfs_fs_fill_super+0x1327/0x1ea0
[   33.452028]  get_tree_bdev+0x42c/0x740
[   33.452307]  vfs_get_tree+0x8e/0x2f0
[   33.452434]  path_mount+0x137f/0x1ec0
[   33.453011]  __x64_sys_mount+0x286/0x310
[   33.453658]  do_syscall_64+0x39/0x90
[   33.453787]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   33.456137]  </TASK>
[   33.456250] XFS (loop1): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x52b/0x6d0 (fs/xfs/xfs_trans.c:1097).  Shutting down filesystem.
[   33.456734] XFS (loop1): Please unmount the filesystem and rectify the problem(s)
[   33.457005] ==================================================================


After there goes KASAN [decoded].

BUG: KASAN: use-after-free in xlog_item_is_intent (fs/xfs/xfs_trans.h:103) 
BUG: KASAN: use-after-free in xlog_recover_cancel_intents (fs/xfs/xfs_log_recover.c:2630) 
Read of size 8 at addr ffff88802c268390 by task mount/2107

Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:107) 
print_report (mm/kasan/report.c:317 mm/kasan/report.c:427) 
kasan_report (mm/kasan/report.c:194 mm/kasan/report.c:533) 
xlog_item_is_intent (fs/xfs/xfs_trans.h:103 inline)
xlog_recover_cancel_intents (fs/xfs/xfs_log_recover.c:2630) 
xlog_recover_finish (fs/xfs/xfs_log_recover.c:3468) 
xfs_log_mount_finish (fs/xfs/xfs_log.c:796) 
xfs_mountfs (fs/xfs/xfs_mount.c:934) 
xfs_fs_fill_super (fs/xfs/xfs_super.c:1682) 
get_tree_bdev (fs/super.c:1367) 
vfs_get_tree (fs/super.c:1574) 
path_mount (fs/namespace.c:3386) 
__x64_sys_mount (fs/namespace.c:3584) 
do_syscall_64 (arch/x86/entry/common.c:81) 
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) 
 </TASK>

Allocated by task 2107:
kasan_save_stack (mm/kasan/common.c:46) 
kasan_set_track (mm/kasan/common.c:52) 
__kasan_slab_alloc (mm/kasan/common.c:328) 
kmem_cache_alloc (mm/slub.c:3422) 
xfs_rui_init (./include/linux/slab.h:689 fs/xfs/xfs_rmap_item.c:146) 
xlog_recover_rui_commit_pass2 (fs/xfs/xfs_rmap_item.c:683) 
xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1976) 
xlog_recover_commit_trans (fs/xfs/xfs_log_recover.c:2043) 
xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2274) 
xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2420) 
xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2467) 
xlog_recover_process (fs/xfs/xfs_log_recover.c:2903) 
xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3199) 
xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3279) 
xlog_do_recover (fs/xfs/xfs_log_recover.c:3306) 
xlog_recover (fs/xfs/xfs_log_recover.c:3439) 
xfs_log_mount (fs/xfs/xfs_log.c:717) 
xfs_mountfs (fs/xfs/xfs_mount.c:822) 
xfs_fs_fill_super (fs/xfs/xfs_super.c:1682) 
get_tree_bdev (fs/super.c:1367) 
vfs_get_tree (fs/super.c:1574) 
path_mount (fs/namespace.c:3057) 
__x64_sys_mount (fs/namespace.c:3584) 
do_syscall_64 (arch/x86/entry/common.c:81) 
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) 

Freed by task 2107:
kasan_save_stack (mm/kasan/common.c:46) 
kasan_set_track (mm/kasan/common.c:52) 
kasan_save_free_info (mm/kasan/generic.c:518) 
____kasan_slab_free (mm/kasan/common.c:200) 
kmem_cache_free (mm/slub.c:3683) 
xfs_rui_item_free (fs/xfs/xfs_rmap_item.c:43) 
xfs_rui_release (fs/xfs/xfs_rmap_item.c:61) 
xfs_rud_item_release (fs/xfs/xfs_rmap_item.c:207) 
xfs_trans_free_items (fs/xfs/xfs_trans.c:711) 
xfs_trans_cancel (fs/xfs/xfs_trans.c:1117) 
xfs_rui_item_recover (fs/xfs/xfs_rmap_item.c:586) 
xlog_recover_process_intents (fs/xfs/xfs_log_recover.c:2590) 
xlog_recover_finish (fs/xfs/xfs_log_recover.c:3459) 
xfs_log_mount_finish (fs/xfs/xfs_log.c:796) 
xfs_mountfs (fs/xfs/xfs_mount.c:934) 
xfs_fs_fill_super (fs/xfs/xfs_super.c:1682) 
get_tree_bdev (fs/super.c:1367) 
vfs_get_tree (fs/super.c:1574) 
path_mount (fs/namespace.c:3386) 
__x64_sys_mount (fs/namespace.c:3584) 
do_syscall_64 (arch/x86/entry/common.c:81) 
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) 

The buggy address belongs to the object at ffff88802c268330
 which belongs to the cache xfs_rui_item of size 688
The buggy address is located 96 bytes inside of
 688-byte region [ffff88802c268330, ffff88802c2685e0)



I'd say the following 6.1 patches from the current backport-series

[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover
[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover

transform the crash a little bit to a NULL pointer dereference in the same
location but do not suppress the problem. It is reproducible above the
relevant top of queue/6.1 of linux-stable-rc.git


Thanks,
Fedor

> 
> [ 6.1: resovled conflict in xfs_defer.c ]
> 
> One thing I never quite got around to doing is porting the log intent
> item recovery code to reconstruct the deferred pending work state.  As a
> result, each intent item open codes xfs_defer_finish_one in its recovery
> method, because that's what the EFI code did before xfs_defer.c even
> existed.
> 
> This is a gross thing to have left unfixed -- if an EFI cannot proceed
> due to busy extents, we end up creating separate new EFIs for each
> unfinished work item, which is a change in behavior from what runtime
> would have done.
> 
> Worse yet, Long Li pointed out that there's a UAF in the recovery code.
> The ->commit_pass2 function adds the intent item to the AIL and drops
> the refcount.  The one remaining refcount is now owned by the recovery
> mechanism (aka the log intent items in the AIL) with the intent of
> giving the refcount to the intent done item in the ->iop_recover
> function.
> 
> However, if something fails later in recovery, xlog_recover_finish will
> walk the recovered intent items in the AIL and release them.  If the CIL
> hasn't been pushed before that point (which is possible since we don't
> force the log until later) then the intent done release will try to free
> its associated intent, which has already been freed.
> 
> This patch starts to address this mess by having the ->commit_pass2
> functions recreate the xfs_defer_pending state.  The next few patches
> will fix the recovery functions.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
> Acked-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> Acked-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_defer.c       | 103 +++++++++++++++++++++--------
>  fs/xfs/libxfs/xfs_defer.h       |   5 ++
>  fs/xfs/libxfs/xfs_log_recover.h |   3 +
>  fs/xfs/xfs_attr_item.c          |  10 +--
>  fs/xfs/xfs_bmap_item.c          |   9 +--
>  fs/xfs/xfs_extfree_item.c       |   9 +--
>  fs/xfs/xfs_log.c                |   1 +
>  fs/xfs/xfs_log_priv.h           |   1 +
>  fs/xfs/xfs_log_recover.c        | 113 ++++++++++++++++----------------
>  fs/xfs/xfs_refcount_item.c      |   9 +--
>  fs/xfs/xfs_rmap_item.c          |   9 +--
>  11 files changed, 157 insertions(+), 115 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 92470ed3fcbd..64005ea1e8af 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -243,37 +243,66 @@ xfs_defer_create_intents(
>  		ret |= ret2;
>  	}
>  	return ret;
>  }
>  
> -STATIC void
> +static inline void
>  xfs_defer_pending_abort(
> +	struct xfs_mount		*mp,
> +	struct xfs_defer_pending	*dfp)
> +{
> +	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
> +
> +	trace_xfs_defer_pending_abort(mp, dfp);
> +
> +	if (dfp->dfp_intent && !dfp->dfp_done) {
> +		ops->abort_intent(dfp->dfp_intent);
> +		dfp->dfp_intent = NULL;
> +	}
> +}
> +
> +static inline void
> +xfs_defer_pending_cancel_work(
> +	struct xfs_mount		*mp,
> +	struct xfs_defer_pending	*dfp)
> +{
> +	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
> +	struct list_head		*pwi;
> +	struct list_head		*n;
> +
> +	trace_xfs_defer_cancel_list(mp, dfp);
> +
> +	list_del(&dfp->dfp_list);
> +	list_for_each_safe(pwi, n, &dfp->dfp_work) {
> +		list_del(pwi);
> +		dfp->dfp_count--;
> +		ops->cancel_item(pwi);
> +	}
> +	ASSERT(dfp->dfp_count == 0);
> +	kmem_cache_free(xfs_defer_pending_cache, dfp);
> +}
> +
> +STATIC void
> +xfs_defer_pending_abort_list(
>  	struct xfs_mount		*mp,
>  	struct list_head		*dop_list)
>  {
>  	struct xfs_defer_pending	*dfp;
> -	const struct xfs_defer_op_type	*ops;
>  
>  	/* Abort intent items that don't have a done item. */
> -	list_for_each_entry(dfp, dop_list, dfp_list) {
> -		ops = defer_op_types[dfp->dfp_type];
> -		trace_xfs_defer_pending_abort(mp, dfp);
> -		if (dfp->dfp_intent && !dfp->dfp_done) {
> -			ops->abort_intent(dfp->dfp_intent);
> -			dfp->dfp_intent = NULL;
> -		}
> -	}
> +	list_for_each_entry(dfp, dop_list, dfp_list)
> +		xfs_defer_pending_abort(mp, dfp);
>  }
>  
>  /* Abort all the intents that were committed. */
>  STATIC void
>  xfs_defer_trans_abort(
>  	struct xfs_trans		*tp,
>  	struct list_head		*dop_pending)
>  {
>  	trace_xfs_defer_trans_abort(tp, _RET_IP_);
> -	xfs_defer_pending_abort(tp->t_mountp, dop_pending);
> +	xfs_defer_pending_abort_list(tp->t_mountp, dop_pending);
>  }
>  
>  /*
>   * Capture resources that the caller said not to release ("held") when the
>   * transaction commits.  Caller is responsible for zero-initializing @dres.
> @@ -387,30 +416,17 @@ xfs_defer_cancel_list(
>  	struct xfs_mount		*mp,
>  	struct list_head		*dop_list)
>  {
>  	struct xfs_defer_pending	*dfp;
>  	struct xfs_defer_pending	*pli;
> -	struct list_head		*pwi;
> -	struct list_head		*n;
> -	const struct xfs_defer_op_type	*ops;
>  
>  	/*
>  	 * Free the pending items.  Caller should already have arranged
>  	 * for the intent items to be released.
>  	 */
> -	list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) {
> -		ops = defer_op_types[dfp->dfp_type];
> -		trace_xfs_defer_cancel_list(mp, dfp);
> -		list_del(&dfp->dfp_list);
> -		list_for_each_safe(pwi, n, &dfp->dfp_work) {
> -			list_del(pwi);
> -			dfp->dfp_count--;
> -			ops->cancel_item(pwi);
> -		}
> -		ASSERT(dfp->dfp_count == 0);
> -		kmem_cache_free(xfs_defer_pending_cache, dfp);
> -	}
> +	list_for_each_entry_safe(dfp, pli, dop_list, dfp_list)
> +		xfs_defer_pending_cancel_work(mp, dfp);
>  }
>  
>  /*
>   * Prevent a log intent item from pinning the tail of the log by logging a
>   * done item to release the intent item; and then log a new intent item.
> @@ -661,10 +677,43 @@ xfs_defer_add(
>  
>  	list_add_tail(li, &dfp->dfp_work);
>  	dfp->dfp_count++;
>  }
>  
> +/*
> + * Create a pending deferred work item to replay the recovered intent item
> + * and add it to the list.
> + */
> +void
> +xfs_defer_start_recovery(
> +	struct xfs_log_item		*lip,
> +	enum xfs_defer_ops_type		dfp_type,
> +	struct list_head		*r_dfops)
> +{
> +	struct xfs_defer_pending	*dfp;
> +
> +	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> +			GFP_NOFS | __GFP_NOFAIL);
> +	dfp->dfp_type = dfp_type;
> +	dfp->dfp_intent = lip;
> +	INIT_LIST_HEAD(&dfp->dfp_work);
> +	list_add_tail(&dfp->dfp_list, r_dfops);
> +}
> +
> +/*
> + * Cancel a deferred work item created to recover a log intent item.  @dfp
> + * will be freed after this function returns.
> + */
> +void
> +xfs_defer_cancel_recovery(
> +	struct xfs_mount		*mp,
> +	struct xfs_defer_pending	*dfp)
> +{
> +	xfs_defer_pending_abort(mp, dfp);
> +	xfs_defer_pending_cancel_work(mp, dfp);
> +}
> +
>  /*
>   * Move deferred ops from one transaction to another and reset the source to
>   * initial state. This is primarily used to carry state forward across
>   * transaction rolls with pending dfops.
>   */
> @@ -765,11 +814,11 @@ xfs_defer_ops_capture_abort(
>  	struct xfs_mount		*mp,
>  	struct xfs_defer_capture	*dfc)
>  {
>  	unsigned short			i;
>  
> -	xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
> +	xfs_defer_pending_abort_list(mp, &dfc->dfc_dfops);
>  	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
>  
>  	for (i = 0; i < dfc->dfc_held.dr_bufs; i++)
>  		xfs_buf_relse(dfc->dfc_held.dr_bp[i]);
>  
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 8788ad5f6a73..5dce938ba3d5 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -123,9 +123,14 @@ void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
>  		struct xfs_defer_resources *dres);
>  void xfs_defer_ops_capture_abort(struct xfs_mount *mp,
>  		struct xfs_defer_capture *d);
>  void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
>  
> +void xfs_defer_start_recovery(struct xfs_log_item *lip,
> +		enum xfs_defer_ops_type dfp_type, struct list_head *r_dfops);
> +void xfs_defer_cancel_recovery(struct xfs_mount *mp,
> +		struct xfs_defer_pending *dfp);
> +
>  int __init xfs_defer_init_item_caches(void);
>  void xfs_defer_destroy_item_caches(void);
>  
>  #endif /* __XFS_DEFER_H__ */
> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> index a5100a11faf9..271a4ce7375c 100644
> --- a/fs/xfs/libxfs/xfs_log_recover.h
> +++ b/fs/xfs/libxfs/xfs_log_recover.h
> @@ -151,6 +151,9 @@ xlog_recover_resv(const struct xfs_trans_res *r)
>  	};
>  
>  	return ret;
>  }
>  
> +void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
> +		xfs_lsn_t lsn, unsigned int dfp_type);
> +
>  #endif	/* __XFS_LOG_RECOVER_H__ */
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 11e88a76a33c..a32716b8cbbd 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -770,18 +770,12 @@ xlog_recover_attri_commit_pass2(
>  			attri_formatp->alfi_value_len);
>  
>  	attrip = xfs_attri_init(mp, nv);
>  	memcpy(&attrip->attri_format, attri_formatp, len);
>  
> -	/*
> -	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
> -	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
> -	 * directly and drop the ATTRI reference. Note that
> -	 * xfs_trans_ail_update() drops the AIL lock.
> -	 */
> -	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
> -	xfs_attri_release(attrip);
> +	xlog_recover_intent_item(log, &attrip->attri_item, lsn,
> +			XFS_DEFER_OPS_TYPE_ATTR);
>  	xfs_attri_log_nameval_put(nv);
>  	return 0;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 1058603db3ac..8d08252e1953 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -644,16 +644,13 @@ xlog_recover_bui_commit_pass2(
>  	}
>  
>  	buip = xfs_bui_init(mp);
>  	xfs_bui_copy_format(&buip->bui_format, bui_formatp);
>  	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
> -	/*
> -	 * Insert the intent into the AIL directly and drop one reference so
> -	 * that finishing or canceling the work will drop the other.
> -	 */
> -	xfs_trans_ail_insert(log->l_ailp, &buip->bui_item, lsn);
> -	xfs_bui_release(buip);
> +
> +	xlog_recover_intent_item(log, &buip->bui_item, lsn,
> +			XFS_DEFER_OPS_TYPE_BMAP);
>  	return 0;
>  }
>  
>  const struct xlog_recover_item_ops xlog_bui_item_ops = {
>  	.item_type		= XFS_LI_BUI,
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 3ed25c352269..fd9fe51bcc31 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -734,16 +734,13 @@ xlog_recover_efi_commit_pass2(
>  	if (error) {
>  		xfs_efi_item_free(efip);
>  		return error;
>  	}
>  	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
> -	/*
> -	 * Insert the intent into the AIL directly and drop one reference so
> -	 * that finishing or canceling the work will drop the other.
> -	 */
> -	xfs_trans_ail_insert(log->l_ailp, &efip->efi_item, lsn);
> -	xfs_efi_release(efip);
> +
> +	xlog_recover_intent_item(log, &efip->efi_item, lsn,
> +			XFS_DEFER_OPS_TYPE_FREE);
>  	return 0;
>  }
>  
>  const struct xlog_recover_item_ops xlog_efi_item_ops = {
>  	.item_type		= XFS_LI_EFI,
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index ce6b303484cf..d39ee05ac1f2 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1538,10 +1538,11 @@ xlog_alloc_log(
>  	log->l_logBBstart  = blk_offset;
>  	log->l_logBBsize   = num_bblks;
>  	log->l_covered_state = XLOG_STATE_COVER_IDLE;
>  	set_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate);
>  	INIT_DELAYED_WORK(&log->l_work, xfs_log_worker);
> +	INIT_LIST_HEAD(&log->r_dfops);
>  
>  	log->l_prev_block  = -1;
>  	/* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */
>  	xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0);
>  	xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0);
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 1bd2963e8fbd..8677ba92d317 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -401,10 +401,11 @@ struct xlog {
>  	struct workqueue_struct	*l_ioend_workqueue; /* for I/O completions */
>  	struct delayed_work	l_work;		/* background flush work */
>  	long			l_opstate;	/* operational state */
>  	uint			l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */
>  	struct list_head	*l_buf_cancel_table;
> +	struct list_head	r_dfops;	/* recovered log intent items */
>  	int			l_iclog_hsize;  /* size of iclog header */
>  	int			l_iclog_heads;  /* # of iclog header sectors */
>  	uint			l_sectBBsize;   /* sector size in BBs (2^n) */
>  	int			l_iclog_size;	/* size of log in bytes */
>  	int			l_iclog_bufs;	/* number of iclog buffers */
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e009bb23d8a2..65041ed7833d 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1721,34 +1721,28 @@ xlog_clear_stale_blocks(
>   * Release the recovered intent item in the AIL that matches the given intent
>   * type and intent id.
>   */
>  void
>  xlog_recover_release_intent(
> -	struct xlog		*log,
> -	unsigned short		intent_type,
> -	uint64_t		intent_id)
> +	struct xlog			*log,
> +	unsigned short			intent_type,
> +	uint64_t			intent_id)
>  {
> -	struct xfs_ail_cursor	cur;
> -	struct xfs_log_item	*lip;
> -	struct xfs_ail		*ailp = log->l_ailp;
> +	struct xfs_defer_pending	*dfp, *n;
> +
> +	list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
> +		struct xfs_log_item	*lip = dfp->dfp_intent;
>  
> -	spin_lock(&ailp->ail_lock);
> -	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip != NULL;
> -	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
>  		if (lip->li_type != intent_type)
>  			continue;
>  		if (!lip->li_ops->iop_match(lip, intent_id))
>  			continue;
>  
> -		spin_unlock(&ailp->ail_lock);
> -		lip->li_ops->iop_release(lip);
> -		spin_lock(&ailp->ail_lock);
> -		break;
> -	}
> +		ASSERT(xlog_item_is_intent(lip));
>  
> -	xfs_trans_ail_cursor_done(&cur);
> -	spin_unlock(&ailp->ail_lock);
> +		xfs_defer_cancel_recovery(log->l_mp, dfp);
> +	}
>  }
>  
>  int
>  xlog_recover_iget(
>  	struct xfs_mount	*mp,
> @@ -1937,10 +1931,33 @@ xlog_buf_readahead(
>  {
>  	if (!xlog_is_buffer_cancelled(log, blkno, len))
>  		xfs_buf_readahead(log->l_mp->m_ddev_targp, blkno, len, ops);
>  }
>  
> +/*
> + * Create a deferred work structure for resuming and tracking the progress of a
> + * log intent item that was found during recovery.
> + */
> +void
> +xlog_recover_intent_item(
> +	struct xlog			*log,
> +	struct xfs_log_item		*lip,
> +	xfs_lsn_t			lsn,
> +	unsigned int			dfp_type)
> +{
> +	ASSERT(xlog_item_is_intent(lip));
> +
> +	xfs_defer_start_recovery(lip, dfp_type, &log->r_dfops);
> +
> +	/*
> +	 * Insert the intent into the AIL directly and drop one reference so
> +	 * that finishing or canceling the work will drop the other.
> +	 */
> +	xfs_trans_ail_insert(log->l_ailp, lip, lsn);
> +	lip->li_ops->iop_unpin(lip, 0);
> +}
> +
>  STATIC int
>  xlog_recover_items_pass2(
>  	struct xlog                     *log,
>  	struct xlog_recover             *trans,
>  	struct list_head                *buffer_list,
> @@ -2534,104 +2551,88 @@ xlog_abort_defer_ops(
>   * have started recovery on all the pending intents when we find an non-intent
>   * item in the AIL.
>   */
>  STATIC int
>  xlog_recover_process_intents(
> -	struct xlog		*log)
> +	struct xlog			*log)
>  {
>  	LIST_HEAD(capture_list);
> -	struct xfs_ail_cursor	cur;
> -	struct xfs_log_item	*lip;
> -	struct xfs_ail		*ailp;
> -	int			error = 0;
> +	struct xfs_defer_pending	*dfp, *n;
> +	int				error = 0;
>  #if defined(DEBUG) || defined(XFS_WARN)
> -	xfs_lsn_t		last_lsn;
> -#endif
> +	xfs_lsn_t			last_lsn;
>  
> -	ailp = log->l_ailp;
> -	spin_lock(&ailp->ail_lock);
> -#if defined(DEBUG) || defined(XFS_WARN)
>  	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
>  #endif
> -	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> -	     lip != NULL;
> -	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
> -		const struct xfs_item_ops	*ops;
>  
> -		if (!xlog_item_is_intent(lip))
> -			break;
> +	list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
> +		struct xfs_log_item	*lip = dfp->dfp_intent;
> +		const struct xfs_item_ops *ops = lip->li_ops;
> +
> +		ASSERT(xlog_item_is_intent(lip));
>  
>  		/*
>  		 * We should never see a redo item with a LSN higher than
>  		 * the last transaction we found in the log at the start
>  		 * of recovery.
>  		 */
>  		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
>  
>  		/*
>  		 * NOTE: If your intent processing routine can create more
>  		 * deferred ops, you /must/ attach them to the capture list in
>  		 * the recover routine or else those subsequent intents will be
>  		 * replayed in the wrong order!
>  		 *
>  		 * The recovery function can free the log item, so we must not
>  		 * access lip after it returns.
>  		 */
> -		spin_unlock(&ailp->ail_lock);
> -		ops = lip->li_ops;
>  		error = ops->iop_recover(lip, &capture_list);
> -		spin_lock(&ailp->ail_lock);
>  		if (error) {
>  			trace_xlog_intent_recovery_failed(log->l_mp, error,
>  					ops->iop_recover);
>  			break;
>  		}
> -	}
>  
> -	xfs_trans_ail_cursor_done(&cur);
> -	spin_unlock(&ailp->ail_lock);
> +		/*
> +		 * XXX: @lip could have been freed, so detach the log item from
> +		 * the pending item before freeing the pending item.  This does
> +		 * not fix the existing UAF bug that occurs if ->iop_recover
> +		 * fails after creating the intent done item.
> +		 */
> +		dfp->dfp_intent = NULL;
> +		xfs_defer_cancel_recovery(log->l_mp, dfp);
> +	}
>  	if (error)
>  		goto err;
>  
>  	error = xlog_finish_defer_ops(log->l_mp, &capture_list);
>  	if (error)
>  		goto err;
>  
>  	return 0;
>  err:
>  	xlog_abort_defer_ops(log->l_mp, &capture_list);
>  	return error;
>  }
>  
>  /*
>   * A cancel occurs when the mount has failed and we're bailing out.  Release all
>   * pending log intent items that we haven't started recovery on so they don't
>   * pin the AIL.
>   */
>  STATIC void
>  xlog_recover_cancel_intents(
> -	struct xlog		*log)
> +	struct xlog			*log)
>  {
> -	struct xfs_log_item	*lip;
> -	struct xfs_ail_cursor	cur;
> -	struct xfs_ail		*ailp;
> -
> -	ailp = log->l_ailp;
> -	spin_lock(&ailp->ail_lock);
> -	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> -	while (lip != NULL) {
> -		if (!xlog_item_is_intent(lip))
> -			break;
> +	struct xfs_defer_pending	*dfp, *n;
>  
> -		spin_unlock(&ailp->ail_lock);
> -		lip->li_ops->iop_release(lip);
> -		spin_lock(&ailp->ail_lock);
> -		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> -	}
> +	list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
> +		ASSERT(xlog_item_is_intent(dfp->dfp_intent));
>  
> -	xfs_trans_ail_cursor_done(&cur);
> -	spin_unlock(&ailp->ail_lock);
> +		xfs_defer_cancel_recovery(log->l_mp, dfp);
> +	}
>  }
>  
>  /*
>   * This routine performs a transaction to null out a bad inode pointer
>   * in an agi unlinked inode hash bucket.
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index dfd7b824e32b..1e047107d2f2 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -666,16 +666,13 @@ xlog_recover_cui_commit_pass2(
>  	}
>  
>  	cuip = xfs_cui_init(mp, cui_formatp->cui_nextents);
>  	xfs_cui_copy_format(&cuip->cui_format, cui_formatp);
>  	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
> -	/*
> -	 * Insert the intent into the AIL directly and drop one reference so
> -	 * that finishing or canceling the work will drop the other.
> -	 */
> -	xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
> -	xfs_cui_release(cuip);
> +
> +	xlog_recover_intent_item(log, &cuip->cui_item, lsn,
> +			XFS_DEFER_OPS_TYPE_REFCOUNT);
>  	return 0;
>  }
>  
>  const struct xlog_recover_item_ops xlog_cui_item_ops = {
>  	.item_type		= XFS_LI_CUI,
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 2043cea261c0..12ae8ab6a69d 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -680,16 +680,13 @@ xlog_recover_rui_commit_pass2(
>  	}
>  
>  	ruip = xfs_rui_init(mp, rui_formatp->rui_nextents);
>  	xfs_rui_copy_format(&ruip->rui_format, rui_formatp);
>  	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
> -	/*
> -	 * Insert the intent into the AIL directly and drop one reference so
> -	 * that finishing or canceling the work will drop the other.
> -	 */
> -	xfs_trans_ail_insert(log->l_ailp, &ruip->rui_item, lsn);
> -	xfs_rui_release(ruip);
> +
> +	xlog_recover_intent_item(log, &ruip->rui_item, lsn,
> +			XFS_DEFER_OPS_TYPE_RMAP);
>  	return 0;
>  }
>  
>  const struct xlog_recover_item_ops xlog_rui_item_ops = {
>  	.item_type		= XFS_LI_RUI,
> -- 
> 2.49.0.rc1.451.g8f38331e32-goog

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-21  8:39   ` Fedor Pchelkin
@ 2025-03-21 17:42     ` Leah Rumancik
  2025-03-22 14:27       ` Fedor Pchelkin
  0 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-21 17:42 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: stable, xfs-stable, Darrick J. Wong, Christoph Hellwig,
	Catherine Hoang, Greg Kroah-Hartman, lvc-project

Hey Fedor,

Thanks a bunch for the report! I don't see xfs/235 running on my
setup. I will look into why and see if I can repro.

Few questions,

Were you able to confirm that e5f1a5146ec3 fixes the issue on 6.1.y?
If so, we can just port this patch, otherwise we might want to drop
the xfs set while we investigate further.

Also, the backport set you mentioned was based on a set from 6.6.y. I
don't see the suggested fix (e5f1a5146ec3) there either. If it's not
too much hassle, could you see if we have the same problem for 6.6.y
as well?

Best,
Leah

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-21 17:42     ` Leah Rumancik
@ 2025-03-22 14:27       ` Fedor Pchelkin
  2025-03-24  0:29         ` Leah Rumancik
  0 siblings, 1 reply; 45+ messages in thread
From: Fedor Pchelkin @ 2025-03-22 14:27 UTC (permalink / raw)
  To: Leah Rumancik
  Cc: stable, xfs-stable, Darrick J. Wong, Christoph Hellwig,
	Catherine Hoang, Greg Kroah-Hartman, lvc-project

On Fri, 21. Mar 10:42, Leah Rumancik wrote:
> Hey Fedor,
> 
> Thanks a bunch for the report! I don't see xfs/235 running on my
> setup. I will look into why and see if I can repro.
> 
> Few questions,
> 
> Were you able to confirm that e5f1a5146ec3 fixes the issue on 6.1.y?

Oh, it probably wasn't obvious from my initial report but the problem
concerns only 6.1.132-rc1 at the moment. It's in testing phase and hasn't
been finally released yet. So it hasn't reached 6.1.y.

Unfortunately, it's hard to port the proposed fix directly on top of
6.1.132-rc1 since it depends on a number of non-trivial changes done in
mainline. The conflicts are huge and require a solid amount of expertise
in the code and I'm not ready to do this to be honest.

Well, as I perceive, the reason to port the following patches

[PATCH 6.1 13/29] xfs: don't leak recovered attri intent items
[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover
[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover

to stable (6.1, in particular) is to fix a UAF when intent recovery fails.
This is stated in upstream patchset [1] where the patches come from.

[1]: https://lore.kernel.org/linux-xfs/170191741007.1195961.10092536809136830257.stg-ugh@frogsfrogsfrogs/

Taking as a fact that 6.1.y is vulnerable to that bug (though I failed
to find an exact blamed commit), I think that the whole series of 8
patches should be ported, otherwise it looks not complete. But only 4/8
patches have been taken to 6.1-queue and 6.6 so far.

> If so, we can just port this patch, otherwise we might want to drop
> the xfs set while we investigate further.

I'd suggest to drop the xfs set from 6.1-queue for now until the problem
is addressed in some way.

> 
> Also, the backport set you mentioned was based on a set from 6.6.y. I
> don't see the suggested fix (e5f1a5146ec3) there either. If it's not
> too much hassle, could you see if we have the same problem for 6.6.y
> as well?

Yes, the crash occurs there, too. And for 6.6 case it actually is for a
released kernel (since v6.6.24).

The remaining four patches of the original upstream series [1] - one of
which is e5f1a5146ec3 - can be applied there without many problems,
fortunately.

I'll send them to you in a separate thread.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-22 14:27       ` Fedor Pchelkin
@ 2025-03-24  0:29         ` Leah Rumancik
  2025-03-24  8:53           ` Fedor Pchelkin
  0 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-24  0:29 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: stable, xfs-stable, Darrick J. Wong, Christoph Hellwig,
	Catherine Hoang, Greg Kroah-Hartman, lvc-project

Okay so a summary from my understanding, correct me if I'm wrong:

03f7767c9f612 introduced the issue in both 6.1 and 6.6.

On mainline, this is resolved by e5f1a5146ec3. This commit is painful
to apply to 6.1 but does apply to 6.6 along with the rest of the
patchset it was a part of (which is the set you just sent out for
6.6).

With the stable branches we try to balance the risk of introducing new
bugs via huge fixes with the benefit of the fix itself. Especially if
the patches don't apply cleanly, it might not be worth the risk and
effort to do the porting. Hmm, since it seems like we might not even
end up taking 03f7767c9f6120 to stable, I'd propose we just drop
03f7767c9f6120 for now. If the rest of the subsequent patches in the
original set apply cleanly, I don't think we need to drop them all. We
can then try to fix the UAF with a more targeted approach in a later
patch instead of via direct cherry-picks.

 What do you think?

- leah

>
> >
> > Also, the backport set you mentioned was based on a set from 6.6.y. I
> > don't see the suggested fix (e5f1a5146ec3) there either. If it's not
> > too much hassle, could you see if we have the same problem for 6.6.y
> > as well?
>
> Yes, the crash occurs there, too. And for 6.6 case it actually is for a
> released kernel (since v6.6.24).
>
> The remaining four patches of the original upstream series [1] - one of
> which is e5f1a5146ec3 - can be applied there without many problems,
> fortunately.
>
> I'll send them to you in a separate thread.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-24  0:29         ` Leah Rumancik
@ 2025-03-24  8:53           ` Fedor Pchelkin
  2025-03-24 21:10             ` Leah Rumancik
  0 siblings, 1 reply; 45+ messages in thread
From: Fedor Pchelkin @ 2025-03-24  8:53 UTC (permalink / raw)
  To: Leah Rumancik
  Cc: stable, xfs-stable, Darrick J. Wong, Christoph Hellwig,
	Catherine Hoang, Greg Kroah-Hartman, lvc-project

On Sun, 23. Mar 17:29, Leah Rumancik wrote:
> Okay so a summary from my understanding, correct me if I'm wrong:
> 
> 03f7767c9f612 introduced the issue in both 6.1 and 6.6.
> 
> On mainline, this is resolved by e5f1a5146ec3. This commit is painful
> to apply to 6.1 but does apply to 6.6 along with the rest of the
> patchset it was a part of (which is the set you just sent out for
> 6.6).

Yeah, that's all correct.

> 
> With the stable branches we try to balance the risk of introducing new
> bugs via huge fixes with the benefit of the fix itself. Especially if
> the patches don't apply cleanly, it might not be worth the risk and
> effort to do the porting. Hmm, since it seems like we might not even
> end up taking 03f7767c9f6120 to stable, I'd propose we just drop
> 03f7767c9f6120 for now. If the rest of the subsequent patches in the
> original set apply cleanly, I don't think we need to drop them all. We
> can then try to fix the UAF with a more targeted approach in a later
> patch instead of via direct cherry-picks.
> 
>  What do you think?

03f7767c9f6120 is '[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover'

Two subsequent patches depend on it logically so should also be dropped:

'[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover'
'[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover'


On the other side, '[PATCH 6.1 13/29] xfs: don't leak recovered attri intent items'
which is at the start of the original patchset [1] looks OK to be taken.
It's rather aside from the subsequent rework patches and fixes a pinpoint
bug.

[1]: https://lore.kernel.org/linux-xfs/170191741007.1195961.10092536809136830257.stg-ugh@frogsfrogsfrogs/


So I've tried the current xfs backport series with three dropped commits:

[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover
[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover
[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover

(everything before and after that still applies cleanly and touches
other things)

and no regressions seen on my side.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-24  8:53           ` Fedor Pchelkin
@ 2025-03-24 21:10             ` Leah Rumancik
  2025-03-25 11:50               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 45+ messages in thread
From: Leah Rumancik @ 2025-03-24 21:10 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: stable, xfs-stable, Darrick J. Wong, Christoph Hellwig,
	Catherine Hoang, Greg Kroah-Hartman, lvc-project

This sounds good to me.

Greg, can we drop the following patches?

'[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover'
'[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover'
'[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in
->iop_recover'

Thanks,
leah

On Mon, Mar 24, 2025 at 1:53 AM Fedor Pchelkin <pchelkin@ispras.ru> wrote:
>
> On Sun, 23. Mar 17:29, Leah Rumancik wrote:
> > Okay so a summary from my understanding, correct me if I'm wrong:
> >
> > 03f7767c9f612 introduced the issue in both 6.1 and 6.6.
> >
> > On mainline, this is resolved by e5f1a5146ec3. This commit is painful
> > to apply to 6.1 but does apply to 6.6 along with the rest of the
> > patchset it was a part of (which is the set you just sent out for
> > 6.6).
>
> Yeah, that's all correct.
>
> >
> > With the stable branches we try to balance the risk of introducing new
> > bugs via huge fixes with the benefit of the fix itself. Especially if
> > the patches don't apply cleanly, it might not be worth the risk and
> > effort to do the porting. Hmm, since it seems like we might not even
> > end up taking 03f7767c9f6120 to stable, I'd propose we just drop
> > 03f7767c9f6120 for now. If the rest of the subsequent patches in the
> > original set apply cleanly, I don't think we need to drop them all. We
> > can then try to fix the UAF with a more targeted approach in a later
> > patch instead of via direct cherry-picks.
> >
> >  What do you think?
>
> 03f7767c9f6120 is '[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover'
>
> Two subsequent patches depend on it logically so should also be dropped:
>
> '[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover'
> '[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover'
>
>
> On the other side, '[PATCH 6.1 13/29] xfs: don't leak recovered attri intent items'
> which is at the start of the original patchset [1] looks OK to be taken.
> It's rather aside from the subsequent rework patches and fixes a pinpoint
> bug.
>
> [1]: https://lore.kernel.org/linux-xfs/170191741007.1195961.10092536809136830257.stg-ugh@frogsfrogsfrogs/
>
>
> So I've tried the current xfs backport series with three dropped commits:
>
> [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover
> [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover
> [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover
>
> (everything before and after that still applies cleanly and touches
> other things)
>
> and no regressions seen on my side.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items
  2025-03-24 21:10             ` Leah Rumancik
@ 2025-03-25 11:50               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 45+ messages in thread
From: Greg Kroah-Hartman @ 2025-03-25 11:50 UTC (permalink / raw)
  To: Leah Rumancik
  Cc: Fedor Pchelkin, stable, xfs-stable, Darrick J. Wong,
	Christoph Hellwig, Catherine Hoang, lvc-project

On Mon, Mar 24, 2025 at 02:10:03PM -0700, Leah Rumancik wrote:
> This sounds good to me.
> 
> Greg, can we drop the following patches?
> 
> '[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover'
> '[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover'
> '[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in
> ->iop_recover'

Now dropped, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (13 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover " Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover Leah Rumancik
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit a050acdfa8003a44eae4558fddafc7afb1aef458 ]

Now that log intent item recovery recreates the xfs_defer_pending state,
we should pass that into the ->iop_recover routines so that the intent
item can finish the recreation work.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c     | 3 ++-
 fs/xfs/xfs_bmap_item.c     | 3 ++-
 fs/xfs/xfs_extfree_item.c  | 3 ++-
 fs/xfs/xfs_log_recover.c   | 2 +-
 fs/xfs/xfs_refcount_item.c | 3 ++-
 fs/xfs/xfs_rmap_item.c     | 3 ++-
 fs/xfs/xfs_trans.h         | 4 +++-
 7 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index a32716b8cbbd..6119a7a480a0 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -543,13 +543,14 @@ xfs_attri_validate(
  * Process an attr intent item that was recovered from the log.  We need to
  * delete the attr that it describes.
  */
 STATIC int
 xfs_attri_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
 	struct xfs_attr_intent		*attr;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_inode		*ip;
 	struct xfs_da_args		*args;
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 8d08252e1953..30c05eb862d6 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -451,15 +451,16 @@ xfs_bui_validate(
  * Process a bmap update intent item that was recovered from the log.
  * We need to update some inode's bmbt.
  */
 STATIC int
 xfs_bui_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_bmap_intent		fake = { };
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_map_extent		*map;
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index fd9fe51bcc31..6cfd9339f872 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -593,14 +593,15 @@ xfs_efi_validate_ext(
  * Process an extent free intent item that was recovered from
  * the log.  We need to free the extents that it describes.
  */
 STATIC int
 xfs_efi_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
 	int				i;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 65041ed7833d..303bf9728c03 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2584,11 +2584,11 @@ xlog_recover_process_intents(
 		 * replayed in the wrong order!
 		 *
 		 * The recovery function can free the log item, so we must not
 		 * access lip after it returns.
 		 */
-		error = ops->iop_recover(lip, &capture_list);
+		error = ops->iop_recover(dfp, &capture_list);
 		if (error) {
 			trace_xlog_intent_recovery_failed(log->l_mp, error,
 					ops->iop_recover);
 			break;
 		}
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 1e047107d2f2..e158dd9f86b0 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -448,14 +448,15 @@ xfs_cui_validate_phys(
  * Process a refcount update intent item that was recovered from the log.
  * We need to update the refcountbt.
  */
 STATIC int
 xfs_cui_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
 	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 12ae8ab6a69d..b3b4de68b41b 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -487,14 +487,15 @@ xfs_rui_validate_map(
  * Process an rmap update intent item that was recovered from the log.
  * We need to update the rmapbt.
  */
 STATIC int
 xfs_rui_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
 	struct xfs_map_extent		*rmap;
 	struct xfs_rud_log_item		*rudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 55819785941c..0f2b62f3ca19 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -64,23 +64,25 @@ struct xfs_log_item {
 	{ (1u << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1u << XFS_LI_FAILED),	"FAILED" }, \
 	{ (1u << XFS_LI_DIRTY),		"DIRTY" }, \
 	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }
 
+struct xfs_defer_pending;
+
 struct xfs_item_ops {
 	unsigned flags;
 	void (*iop_size)(struct xfs_log_item *, int *, int *);
 	void (*iop_format)(struct xfs_log_item *, struct xfs_log_vec *);
 	void (*iop_pin)(struct xfs_log_item *);
 	void (*iop_unpin)(struct xfs_log_item *, int remove);
 	uint64_t (*iop_sort)(struct xfs_log_item *lip);
 	int (*iop_precommit)(struct xfs_trans *tp, struct xfs_log_item *lip);
 	void (*iop_committing)(struct xfs_log_item *lip, xfs_csn_t seq);
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
 	uint (*iop_push)(struct xfs_log_item *, struct list_head *);
 	void (*iop_release)(struct xfs_log_item *);
-	int (*iop_recover)(struct xfs_log_item *lip,
+	int (*iop_recover)(struct xfs_defer_pending *dfp,
 			   struct list_head *capture_list);
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,
 			struct xfs_trans *tp);
 	struct xfs_log_item *(*iop_intent)(struct xfs_log_item *intent_done);
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (14 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 17/29] xfs: make rextslog computation consistent with mkfs Leah Rumancik
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 ]

Now that we pass the xfs_defer_pending object into the intent item
recovery functions, we know exactly when ownership of the sole refcount
passes from the recovery context to the intent done item.  At that
point, we need to null out dfp_intent so that the recovery mechanism
won't release it.  This should fix the UAF problem reported by Long Li.

Note that we still want to recreate the full deferred work state.  That
will be addressed in the next patches.

Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_recover.h |  2 ++
 fs/xfs/xfs_attr_item.c          |  1 +
 fs/xfs/xfs_bmap_item.c          |  2 ++
 fs/xfs/xfs_extfree_item.c       |  2 ++
 fs/xfs/xfs_log_recover.c        | 19 ++++++++++++-------
 fs/xfs/xfs_refcount_item.c      |  1 +
 fs/xfs/xfs_rmap_item.c          |  2 ++
 7 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 271a4ce7375c..13583df9f239 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -153,7 +153,9 @@ xlog_recover_resv(const struct xfs_trans_res *r)
 	return ret;
 }
 
 void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
 		xfs_lsn_t lsn, unsigned int dfp_type);
+void xlog_recover_transfer_intent(struct xfs_trans *tp,
+		struct xfs_defer_pending *dfp);
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 6119a7a480a0..82775e9537df 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -630,10 +630,11 @@ xfs_attri_item_recover(
 	if (error)
 		goto out;
 
 	args->trans = tp;
 	done_item = xfs_trans_get_attrd(tp, attrip);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
 	error = xfs_xattri_finish_update(attr, done_item);
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 30c05eb862d6..317cf49e6cd6 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -489,10 +489,12 @@ xfs_bui_item_recover(
 			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		goto err_rele;
 
 	budp = xfs_trans_get_bud(tp, buip);
+	xlog_recover_transfer_intent(tp, dfp);
+
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
 	if (fake.bi_type == XFS_BMAP_MAP)
 		iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT;
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 6cfd9339f872..9c726f082285 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -624,11 +624,13 @@ xfs_efi_item_recover(
 
 	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
 	error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp);
 	if (error)
 		return error;
+
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		struct xfs_extent_free_item	fake = {
 			.xefi_owner		= XFS_RMAP_OWN_UNKNOWN,
 			.xefi_agresv		= XFS_AG_RESV_NONE,
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 303bf9728c03..e7992a651740 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2591,17 +2591,10 @@ xlog_recover_process_intents(
 			trace_xlog_intent_recovery_failed(log->l_mp, error,
 					ops->iop_recover);
 			break;
 		}
 
-		/*
-		 * XXX: @lip could have been freed, so detach the log item from
-		 * the pending item before freeing the pending item.  This does
-		 * not fix the existing UAF bug that occurs if ->iop_recover
-		 * fails after creating the intent done item.
-		 */
-		dfp->dfp_intent = NULL;
 		xfs_defer_cancel_recovery(log->l_mp, dfp);
 	}
 	if (error)
 		goto err;
 
@@ -2631,10 +2624,22 @@ xlog_recover_cancel_intents(
 
 		xfs_defer_cancel_recovery(log->l_mp, dfp);
 	}
 }
 
+/*
+ * Transfer ownership of the recovered log intent item to the recovery
+ * transaction.
+ */
+void
+xlog_recover_transfer_intent(
+	struct xfs_trans		*tp,
+	struct xfs_defer_pending	*dfp)
+{
+	dfp->dfp_intent = NULL;
+}
+
 /*
  * This routine performs a transaction to null out a bad inode pointer
  * in an agi unlinked inode hash bucket.
  */
 STATIC void
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index e158dd9f86b0..2e8bdd598296 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -497,10 +497,11 @@ xfs_cui_item_recover(
 			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
 
 	cudp = xfs_trans_get_cud(tp, cuip);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
 		struct xfs_refcount_intent	fake = { };
 		struct xfs_phys_extent		*refc;
 
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index b3b4de68b41b..f29fadf68ed2 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -524,11 +524,13 @@ xfs_rui_item_recover(
 	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
 	error = xfs_trans_alloc(mp, &resv, mp->m_rmap_maxlevels, 0,
 			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
+
 	rudp = xfs_trans_get_rud(tp, ruip);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	for (i = 0; i < ruip->rui_format.rui_nextents; i++) {
 		rmap = &ruip->rui_format.rui_extents[i];
 		state = (rmap->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ?
 				XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 17/29] xfs: make rextslog computation consistent with mkfs
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (15 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 18/29] xfs: fix 32-bit truncation in xfs_compute_rextslog Leah Rumancik
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit a6a38f309afc4a7ede01242b603f36c433997780 ]

There's a weird discrepancy in xfsprogs dating back to the creation of
the Linux port -- if there are zero rt extents, mkfs will set
sb_rextents and sb_rextslog both to zero:

	sbp->sb_rextslog =
		(uint8_t)(rtextents ?
			libxfs_highbit32((unsigned int)rtextents) : 0);

However, that's not the check that xfs_repair uses for nonzero rtblocks:

	if (sb->sb_rextslog !=
			libxfs_highbit32((unsigned int)sb->sb_rextents))

The difference here is that xfs_highbit32 returns -1 if its argument is
zero.  Unfortunately, this means that in the weird corner case of a
realtime volume shorter than 1 rt extent, xfs_repair will immediately
flag a freshly formatted filesystem as corrupt.  Because mkfs has been
writing ondisk artifacts like this for decades, we have to accept that
as "correct".  TBH, zero rextslog for zero rtextents makes more sense to
me anyway.

Regrettably, the superblock verifier checks created in commit copied
xfs_repair even though mkfs has been writing out such filesystems for
ages.  Fix the superblock verifier to accept what mkfs spits out; the
userspace version of this patch will have to fix xfs_repair as well.

Note that the new helper leaves the zeroday bug where the upper 32 bits
of sb_rextents is ripped off and fed to highbit32.  This leads to a
seriously undersized rt summary file, which immediately breaks mkfs:

$ hugedisk.sh foo /dev/sdc $(( 0x100000080 * 4096))B
$ /sbin/mkfs.xfs -f /dev/sda -m rmapbt=0,reflink=0 -r rtdev=/dev/mapper/foo
meta-data=/dev/sda               isize=512    agcount=4, agsize=1298176 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0    bigtime=1 inobtcount=1 nrext64=1
data     =                       bsize=4096   blocks=5192704, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =/dev/mapper/foo        extsz=4096   blocks=4294967424, rtextents=4294967424
Discarding blocks...Done.
mkfs.xfs: Error initializing the realtime space [117 - Structure needs cleaning]

The next patch will drop support for rt volumes with fewer than 1 or
more than 2^32-1 rt extents, since they've clearly been broken forever.

Fixes: f8e566c0f5e1f ("xfs: validate the realtime geometry in xfs_validate_sb_common")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c | 13 +++++++++++++
 fs/xfs/libxfs/xfs_rtbitmap.h |  4 ++++
 fs/xfs/libxfs/xfs_sb.c       |  3 ++-
 fs/xfs/xfs_rtalloc.c         |  4 ++--
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 9eb1b5aa7e35..37b425ea3fed 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1128,5 +1128,18 @@ xfs_rtalloc_extent_is_free(
 		return error;
 
 	*is_free = matches;
 	return 0;
 }
+
+/*
+ * Compute the maximum level number of the realtime summary file, as defined by
+ * mkfs.  The use of highbit32 on a 64-bit quantity is a historic artifact that
+ * prohibits correct use of rt volumes with more than 2^32 extents.
+ */
+uint8_t
+xfs_compute_rextslog(
+	xfs_rtbxlen_t		rtextents)
+{
+	return rtextents ? xfs_highbit32(rtextents) : 0;
+}
+
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index c3ef22e67aa3..6becdc7a48ed 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -68,15 +68,19 @@ xfs_rtfree_extent(
 	xfs_extlen_t		len);	/* length of extent freed */
 
 /* Same as above, but in units of rt blocks. */
 int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
 		xfs_filblks_t rtlen);
+
+uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
+
 #else /* CONFIG_XFS_RT */
 # define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
 # define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
 # define xfs_rtalloc_query_range(m,t,l,h,f,p)		(-ENOSYS)
 # define xfs_rtalloc_query_all(m,t,f,p)			(-ENOSYS)
 # define xfs_rtbuf_get(m,t,b,i,p)			(-ENOSYS)
 # define xfs_rtalloc_extent_is_free(m,t,s,l,i)		(-ENOSYS)
+# define xfs_compute_rextslog(rtx)			(0)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __XFS_RTBITMAP_H__ */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index d214233ef532..6b87b04d0c6c 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -23,10 +23,11 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_refcount_btree.h"
 #include "xfs_da_format.h"
 #include "xfs_health.h"
 #include "xfs_ag.h"
+#include "xfs_rtbitmap.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
  */
 
@@ -500,11 +501,11 @@ xfs_validate_sb_common(
 		rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize);
 		rbmblocks = howmany_64(sbp->sb_rextents,
 				       NBBY * sbp->sb_blocksize);
 
 		if (sbp->sb_rextents != rexts ||
-		    sbp->sb_rextslog != xfs_highbit32(sbp->sb_rextents) ||
+		    sbp->sb_rextslog != xfs_compute_rextslog(rexts) ||
 		    sbp->sb_rbmblocks != rbmblocks) {
 			xfs_notice(mp,
 				"realtime geometry sanity check failed");
 			return -EFSCORRUPTED;
 		}
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 2f2280f4e7fa..eca800e2b879 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -997,11 +997,11 @@ xfs_growfs_rt(
 	 * Calculate new parameters.  These are the final values to be reached.
 	 */
 	nrextents = nrblocks;
 	do_div(nrextents, in->extsize);
 	nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize);
-	nrextslog = xfs_highbit32(nrextents);
+	nrextslog = xfs_compute_rextslog(nrextents);
 	nrsumlevels = nrextslog + 1;
 	nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks;
 	nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
 	nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks);
 	/*
@@ -1059,11 +1059,11 @@ xfs_growfs_rt(
 				nsbp->sb_rextsize;
 		nsbp->sb_rblocks = min(nrblocks, nrblocks_step);
 		nsbp->sb_rextents = nsbp->sb_rblocks;
 		do_div(nsbp->sb_rextents, nsbp->sb_rextsize);
 		ASSERT(nsbp->sb_rextents != 0);
-		nsbp->sb_rextslog = xfs_highbit32(nsbp->sb_rextents);
+		nsbp->sb_rextslog = xfs_compute_rextslog(nsbp->sb_rextents);
 		nrsumlevels = nmp->m_rsumlevels = nsbp->sb_rextslog + 1;
 		nrsumsize =
 			(uint)sizeof(xfs_suminfo_t) * nrsumlevels *
 			nsbp->sb_rbmblocks;
 		nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 18/29] xfs: fix 32-bit truncation in xfs_compute_rextslog
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (16 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 17/29] xfs: make rextslog computation consistent with mkfs Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 19/29] xfs: don't allow overly small or large realtime volumes Leah Rumancik
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit cf8f0e6c1429be7652869059ea44696b72d5b726 ]

It's quite reasonable that some customer somewhere will want to
configure a realtime volume with more than 2^32 extents.  If they try to
do this, the highbit32() call will truncate the upper bits of the
xfs_rtbxlen_t and produce the wrong value for rextslog.  This in turn
causes the rsumlevels to be wrong, which results in a realtime summary
file that is the wrong length.  Fix that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 37b425ea3fed..8db1243beacc 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1131,15 +1131,17 @@ xfs_rtalloc_extent_is_free(
 	return 0;
 }
 
 /*
  * Compute the maximum level number of the realtime summary file, as defined by
- * mkfs.  The use of highbit32 on a 64-bit quantity is a historic artifact that
- * prohibits correct use of rt volumes with more than 2^32 extents.
+ * mkfs.  The historic use of highbit32 on a 64-bit quantity prohibited correct
+ * use of rt volumes with more than 2^32 extents.
  */
 uint8_t
 xfs_compute_rextslog(
 	xfs_rtbxlen_t		rtextents)
 {
-	return rtextents ? xfs_highbit32(rtextents) : 0;
+	if (!rtextents)
+		return 0;
+	return xfs_highbit64(rtextents);
 }
 
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 19/29] xfs: don't allow overly small or large realtime volumes
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (17 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 18/29] xfs: fix 32-bit truncation in xfs_compute_rextslog Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 20/29] xfs: remove unused fields from struct xbtree_ifakeroot Leah Rumancik
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit e14293803f4e84eb23a417b462b56251033b5a66 ]

Don't allow realtime volumes that are less than one rt extent long.
This has been broken across 4 LTS kernels with nobody noticing, so let's
just disable it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.h | 13 +++++++++++++
 fs/xfs/libxfs/xfs_sb.c       |  3 ++-
 fs/xfs/xfs_rtalloc.c         |  2 ++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 6becdc7a48ed..4e49aadf0955 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -71,16 +71,29 @@ xfs_rtfree_extent(
 int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
 		xfs_filblks_t rtlen);
 
 uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
 
+/* Do we support an rt volume having this number of rtextents? */
+static inline bool
+xfs_validate_rtextents(
+	xfs_rtbxlen_t		rtextents)
+{
+	/* No runt rt volumes */
+	if (rtextents == 0)
+		return false;
+
+	return true;
+}
+
 #else /* CONFIG_XFS_RT */
 # define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
 # define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
 # define xfs_rtalloc_query_range(m,t,l,h,f,p)		(-ENOSYS)
 # define xfs_rtalloc_query_all(m,t,f,p)			(-ENOSYS)
 # define xfs_rtbuf_get(m,t,b,i,p)			(-ENOSYS)
 # define xfs_rtalloc_extent_is_free(m,t,s,l,i)		(-ENOSYS)
 # define xfs_compute_rextslog(rtx)			(0)
+# define xfs_validate_rtextents(rtx)			(false)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __XFS_RTBITMAP_H__ */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 6b87b04d0c6c..04247d1c7523 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -500,11 +500,12 @@ xfs_validate_sb_common(
 
 		rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize);
 		rbmblocks = howmany_64(sbp->sb_rextents,
 				       NBBY * sbp->sb_blocksize);
 
-		if (sbp->sb_rextents != rexts ||
+		if (!xfs_validate_rtextents(rexts) ||
+		    sbp->sb_rextents != rexts ||
 		    sbp->sb_rextslog != xfs_compute_rextslog(rexts) ||
 		    sbp->sb_rbmblocks != rbmblocks) {
 			xfs_notice(mp,
 				"realtime geometry sanity check failed");
 			return -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index eca800e2b879..2dcd5cca4ec2 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -996,10 +996,12 @@ xfs_growfs_rt(
 	/*
 	 * Calculate new parameters.  These are the final values to be reached.
 	 */
 	nrextents = nrblocks;
 	do_div(nrextents, in->extsize);
+	if (!xfs_validate_rtextents(nrextents))
+		return -EINVAL;
 	nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize);
 	nrextslog = xfs_compute_rextslog(nrextents);
 	nrsumlevels = nrextslog + 1;
 	nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks;
 	nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 20/29] xfs: remove unused fields from struct xbtree_ifakeroot
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (18 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 19/29] xfs: don't allow overly small or large realtime volumes Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 21/29] xfs: recompute growfsrtfree transaction reservation while growing rt volume Leah Rumancik
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 4c8ecd1cfdd01fb727121035014d9f654a30bdf2 ]

Remove these unused fields since nobody uses them.  They should have
been removed years ago in a different cleanup series from Christoph
Hellwig.

Fixes: daf83964a3681 ("xfs: move the per-fork nextents fields into struct xfs_ifork")
Fixes: f7e67b20ecbbc ("xfs: move the fork format fields into struct xfs_ifork")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree_staging.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree_staging.h b/fs/xfs/libxfs/xfs_btree_staging.h
index f0d2976050ae..5f638f711246 100644
--- a/fs/xfs/libxfs/xfs_btree_staging.h
+++ b/fs/xfs/libxfs/xfs_btree_staging.h
@@ -35,16 +35,10 @@ struct xbtree_ifakeroot {
 	/* Height of the new btree. */
 	unsigned int		if_levels;
 
 	/* Number of bytes available for this fork in the inode. */
 	unsigned int		if_fork_size;
-
-	/* Fork format. */
-	unsigned int		if_format;
-
-	/* Number of records. */
-	unsigned int		if_extents;
 };
 
 /* Cursor interactions with fake roots for inode-rooted btrees. */
 void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
 		struct xbtree_ifakeroot *ifake,
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 21/29] xfs: recompute growfsrtfree transaction reservation while growing rt volume
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (19 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 20/29] xfs: remove unused fields from struct xbtree_ifakeroot Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 22/29] xfs: force all buffers to be written during btree bulk load Leah Rumancik
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 578bd4ce7100ae34f98c6b0147fe75cfa0dadbac ]

While playing with growfs to create a 20TB realtime section on a
filesystem that didn't previously have an rt section, I noticed that
growfs would occasionally shut down the log due to a transaction
reservation overflow.

xfs_calc_growrtfree_reservation uses the current size of the realtime
summary file (m_rsumsize) to compute the transaction reservation for a
growrtfree transaction.  The reservations are computed at mount time,
which means that m_rsumsize is zero when growfs starts "freeing" the new
realtime extents into the rt volume.  As a result, the transaction is
undersized and fails.

Fix this by recomputing the transaction reservations every time we
change m_rsumsize.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_rtalloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 2dcd5cca4ec2..7c5134899634 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1068,10 +1068,13 @@ xfs_growfs_rt(
 		nrsumsize =
 			(uint)sizeof(xfs_suminfo_t) * nrsumlevels *
 			nsbp->sb_rbmblocks;
 		nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
 		nmp->m_rsumsize = nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks);
+		/* recompute growfsrt reservation from new rsumsize */
+		xfs_trans_resv_calc(nmp, &nmp->m_resv);
+
 		/*
 		 * Start a transaction, get the log reservation.
 		 */
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtfree, 0, 0, 0,
 				&tp);
@@ -1151,10 +1154,12 @@ xfs_growfs_rt(
 		/*
 		 * Update mp values into the real mp structure.
 		 */
 		mp->m_rsumlevels = nrsumlevels;
 		mp->m_rsumsize = nrsumsize;
+		/* recompute growfsrt reservation from new rsumsize */
+		xfs_trans_resv_calc(mp, &mp->m_resv);
 
 		error = xfs_trans_commit(tp);
 		if (error)
 			break;
 
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 22/29] xfs: force all buffers to be written during btree bulk load
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (20 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 21/29] xfs: recompute growfsrtfree transaction reservation while growing rt volume Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 23/29] xfs: initialise di_crc in xfs_log_dinode Leah Rumancik
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable; +Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 ]

While stress-testing online repair of btrees, I noticed periodic
assertion failures from the buffer cache about buffers with incorrect
DELWRI_Q state.  Looking further, I observed this race between the AIL
trying to write out a btree block and repair zapping a btree block after
the fact:

AIL:    Repair0:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

delwri_submit   # oops

Worse yet, I discovered that running the same repair over and over in a
tight loop can result in a second race that cause data integrity
problems with the repair:

AIL:    Repair0:        Repair1:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

                        find free space X
                        get buffer
                        rewrite buffer
                        delwri_queue:
                        set DELWRI_Q
                        already on a list, do not add
                        commit

                        BAD: committed tree root before all blocks written

delwri_submit   # too late now

I traced this to my own misunderstanding of how the delwri lists work,
particularly with regards to the AIL's buffer list.  If a buffer is
logged and committed, the buffer can end up on that AIL buffer list.  If
btree repairs are run twice in rapid succession, it's possible that the
first repair will invalidate the buffer and free it before the next time
the AIL wakes up.  Marking the buffer stale clears DELWRI_Q from the
buffer state without removing the buffer from its delwri list.  The
buffer doesn't know which list it's on, so it cannot know which lock to
take to protect the list for a removal.

If the second repair allocates the same block, it will then recycle the
buffer to start writing the new btree block.  Meanwhile, if the AIL
wakes up and walks the buffer list, it will ignore the buffer because it
can't lock it, and go back to sleep.

When the second repair calls delwri_queue to put the buffer on the
list of buffers to write before committing the new btree, it will set
DELWRI_Q again, but since the buffer hasn't been removed from the AIL's
buffer list, it won't add it to the bulkload buffer's list.

This is incorrect, because the bulkload caller relies on delwri_submit
to ensure that all the buffers have been sent to disk /before/
committing the new btree root pointer.  This ordering requirement is
required for data consistency.

Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally
drop it, so the next thread to walk through the btree will trip over a
debug assertion on that flag.

To fix this, create a new function that waits for the buffer to be
removed from any other delwri lists before adding the buffer to the
caller's delwri list.  By waiting for the buffer to clear both the
delwri list and any potential delwri wait list, we can be sure that
repair will initiate writes of all buffers and report all write errors
back to userspace instead of committing the new structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_btree_staging.c |  4 +--
 fs/xfs/xfs_buf.c                  | 44 ++++++++++++++++++++++++++++---
 fs/xfs/xfs_buf.h                  |  1 +
 3 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
index dd75e208b543..29e3f8ccb185 100644
--- a/fs/xfs/libxfs/xfs_btree_staging.c
+++ b/fs/xfs/libxfs/xfs_btree_staging.c
@@ -340,13 +340,11 @@ xfs_btree_bload_drop_buf(
 	struct xfs_buf		**bpp)
 {
 	if (*bpp == NULL)
 		return;
 
-	if (!xfs_buf_delwri_queue(*bpp, buffers_list))
-		ASSERT(0);
-
+	xfs_buf_delwri_queue_here(*bpp, buffers_list);
 	xfs_buf_relse(*bpp);
 	*bpp = NULL;
 }
 
 /*
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 54c774af6e1c..257945cdf63b 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2038,28 +2038,36 @@ xfs_alloc_buftarg(
 error_free:
 	kmem_free(btp);
 	return NULL;
 }
 
+static inline void
+xfs_buf_list_del(
+	struct xfs_buf		*bp)
+{
+	list_del_init(&bp->b_list);
+	wake_up_var(&bp->b_list);
+}
+
 /*
  * Cancel a delayed write list.
  *
  * Remove each buffer from the list, clear the delwri queue flag and drop the
  * associated buffer reference.
  */
 void
 xfs_buf_delwri_cancel(
 	struct list_head	*list)
 {
 	struct xfs_buf		*bp;
 
 	while (!list_empty(list)) {
 		bp = list_first_entry(list, struct xfs_buf, b_list);
 
 		xfs_buf_lock(bp);
 		bp->b_flags &= ~_XBF_DELWRI_Q;
-		list_del_init(&bp->b_list);
+		xfs_buf_list_del(bp);
 		xfs_buf_relse(bp);
 	}
 }
 
 /*
@@ -2108,10 +2116,38 @@ xfs_buf_delwri_queue(
 	}
 
 	return true;
 }
 
+/*
+ * Queue a buffer to this delwri list as part of a data integrity operation.
+ * If the buffer is on any other delwri list, we'll wait for that to clear
+ * so that the caller can submit the buffer for IO and wait for the result.
+ * Callers must ensure the buffer is not already on the list.
+ */
+void
+xfs_buf_delwri_queue_here(
+	struct xfs_buf		*bp,
+	struct list_head	*buffer_list)
+{
+	/*
+	 * We need this buffer to end up on the /caller's/ delwri list, not any
+	 * old list.  This can happen if the buffer is marked stale (which
+	 * clears DELWRI_Q) after the AIL queues the buffer to its list but
+	 * before the AIL has a chance to submit the list.
+	 */
+	while (!list_empty(&bp->b_list)) {
+		xfs_buf_unlock(bp);
+		wait_var_event(&bp->b_list, list_empty(&bp->b_list));
+		xfs_buf_lock(bp);
+	}
+
+	ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+
+	xfs_buf_delwri_queue(bp, buffer_list);
+}
+
 /*
  * Compare function is more complex than it needs to be because
  * the return value is only 32 bits and we are doing comparisons
  * on 64 bit values
  */
@@ -2170,31 +2206,31 @@ xfs_buf_delwri_submit_buffers(
 		 * marked it stale in the meantime.  In that case only the
 		 * _XBF_DELWRI_Q flag got cleared, and we have to drop the
 		 * reference and remove it from the list here.
 		 */
 		if (!(bp->b_flags & _XBF_DELWRI_Q)) {
-			list_del_init(&bp->b_list);
+			xfs_buf_list_del(bp);
 			xfs_buf_relse(bp);
 			continue;
 		}
 
 		trace_xfs_buf_delwri_split(bp, _RET_IP_);
 
 		/*
 		 * If we have a wait list, each buffer (and associated delwri
 		 * queue reference) transfers to it and is submitted
 		 * synchronously. Otherwise, drop the buffer from the delwri
 		 * queue and submit async.
 		 */
 		bp->b_flags &= ~_XBF_DELWRI_Q;
 		bp->b_flags |= XBF_WRITE;
 		if (wait_list) {
 			bp->b_flags &= ~XBF_ASYNC;
 			list_move_tail(&bp->b_list, wait_list);
 		} else {
 			bp->b_flags |= XBF_ASYNC;
-			list_del_init(&bp->b_list);
+			xfs_buf_list_del(bp);
 		}
 		__xfs_buf_submit(bp, false);
 	}
 	blk_finish_plug(&plug);
 
@@ -2244,11 +2280,11 @@ xfs_buf_delwri_submit(
 
 	/* Wait for IO to complete. */
 	while (!list_empty(&wait_list)) {
 		bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
 
-		list_del_init(&bp->b_list);
+		xfs_buf_list_del(bp);
 
 		/*
 		 * Wait on the locked buffer, check for errors and unlock and
 		 * release the delwri queue reference.
 		 */
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 549c60942208..6cf0332ba62c 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -303,10 +303,11 @@ extern void *xfs_buf_offset(struct xfs_buf *, size_t);
 extern void xfs_buf_stale(struct xfs_buf *bp);
 
 /* Delayed Write Buffer Routines */
 extern void xfs_buf_delwri_cancel(struct list_head *);
 extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
+void xfs_buf_delwri_queue_here(struct xfs_buf *bp, struct list_head *bl);
 extern int xfs_buf_delwri_submit(struct list_head *);
 extern int xfs_buf_delwri_submit_nowait(struct list_head *);
 extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);
 
 static inline xfs_daddr_t xfs_buf_daddr(struct xfs_buf *bp)
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 23/29] xfs: initialise di_crc in xfs_log_dinode
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (21 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 22/29] xfs: force all buffers to be written during btree bulk load Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 24/29] xfs: add lock protection when remove perag from radix tree Leah Rumancik
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Dave Chinner, Alexander Potapenko, Darrick J. Wong,
	Chandan Babu R, Leah Rumancik

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 0573676fdde7ce3829ee6a42a8e5a56355234712 ]

Alexander Potapenko report that KMSAN was issuing these warnings:

kmalloc-ed xlog buffer of size 512 : ffff88802fc26200
kmalloc-ed xlog buffer of size 368 : ffff88802fc24a00
kmalloc-ed xlog buffer of size 648 : ffff88802b631000
kmalloc-ed xlog buffer of size 648 : ffff88802b632800
kmalloc-ed xlog buffer of size 648 : ffff88802b631c00
xlog_write_iovec: copying 12 bytes from ffff888017ddbbd8 to ffff88802c300400
xlog_write_iovec: copying 28 bytes from ffff888017ddbbe4 to ffff88802c30040c
xlog_write_iovec: copying 68 bytes from ffff88802fc26274 to ffff88802c300428
xlog_write_iovec: copying 188 bytes from ffff88802fc262bc to ffff88802c30046c
=====================================================
BUG: KMSAN: uninit-value in xlog_write_iovec fs/xfs/xfs_log.c:2227
BUG: KMSAN: uninit-value in xlog_write_full fs/xfs/xfs_log.c:2263
BUG: KMSAN: uninit-value in xlog_write+0x1fac/0x2600 fs/xfs/xfs_log.c:2532
 xlog_write_iovec fs/xfs/xfs_log.c:2227
 xlog_write_full fs/xfs/xfs_log.c:2263
 xlog_write+0x1fac/0x2600 fs/xfs/xfs_log.c:2532
 xlog_cil_write_chain fs/xfs/xfs_log_cil.c:918
 xlog_cil_push_work+0x30f2/0x44e0 fs/xfs/xfs_log_cil.c:1263
 process_one_work kernel/workqueue.c:2630
 process_scheduled_works+0x1188/0x1e30 kernel/workqueue.c:2703
 worker_thread+0xee5/0x14f0 kernel/workqueue.c:2784
 kthread+0x391/0x500 kernel/kthread.c:388
 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242

Uninit was created at:
 slab_post_alloc_hook+0x101/0xac0 mm/slab.h:768
 slab_alloc_node mm/slub.c:3482
 __kmem_cache_alloc_node+0x612/0xae0 mm/slub.c:3521
 __do_kmalloc_node mm/slab_common.c:1006
 __kmalloc+0x11a/0x410 mm/slab_common.c:1020
 kmalloc ./include/linux/slab.h:604
 xlog_kvmalloc fs/xfs/xfs_log_priv.h:704
 xlog_cil_alloc_shadow_bufs fs/xfs/xfs_log_cil.c:343
 xlog_cil_commit+0x487/0x4dc0 fs/xfs/xfs_log_cil.c:1574
 __xfs_trans_commit+0x8df/0x1930 fs/xfs/xfs_trans.c:1017
 xfs_trans_commit+0x30/0x40 fs/xfs/xfs_trans.c:1061
 xfs_create+0x15af/0x2150 fs/xfs/xfs_inode.c:1076
 xfs_generic_create+0x4cd/0x1550 fs/xfs/xfs_iops.c:199
 xfs_vn_create+0x4a/0x60 fs/xfs/xfs_iops.c:275
 lookup_open fs/namei.c:3477
 open_last_lookups fs/namei.c:3546
 path_openat+0x29ac/0x6180 fs/namei.c:3776
 do_filp_open+0x24d/0x680 fs/namei.c:3809
 do_sys_openat2+0x1bc/0x330 fs/open.c:1440
 do_sys_open fs/open.c:1455
 __do_sys_openat fs/open.c:1471
 __se_sys_openat fs/open.c:1466
 __x64_sys_openat+0x253/0x330 fs/open.c:1466
 do_syscall_x64 arch/x86/entry/common.c:51
 do_syscall_64+0x4f/0x140 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b arch/x86/entry/entry_64.S:120

Bytes 112-115 of 188 are uninitialized
Memory access of size 188 starts at ffff88802fc262bc

This is caused by the struct xfs_log_dinode not having the di_crc
field initialised. Log recovery never uses this field (it is only
present these days for on-disk format compatibility reasons) and so
it's value is never checked so nothing in XFS has caught this.

Further, none of the uninitialised memory access warning tools have
caught this (despite catching other uninit memory accesses in the
struct xfs_log_dinode back in 2017!) until recently. Alexander
annotated the XFS code to get the dump of the actual bytes that were
detected as uninitialised, and from that report it took me about 30s
to realise what the issue was.

The issue was introduced back in 2016 and every inode that is logged
fails to initialise this field. This is no actual bad behaviour
caused by this issue - I find it hard to even classify it as a
bug...

Reported-and-tested-by: Alexander Potapenko <glider@google.com>
Fixes: f8d55aa0523a ("xfs: introduce inode log format object")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_inode_item.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 91c847a84e10..2ec23c9af760 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -554,10 +554,13 @@ xfs_inode_to_log_dinode(
 		to->di_ino = ip->i_ino;
 		to->di_lsn = lsn;
 		memset(to->di_pad2, 0, sizeof(to->di_pad2));
 		uuid_copy(&to->di_uuid, &ip->i_mount->m_sb.sb_meta_uuid);
 		to->di_v3_pad = 0;
+
+		/* dummy value for initialisation */
+		to->di_crc = 0;
 	} else {
 		to->di_version = 2;
 		to->di_flushiter = ip->i_flushiter;
 		memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad));
 	}
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 24/29] xfs: add lock protection when remove perag from radix tree
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (22 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 23/29] xfs: initialise di_crc in xfs_log_dinode Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 25/29] xfs: fix perag leak when growfs fails Leah Rumancik
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Long Li, Christoph Hellwig, Darrick J. Wong,
	Chandan Babu R, Catherine Hoang, Greg Kroah-Hartman,
	Leah Rumancik

From: Long Li <leo.lilong@huawei.com>

[ Upstream commit 07afd3173d0c6d24a47441839a835955ec6cf0d4 ]

[ 6.1: resolved conflict in xfs_ag.c ]

Take mp->m_perag_lock for deletions from the perag radix tree in
xfs_initialize_perag to prevent racing with tagging operations.
Lookups are fine - they are RCU protected so already deal with the
tree changing shape underneath the lookup - but tagging operations
require the tree to be stable while the tags are propagated back up
to the root.

Right now there's nothing stopping radix tree tagging from operating
while a growfs operation is progress and adding/removing new entries
into the radix tree.

Hence we can have traversals that require a stable tree occurring at
the same time we are removing unused entries from the radix tree which
causes the shape of the tree to change.

Likely this hasn't caused a problem in the past because we are only
doing append addition and removal so the active AG part of the tree
is not changing shape, but that doesn't mean it is safe. Just making
the radix tree modifications serialise against each other is obviously
correct.

Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_ag.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index e03bfeacbed4..e7b011c42b7a 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -343,17 +343,21 @@ xfs_initialize_perag(
 
 	mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
 	return 0;
 
 out_remove_pag:
+	spin_lock(&mp->m_perag_lock);
 	radix_tree_delete(&mp->m_perag_tree, index);
+	spin_unlock(&mp->m_perag_lock);
 out_free_pag:
 	kmem_free(pag);
 out_unwind_new_pags:
 	/* unwind any prior newly initialized pags */
 	for (index = first_initialised; index < agcount; index++) {
+		spin_lock(&mp->m_perag_lock);
 		pag = radix_tree_delete(&mp->m_perag_tree, index);
+		spin_unlock(&mp->m_perag_lock);
 		if (!pag)
 			break;
 		xfs_buf_hash_destroy(pag);
 		kmem_free(pag);
 	}
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 25/29] xfs: fix perag leak when growfs fails
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (23 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 24/29] xfs: add lock protection when remove perag from radix tree Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 26/29] xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real Leah Rumancik
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Long Li, Darrick J. Wong, Chandan Babu R,
	Catherine Hoang, Greg Kroah-Hartman, Leah Rumancik

From: Long Li <leo.lilong@huawei.com>

[ Upstream commit 7823921887750b39d02e6b44faafdd1cc617c651 ]

[ 6.1: resolved conflicts in xfs_ag.c and xfs_ag.h ]

During growfs, if new ag in memory has been initialized, however
sb_agcount has not been updated, if an error occurs at this time it
will cause perag leaks as follows, these new AGs will not been freed
during umount , because of these new AGs are not visible(that is
included in mp->m_sb.sb_agcount).

unreferenced object 0xffff88810be40200 (size 512):
  comm "xfs_growfs", pid 857, jiffies 4294909093
  hex dump (first 32 bytes):
    00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00  ................
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 381741e2):
    [<ffffffff8191aef6>] __kmalloc+0x386/0x4f0
    [<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0
    [<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810
    [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
    [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
    [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
    [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
    [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
unreferenced object 0xffff88810be40800 (size 512):
  comm "xfs_growfs", pid 857, jiffies 4294909093
  hex dump (first 32 bytes):
    20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00   .......W.......
    10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff  ................
  backtrace (crc bde50e2d):
    [<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540
    [<ffffffff81814489>] kvmalloc_node+0x99/0x160
    [<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400
    [<ffffffff8286bdc5>] rhashtable_init+0x405/0x760
    [<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810
    [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
    [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
    [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
    [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
    [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a

Factor out xfs_free_unused_perag_range() from xfs_initialize_perag(),
used for freeing unused perag within a specified range in error handling,
included in the error path of the growfs failure.

Fixes: 1c1c6ebcf528 ("xfs: Replace per-ag array with a radix tree")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_ag.c | 34 +++++++++++++++++++++++++---------
 fs/xfs/libxfs/xfs_ag.h |  3 +++
 fs/xfs/xfs_fsops.c     |  5 ++++-
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index e7b011c42b7a..9743fa5b5388 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -257,10 +257,34 @@ xfs_agino_range(
 	xfs_agino_t		*last)
 {
 	return __xfs_agino_range(mp, xfs_ag_block_count(mp, agno), first, last);
 }
 
+/*
+ * Free perag within the specified AG range, it is only used to free unused
+ * perags under the error handling path.
+ */
+void
+xfs_free_unused_perag_range(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agstart,
+	xfs_agnumber_t		agend)
+{
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		index;
+
+	for (index = agstart; index < agend; index++) {
+		spin_lock(&mp->m_perag_lock);
+		pag = radix_tree_delete(&mp->m_perag_tree, index);
+		spin_unlock(&mp->m_perag_lock);
+		if (!pag)
+			break;
+		xfs_buf_hash_destroy(pag);
+		kmem_free(pag);
+	}
+}
+
 int
 xfs_initialize_perag(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agcount,
 	xfs_rfsblock_t		dblocks,
@@ -350,19 +374,11 @@ xfs_initialize_perag(
 	spin_unlock(&mp->m_perag_lock);
 out_free_pag:
 	kmem_free(pag);
 out_unwind_new_pags:
 	/* unwind any prior newly initialized pags */
-	for (index = first_initialised; index < agcount; index++) {
-		spin_lock(&mp->m_perag_lock);
-		pag = radix_tree_delete(&mp->m_perag_tree, index);
-		spin_unlock(&mp->m_perag_lock);
-		if (!pag)
-			break;
-		xfs_buf_hash_destroy(pag);
-		kmem_free(pag);
-	}
+	xfs_free_unused_perag_range(mp, first_initialised, agcount);
 	return error;
 }
 
 static int
 xfs_get_aghdr_buf(
diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 191b22b9a35b..eb84af1c8628 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -104,10 +104,13 @@ struct xfs_perag {
 	struct delayed_work	pag_blockgc_work;
 
 #endif /* __KERNEL__ */
 };
 
+
+void xfs_free_unused_perag_range(struct xfs_mount *mp, xfs_agnumber_t agstart,
+			xfs_agnumber_t agend);
 int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
 			xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi);
 int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
 void xfs_free_perag(struct xfs_mount *mp);
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 77b14f788214..96e9d64fbe62 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -151,11 +151,11 @@ xfs_growfs_data_private(
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
 			(delta > 0 ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
 			XFS_TRANS_RESERVE, &tp);
 	if (error)
-		return error;
+		goto out_free_unused_perag;
 
 	last_pag = xfs_perag_get(mp, oagcount - 1);
 	if (delta > 0) {
 		error = xfs_resizefs_init_new_ags(tp, &id, oagcount, nagcount,
 				delta, last_pag, &lastag_extended);
@@ -225,10 +225,13 @@ xfs_growfs_data_private(
 	}
 	return error;
 
 out_trans_cancel:
 	xfs_trans_cancel(tp);
+out_free_unused_perag:
+	if (nagcount > oagcount)
+		xfs_free_unused_perag_range(mp, oagcount, nagcount);
 	return error;
 }
 
 static int
 xfs_growfs_log_private(
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 26/29] xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (24 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 25/29] xfs: fix perag leak when growfs fails Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 27/29] xfs: update dir3 leaf block metadata after swap Leah Rumancik
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Jiachen Zhang, Christoph Hellwig, Darrick J. Wong,
	Chandan Babu R, Leah Rumancik

From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>

[ Upstream commit e6af9c98cbf0164a619d95572136bfb54d482dd6 ]

In the case of returning -ENOSPC, ensure logflagsp is initialized by 0.
Otherwise the caller __xfs_bunmapi will set uninitialized illegal
tmp_logflags value into xfs log, which might cause unpredictable error
in the log recovery procedure.

Also, remove the flags variable and set the *logflagsp directly, so that
the code should be more robust in the long run.

Fixes: 1b24b633aafe ("xfs: move some more code into xfs_bmap_del_extent_real")
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 73 +++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 42 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c99fd7fe242e..2a3b78032cb0 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4997,242 +4997,231 @@ xfs_bmap_del_extent_real(
 {
 	xfs_fsblock_t		del_endblock=0;	/* first block past del */
 	xfs_fileoff_t		del_endoff;	/* first offset past del */
 	int			do_fx;	/* free extent at end of routine */
 	int			error;	/* error return value */
-	int			flags = 0;/* inode logging flags */
 	struct xfs_bmbt_irec	got;	/* current extent entry */
 	xfs_fileoff_t		got_endoff;	/* first offset past got */
 	int			i;	/* temp state */
 	struct xfs_ifork	*ifp;	/* inode fork pointer */
 	xfs_mount_t		*mp;	/* mount structure */
 	xfs_filblks_t		nblks;	/* quota/sb block count */
 	xfs_bmbt_irec_t		new;	/* new record to be inserted */
 	/* REFERENCED */
 	uint			qfield;	/* quota field to update */
 	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
 	struct xfs_bmbt_irec	old;
 
+	*logflagsp = 0;
+
 	mp = ip->i_mount;
 	XFS_STATS_INC(mp, xs_del_exlist);
 
 	ifp = xfs_ifork_ptr(ip, whichfork);
 	ASSERT(del->br_blockcount > 0);
 	xfs_iext_get_extent(ifp, icur, &got);
 	ASSERT(got.br_startoff <= del->br_startoff);
 	del_endoff = del->br_startoff + del->br_blockcount;
 	got_endoff = got.br_startoff + got.br_blockcount;
 	ASSERT(got_endoff >= del_endoff);
 	ASSERT(!isnullstartblock(got.br_startblock));
 	qfield = 0;
-	error = 0;
 
 	/*
 	 * If it's the case where the directory code is running with no block
 	 * reservation, and the deleted block is in the middle of its extent,
 	 * and the resulting insert of an extent would cause transformation to
 	 * btree format, then reject it.  The calling code will then swap blocks
 	 * around instead.  We have to do this now, rather than waiting for the
 	 * conversion to btree format, since the transaction will be dirty then.
 	 */
 	if (tp->t_blk_res == 0 &&
 	    ifp->if_format == XFS_DINODE_FMT_EXTENTS &&
 	    ifp->if_nextents >= XFS_IFORK_MAXEXT(ip, whichfork) &&
 	    del->br_startoff > got.br_startoff && del_endoff < got_endoff)
 		return -ENOSPC;
 
-	flags = XFS_ILOG_CORE;
+	*logflagsp = XFS_ILOG_CORE;
 	if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) {
 		if (!(bflags & XFS_BMAPI_REMAP)) {
 			error = xfs_rtfree_blocks(tp, del->br_startblock,
 					del->br_blockcount);
 			if (error)
-				goto done;
+				return error;
 		}
 
 		do_fx = 0;
 		qfield = XFS_TRANS_DQ_RTBCOUNT;
 	} else {
 		do_fx = 1;
 		qfield = XFS_TRANS_DQ_BCOUNT;
 	}
 	nblks = del->br_blockcount;
 
 	del_endblock = del->br_startblock + del->br_blockcount;
 	if (cur) {
 		error = xfs_bmbt_lookup_eq(cur, &got, &i);
 		if (error)
-			goto done;
-		if (XFS_IS_CORRUPT(mp, i != 1)) {
-			error = -EFSCORRUPTED;
-			goto done;
-		}
+			return error;
+		if (XFS_IS_CORRUPT(mp, i != 1))
+			return -EFSCORRUPTED;
 	}
 
 	if (got.br_startoff == del->br_startoff)
 		state |= BMAP_LEFT_FILLING;
 	if (got_endoff == del_endoff)
 		state |= BMAP_RIGHT_FILLING;
 
 	switch (state & (BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) {
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
 		/*
 		 * Matches the whole extent.  Delete the entry.
 		 */
 		xfs_iext_remove(ip, icur, state);
 		xfs_iext_prev(ifp, icur);
 		ifp->if_nextents--;
 
-		flags |= XFS_ILOG_CORE;
+		*logflagsp |= XFS_ILOG_CORE;
 		if (!cur) {
-			flags |= xfs_ilog_fext(whichfork);
+			*logflagsp |= xfs_ilog_fext(whichfork);
 			break;
 		}
 		if ((error = xfs_btree_delete(cur, &i)))
-			goto done;
-		if (XFS_IS_CORRUPT(mp, i != 1)) {
-			error = -EFSCORRUPTED;
-			goto done;
-		}
+			return error;
+		if (XFS_IS_CORRUPT(mp, i != 1))
+			return -EFSCORRUPTED;
 		break;
 	case BMAP_LEFT_FILLING:
 		/*
 		 * Deleting the first part of the extent.
 		 */
 		got.br_startoff = del_endoff;
 		got.br_startblock = del_endblock;
 		got.br_blockcount -= del->br_blockcount;
 		xfs_iext_update_extent(ip, state, icur, &got);
 		if (!cur) {
-			flags |= xfs_ilog_fext(whichfork);
+			*logflagsp |= xfs_ilog_fext(whichfork);
 			break;
 		}
 		error = xfs_bmbt_update(cur, &got);
 		if (error)
-			goto done;
+			return error;
 		break;
 	case BMAP_RIGHT_FILLING:
 		/*
 		 * Deleting the last part of the extent.
 		 */
 		got.br_blockcount -= del->br_blockcount;
 		xfs_iext_update_extent(ip, state, icur, &got);
 		if (!cur) {
-			flags |= xfs_ilog_fext(whichfork);
+			*logflagsp |= xfs_ilog_fext(whichfork);
 			break;
 		}
 		error = xfs_bmbt_update(cur, &got);
 		if (error)
-			goto done;
+			return error;
 		break;
 	case 0:
 		/*
 		 * Deleting the middle of the extent.
 		 */
 
 		old = got;
 
 		got.br_blockcount = del->br_startoff - got.br_startoff;
 		xfs_iext_update_extent(ip, state, icur, &got);
 
 		new.br_startoff = del_endoff;
 		new.br_blockcount = got_endoff - del_endoff;
 		new.br_state = got.br_state;
 		new.br_startblock = del_endblock;
 
-		flags |= XFS_ILOG_CORE;
+		*logflagsp |= XFS_ILOG_CORE;
 		if (cur) {
 			error = xfs_bmbt_update(cur, &got);
 			if (error)
-				goto done;
+				return error;
 			error = xfs_btree_increment(cur, 0, &i);
 			if (error)
-				goto done;
+				return error;
 			cur->bc_rec.b = new;
 			error = xfs_btree_insert(cur, &i);
 			if (error && error != -ENOSPC)
-				goto done;
+				return error;
 			/*
 			 * If get no-space back from btree insert, it tried a
 			 * split, and we have a zero block reservation.  Fix up
 			 * our state and return the error.
 			 */
 			if (error == -ENOSPC) {
 				/*
 				 * Reset the cursor, don't trust it after any
 				 * insert operation.
 				 */
 				error = xfs_bmbt_lookup_eq(cur, &got, &i);
 				if (error)
-					goto done;
-				if (XFS_IS_CORRUPT(mp, i != 1)) {
-					error = -EFSCORRUPTED;
-					goto done;
-				}
+					return error;
+				if (XFS_IS_CORRUPT(mp, i != 1))
+					return -EFSCORRUPTED;
 				/*
 				 * Update the btree record back
 				 * to the original value.
 				 */
 				error = xfs_bmbt_update(cur, &old);
 				if (error)
-					goto done;
+					return error;
 				/*
 				 * Reset the extent record back
 				 * to the original value.
 				 */
 				xfs_iext_update_extent(ip, state, icur, &old);
-				flags = 0;
-				error = -ENOSPC;
-				goto done;
-			}
-			if (XFS_IS_CORRUPT(mp, i != 1)) {
-				error = -EFSCORRUPTED;
-				goto done;
+				*logflagsp = 0;
+				return -ENOSPC;
 			}
+			if (XFS_IS_CORRUPT(mp, i != 1))
+				return -EFSCORRUPTED;
 		} else
-			flags |= xfs_ilog_fext(whichfork);
+			*logflagsp |= xfs_ilog_fext(whichfork);
 
 		ifp->if_nextents++;
 		xfs_iext_next(ifp, icur);
 		xfs_iext_insert(ip, icur, &new, state);
 		break;
 	}
 
 	/* remove reverse mapping */
 	xfs_rmap_unmap_extent(tp, ip, whichfork, del);
 
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
 	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
 		} else {
 			error = __xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
 					XFS_AG_RESV_NONE,
 					((bflags & XFS_BMAPI_NODISCARD) ||
 					del->br_state == XFS_EXT_UNWRITTEN));
 			if (error)
-				goto done;
+				return error;
 		}
 	}
 
 	/*
 	 * Adjust inode # blocks in the file.
 	 */
 	if (nblks)
 		ip->i_nblocks -= nblks;
 	/*
 	 * Adjust quota data.
 	 */
 	if (qfield && !(bflags & XFS_BMAPI_REMAP))
 		xfs_trans_mod_dquot_byino(tp, ip, qfield, (long)-nblks);
 
-done:
-	*logflagsp = flags;
-	return error;
+	return 0;
 }
 
 /*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 27/29] xfs: update dir3 leaf block metadata after swap
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (25 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 26/29] xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 28/29] xfs: reset XFS_ATTR_INCOMPLETE filter on node removal Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 29/29] xfs: remove conditional building of rt geometry validator functions Leah Rumancik
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Zhang Tianci, Dave Chinner, Darrick J. Wong,
	Chandan Babu R, Leah Rumancik

From: Zhang Tianci <zhangtianci.1997@bytedance.com>

[ Upstream commit 5759aa4f956034b289b0ae2c99daddfc775442e1 ]

xfs_da3_swap_lastblock() copy the last block content to the dead block,
but do not update the metadata in it. We need update some metadata
for some kinds of type block, such as dir3 leafn block records its
blkno, we shall update it to the dead block blkno. Otherwise,
before write the xfs_buf to disk, the verify_write() will fail in
blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.

We will get this warning:

  XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
  XFS (dm-0): Unmount and run xfs_repair
  XFS (dm-0): First 128 bytes of corrupted metadata buffer:
  00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00  ........=.......
  000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00  ................
  000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf  .D...dDA..`.PC..
  00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00  .........s......
  00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00  .).8.....).@....
  000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00  .).H.....I......
  00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe  .I....E%.I....H.
  0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97  .I....Lk.I. ..M.
  XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c.  Return address = 00000000c0ff63c1
  XFS (dm-0): Corruption of in-memory data detected.  Shutting down filesystem
  XFS (dm-0): Please umount the filesystem and rectify the problem(s)

>From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
its blkno is 0x1a0.

Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_btree.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index e576560b46e9..282c7cf032f4 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2314,14 +2314,21 @@ xfs_da3_swap_lastblock(
 	error = xfs_da3_node_read(tp, dp, last_blkno, &last_buf, w);
 	if (error)
 		return error;
 	/*
 	 * Copy the last block into the dead buffer and log it.
+	 * On CRC-enabled file systems, also update the stamped in blkno.
 	 */
 	memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
+	if (xfs_has_crc(mp)) {
+		struct xfs_da3_blkinfo *da3 = dead_buf->b_addr;
+
+		da3->blkno = cpu_to_be64(xfs_buf_daddr(dead_buf));
+	}
 	xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
 	dead_info = dead_buf->b_addr;
+
 	/*
 	 * Get values from the moved block.
 	 */
 	if (dead_info->magic == cpu_to_be16(XFS_DIR2_LEAFN_MAGIC) ||
 	    dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 28/29] xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (26 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 27/29] xfs: update dir3 leaf block metadata after swap Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  2025-03-13 20:25 ` [PATCH 6.1 29/29] xfs: remove conditional building of rt geometry validator functions Leah Rumancik
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Andrey Albershteyn, Christoph Hellwig, Chandan Babu R,
	Leah Rumancik, Darrick J. Wong

From: Andrey Albershteyn <aalbersh@redhat.com>

[ Upstream commit 82ef1a5356572219f41f9123ca047259a77bd67b ]

In XFS_DAS_NODE_REMOVE_ATTR case, xfs_attr_mode_remove_attr() sets
filter to XFS_ATTR_INCOMPLETE. The filter is then reset in
xfs_attr_complete_op() if XFS_DA_OP_REPLACE operation is performed.

The filter is not reset though if XFS just removes the attribute
(args->value == NULL) with xfs_attr_defer_remove(). attr code goes
to XFS_DAS_DONE state.

Fix this by always resetting XFS_ATTR_INCOMPLETE filter. The replace
operation already resets this filter in anyway and others are
completed at this step hence don't need it.

Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index e28d93d232de..32d350e97e0f 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -419,14 +419,14 @@ xfs_attr_complete_op(
 {
 	struct xfs_da_args	*args = attr->xattri_da_args;
 	bool			do_replace = args->op_flags & XFS_DA_OP_REPLACE;
 
 	args->op_flags &= ~XFS_DA_OP_REPLACE;
-	if (do_replace) {
-		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+	args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+	if (do_replace)
 		return replace_state;
-	}
+
 	return XFS_DAS_DONE;
 }
 
 static int
 xfs_attr_leaf_addname(
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6.1 29/29] xfs: remove conditional building of rt geometry validator functions
  2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
                   ` (27 preceding siblings ...)
  2025-03-13 20:25 ` [PATCH 6.1 28/29] xfs: reset XFS_ATTR_INCOMPLETE filter on node removal Leah Rumancik
@ 2025-03-13 20:25 ` Leah Rumancik
  28 siblings, 0 replies; 45+ messages in thread
From: Leah Rumancik @ 2025-03-13 20:25 UTC (permalink / raw)
  To: stable
  Cc: xfs-stable, Darrick J. Wong, Christoph Hellwig, Chandan Babu R,
	Catherine Hoang, Greg Kroah-Hartman, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 881f78f472556ed05588172d5b5676b48dc48240 ]

[ 6.1: used 6.6 backport to minimize conflicts ]

[backport: resolve merge conflicts due to refactoring rtbitmap/summary
macros and accessors]

I mistakenly turned off CONFIG_XFS_RT in the Kconfig file for arm64
variant of the djwong-wtf git branch.  Unfortunately, it took me a good
hour to figure out that RT wasn't built because this is what got printed
to dmesg:

XFS (sda2): realtime geometry sanity check failed
XFS (sda2): Metadata corruption detected at xfs_sb_read_verify+0x170/0x190 [xfs], xfs_sb block 0x0

Whereas I would have expected:

XFS (sda2): Not built with CONFIG_XFS_RT
XFS (sda2): RT mount failed

The root cause of these problems is the conditional compilation of the
new functions xfs_validate_rtextents and xfs_compute_rextslog that I
introduced in the two commits listed below.  The !RT versions of these
functions return false and 0, respectively, which causes primary
superblock validation to fail, which explains the first message.

Move the two functions to other parts of libxfs that are not
conditionally defined by CONFIG_XFS_RT and remove the broken stubs so
that validation works again.

Fixes: e14293803f4e ("xfs: don't allow overly small or large realtime volumes")
Fixes: a6a38f309afc ("xfs: make rextslog computation consistent with mkfs")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_rtbitmap.c | 14 --------------
 fs/xfs/libxfs/xfs_rtbitmap.h | 16 ----------------
 fs/xfs/libxfs/xfs_sb.c       | 14 ++++++++++++++
 fs/xfs/libxfs/xfs_sb.h       |  2 ++
 fs/xfs/libxfs/xfs_types.h    | 12 ++++++++++++
 fs/xfs/scrub/rtbitmap.c      |  1 +
 6 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index 8db1243beacc..760172a65aff 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1129,19 +1129,5 @@ xfs_rtalloc_extent_is_free(
 
 	*is_free = matches;
 	return 0;
 }
 
-/*
- * Compute the maximum level number of the realtime summary file, as defined by
- * mkfs.  The historic use of highbit32 on a 64-bit quantity prohibited correct
- * use of rt volumes with more than 2^32 extents.
- */
-uint8_t
-xfs_compute_rextslog(
-	xfs_rtbxlen_t		rtextents)
-{
-	if (!rtextents)
-		return 0;
-	return xfs_highbit64(rtextents);
-}
-
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 4e49aadf0955..b89712983347 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -69,31 +69,15 @@ xfs_rtfree_extent(
 
 /* Same as above, but in units of rt blocks. */
 int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
 		xfs_filblks_t rtlen);
 
-uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
-
-/* Do we support an rt volume having this number of rtextents? */
-static inline bool
-xfs_validate_rtextents(
-	xfs_rtbxlen_t		rtextents)
-{
-	/* No runt rt volumes */
-	if (rtextents == 0)
-		return false;
-
-	return true;
-}
-
 #else /* CONFIG_XFS_RT */
 # define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
 # define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
 # define xfs_rtalloc_query_range(m,t,l,h,f,p)		(-ENOSYS)
 # define xfs_rtalloc_query_all(m,t,f,p)			(-ENOSYS)
 # define xfs_rtbuf_get(m,t,b,i,p)			(-ENOSYS)
 # define xfs_rtalloc_extent_is_free(m,t,s,l,i)		(-ENOSYS)
-# define xfs_compute_rextslog(rtx)			(0)
-# define xfs_validate_rtextents(rtx)			(false)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __XFS_RTBITMAP_H__ */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 04247d1c7523..90ed55cd3d10 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1365,5 +1365,19 @@ xfs_validate_stripe_geometry(
 				   swidth, sunit);
 		return false;
 	}
 	return true;
 }
+
+/*
+ * Compute the maximum level number of the realtime summary file, as defined by
+ * mkfs.  The historic use of highbit32 on a 64-bit quantity prohibited correct
+ * use of rt volumes with more than 2^32 extents.
+ */
+uint8_t
+xfs_compute_rextslog(
+	xfs_rtbxlen_t		rtextents)
+{
+	if (!rtextents)
+		return 0;
+	return xfs_highbit64(rtextents);
+}
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h
index 19134b23c10b..2e8e8d63d4eb 100644
--- a/fs/xfs/libxfs/xfs_sb.h
+++ b/fs/xfs/libxfs/xfs_sb.h
@@ -36,6 +36,8 @@ extern int	xfs_sb_get_secondary(struct xfs_mount *mp,
 				struct xfs_buf **bpp);
 
 extern bool	xfs_validate_stripe_geometry(struct xfs_mount *mp,
 		__s64 sunit, __s64 swidth, int sectorsize, bool silent);
 
+uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
+
 #endif	/* __XFS_SB_H__ */
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 7023e4a79f87..42fed04f038d 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -226,6 +226,18 @@ void xfs_icount_range(struct xfs_mount *mp, unsigned long long *min,
 		unsigned long long *max);
 bool xfs_verify_fileoff(struct xfs_mount *mp, xfs_fileoff_t off);
 bool xfs_verify_fileext(struct xfs_mount *mp, xfs_fileoff_t off,
 		xfs_fileoff_t len);
 
+/* Do we support an rt volume having this number of rtextents? */
+static inline bool
+xfs_validate_rtextents(
+	xfs_rtbxlen_t		rtextents)
+{
+	/* No runt rt volumes */
+	if (rtextents == 0)
+		return false;
+
+	return true;
+}
+
 #endif	/* __XFS_TYPES_H__ */
diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
index faf6e0890d44..fad7c353ada6 100644
--- a/fs/xfs/scrub/rtbitmap.c
+++ b/fs/xfs/scrub/rtbitmap.c
@@ -12,10 +12,11 @@
 #include "xfs_log_format.h"
 #include "xfs_trans.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_inode.h"
 #include "xfs_bmap.h"
+#include "xfs_sb.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 
 /* Set us up with the realtime metadata locked. */
 int
-- 
2.49.0.rc1.451.g8f38331e32-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2025-03-25 11:52 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-13 20:25 [PATCH 6.1 00/29] patches for 6.1.y from 6.8 Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 01/29] xfs: pass refcount intent directly through the log intent code Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 02/29] xfs: pass xfs_extent_free_item " Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 03/29] xfs: fix confusing xfs_extent_item variable names Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 04/29] xfs: pass the xfs_bmbt_irec directly through the log intent code Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 05/29] xfs: pass per-ag references to xfs_free_extent Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 06/29] xfs: validate block number being freed before adding to xefi Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 07/29] xfs: fix bounds check in xfs_defer_agfl_block() Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 08/29] xfs: use deferred frees for btree block freeing Leah Rumancik
2025-03-14 23:10   ` Sasha Levin
2025-03-13 20:25 ` [PATCH 6.1 09/29] xfs: reserve less log space when recovering log intent items Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 10/29] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 11/29] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 12/29] xfs: consider minlen sized extents in xfs_rtallocate_extent_block Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 13/29] xfs: don't leak recovered attri intent items Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover " Leah Rumancik
2025-03-21  8:39   ` Fedor Pchelkin
2025-03-21 17:42     ` Leah Rumancik
2025-03-22 14:27       ` Fedor Pchelkin
2025-03-24  0:29         ` Leah Rumancik
2025-03-24  8:53           ` Fedor Pchelkin
2025-03-24 21:10             ` Leah Rumancik
2025-03-25 11:50               ` Greg Kroah-Hartman
2025-03-13 20:25 ` [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 17/29] xfs: make rextslog computation consistent with mkfs Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 18/29] xfs: fix 32-bit truncation in xfs_compute_rextslog Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 19/29] xfs: don't allow overly small or large realtime volumes Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 20/29] xfs: remove unused fields from struct xbtree_ifakeroot Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 21/29] xfs: recompute growfsrtfree transaction reservation while growing rt volume Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 22/29] xfs: force all buffers to be written during btree bulk load Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 23/29] xfs: initialise di_crc in xfs_log_dinode Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 24/29] xfs: add lock protection when remove perag from radix tree Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 25/29] xfs: fix perag leak when growfs fails Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 26/29] xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 27/29] xfs: update dir3 leaf block metadata after swap Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 28/29] xfs: reset XFS_ATTR_INCOMPLETE filter on node removal Leah Rumancik
2025-03-13 20:25 ` [PATCH 6.1 29/29] xfs: remove conditional building of rt geometry validator functions Leah Rumancik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox