public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix deadlock between busy flushing and t_busy
@ 2025-11-14 15:21 Haoqin Huang
  2025-11-15 15:50 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Haoqin Huang @ 2025-11-14 15:21 UTC (permalink / raw)
  To: chandan.babu, djwong, linux-xfs; +Cc: Haoqin Huang, Rongwei Wang

From: Haoqin Huang <haoqinhuang@tencent.com>

In case of insufficient disk space, the newly released blocks can be
allocated from free list. And in this scenario, file system will
search ag->pagb_tree (busy tree), and trim busy node if hits.
Immediately afterwards, xfs_extent_busy_flush() will be called to
flush logbuf to clear busy tree.

But a deadlock could be triggered by xfs_extent_busy_flush() if
current tp->t_busy and flush AG meet:

The current trans which t_busy is non-empty, and:
  1. The block B1, B2 all belong to AG A, and have inserted into
     current tp->t_busy;
  2. and AG A's busy tree (pagb_tree) only has the blocks coincidentally.
  2. xfs_extent_busy_flush() is flushing AG A.

In a short word, The trans flushing AG A, and also waiting AG A
to clear busy tree, but the only blocks of busy tree also in
this trans's t_busy. A deadlock appeared.

The detailed process of this deadlock:

xfs_reflink_end_cow()
xfs_trans_commit()
xfs_defer_finish_noroll()
  xfs_defer_finish_one()
    xfs_refcount_update_finish_item()    <== step1. cow alloc (tp1)
      __xfs_refcount_cow_alloc()
        xfs_refcountbt_free_block()
          xfs_extent_busy_insert()       <-- step2. x1 x2 insert tp1->t_busy
    <No trans roll>
    xfs_refcount_update_finish_item()    <== step3: cow free (tp1)
      __xfs_refcount_cow_free()
        xfs_refcount_adjust_cow()
          xfs_refcount_split_extent()
            xfs_refcount_insert()
              xfs_alloc_ag_vextent_near()
                xfs_extent_busy_flush()  <-- step4: flush but current tp1->t_busy
                                             only has x1 x2 which incurs ag3's
                                             busy_gen can't increase.

Actually, commit 8ebbf262d468 ("xfs: don't block in busy flushing when freeing extents")
had fix a similar deadlock. But current flush routine doesn't
handle -EAGAIN roll back rightly.

The solution of the deadlock is fix xfs_extent_busy_flush() to
return -EAGAIN before deadlock, and make sure xfs_refcount_split_extent()
save the original extent and roll it back when it's callee
return -EAGAIN.

Signed-off-by: Haoqin Huang <haoqinhuang@tencent.com>
Signed-off-by: Rongwei Wang <zigiwang@tencent.com>
---
 fs/xfs/libxfs/xfs_refcount.c | 15 ++++++++++++++-
 fs/xfs/xfs_bmap_item.c       |  3 +++
 fs/xfs/xfs_extent_busy.c     |  7 +++++++
 fs/xfs/xfs_refcount_item.c   |  2 ++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 2484dc9f6d7e..0ecda9df40ec 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -429,6 +429,7 @@ xfs_refcount_split_extent(
 	struct xfs_refcount_irec	rcext, tmp;
 	int				found_rec;
 	int				error;
+	struct xfs_refcount_irec	saved_rcext;
 
 	*shape_changed = false;
 	error = xfs_refcount_lookup_le(cur, domain, agbno, &found_rec);
@@ -453,6 +454,9 @@ xfs_refcount_split_extent(
 	*shape_changed = true;
 	trace_xfs_refcount_split_extent(cur, &rcext, agbno);
 
+	/* Save the original extent for potential rollback */
+	saved_rcext = rcext;
+
 	/* Establish the right extent. */
 	tmp = rcext;
 	tmp.rc_startblock = agbno;
@@ -465,8 +469,17 @@ xfs_refcount_split_extent(
 	tmp = rcext;
 	tmp.rc_blockcount = agbno - rcext.rc_startblock;
 	error = xfs_refcount_insert(cur, &tmp, &found_rec);
-	if (error)
+	if (error) {
+		/*
+		 * If failed to insert the left extent, we need to restore the
+		 * right extent to its original state to maintain consistency.
+		 */
+		int ret = xfs_refcount_update(cur, &saved_rcext);
+
+		if (ret)
+			error = ret;
 		goto out_error;
+	}
 	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
 		xfs_btree_mark_sick(cur);
 		error = -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 80f0c4bcc483..1b5241bfc304 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -403,6 +403,9 @@ xfs_bmap_update_finish_item(
 		ASSERT(bi->bi_type == XFS_BMAP_UNMAP);
 		return -EAGAIN;
 	}
+	/* trigger a trans roll. */
+	if (error == -EAGAIN)
+		return error;
 
 	xfs_bmap_update_cancel_item(item);
 	return error;
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
index da3161572735..c4dabee0c40d 100644
--- a/fs/xfs/xfs_extent_busy.c
+++ b/fs/xfs/xfs_extent_busy.c
@@ -631,6 +631,13 @@ xfs_extent_busy_flush(
 
 		if (alloc_flags & XFS_ALLOC_FLAG_FREEING)
 			return -EAGAIN;
+
+		/*
+		 * To avoid deadlocks if alloc_flags without any FLAG set
+		 * and t_busy is not empty.
+		 */
+		if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
+			return -EAGAIN;
 	}
 
 	/* Wait for committed busy extents to resolve. */
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 3728234699a2..1932ce86090b 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -416,6 +416,8 @@ xfs_refcount_update_finish_item(
 		       ri->ri_type == XFS_REFCOUNT_DECREASE);
 		return -EAGAIN;
 	}
+	if (error == -EAGAIN)
+		return error;
 
 	xfs_refcount_update_cancel_item(item);
 	return error;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-24 21:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 15:21 [PATCH] xfs: fix deadlock between busy flushing and t_busy Haoqin Huang
2025-11-15 15:50 ` kernel test robot
2025-11-15 15:50 ` kernel test robot
2025-11-16 13:03 ` Dave Chinner
2025-11-22  9:41   ` haoqin huang
2025-11-24  4:13     ` Jinliang Zheng
2025-11-24  7:07       ` haoqin huang
     [not found]   ` <CAEjiKSmvzr1M-2=5SPA0hQo1-442t-_zLB3zBj1jqWoZ0tMCUQ@mail.gmail.com>
2025-11-24 21:18     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox