[PATCH] xfs: fix deadlock between busy flushing and t

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xfs: fix deadlock between busy flushing and t_busy
@ 2025-11-14 15:21 Haoqin Huang
  2025-11-15 15:50 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Haoqin Huang @ 2025-11-14 15:21 UTC (permalink / raw)
  To: chandan.babu, djwong, linux-xfs; +Cc: Haoqin Huang, Rongwei Wang

From: Haoqin Huang <haoqinhuang@tencent.com>

In case of insufficient disk space, the newly released blocks can be
allocated from free list. And in this scenario, file system will
search ag->pagb_tree (busy tree), and trim busy node if hits.
Immediately afterwards, xfs_extent_busy_flush() will be called to
flush logbuf to clear busy tree.

But a deadlock could be triggered by xfs_extent_busy_flush() if
current tp->t_busy and flush AG meet:

The current trans which t_busy is non-empty, and:
  1. The block B1, B2 all belong to AG A, and have inserted into
     current tp->t_busy;
  2. and AG A's busy tree (pagb_tree) only has the blocks coincidentally.
  2. xfs_extent_busy_flush() is flushing AG A.

In a short word, The trans flushing AG A, and also waiting AG A
to clear busy tree, but the only blocks of busy tree also in
this trans's t_busy. A deadlock appeared.

The detailed process of this deadlock:

xfs_reflink_end_cow()
xfs_trans_commit()
xfs_defer_finish_noroll()
  xfs_defer_finish_one()
    xfs_refcount_update_finish_item()    <== step1. cow alloc (tp1)
      __xfs_refcount_cow_alloc()
        xfs_refcountbt_free_block()
          xfs_extent_busy_insert()       <-- step2. x1 x2 insert tp1->t_busy
    <No trans roll>
    xfs_refcount_update_finish_item()    <== step3: cow free (tp1)
      __xfs_refcount_cow_free()
        xfs_refcount_adjust_cow()
          xfs_refcount_split_extent()
            xfs_refcount_insert()
              xfs_alloc_ag_vextent_near()
                xfs_extent_busy_flush()  <-- step4: flush but current tp1->t_busy
                                             only has x1 x2 which incurs ag3's
                                             busy_gen can't increase.

Actually, commit 8ebbf262d468 ("xfs: don't block in busy flushing when freeing extents")
had fix a similar deadlock. But current flush routine doesn't
handle -EAGAIN roll back rightly.

The solution of the deadlock is fix xfs_extent_busy_flush() to
return -EAGAIN before deadlock, and make sure xfs_refcount_split_extent()
save the original extent and roll it back when it's callee
return -EAGAIN.

Signed-off-by: Haoqin Huang <haoqinhuang@tencent.com>
Signed-off-by: Rongwei Wang <zigiwang@tencent.com>
---
 fs/xfs/libxfs/xfs_refcount.c | 15 ++++++++++++++-
 fs/xfs/xfs_bmap_item.c       |  3 +++
 fs/xfs/xfs_extent_busy.c     |  7 +++++++
 fs/xfs/xfs_refcount_item.c   |  2 ++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 2484dc9f6d7e..0ecda9df40ec 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -429,6 +429,7 @@ xfs_refcount_split_extent(
 	struct xfs_refcount_irec	rcext, tmp;
 	int				found_rec;
 	int				error;
+	struct xfs_refcount_irec	saved_rcext;
 
 	*shape_changed = false;
 	error = xfs_refcount_lookup_le(cur, domain, agbno, &found_rec);
@@ -453,6 +454,9 @@ xfs_refcount_split_extent(
 	*shape_changed = true;
 	trace_xfs_refcount_split_extent(cur, &rcext, agbno);
 
+	/* Save the original extent for potential rollback */
+	saved_rcext = rcext;
+
 	/* Establish the right extent. */
 	tmp = rcext;
 	tmp.rc_startblock = agbno;
@@ -465,8 +469,17 @@ xfs_refcount_split_extent(
 	tmp = rcext;
 	tmp.rc_blockcount = agbno - rcext.rc_startblock;
 	error = xfs_refcount_insert(cur, &tmp, &found_rec);
-	if (error)
+	if (error) {
+		/*
+		 * If failed to insert the left extent, we need to restore the
+		 * right extent to its original state to maintain consistency.
+		 */
+		int ret = xfs_refcount_update(cur, &saved_rcext);
+
+		if (ret)
+			error = ret;
 		goto out_error;
+	}
 	if (XFS_IS_CORRUPT(cur->bc_mp, found_rec != 1)) {
 		xfs_btree_mark_sick(cur);
 		error = -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 80f0c4bcc483..1b5241bfc304 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -403,6 +403,9 @@ xfs_bmap_update_finish_item(
 		ASSERT(bi->bi_type == XFS_BMAP_UNMAP);
 		return -EAGAIN;
 	}
+	/* trigger a trans roll. */
+	if (error == -EAGAIN)
+		return error;
 
 	xfs_bmap_update_cancel_item(item);
 	return error;
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
index da3161572735..c4dabee0c40d 100644
--- a/fs/xfs/xfs_extent_busy.c
+++ b/fs/xfs/xfs_extent_busy.c
@@ -631,6 +631,13 @@ xfs_extent_busy_flush(
 
 		if (alloc_flags & XFS_ALLOC_FLAG_FREEING)
 			return -EAGAIN;
+
+		/*
+		 * To avoid deadlocks if alloc_flags without any FLAG set
+		 * and t_busy is not empty.
+		 */
+		if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
+			return -EAGAIN;
 	}
 
 	/* Wait for committed busy extents to resolve. */
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 3728234699a2..1932ce86090b 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -416,6 +416,8 @@ xfs_refcount_update_finish_item(
 		       ri->ri_type == XFS_REFCOUNT_DECREASE);
 		return -EAGAIN;
 	}
+	if (error == -EAGAIN)
+		return error;
 
 	xfs_refcount_update_cancel_item(item);
 	return error;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
  2025-11-14 15:21 [PATCH] xfs: fix deadlock between busy flushing and t_busy Haoqin Huang
@ 2025-11-15 15:50 ` kernel test robot
  2025-11-15 15:50 ` kernel test robot
  2025-11-16 13:03 ` Dave Chinner
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-11-15 15:50 UTC (permalink / raw)
  To: Haoqin Huang, chandan.babu, djwong, linux-xfs
  Cc: llvm, oe-kbuild-all, Haoqin Huang, Rongwei Wang

Hi Haoqin,

kernel test robot noticed the following build errors:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on linus/master v6.18-rc5 next-20251114]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Haoqin-Huang/xfs-fix-deadlock-between-busy-flushing-and-t_busy/20251115-002142
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link:    https://lore.kernel.org/r/20251114152147.66688-1-haoqinhuang7%40gmail.com
patch subject: [PATCH] xfs: fix deadlock between busy flushing and t_busy
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20251115/202511152311.UvZyd6Gi-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251115/202511152311.UvZyd6Gi-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511152311.UvZyd6Gi-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
     639 |                 if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
         |                                                           ^
>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
>> fs/xfs/xfs_extent_busy.c:639:45: error: use of undeclared identifier 'pag'
   7 errors generated.


vim +/pag +639 fs/xfs/xfs_extent_busy.c

   594	
   595	/*
   596	 * Flush out all busy extents for this group.
   597	 *
   598	 * If the current transaction is holding busy extents, the caller may not want
   599	 * to wait for committed busy extents to resolve. If we are being told just to
   600	 * try a flush or progress has been made since we last skipped a busy extent,
   601	 * return immediately to allow the caller to try again.
   602	 *
   603	 * If we are freeing extents, we might actually be holding the only free extents
   604	 * in the transaction busy list and the log force won't resolve that situation.
   605	 * In this case, we must return -EAGAIN to avoid a deadlock by informing the
   606	 * caller it needs to commit the busy extents it holds before retrying the
   607	 * extent free operation.
   608	 */
   609	int
   610	xfs_extent_busy_flush(
   611		struct xfs_trans	*tp,
   612		struct xfs_group	*xg,
   613		unsigned		busy_gen,
   614		uint32_t		alloc_flags)
   615	{
   616		struct xfs_extent_busy_tree *eb = xg->xg_busy_extents;
   617		DEFINE_WAIT		(wait);
   618		int			error;
   619	
   620		error = xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
   621		if (error)
   622			return error;
   623	
   624		/* Avoid deadlocks on uncommitted busy extents. */
   625		if (!list_empty(&tp->t_busy)) {
   626			if (alloc_flags & XFS_ALLOC_FLAG_TRYFLUSH)
   627				return 0;
   628	
   629			if (busy_gen != READ_ONCE(eb->eb_gen))
   630				return 0;
   631	
   632			if (alloc_flags & XFS_ALLOC_FLAG_FREEING)
   633				return -EAGAIN;
   634	
   635			/*
   636			 * To avoid deadlocks if alloc_flags without any FLAG set
   637			 * and t_busy is not empty.
   638			 */
 > 639			if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
   640				return -EAGAIN;
   641		}
   642	
   643		/* Wait for committed busy extents to resolve. */
   644		do {
   645			prepare_to_wait(&eb->eb_wait, &wait, TASK_KILLABLE);
   646			if  (busy_gen != READ_ONCE(eb->eb_gen))
   647				break;
   648			schedule();
   649		} while (1);
   650	
   651		finish_wait(&eb->eb_wait, &wait);
   652		return 0;
   653	}
   654	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
  2025-11-14 15:21 [PATCH] xfs: fix deadlock between busy flushing and t_busy Haoqin Huang
  2025-11-15 15:50 ` kernel test robot
@ 2025-11-15 15:50 ` kernel test robot
  2025-11-16 13:03 ` Dave Chinner
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-11-15 15:50 UTC (permalink / raw)
  To: Haoqin Huang, chandan.babu, djwong, linux-xfs
  Cc: oe-kbuild-all, Haoqin Huang, Rongwei Wang

Hi Haoqin,

kernel test robot noticed the following build errors:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on linus/master v6.18-rc5 next-20251114]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Haoqin-Huang/xfs-fix-deadlock-between-busy-flushing-and-t_busy/20251115-002142
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link:    https://lore.kernel.org/r/20251114152147.66688-1-haoqinhuang7%40gmail.com
patch subject: [PATCH] xfs: fix deadlock between busy flushing and t_busy
config: m68k-mac_defconfig (https://download.01.org/0day-ci/archive/20251115/202511152332.yaXc6tmQ-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251115/202511152332.yaXc6tmQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511152332.yaXc6tmQ-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
   fs/xfs/xfs_extent_busy.c: In function 'xfs_extent_busy_flush':
>> fs/xfs/xfs_extent_busy.c:639:59: error: 'pag' undeclared (first use in this function); did you mean 'page'?
     639 |                 if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
         |                                                           ^~~
   include/linux/compiler_types.h:577:23: note: in definition of macro '__compiletime_assert'
     577 |                 if (!(condition))                                       \
         |                       ^~~~~~~~~
   include/linux/compiler_types.h:597:9: note: in expansion of macro '_compiletime_assert'
     597 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:28: note: in expansion of macro '__native_word'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |                            ^~~~~~~~~~~~~
   include/asm-generic/rwonce.h:49:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      49 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/xfs_extent_busy.c:639:49: note: in expansion of macro 'READ_ONCE'
     639 |                 if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
         |                                                 ^~~~~~~~~
   fs/xfs/xfs_extent_busy.c:639:59: note: each undeclared identifier is reported only once for each function it appears in
     639 |                 if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
         |                                                           ^~~
   include/linux/compiler_types.h:577:23: note: in definition of macro '__compiletime_assert'
     577 |                 if (!(condition))                                       \
         |                       ^~~~~~~~~
   include/linux/compiler_types.h:597:9: note: in expansion of macro '_compiletime_assert'
     597 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:28: note: in expansion of macro '__native_word'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |                            ^~~~~~~~~~~~~
   include/asm-generic/rwonce.h:49:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      49 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/xfs/xfs_extent_busy.c:639:49: note: in expansion of macro 'READ_ONCE'
     639 |                 if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
         |                                                 ^~~~~~~~~


vim +639 fs/xfs/xfs_extent_busy.c

   594	
   595	/*
   596	 * Flush out all busy extents for this group.
   597	 *
   598	 * If the current transaction is holding busy extents, the caller may not want
   599	 * to wait for committed busy extents to resolve. If we are being told just to
   600	 * try a flush or progress has been made since we last skipped a busy extent,
   601	 * return immediately to allow the caller to try again.
   602	 *
   603	 * If we are freeing extents, we might actually be holding the only free extents
   604	 * in the transaction busy list and the log force won't resolve that situation.
   605	 * In this case, we must return -EAGAIN to avoid a deadlock by informing the
   606	 * caller it needs to commit the busy extents it holds before retrying the
   607	 * extent free operation.
   608	 */
   609	int
   610	xfs_extent_busy_flush(
   611		struct xfs_trans	*tp,
   612		struct xfs_group	*xg,
   613		unsigned		busy_gen,
   614		uint32_t		alloc_flags)
   615	{
   616		struct xfs_extent_busy_tree *eb = xg->xg_busy_extents;
   617		DEFINE_WAIT		(wait);
   618		int			error;
   619	
   620		error = xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
   621		if (error)
   622			return error;
   623	
   624		/* Avoid deadlocks on uncommitted busy extents. */
   625		if (!list_empty(&tp->t_busy)) {
   626			if (alloc_flags & XFS_ALLOC_FLAG_TRYFLUSH)
   627				return 0;
   628	
   629			if (busy_gen != READ_ONCE(eb->eb_gen))
   630				return 0;
   631	
   632			if (alloc_flags & XFS_ALLOC_FLAG_FREEING)
   633				return -EAGAIN;
   634	
   635			/*
   636			 * To avoid deadlocks if alloc_flags without any FLAG set
   637			 * and t_busy is not empty.
   638			 */
 > 639			if (!alloc_flags && busy_gen == READ_ONCE(pag->pagb_gen))
   640				return -EAGAIN;
   641		}
   642	
   643		/* Wait for committed busy extents to resolve. */
   644		do {
   645			prepare_to_wait(&eb->eb_wait, &wait, TASK_KILLABLE);
   646			if  (busy_gen != READ_ONCE(eb->eb_gen))
   647				break;
   648			schedule();
   649		} while (1);
   650	
   651		finish_wait(&eb->eb_wait, &wait);
   652		return 0;
   653	}
   654	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
  2025-11-14 15:21 [PATCH] xfs: fix deadlock between busy flushing and t_busy Haoqin Huang
  2025-11-15 15:50 ` kernel test robot
  2025-11-15 15:50 ` kernel test robot
@ 2025-11-16 13:03 ` Dave Chinner
  2025-11-22  9:41   ` haoqin huang
       [not found]   ` <CAEjiKSmvzr1M-2=5SPA0hQo1-442t-_zLB3zBj1jqWoZ0tMCUQ@mail.gmail.com>
  2 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2025-11-16 13:03 UTC (permalink / raw)
  To: Haoqin Huang; +Cc: chandan.babu, djwong, linux-xfs, Haoqin Huang, Rongwei Wang

On Fri, Nov 14, 2025 at 11:21:47PM +0800, Haoqin Huang wrote:
> From: Haoqin Huang <haoqinhuang@tencent.com>
> 
> In case of insufficient disk space, the newly released blocks can be
> allocated from free list. And in this scenario, file system will
> search ag->pagb_tree (busy tree), and trim busy node if hits.
> Immediately afterwards, xfs_extent_busy_flush() will be called to
> flush logbuf to clear busy tree.
> 
> But a deadlock could be triggered by xfs_extent_busy_flush() if
> current tp->t_busy and flush AG meet:
> 
> The current trans which t_busy is non-empty, and:
>   1. The block B1, B2 all belong to AG A, and have inserted into
>      current tp->t_busy;
>   2. and AG A's busy tree (pagb_tree) only has the blocks coincidentally.
>   2. xfs_extent_busy_flush() is flushing AG A.
> 
> In a short word, The trans flushing AG A, and also waiting AG A
> to clear busy tree, but the only blocks of busy tree also in
> this trans's t_busy. A deadlock appeared.
> 
> The detailed process of this deadlock:
> 
> xfs_reflink_end_cow()
> xfs_trans_commit()
> xfs_defer_finish_noroll()
>   xfs_defer_finish_one()
>     xfs_refcount_update_finish_item()    <== step1. cow alloc (tp1)
>       __xfs_refcount_cow_alloc()
>         xfs_refcountbt_free_block()
>           xfs_extent_busy_insert()       <-- step2. x1 x2 insert tp1->t_busy

We haven't done that for quite some time. See commit b742d7b4f0e0
("xfs: use deferred frees for btree block freeing").

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
  2025-11-16 13:03 ` Dave Chinner
@ 2025-11-22  9:41   ` haoqin huang
  2025-11-24  4:13     ` Jinliang Zheng
       [not found]   ` <CAEjiKSmvzr1M-2=5SPA0hQo1-442t-_zLB3zBj1jqWoZ0tMCUQ@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: haoqin huang @ 2025-11-22  9:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: chandan.babu, djwong, linux-xfs, Haoqin Huang, Rongwei Wang

Hi Dave,

Thanks for your reviews, and sorry for response lately.

I’m very agree that deferred frees largely resolved the deadlock issue.

Maybe I should split two parts of this patch to show my idea:
Part 1. fix fallback of xfs_refcount_split_extent()
It seems better to  fix a logic bug in xfs_refcount_split_extent().
When splitting an extent, we update the existing record before
inserting the new one. If the insertion fails, we currently return
without restoring the original record, leaving the btree in an
inconsistent state.

This part does not seem to be necessarily related to the
aforementioned deadlock.
Part 2. Robustify the rollback path to prevent deadlocks
The change to xfs_extent_busy_flush() is just added as a secondary
hardening measure for edge cases.
I’m not sure, but theoretically, the alloc_flag to be zero, then
entering a cycle with a high probability of deadlock.

I can post v2 if you agree, and any comments are welcome.

Thanks.

On Sun, Nov 16, 2025 at 9:03 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Nov 14, 2025 at 11:21:47PM +0800, Haoqin Huang wrote:
> > From: Haoqin Huang <haoqinhuang@tencent.com>
> >
> > In case of insufficient disk space, the newly released blocks can be
> > allocated from free list. And in this scenario, file system will
> > search ag->pagb_tree (busy tree), and trim busy node if hits.
> > Immediately afterwards, xfs_extent_busy_flush() will be called to
> > flush logbuf to clear busy tree.
> >
> > But a deadlock could be triggered by xfs_extent_busy_flush() if
> > current tp->t_busy and flush AG meet:
> >
> > The current trans which t_busy is non-empty, and:
> >   1. The block B1, B2 all belong to AG A, and have inserted into
> >      current tp->t_busy;
> >   2. and AG A's busy tree (pagb_tree) only has the blocks coincidentally.
> >   2. xfs_extent_busy_flush() is flushing AG A.
> >
> > In a short word, The trans flushing AG A, and also waiting AG A
> > to clear busy tree, but the only blocks of busy tree also in
> > this trans's t_busy. A deadlock appeared.
> >
> > The detailed process of this deadlock:
> >
> > xfs_reflink_end_cow()
> > xfs_trans_commit()
> > xfs_defer_finish_noroll()
> >   xfs_defer_finish_one()
> >     xfs_refcount_update_finish_item()    <== step1. cow alloc (tp1)
> >       __xfs_refcount_cow_alloc()
> >         xfs_refcountbt_free_block()
> >           xfs_extent_busy_insert()       <-- step2. x1 x2 insert tp1->t_busy
>
> We haven't done that for quite some time. See commit b742d7b4f0e0
> ("xfs: use deferred frees for btree block freeing").
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
  2025-11-22  9:41   ` haoqin huang
@ 2025-11-24  4:13     ` Jinliang Zheng
  2025-11-24  7:07       ` haoqin huang
  0 siblings, 1 reply; 8+ messages in thread
From: Jinliang Zheng @ 2025-11-24  4:13 UTC (permalink / raw)
  To: haoqinhuang7
  Cc: chandan.babu, david, djwong, haoqinhuang, linux-xfs, zigiwang

On Sat, 22 Nov 2025 17:41:49 +0800, haoqinhuang7@gmail.com wrote:
> Hi Dave,
> 
> Thanks for your reviews, and sorry for response lately.
> 
> I’m very agree that deferred frees largely resolved the deadlock issue.
> 
> Maybe I should split two parts of this patch to show my idea:
> Part 1. fix fallback of xfs_refcount_split_extent()
> It seems better to  fix a logic bug in xfs_refcount_split_extent().
> When splitting an extent, we update the existing record before
> inserting the new one. If the insertion fails, we currently return
> without restoring the original record, leaving the btree in an
> inconsistent state.

Is the error handling part in xfs_trans_commit() sufficient to handle
the situation you described? (I haven't carefully verified it.)

Jinliang Zheng. :)

> 
> This part does not seem to be necessarily related to the
> aforementioned deadlock.
> Part 2. Robustify the rollback path to prevent deadlocks
> The change to xfs_extent_busy_flush() is just added as a secondary
> hardening measure for edge cases.
> I’m not sure, but theoretically, the alloc_flag to be zero, then
> entering a cycle with a high probability of deadlock.
> 
> I can post v2 if you agree, and any comments are welcome.
> 
> Thanks.
> 
> On Sun, Nov 16, 2025 at 9:03 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Nov 14, 2025 at 11:21:47PM +0800, Haoqin Huang wrote:
> > > From: Haoqin Huang <haoqinhuang@tencent.com>
> > >
> > > In case of insufficient disk space, the newly released blocks can be
> > > allocated from free list. And in this scenario, file system will
> > > search ag->pagb_tree (busy tree), and trim busy node if hits.
> > > Immediately afterwards, xfs_extent_busy_flush() will be called to
> > > flush logbuf to clear busy tree.
> > >
> > > But a deadlock could be triggered by xfs_extent_busy_flush() if
> > > current tp->t_busy and flush AG meet:
> > >
> > > The current trans which t_busy is non-empty, and:
> > >   1. The block B1, B2 all belong to AG A, and have inserted into
> > >      current tp->t_busy;
> > >   2. and AG A's busy tree (pagb_tree) only has the blocks coincidentally.
> > >   2. xfs_extent_busy_flush() is flushing AG A.
> > >
> > > In a short word, The trans flushing AG A, and also waiting AG A
> > > to clear busy tree, but the only blocks of busy tree also in
> > > this trans's t_busy. A deadlock appeared.
> > >
> > > The detailed process of this deadlock:
> > >
> > > xfs_reflink_end_cow()
> > > xfs_trans_commit()
> > > xfs_defer_finish_noroll()
> > >   xfs_defer_finish_one()
> > >     xfs_refcount_update_finish_item()    <== step1. cow alloc (tp1)
> > >       __xfs_refcount_cow_alloc()
> > >         xfs_refcountbt_free_block()
> > >           xfs_extent_busy_insert()       <-- step2. x1 x2 insert tp1->t_busy
> >
> > We haven't done that for quite some time. See commit b742d7b4f0e0
> > ("xfs: use deferred frees for btree block freeing").
> >
> > -Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
  2025-11-24  4:13     ` Jinliang Zheng
@ 2025-11-24  7:07       ` haoqin huang
  0 siblings, 0 replies; 8+ messages in thread
From: haoqin huang @ 2025-11-24  7:07 UTC (permalink / raw)
  To: Jinliang Zheng
  Cc: chandan.babu, david, djwong, haoqinhuang, linux-xfs, zigiwang,
	rongwei.wrw

On Mon, Nov 24, 2025 at 12:13 PM Jinliang Zheng <alexjlzheng@gmail.com> wrote:
>
> On Sat, 22 Nov 2025 17:41:49 +0800, haoqinhuang7@gmail.com wrote:
> > Hi Dave,
> >
> > Thanks for your reviews, and sorry for response lately.
> >
> > I’m very agree that deferred frees largely resolved the deadlock issue.
> >
> > Maybe I should split two parts of this patch to show my idea:
> > Part 1. fix fallback of xfs_refcount_split_extent()
> > It seems better to  fix a logic bug in xfs_refcount_split_extent().
> > When splitting an extent, we update the existing record before
> > inserting the new one. If the insertion fails, we currently return
> > without restoring the original record, leaving the btree in an
> > inconsistent state.
>
> Is the error handling part in xfs_trans_commit() sufficient to handle
> the situation you described? (I haven't carefully verified it.)
>
> Jinliang Zheng. :)

Hi Jinliang,

Thanks for your suggestion, I couldn't find how xfs_trans_commit()
handles this issue. Maybe you can describe it in detail. What's more,
I see some assertions will be triggered if we don't restore this right
extent, likes:
xfs_refcount_split_extent() updates the existing right extent before
inserting the new one for the left extent. If insertion fails, the
right extent is left in an inconsistent state.

When xfs_defer_finish_noroll() attempts to retry the deferred
operation, because the tree was modified but not restored, the
cursor's path becomes invalid regarding the tree structure. This
triggers the ASSERT(0) in xfs_btree_increment():
if (lev == cur->bc_nlevels) {
if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
goto out0;
ASSERT(0);
……
}
IMO, it seems better to restore the original record in
xfs_refcount_split_extent() to ensure the btree remains consistent if
the insertion fails.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] xfs: fix deadlock between busy flushing and t_busy
       [not found]   ` <CAEjiKSmvzr1M-2=5SPA0hQo1-442t-_zLB3zBj1jqWoZ0tMCUQ@mail.gmail.com>
@ 2025-11-24 21:18     ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2025-11-24 21:18 UTC (permalink / raw)
  To: haoqin huang; +Cc: linux-xfs

[ Please reply on list, added linux-xfs back to cc list. ]

On Sat, Nov 22, 2025 at 05:28:25PM +0800, haoqin huang wrote:
> Hi Dave,
> 
> Thanks for your reviews, and sorry for response lately.
> 
> I’m very agree that deferred frees largely resolved the deadlock issue.
> 
> Maybe I should split two parts of this patch to show my idea:
> Part 1. fix fallback of xfs_refcount_split_extent()
> It seems better to  fix a logic bug in xfs_refcount_split_extent().
> When splitting an extent, we update the existing record before
> inserting the new one. If the insertion fails, we currently return
> without restoring the original record, leaving the btree in an
> inconsistent state.

Yes, because we can't actually recover from such a failure. We have
dirtied the transaction, so when it gets cancelled on an error
return, it will shut down the filesystem. Hence there is no point in
trying to unwind previous modifications in the transaction on
error...

> This part does not seem to be necessarily related to the
> aforementioned deadlock.

It isn't, but it is also unnecessary.

> Part 2. Robustify the rollback path to prevent deadlocks
> The change to xfs_extent_busy_flush() is just added as a secondary
> hardening measure for edge cases.
> I’m not sure, but theoretically, the alloc_flag to be zero, then
> entering a cycle with a high probability of deadlock.

We definitely allow alloc_flags to be zero - this is not a bug or
a behaviour that needs to be worked around. Those callers aren't
likely to also hold overlapping busy extents in the transaction
themselves, so there shouldn't be any deadlocks in those callers.

Really, it's the nature of the btree modifications (multiple record
updates (and hence block free/alloc) per transaction) that leads to
the potential deadlock. Only the BMBT and REFCNTBT trees have this
issue because they require allocation from the free space btrees
rather than the AGFL pool. However, multi-record updates of these
btrees are no longer deadlock prone because of the deferred freeing
of extents.

Hence, in all other contexts, it is generally safe to call
xfs_extent_busy_flush(0) and block waiting on busy extent processing
to complete. There is no reason to force callers like this to
implement an -EAGAIN loop around xfs_extent_busy_flush() in an
attempt to prevent deadlocks. i.e. if there is a deadlock, they'll
now just have to busy-loop trying to flush instead of sleeping
waiting for flush completion...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-24 21:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 15:21 [PATCH] xfs: fix deadlock between busy flushing and t_busy Haoqin Huang
2025-11-15 15:50 ` kernel test robot
2025-11-15 15:50 ` kernel test robot
2025-11-16 13:03 ` Dave Chinner
2025-11-22  9:41   ` haoqin huang
2025-11-24  4:13     ` Jinliang Zheng
2025-11-24  7:07       ` haoqin huang
     [not found]   ` <CAEjiKSmvzr1M-2=5SPA0hQo1-442t-_zLB3zBj1jqWoZ0tMCUQ@mail.gmail.com>
2025-11-24 21:18     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox