[PATCH] xfs: avoid inodegc worker flush deadlock

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xfs: avoid inodegc worker flush deadlock
@ 2026-03-28  7:12 ZhengYuan Huang
  2026-03-30  1:41 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: ZhengYuan Huang @ 2026-03-28  7:12 UTC (permalink / raw)
  To: cem, dchinner, djwong
  Cc: linux-xfs, linux-kernel, baijiaju1990, r33s3n6, zzzccc427,
	ZhengYuan Huang

[BUG]
WARNING: possible recursive locking detected
--------------------------------------------
kworker/0:1/10 is trying to acquire lock:
ffff88801621fd48 ((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x99/0x1c0 kernel/workqueue.c:3936

but task is already holding lock:
ffff88801621fd48 ((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at: process_one_work+0x1188/0x1980 kernel/workqueue.c:3238

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock((wq_completion)xfs-inodegc/ublkb1);
  lock((wq_completion)xfs-inodegc/ublkb1);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by kworker/0:1/10:
 #0: ffff88801621fd48 ((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at: process_one_work+0x1188/0x1980 kernel/workqueue.c:3238
 #1: ffff888009dafce8 ((work_completion)(&(&gc->work)->work)){+.+.}-{0:0}, at: process_one_work+0x865/0x1980 kernel/workqueue.c:3239

stack backtrace:
Workqueue: xfs-inodegc/ublkb1 xfs_inodegc_worker
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120
 dump_stack+0x15/0x20 lib/dump_stack.c:129
 print_deadlock_bug+0x23f/0x320 kernel/locking/lockdep.c:3041
 check_deadlock kernel/locking/lockdep.c:3093 [inline]
 validate_chain kernel/locking/lockdep.c:3895 [inline]
 __lock_acquire+0x1317/0x21e0 kernel/locking/lockdep.c:5237
 lock_acquire kernel/locking/lockdep.c:5868 [inline]
 lock_acquire+0x169/0x2f0 kernel/locking/lockdep.c:5825
 touch_wq_lockdep_map+0xab/0x1c0 kernel/workqueue.c:3936
 __flush_workqueue+0x117/0x1010 kernel/workqueue.c:3978
 xfs_inodegc_wait_all fs/xfs/xfs_icache.c:495 [inline]
 xfs_inodegc_flush+0x9a/0x390 fs/xfs/xfs_icache.c:2020
 xfs_blockgc_flush_all+0x106/0x250 fs/xfs/xfs_icache.c:1614
 xfs_trans_alloc+0x5e4/0xc10 fs/xfs/xfs_trans.c:268
 xfs_inactive_ifree+0x329/0x3c0 fs/xfs/xfs_inode.c:1224
 xfs_inactive+0x590/0xb60 fs/xfs/xfs_inode.c:1485
 xfs_inodegc_inactivate fs/xfs/xfs_icache.c:1942 [inline]
 xfs_inodegc_worker+0x241/0x650 fs/xfs/xfs_icache.c:1988
 process_one_work+0x8e0/0x1980 kernel/workqueue.c:3263
 process_scheduled_works kernel/workqueue.c:3346 [inline]
 worker_thread+0x683/0xf80 kernel/workqueue.c:3427
 kthread+0x3f0/0x850 kernel/kthread.c:463
 ret_from_fork+0x50f/0x610 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

[CAUSE]

If xfs_trans_alloc() hits -ENOSPC while xfs_inodegc_worker() is
inactivating an unlinked inode, the retry path runs
xfs_blockgc_flush_all() and recurses into xfs_inodegc_flush().
xfs_inodegc_wait_all() then calls flush_workqueue() on m_inodegc_wq
from an inodegc worker, which waits for the current in-flight work
item and deadlocks.

[FIX]

Detect when xfs_inodegc_wait_all() is running from an inodegc worker
and flush every other per-cpu inodegc work item directly instead of
flushing the whole workqueue. This preserves the intent of waiting
for background inodegc reclaim while avoiding recursion on the current
worker. Also collect inodegc errors from all possible CPUs because
running workers clear their cpumask bit before processing inodes.

Fixes: d4d12c02bf5f ("xfs: collect errors from inodegc for unlinked inode recovery")
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
---
 fs/xfs/xfs_icache.c | 50 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index e44040206851..cdb707332b4b 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -484,16 +484,64 @@ xfs_inodegc_queue_all(
 	return ret;
 }
 
+/*
+ * flush_workqueue() waits for all in-flight work items, including the current
+ * one.  If xfs_trans_alloc() hits ENOSPC while an inodegc worker is freeing an
+ * unlinked inode, xfs_blockgc_flush_all() recurses into xfs_inodegc_flush().
+ * Waiting for the current worker there deadlocks because the flush cannot
+ * complete until this work function returns.
+ */
+static struct xfs_inodegc *
+xfs_inodegc_current(struct xfs_mount *mp)
+{
+	struct work_struct	*work = current_work();
+	int			cpu;
+
+	if (!work)
+		return NULL;
+
+	for_each_possible_cpu(cpu) {
+		struct xfs_inodegc	*gc = per_cpu_ptr(mp->m_inodegc, cpu);
+
+		if (work == &gc->work.work)
+			return gc;
+	}
+
+	return NULL;
+}
+
 /* Wait for all queued work and collect errors */
 static int
 xfs_inodegc_wait_all(
 	struct xfs_mount	*mp)
 {
+	struct xfs_inodegc	*current_gc = xfs_inodegc_current(mp);
 	int			cpu;
 	int			error = 0;
 
+	if (current_gc) {
+		/*
+		 * current_gc is already in flight, so waiting for the whole
+		 * workqueue would recurse on ourselves.  Flush every other
+		 * per-cpu work item instead so that ENOSPC retries still wait
+		 * for the rest of the inodegc work to finish.
+		 */
+		for_each_possible_cpu(cpu) {
+			struct xfs_inodegc	*gc;
+
+			gc = per_cpu_ptr(mp->m_inodegc, cpu);
+			if (gc == current_gc)
+				continue;
+			flush_delayed_work(&gc->work);
+			if (gc->error && !error)
+				error = gc->error;
+			gc->error = 0;
+		}
+		return error;
+	}
+
 	flush_workqueue(mp->m_inodegc_wq);
-	for_each_cpu(cpu, &mp->m_inodegc_cpumask) {
+	for_each_possible_cpu(cpu) {
 		struct xfs_inodegc	*gc;
 
 		gc = per_cpu_ptr(mp->m_inodegc, cpu);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: avoid inodegc worker flush deadlock
  2026-03-28  7:12 [PATCH] xfs: avoid inodegc worker flush deadlock ZhengYuan Huang
@ 2026-03-30  1:41 ` Dave Chinner
  2026-03-30  2:40   ` ZhengYuan Huang
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2026-03-30  1:41 UTC (permalink / raw)
  To: ZhengYuan Huang
  Cc: cem, dchinner, djwong, linux-xfs, linux-kernel, baijiaju1990,
	r33s3n6, zzzccc427

On Sat, Mar 28, 2026 at 03:12:51PM +0800, ZhengYuan Huang wrote:
> [BUG]
> WARNING: possible recursive locking detected
> --------------------------------------------
> kworker/0:1/10 is trying to acquire lock:
> ffff88801621fd48 ((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x99/0x1c0 kernel/workqueue.c:3936
> 
> but task is already holding lock:
> ffff88801621fd48 ((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at: process_one_work+0x1188/0x1980 kernel/workqueue.c:3238
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock((wq_completion)xfs-inodegc/ublkb1);
>   lock((wq_completion)xfs-inodegc/ublkb1);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 2 locks held by kworker/0:1/10:
>  #0: ffff88801621fd48 ((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at: process_one_work+0x1188/0x1980 kernel/workqueue.c:3238
>  #1: ffff888009dafce8 ((work_completion)(&(&gc->work)->work)){+.+.}-{0:0}, at: process_one_work+0x865/0x1980 kernel/workqueue.c:3239
> 
> stack backtrace:
> Workqueue: xfs-inodegc/ublkb1 xfs_inodegc_worker
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120
>  dump_stack+0x15/0x20 lib/dump_stack.c:129
>  print_deadlock_bug+0x23f/0x320 kernel/locking/lockdep.c:3041
>  check_deadlock kernel/locking/lockdep.c:3093 [inline]
>  validate_chain kernel/locking/lockdep.c:3895 [inline]
>  __lock_acquire+0x1317/0x21e0 kernel/locking/lockdep.c:5237
>  lock_acquire kernel/locking/lockdep.c:5868 [inline]
>  lock_acquire+0x169/0x2f0 kernel/locking/lockdep.c:5825
>  touch_wq_lockdep_map+0xab/0x1c0 kernel/workqueue.c:3936
>  __flush_workqueue+0x117/0x1010 kernel/workqueue.c:3978
>  xfs_inodegc_wait_all fs/xfs/xfs_icache.c:495 [inline]
>  xfs_inodegc_flush+0x9a/0x390 fs/xfs/xfs_icache.c:2020
>  xfs_blockgc_flush_all+0x106/0x250 fs/xfs/xfs_icache.c:1614
>  xfs_trans_alloc+0x5e4/0xc10 fs/xfs/xfs_trans.c:268
>  xfs_inactive_ifree+0x329/0x3c0 fs/xfs/xfs_inode.c:1224
>  xfs_inactive+0x590/0xb60 fs/xfs/xfs_inode.c:1485

How did the filesystem get to ENOSPC when freeing an inode?
That should not happen, so can you please describe what the system
was doing to trip over this issue?

i.e. the problem that needs to be understood and fixed here is
"freeing an inode should never see ENOSPC", not "inodegc should not
recurse"...

-Dave.
-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: avoid inodegc worker flush deadlock
  2026-03-30  1:41 ` Dave Chinner
@ 2026-03-30  2:40   ` ZhengYuan Huang
  2026-03-30 20:45     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: ZhengYuan Huang @ 2026-03-30  2:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: cem, dchinner, djwong, linux-xfs, linux-kernel, baijiaju1990,
	r33s3n6, zzzccc427

On Mon, Mar 30, 2026 at 9:41 AM Dave Chinner <dgc@kernel.org> wrote:
> How did the filesystem get to ENOSPC when freeing an inode?
> That should not happen, so can you please describe what the system
> was doing to trip over this issue?
>
> i.e. the problem that needs to be understood and fixed here is
> "freeing an inode should never see ENOSPC", not "inodegc should not
> recurse"...

Thanks for the reply.

This issue was found by our fuzzing tool, and we are still working on
a reliable reproducer.

From the logs we have so far, it appears that the filesystem may
already be falling back to m_finobt_nores during mount, before the
later inodegc/ifree path is reached. In particular, we observe
repeated per-AG reservation failures during mount, followed by:

ENOSPC reserving per-AG metadata pool, log recovery may fail.

Based on the current code, my understanding is that when
xfs_fs_reserve_ag_blocks fails, XFS can continue mounting in the
degraded m_finobt_nores mode. In this state, xfs_inactive_ifree may
later take the explicit reservation path, which seems like a plausible
way for ifree to encounter ENOSPC.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: avoid inodegc worker flush deadlock
  2026-03-30  2:40   ` ZhengYuan Huang
@ 2026-03-30 20:45     ` Dave Chinner
  2026-03-31  3:24       ` ZhengYuan Huang
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2026-03-30 20:45 UTC (permalink / raw)
  To: ZhengYuan Huang
  Cc: cem, dchinner, djwong, linux-xfs, linux-kernel, baijiaju1990,
	r33s3n6, zzzccc427

On Mon, Mar 30, 2026 at 10:40:13AM +0800, ZhengYuan Huang wrote:
> On Mon, Mar 30, 2026 at 9:41 AM Dave Chinner <dgc@kernel.org> wrote:
> > How did the filesystem get to ENOSPC when freeing an inode?
> > That should not happen, so can you please describe what the system
> > was doing to trip over this issue?
> >
> > i.e. the problem that needs to be understood and fixed here is
> > "freeing an inode should never see ENOSPC", not "inodegc should not
> > recurse"...
> 
> Thanks for the reply.
> 
> This issue was found by our fuzzing tool, and we are still working on
> a reliable reproducer.

Is this some new custom fuzzer tool, or just another private syzbot
instance?

More importantly: this is not a failure that anyone is likely to see
in production systems, right?

> From the logs we have so far, it appears that the filesystem may
> already be falling back to m_finobt_nores during mount, before the
> later inodegc/ifree path is reached.

Which means ifree would have dipped into the reserve block pool
because when mp->m_finobt_nores is set, we use XFS_TRANS_RESERVE for
the ifree transaction reservation.

> In particular, we observe
> repeated per-AG reservation failures during mount, followed by:
> 
> ENOSPC reserving per-AG metadata pool, log recovery may fail.

This error doesn't occur in isolation - what other errors were
reported?

Please post the entire log output from the start of the mount to the
actual reported failure. That way we know the same things as you do,
and can make more informed comments about the error rather than
having to rely on what you think is relevant.

> Based on the current code, my understanding is that when
> xfs_fs_reserve_ag_blocks fails, XFS can continue mounting in the
> degraded m_finobt_nores mode. In this state, xfs_inactive_ifree may
> later take the explicit reservation path, which seems like a plausible
> way for ifree to encounter ENOSPC.

The nores path sets XFS_TRANS_RESERVE, allowing it to dip into the
global reserve blocks pool to avoid ENOSPC in most situations.
However, if it gets ENOSPC, that means the reserve block pool is
empty, and whatever corruption the fuzzer introduced has produced
a filesystem that has zero space available to run transactions that
log recovery needs to run.

IOWs, if the fs is at ENOSPC, and the reserve pool is also empty,
then we can't run unlinked inode recovery or replay intents because
the transaction reservations will ENOSPC.  If that's the case, then
we should be detecting the ENOSPC situation and aborting log
recovery rather than trying to recover and hitting random ENOSPC
failures part way through.

i.e. I'm trying to understand the cause of the ENOSPC issue, because
that will determine how we need to detect whatever on-disk
corruption the fuzzer created to trigger this issue.

-Dave.
-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: avoid inodegc worker flush deadlock
  2026-03-30 20:45     ` Dave Chinner
@ 2026-03-31  3:24       ` ZhengYuan Huang
  0 siblings, 0 replies; 5+ messages in thread
From: ZhengYuan Huang @ 2026-03-31  3:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: cem, dchinner, djwong, linux-xfs, linux-kernel, baijiaju1990,
	r33s3n6, zzzccc427

On Tue, Mar 31, 2026 at 4:45 AM Dave Chinner <dgc@kernel.org> wrote:
> On Mon, Mar 30, 2026 at 10:40:13AM +0800, ZhengYuan Huang wrote:
> > This issue was found by our fuzzing tool, and we are still working on
> > a reliable reproducer.
>
> Is this some new custom fuzzer tool, or just another private syzbot
> instance?

This fuzzing tool is built on top of syzkaller and designed to test filesystem
stability and robustness by mutating filesystem metadata. To further
investigate, we have preserved the disk image that originally triggered
the issue. However, when attempting to reproduce, the filesystem
behaviour has changed in a way we haven't yet explained — we are
still looking into that.

> More importantly: this is not a failure that anyone is likely to see
> in production systems, right?

Our current understanding is that this is likely caused by on-disk
corruption introduced by fuzzing, rather than a normal runtime condition.
So it is probably not representative of production scenarios, although
we are not yet fully certain about all contributing factors.

> > In particular, we observe
> > repeated per-AG reservation failures during mount, followed by:
> >
> > ENOSPC reserving per-AG metadata pool, log recovery may fail.
>
> This error doesn't occur in isolation - what other errors were
> reported?
>
> Please post the entire log output from the start of the mount to the
> actual reported failure. That way we know the same things as you do,
> and can make more informed comments about the error rather than
> having to rely on what you think is relevant.

We are happy to share the complete log below (with some unrelated
noise removed for clarity). One additional note: the -dirty tag in the
kernel version string is solely because we added a helper to dump
stack traces; no XFS code was modified.
====== LOG ======
[   46.833438] XFS (ublkb1): Mounting V5 Filesystem
ed0a5015-75fd-40a3-bcb7-39dc2d7bf4e3
[   46.905414] XFS (ublkb1): Reserve blocks depleted! Consider
increasing reserve pool size.
[   46.907124] XFS (ublkb1): Per-AG reservation for AG 0 failed.
Filesystem may run out of space.
[   46.910303] XFS (ublkb1): Per-AG reservation for AG 0 failed.
Filesystem may run out of space.
[   46.919427] XFS (ublkb1): Per-AG reservation for AG 1 failed.
Filesystem may run out of space.
[   46.921898] XFS (ublkb1): Per-AG reservation for AG 1 failed.
Filesystem may run out of space.
[   46.931705] XFS (ublkb1): Per-AG reservation for AG 2 failed.
Filesystem may run out of space.
[   46.933707] XFS (ublkb1): Per-AG reservation for AG 2 failed.
Filesystem may run out of space.
[   46.943612] XFS (ublkb1): Per-AG reservation for AG 3 failed.
Filesystem may run out of space.
[   46.945408] XFS (ublkb1): Per-AG reservation for AG 3 failed.
Filesystem may run out of space.
[   46.947176] XFS (ublkb1): ENOSPC reserving per-AG metadata pool,
log recovery may fail.
[   46.948802] XFS (ublkb1): Ending clean mount
[   46.960493] XFS (ublkb1): Per-AG reservation for AG 0 failed.
Filesystem may run out of space.
[   46.962052] XFS (ublkb1): Per-AG reservation for AG 0 failed.
Filesystem may run out of space.
[   46.973196] XFS (ublkb1): Per-AG reservation for AG 1 failed.
Filesystem may run out of space.
[   46.974662] XFS (ublkb1): Per-AG reservation for AG 1 failed.
Filesystem may run out of space.
[   46.982909] XFS (ublkb1): Per-AG reservation for AG 2 failed.
Filesystem may run out of space.
[   46.984336] XFS (ublkb1): Per-AG reservation for AG 2 failed.
Filesystem may run out of space.
[   46.992623] XFS (ublkb1): Per-AG reservation for AG 3 failed.
Filesystem may run out of space.
[   46.994431] XFS (ublkb1): Per-AG reservation for AG 3 failed.
Filesystem may run out of space.
[   46.996066] xfs filesystem being mounted at /mnt/xfs supports
timestamps until 2038-01-19 (0x7fffffff)
[  237.225031] XFS (ublkb1): Metadata corruption detected at
xfs_dinode_verify.part.0+0x4a9/0x1410, inode 0x41c0 dinode
[  237.226784] XFS (ublkb1): Unmount and run xfs_repair
[  237.227626] XFS (ublkb1): First 128 bytes of corrupted metadata buffer:
[  237.228643] 00000000: 49 4e e4 f5 03 02 00 00 00 00 00 00 00 00 00
00  IN..............
[  237.229832] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  237.230945] 00000020: ff ff ff f8 00 00 00 00 00 00 00 81 00 00 00
00  ................
[  237.232196] 00000030: 69 c0 b4 39 32 2f e6 80 00 00 00 00 00 00 00
00  i..92/..........
[  237.233454] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  237.234689] 00000050: 00 00 11 01 00 00 00 00 00 00 00 00 c2 4d 5a
de  .............MZ.
[  237.235911] 00000060: ff ff ff ff fb ec ad 7a 00 00 00 00 00 00 00
15  .......z........
[  237.237178] 00000070: 00 00 00 01 00 00 32 80 00 00 00 00 00 00 00
00  ......2.........
[  288.800496] XFS (ublkb1): Metadata corruption detected at
xfs_dinode_verify.part.0+0x4a9/0x1410, inode 0x41c0 dinode
[  288.802321] XFS (ublkb1): Unmount and run xfs_repair
[  288.803524] XFS (ublkb1): First 128 bytes of corrupted metadata buffer:
[  288.804591] 00000000: 49 4e e4 f5 03 02 00 00 00 00 00 00 00 00 00
00  IN..............
[  288.805850] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  288.807063] 00000020: ff ff ff f8 00 00 00 00 00 00 00 81 00 00 00
00  ................
[  288.808178] 00000030: 69 c0 b4 39 32 2f e6 80 00 00 00 00 00 00 00
00  i..92/..........
[  288.809262] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  288.810324] 00000050: 00 00 11 01 00 00 00 00 00 00 00 00 c2 4d 5a
de  .............MZ.
[  288.811528] 00000060: ff ff ff ff fb ec ad 7a 00 00 00 00 00 00 00
15  .......z........
[  288.812772] 00000070: 00 00 00 01 00 00 32 80 00 00 00 00 00 00 00
00  ......2.........
[  292.851051] Direct I/O collision with buffered writes! File: /mn�d
[  292.851051] j Comm: syz.0.505
[  295.532209] XFS (ublkb1): Metadata corruption detected at
xfs_dinode_verify.part.0+0x4a9/0x1410, inode 0x41c0 dinode
[  295.534296] XFS (ublkb1): Unmount and run xfs_repair
[  295.535341] XFS (ublkb1): First 128 bytes of corrupted metadata buffer:
[  295.536632] 00000000: 49 4e e4 f5 03 02 00 00 00 00 00 00 00 00 00
00  IN..............
[  295.538380] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  295.540379] 00000020: ff ff ff f8 00 00 00 00 00 00 00 81 00 00 00
00  ................
[  295.542385] 00000030: 69 c0 b4 39 32 2f e6 80 00 00 00 00 00 00 00
00  i..92/..........
[  295.544337] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  295.546382] 00000050: 00 00 11 01 00 00 00 00 00 00 00 00 c2 4d 5a
de  .............MZ.
[  295.548332] 00000060: ff ff ff ff fb ec ad 7a 00 00 00 00 00 00 00
15  .......z........
[  295.550370] 00000070: 00 00 00 01 00 00 32 80 00 00 00 00 00 00 00
00  ......2.........
[  354.190195] XFS (ublkb1): Metadata corruption detected at
xfs_dinode_verify.part.0+0x4a9/0x1410, inode 0x41c0 dinode
[  354.195110] XFS (ublkb1): Unmount and run xfs_repair
[  354.198969] XFS (ublkb1): First 128 bytes of corrupted metadata buffer:
[  354.199974] 00000000: 49 4e e4 f5 03 02 00 00 00 00 00 00 00 00 00
00  IN..............
[  354.201200] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  354.204952] 00000020: ff ff ff f8 00 00 00 00 00 00 00 81 00 00 00
00  ................
[  354.206459] 00000030: 69 c0 b4 39 32 2f e6 80 00 00 00 00 00 00 00
00  i..92/..........
[  354.207774] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  354.209031] 00000050: 00 00 11 01 00 00 00 00 00 00 00 00 c2 4d 5a
de  .............MZ.
[  354.210274] 00000060: ff ff ff ff fb ec ad 7a 00 00 00 00 00 00 00
15  .......z........
[  354.211474] 00000070: 00 00 00 01 00 00 32 80 00 00 00 00 00 00 00
00  ......2.........
[  367.277630] XFS (ublkb1): Metadata corruption detected at
xfs_dinode_verify.part.0+0x4a9/0x1410, inode 0x41c0 dinode
[  367.280014] XFS (ublkb1): Unmount and run xfs_repair
[  367.281661] XFS (ublkb1): First 128 bytes of corrupted metadata buffer:
[  367.283455] 00000000: 49 4e e4 f5 03 02 00 00 00 00 00 00 00 00 00
00  IN..............
[  367.284686] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  367.287382] 00000020: ff ff ff f8 00 00 00 00 00 00 00 81 00 00 00
00  ................
[  367.288729] 00000030: 69 c0 b4 39 32 2f e6 80 00 00 00 00 00 00 00
00  i..92/..........
[  367.289970] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
[  367.291261] 00000050: 00 00 11 01 00 00 00 00 00 00 00 00 c2 4d 5a
de  .............MZ.
[  367.292480] 00000060: ff ff ff ff fb ec ad 7a 00 00 00 00 00 00 00
15  .......z........
[  367.293746] 00000070: 00 00 00 01 00 00 32 80 00 00 00 00 00 00 00
00  ......2.........
[  409.688514]
[  409.688846] ============================================
[  409.689672] WARNING: possible recursive locking detected
[  409.690585] 6.18.0-dirty #1 Tainted: G           OE
[  409.691428] --------------------------------------------
[  409.692219] kworker/0:1/10 is trying to acquire lock:
[  409.693031] ffff88801621fd48
((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at:
touch_wq_lockdep_map+0x99/0x1c0
[  409.695062]
[  409.695062] but task is already holding lock:
[  409.696291] ffff88801621fd48
((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at:
process_one_work+0x1188/0x1980
[  409.697803]
[  409.697803] other info that might help us debug this:
[  409.698772]  Possible unsafe locking scenario:
[  409.698772]
[  409.699844]        CPU0
[  409.700360]        ----
[  409.700898]   lock((wq_completion)xfs-inodegc/ublkb1);
[  409.701919]   lock((wq_completion)xfs-inodegc/ublkb1);
[  409.702984]
[  409.702984]  *** DEADLOCK ***
[  409.702984]
[  409.704195]  May be due to missing lock nesting notation
[  409.704195]
[  409.705838] 2 locks held by kworker/0:1/10:
[  409.706725]  #0: ffff88801621fd48
((wq_completion)xfs-inodegc/ublkb1){+.+.}-{0:0}, at:
process_one_work+0x1188/0x1980
[  409.708916]  #1: ffff888009dafce8
((work_completion)(&(&gc->work)->work)){+.+.}-{0:0}, at:
process_one_work+0x865/0x1980
[  409.711101]
[  409.711101] stack backtrace:
[  409.712068] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G
     OE       6.18.0-dirty #1 PREEMPT(voluntary)
[  409.712095] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  409.712103] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX,
arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  409.712117] Workqueue: xfs-inodegc/ublkb1 xfs_inodegc_worker
[  409.712154] Call Trace:
[  409.712187]  <TASK>
[  409.712197]  dump_stack_lvl+0xbe/0x130
[  409.712231]  dump_stack+0x15/0x20
[  409.712252]  print_deadlock_bug+0x23f/0x320
[  409.712284]  __lock_acquire+0x1317/0x21e0
[  409.712326]  lock_acquire+0x169/0x2f0
[  409.712348]  ? touch_wq_lockdep_map+0x99/0x1c0
[  409.712373]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.712394]  ? lockdep_init_map_type+0x5c/0x280
[  409.712422]  ? touch_wq_lockdep_map+0x99/0x1c0
[  409.712444]  touch_wq_lockdep_map+0xab/0x1c0
[  409.712465]  ? touch_wq_lockdep_map+0x99/0x1c0
[  409.712490]  __flush_workqueue+0x117/0x1010
[  409.712519]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.712554]  ? xa_find+0x1d0/0x370
[  409.712585]  ? __pfx___flush_workqueue+0x10/0x10
[  409.712623]  ? xfs_inodegc_flush+0x26/0x390
[  409.712646]  ? xfs_blockgc_flush_all+0x106/0x250
[  409.712671]  xfs_inodegc_flush+0x9a/0x390
[  409.712699]  xfs_blockgc_flush_all+0x106/0x250
[  409.712725]  xfs_trans_alloc+0x5e4/0xc10
[  409.712740]  ? xfs_iunlock+0xb1/0x2e0
[  409.712768]  xfs_inactive_ifree+0x329/0x3c0
[  409.712790]  ? __pfx_xfs_inactive_ifree+0x10/0x10
[  409.712812]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.712853]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.712879]  ? xfs_qm_need_dqattach+0xed/0x290
[  409.712901]  xfs_inactive+0x590/0xb60
[  409.712921]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.712944]  ? __pfx_xfs_inactive+0x10/0x10
[  409.712964]  ? xfs_inodegc_worker+0x308/0x650
[  409.712989]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713011]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713031]  ? do_raw_spin_unlock+0x14b/0x200
[  409.713065]  xfs_inodegc_worker+0x241/0x650
[  409.713093]  process_one_work+0x8e0/0x1980
[  409.713132]  ? __pfx_process_one_work+0x10/0x10
[  409.713157]  ? move_linked_works+0x1a8/0x2c0
[  409.713182]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713203]  ? assign_work+0x19d/0x240
[  409.713226]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713247]  ? lock_is_held_type+0xa3/0x130
[  409.713272]  worker_thread+0x683/0xf80
[  409.713298]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713324]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713345]  ? __kasan_check_read+0x11/0x20
[  409.713387]  ? __pfx_worker_thread+0x10/0x10
[  409.713413]  kthread+0x3f0/0x850
[  409.713440]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713463]  ? __pfx_kthread+0x10/0x10
[  409.713486]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713507]  ? trace_hardirqs_on+0x53/0x60
[  409.713534]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713574]  ? _raw_spin_unlock_irq+0x27/0x70
[  409.713604]  ? srso_alias_return_thunk+0x5/0xfbef5
[  409.713625]  ? calculate_sigpending+0x7c/0xb0
[  409.713658]  ? __pfx_kthread+0x10/0x10
[  409.713685]  ret_from_fork+0x50f/0x610
[  409.713721]  ? __pfx_kthread+0x10/0x10
[  409.713747]  ret_from_fork_asm+0x1a/0x30
[  409.713784]  </TASK>

> > Based on the current code, my understanding is that when
> > xfs_fs_reserve_ag_blocks fails, XFS can continue mounting in the
> > degraded m_finobt_nores mode. In this state, xfs_inactive_ifree may
> > later take the explicit reservation path, which seems like a plausible
> > way for ifree to encounter ENOSPC.
>
> The nores path sets XFS_TRANS_RESERVE, allowing it to dip into the
> global reserve blocks pool to avoid ENOSPC in most situations.
> However, if it gets ENOSPC, that means the reserve block pool is
> empty, and whatever corruption the fuzzer introduced has produced
> a filesystem that has zero space available to run transactions that
> log recovery needs to run.
>
> IOWs, if the fs is at ENOSPC, and the reserve pool is also empty,
> then we can't run unlinked inode recovery or replay intents because
> the transaction reservations will ENOSPC.  If that's the case, then
> we should be detecting the ENOSPC situation and aborting log
> recovery rather than trying to recover and hitting random ENOSPC
> failures part way through.
>
> i.e. I'm trying to understand the cause of the ENOSPC issue, because
> that will determine how we need to detect whatever on-disk
> corruption the fuzzer created to trigger this issue.

Thanks for the detailed explanation — that makes sense.
For additional context, here is some on-disk free space information
from the image:

$ xfs_db -r -c "sb 0" -c "print fdblocks" /dev/vda
fdblocks = 3517

$ for ag in 0 1 2 3; do
echo "=== AG $ag ==="
xfs_db -r -c "agf $ag" -c "print freeblks" /dev/vda
done
=== AG 0 ===
freeblks = 622
=== AG 1 ===
freeblks = 630
=== AG 2 ===
freeblks = 1129
=== AG 3 ===
freeblks = 1076

Please let us know if there is anything else we can provide.

Thanks,
ZhengYuan Huang

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-31  3:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-28  7:12 [PATCH] xfs: avoid inodegc worker flush deadlock ZhengYuan Huang
2026-03-30  1:41 ` Dave Chinner
2026-03-30  2:40   ` ZhengYuan Huang
2026-03-30 20:45     ` Dave Chinner
2026-03-31  3:24       ` ZhengYuan Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox