* [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc
@ 2026-03-12 7:22 Morduan Zang
2026-03-12 14:26 ` Darrick J. Wong
2026-03-12 20:25 ` Dave Chinner
0 siblings, 2 replies; 6+ messages in thread
From: Morduan Zang @ 2026-03-12 7:22 UTC (permalink / raw)
To: cem
Cc: zhanjun, hch, dchinner, stable, linux-xfs, linux-kernel,
Morduan Zang, syzbot+d78ace33ad4ee69329d5
__xfs_trans_alloc() allocates the transaction structure before
xfs_trans_set_context() establishes the nofs context. If memory reclaim
enters XFS through xfs_vn_sync_lazytime(), this GFP_KERNEL allocation can
trigger a warning from the reclaim path.
Use GFP_NOFS for the transaction allocation to avoid filesystem reclaim
recursion before the nofs context is set.
Link: https://syzkaller.appspot.com/bug?extid=d78ace33ad4ee69329d5
Fixes: 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc")
Reported-by: syzbot+d78ace33ad4ee69329d5@syzkaller.appspotmail.com
Signed-off-by: Zhan Jun <zhanjun@uniontech.com>
Signed-off-by: Morduan Zang <zhangdandan@uniontech.com>
---
fs/xfs/xfs_trans.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index bcc470f56e46..0d347cff7317 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -217,7 +217,7 @@ __xfs_trans_alloc(
ASSERT(!(flags & XFS_TRANS_RES_FDBLKS) || xfs_has_lazysbcount(mp));
- tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL);
+ tp = kmem_cache_zalloc(xfs_trans_cache, GFP_NOFS | __GFP_NOFAIL);
if (!(flags & XFS_TRANS_NO_WRITECOUNT))
sb_start_intwrite(mp->m_super);
xfs_trans_set_context(tp);
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc 2026-03-12 7:22 [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc Morduan Zang @ 2026-03-12 14:26 ` Darrick J. Wong 2026-03-12 20:28 ` Dave Chinner 2026-03-12 20:25 ` Dave Chinner 1 sibling, 1 reply; 6+ messages in thread From: Darrick J. Wong @ 2026-03-12 14:26 UTC (permalink / raw) To: Morduan Zang Cc: cem, zhanjun, hch, dchinner, stable, linux-xfs, linux-kernel, syzbot+d78ace33ad4ee69329d5 On Thu, Mar 12, 2026 at 03:22:14PM +0800, Morduan Zang wrote: > __xfs_trans_alloc() allocates the transaction structure before > xfs_trans_set_context() establishes the nofs context. If memory reclaim > enters XFS through xfs_vn_sync_lazytime(), this GFP_KERNEL allocation can > trigger a warning from the reclaim path. > > Use GFP_NOFS for the transaction allocation to avoid filesystem reclaim > recursion before the nofs context is set. Why doesn't filesystem reclaim itself set PF_MEMALLOC_NOFS for us? xfs_vn_sync_lazytime+0xaf/0x150 fs/xfs/xfs_iops.c:1238 sync_lazytime+0x12d/0x2d0 fs/fs-writeback.c:1721 iput+0x230/0xe80 fs/inode.c:1997 __dentry_kill+0x1a2/0x5e0 fs/dcache.c:670 shrink_kill+0xa9/0x2c0 fs/dcache.c:1147 shrink_dentry_list+0x2e0/0x5e0 fs/dcache.c:1174 prune_dcache_sb+0x119/0x180 fs/dcache.c:1256 super_cache_scan+0x369/0x4b0 fs/super.c:223 do_shrink_slab+0x6df/0x1170 mm/shrinker.c:437` --D > Link: https://syzkaller.appspot.com/bug?extid=d78ace33ad4ee69329d5 > Fixes: 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc") > Reported-by: syzbot+d78ace33ad4ee69329d5@syzkaller.appspotmail.com > > Signed-off-by: Zhan Jun <zhanjun@uniontech.com> > Signed-off-by: Morduan Zang <zhangdandan@uniontech.com> > --- > fs/xfs/xfs_trans.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > index bcc470f56e46..0d347cff7317 100644 > --- a/fs/xfs/xfs_trans.c > +++ b/fs/xfs/xfs_trans.c > @@ -217,7 +217,7 @@ __xfs_trans_alloc( > > ASSERT(!(flags & XFS_TRANS_RES_FDBLKS) || xfs_has_lazysbcount(mp)); > > - tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL); > + tp = kmem_cache_zalloc(xfs_trans_cache, GFP_NOFS | __GFP_NOFAIL); > if (!(flags & XFS_TRANS_NO_WRITECOUNT)) > sb_start_intwrite(mp->m_super); > xfs_trans_set_context(tp); > -- > 2.50.1 > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc 2026-03-12 14:26 ` Darrick J. Wong @ 2026-03-12 20:28 ` Dave Chinner 0 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2026-03-12 20:28 UTC (permalink / raw) To: Darrick J. Wong Cc: Morduan Zang, cem, zhanjun, hch, dchinner, stable, linux-xfs, linux-kernel, syzbot+d78ace33ad4ee69329d5 On Thu, Mar 12, 2026 at 07:26:01AM -0700, Darrick J. Wong wrote: > On Thu, Mar 12, 2026 at 03:22:14PM +0800, Morduan Zang wrote: > > __xfs_trans_alloc() allocates the transaction structure before > > xfs_trans_set_context() establishes the nofs context. If memory reclaim > > enters XFS through xfs_vn_sync_lazytime(), this GFP_KERNEL allocation can > > trigger a warning from the reclaim path. > > > > Use GFP_NOFS for the transaction allocation to avoid filesystem reclaim > > recursion before the nofs context is set. > > Why doesn't filesystem reclaim itself set PF_MEMALLOC_NOFS for us? Because kswapd based memory reclaim runs in GFP_KERNEL memory allocation context. i.e. we're not in a recursion path here potentially holding other filesystem resources that we could deadlock on. i.e. GFP_NOFS allocations are used to stop -direct reclaim- from recursing back into filesystem reclaim - it does not (and should not) stop kswapd from being able to reclaim filesystem objects. -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc 2026-03-12 7:22 [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc Morduan Zang 2026-03-12 14:26 ` Darrick J. Wong @ 2026-03-12 20:25 ` Dave Chinner 2026-03-16 9:18 ` Christoph Hellwig 1 sibling, 1 reply; 6+ messages in thread From: Dave Chinner @ 2026-03-12 20:25 UTC (permalink / raw) To: Morduan Zang Cc: cem, zhanjun, hch, dchinner, stable, linux-xfs, linux-kernel, syzbot+d78ace33ad4ee69329d5 On Thu, Mar 12, 2026 at 03:22:14PM +0800, Morduan Zang wrote: > __xfs_trans_alloc() allocates the transaction structure before > xfs_trans_set_context() establishes the nofs context. If memory reclaim > enters XFS through xfs_vn_sync_lazytime(), this GFP_KERNEL allocation can > trigger a warning from the reclaim path. PLease include the warning and stack trace in the commit message. > Use GFP_NOFS for the transaction allocation to avoid filesystem reclaim > recursion before the nofs context is set. > > Link: https://syzkaller.appspot.com/bug?extid=d78ace33ad4ee69329d5 That's a PF_MEMALLOC + __GFP_NOFAIL warning. Has nothing to do with GFP_NOFS. > Fixes: 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc") Nope. That commit didn't even change the allocation context.... Indeed, the stack trace trivially demonstrates the cause - the sync_lazytime() changes (in 6.19i, IIRC) have put a new XFS transaction in the iput() path that memory reclaim runs. We managed to remove all the xfs transactions in this path with the introduction of the background inodegc infrastructure because lockdep, memory allocation and other stuff really don't like us running "must succeed" transactions in the memory reclaim path. Hence putting a new transaction directly in that path is a regression, and so I suspect the sync_lazytime() call directly from iput() running a transaction needs to be rethought... -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc 2026-03-12 20:25 ` Dave Chinner @ 2026-03-16 9:18 ` Christoph Hellwig 2026-03-18 21:01 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Christoph Hellwig @ 2026-03-16 9:18 UTC (permalink / raw) To: Dave Chinner Cc: Morduan Zang, cem, zhanjun, hch, dchinner, stable, linux-xfs, linux-kernel, syzbot+d78ace33ad4ee69329d5, Theodore Ts'o, Al Viro, Christian Brauner, Jan Kara, linux-fsdevel On Fri, Mar 13, 2026 at 07:25:05AM +1100, Dave Chinner wrote: > On Thu, Mar 12, 2026 at 03:22:14PM +0800, Morduan Zang wrote: > > __xfs_trans_alloc() allocates the transaction structure before > > xfs_trans_set_context() establishes the nofs context. If memory reclaim > > enters XFS through xfs_vn_sync_lazytime(), this GFP_KERNEL allocation can > > trigger a warning from the reclaim path. > > PLease include the warning and stack trace in the commit message. > > > Use GFP_NOFS for the transaction allocation to avoid filesystem reclaim > > recursion before the nofs context is set. > > > > Link: https://syzkaller.appspot.com/bug?extid=d78ace33ad4ee69329d5 > > That's a PF_MEMALLOC + __GFP_NOFAIL warning. Has nothing to do > with GFP_NOFS. Yes. > Indeed, the stack trace trivially demonstrates the cause - the > sync_lazytime() changes (in 6.19i, IIRC) have put a new XFS > transaction in the iput() path that memory reclaim runs. The lazytime changes (in 7.0-rc). And I think they do indeed cause this because we fail to clear I_DIRTY_TIME for some cases. > We managed to remove all the xfs transactions in this path with the > introduction of the background inodegc infrastructure because > lockdep, memory allocation and other stuff really don't like us > running "must succeed" transactions in the memory reclaim path. > > Hence putting a new transaction directly in that path is a > regression, and so I suspect the sync_lazytime() call directly from > iput() running a transaction needs to be rethought... Not a new transaction, but one we didn't hit before. That being said, doing this separate syncing of the dirty time vs just batching it with the write_inode_now in iput_final looks really odd to me. This goes back to Ted's original commit 0ae45f63d4ef8 adding laztime more than 10 years ago, which unfortunately does not explain the rationale. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc 2026-03-16 9:18 ` Christoph Hellwig @ 2026-03-18 21:01 ` Dave Chinner 0 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2026-03-18 21:01 UTC (permalink / raw) To: Christoph Hellwig Cc: Morduan Zang, cem, zhanjun, dchinner, stable, linux-xfs, linux-kernel, syzbot+d78ace33ad4ee69329d5, Theodore Ts'o, Al Viro, Christian Brauner, Jan Kara, linux-fsdevel On Mon, Mar 16, 2026 at 10:18:27AM +0100, Christoph Hellwig wrote: > On Fri, Mar 13, 2026 at 07:25:05AM +1100, Dave Chinner wrote: > > On Thu, Mar 12, 2026 at 03:22:14PM +0800, Morduan Zang wrote: > > > __xfs_trans_alloc() allocates the transaction structure before > > > xfs_trans_set_context() establishes the nofs context. If memory reclaim > > > enters XFS through xfs_vn_sync_lazytime(), this GFP_KERNEL allocation can > > > trigger a warning from the reclaim path. > > > > PLease include the warning and stack trace in the commit message. > > > > > Use GFP_NOFS for the transaction allocation to avoid filesystem reclaim > > > recursion before the nofs context is set. > > > > > > Link: https://syzkaller.appspot.com/bug?extid=d78ace33ad4ee69329d5 > > > > That's a PF_MEMALLOC + __GFP_NOFAIL warning. Has nothing to do > > with GFP_NOFS. > > Yes. > > > Indeed, the stack trace trivially demonstrates the cause - the > > sync_lazytime() changes (in 6.19i, IIRC) have put a new XFS > > transaction in the iput() path that memory reclaim runs. > > The lazytime changes (in 7.0-rc). And I think they do indeed cause > this because we fail to clear I_DIRTY_TIME for some cases. > > > We managed to remove all the xfs transactions in this path with the > > introduction of the background inodegc infrastructure because > > lockdep, memory allocation and other stuff really don't like us > > running "must succeed" transactions in the memory reclaim path. > > > > Hence putting a new transaction directly in that path is a > > regression, and so I suspect the sync_lazytime() call directly from > > iput() running a transaction needs to be rethought... > > Not a new transaction, but one we didn't hit before. Sure, but that doesn't change the fact that we should never have put this timestamp update transaction in the direct iput_final() path. > That being said, > doing this separate syncing of the dirty time vs just batching it with > the write_inode_now in iput_final looks really odd to me. This goes back > to Ted's original commit 0ae45f63d4ef8 adding laztime more than 10 years > ago, which unfortunately does not explain the rationale. Moving it to pair with write_inode_now() by itself doesn't help us avoid the transaction in iput_final() context. It does, however, give us a state flag we can check (I_WILL_FREE) to change the behaviour of xfs_vn_sync_lazytime() when called from this path. i.e. we can mark the XFS inode as needing async inodegc processing and then skip the update transaction. When VFS inode eviction calls us to destroy the inode, we can schedule the inode for GC instead of marking it for immediate reclaim. We then can safely run the timestamp update transaction from inodegc context. This also allows us to skip the timestamp update transaction when running GC on unlinked inodes... -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-18 21:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-12 7:22 [PATCH] xfs: use GFP_NOFS in __xfs_trans_alloc Morduan Zang 2026-03-12 14:26 ` Darrick J. Wong 2026-03-12 20:28 ` Dave Chinner 2026-03-12 20:25 ` Dave Chinner 2026-03-16 9:18 ` Christoph Hellwig 2026-03-18 21:01 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox