The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v11 0/5] ext4: deferred iput framework for EA inodes
@ 2026-06-29 11:08 Yun Zhou
  2026-06-29 11:08 ` [PATCH v11 1/5] fs: add iput_if_not_last() helper Yun Zhou
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Yun Zhou @ 2026-06-29 11:08 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner
  Cc: linux-ext4, linux-kernel, yun.zhou, linux-fsdevel, xiaowu.417

This series introduces a deferred-iput framework for EA inodes to
eliminate a class of lock ordering issues in ext4 xattr code.

The problem: iput() on EA inodes while holding xattr_sem or a jbd2
handle can trigger eviction, which may acquire those same locks or
s_writepages_rwsem, creating circular dependencies.  The immediate
deadlock (during mount-time orphan cleanup) is fixed by two separate
patches already reviewed and posted:

  ext4: skip extra isize expansion during mount to prevent deadlock
  ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode

This series provides the structural fix that makes the code safe
regardless of calling context:

Patch 1 adds a VFS helper iput_if_not_last() which drops an inode
reference only if it is not the last one, using atomic_add_unless().
This provides a proper VFS abstraction for filesystems that need to
conditionally defer final iput.  Annotated with __must_check.

Patch 2 introduces ext4_put_ea_inode() using iput_if_not_last() as
a fast path (single atomic, zero overhead for the common case).  If
this is the last reference, the inode is linked onto a per-sb llist
(via i_ea_iput_node embedded in ext4_inode_info, union with xattr_sem
which is unused for EA inodes) and a delayed worker (1 jiffie) performs
the final iput() in a clean context.  No per-iput allocation needed.
Also moves init_rwsem(xattr_sem) from init_once to ext4_alloc_inode
to handle slab reuse after the union field has been overwritten.

Patch 3 converts all EA inode iput() calls in xattr code to use
ext4_put_ea_inode() uniformly -- no exceptions to reason about.

Patch 4 removes the now-redundant ea_inode_array mechanism (parameter
threading, struct, expand/free functions), replaced entirely by direct
ext4_put_ea_inode() calls.  This is a net code reduction.

Patch 5 prevents a potential deadlock on corrupted filesystems where
multiple xattr entries reference the same EA inode whose nlink has
dropped to zero.  It tracks such EA inode numbers in a small on-stack
array (with dynamic growth for the pathological case) and skips
duplicates before iget, eliminating the deadlock window entirely.
Legitimate EA inode dedup (ref_count > 1) is unaffected since nlink
remains > 0 after dec_ref and such entries are not added to the skip
array.

Link: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5

v11:
 - Patch 1: add __must_check annotation to iput_if_not_last().
 - Patch 2: remove ext4_drain_ea_inode_work() wrapper, use direct
   flush_delayed_work() at drain points.  Re-arm is not possible
   because check_igot_inode() in __ext4_iget() already rejects EA
   inodes with extended attributes, so evicting an EA inode never
   enters ext4_xattr_delete_inode().  Drop the ext4_evict_inode()
   guard (was patch 5 in v10) -- it is unnecessary given the above.
   Remove ext4_xattr_inode_array_free_deferred() intermediate function
   -- mechanism is introduced without converting any call site.
 - Patch 2: add comment on ext4_put_ea_inode() documenting why the
   inode cannot be double-queued to s_ea_inode_to_free (reviewer
   request).
 - Patch 2: simplify ext4_ea_inode_work() by removing 'next' variable.
 - Patch 5: replace per-call llist (i_ea_iput_node reuse) with a simple
   on-stack ino array + __GFP_NOFAIL dynamic growth.  This eliminates
   all concurrent access concerns on i_ea_iput_node and avoids the
   need for EXT4_STATE_EA_DEC_REF or ihold tricks.  Only EA inodes
   whose nlink drops to 0 are tracked, so legitimate dedup with
   ref_count > 1 is correctly processed multiple times.

v10:
 - New patch 5: prevent deadlock from duplicate EA inode references
   on corrupted filesystems.  Track processed EA inodes on a per-call
   llist to skip duplicates before iget, and defer ext4_put_ea_inode()
   until after the loop to avoid queuing an inode for eviction while
   the same loop may still iget it.
 - Patch 2: move ext4_init_ea_inode_work() before ext4_multi_mount_protect()
   so that failed_mount3a drain does not hit an uninitialized delayed_work
   when MMP check fails.

v9:
 - Add iput_if_not_last() as proper VFS helper (per reviewer: don't
   let filesystems manipulate inode refcount without VFS abstraction).
 - Use iput_if_not_last() + llist_node embedded in ext4_inode_info
   (union with xattr_sem) to avoid per-iput allocation entirely.
 - Convert ALL EA inode iput() calls uniformly -- no exceptions.
 - Remove entire ea_inode_array mechanism.
 - Add WARN_ON_ONCE in ext4_put_ea_inode() to catch misuse on non-EA
   inodes (protects the xattr_sem union safety).
 - Move INIT_DELAYED_WORK before journal loading (fast commit replay
   may trigger evictions).
 - Drain before ext4_quotas_off() for correct quota accounting.
 - Add flush in failed_mount_wq and failed_mount3a error paths for
   journal replay case.
 - Move init_rwsem(xattr_sem) from init_once to ext4_alloc_inode to
   handle slab object reuse after union overwrite.
 - Encapsulate worker init into ext4_init_ea_inode_work(), making
   ext4_ea_inode_work() static to xattr.c.

Yun Zhou (5):
  fs: add iput_if_not_last() helper
  ext4: introduce ext4_put_ea_inode() for safe deferred iput
  ext4: convert all EA inode iput() calls to ext4_put_ea_inode()
  ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode()
  ext4: prevent deadlock from duplicate EA inode references on corrupted
    fs

 fs/ext4/ext4.h     |  13 ++-
 fs/ext4/inode.c    |   6 +-
 fs/ext4/super.c    |  18 +++-
 fs/ext4/xattr.c    | 209 +++++++++++++++++++++++++++------------------
 fs/ext4/xattr.h    |   9 +-
 include/linux/fs.h |  13 +++
 6 files changed, 172 insertions(+), 96 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-07-02 22:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 11:08 [PATCH v11 0/5] ext4: deferred iput framework for EA inodes Yun Zhou
2026-06-29 11:08 ` [PATCH v11 1/5] fs: add iput_if_not_last() helper Yun Zhou
2026-06-30  8:53   ` Christian Brauner
2026-06-30  9:05   ` Mateusz Guzik
2026-06-30 11:34     ` Jan Kara
2026-07-02 13:35       ` Zhou, Yun
2026-07-02 14:29         ` Jan Kara
2026-07-02 14:55           ` Zhou, Yun
2026-07-02 22:41             ` Theodore Tso
2026-06-29 11:08 ` [PATCH v11 2/5] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
2026-06-29 11:34   ` Jan Kara
2026-06-29 11:37   ` Jan Kara
2026-06-29 11:08 ` [PATCH v11 3/5] ext4: convert all EA inode iput() calls to ext4_put_ea_inode() Yun Zhou
2026-06-29 11:38   ` Jan Kara
2026-06-29 11:08 ` [PATCH v11 4/5] ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode() Yun Zhou
2026-06-29 11:42   ` Jan Kara
2026-06-29 11:08 ` [PATCH v11 5/5] ext4: prevent deadlock from duplicate EA inode references on corrupted fs Yun Zhou
2026-06-29 11:48   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox