From: Yun Zhou <yun.zhou@windriver.com>
To: <tytso@mit.edu>, <adilger.kernel@dilger.ca>,
<libaokun@linux.alibaba.com>, <jack@suse.cz>,
<ojaswin@linux.ibm.com>, <ritesh.list@gmail.com>,
<yi.zhang@huawei.com>, <viro@zeniv.linux.org.uk>,
<brauner@kernel.org>
Cc: <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<yun.zhou@windriver.com>, <linux-fsdevel@vger.kernel.org>
Subject: [PATCH v12 0/4] ext4: deferred iput framework for EA inodes
Date: Tue, 30 Jun 2026 18:08:25 +0800 [thread overview]
Message-ID: <20260630100829.1257618-1-yun.zhou@windriver.com> (raw)
This series introduces a deferred-iput framework for EA inodes to
eliminate a class of lock ordering issues in ext4 xattr code.
The problem: iput() on EA inodes while holding xattr_sem or a jbd2
handle can trigger eviction, which may acquire those same locks or
s_writepages_rwsem, creating circular dependencies. The immediate
deadlock (during mount-time orphan cleanup) is fixed by two separate
patches already reviewed and posted:
ext4: skip extra isize expansion during mount to prevent deadlock
ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode
This series provides the structural fix that makes the code safe
regardless of calling context:
Patch 1 adds a VFS helper iput_if_not_last() which drops an inode
reference only if it is not the last one, using atomic_add_unless().
Annotated with __must_check to ensure callers handle the failure case.
Patch 2 introduces ext4_put_ea_inode() using iput_if_not_last() as
a fast path (single atomic, zero overhead for the common case). If
this is the last reference, the inode is linked onto a per-sb llist
(via i_ea_iput_node embedded in ext4_inode_info, union with xattr_sem
which is unused for EA inodes) and a delayed worker (1 jiffie) performs
the final iput() in a clean context. No per-iput allocation needed.
Also moves init_rwsem(xattr_sem) from init_once to ext4_alloc_inode
to handle slab reuse after the union field has been overwritten.
Patch 3 converts all EA inode iput() calls in xattr code to use
ext4_put_ea_inode() uniformly -- no exceptions to reason about.
Patch 4 removes the now-redundant ea_inode_array mechanism (parameter
threading, struct, expand/free functions), replaced entirely by direct
ext4_put_ea_inode() calls. This is a net code reduction.
Link: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5
v12:
- Drop patch 5 (dedup array for corrupted fs duplicate entries).
- Simplify ext4_put_ea_inode() to take only an inode argument (sb is
derived from inode->i_sb).
v11:
- Patch 1: add __must_check annotation to iput_if_not_last().
- Patch 2: remove ext4_drain_ea_inode_work() wrapper, use direct
flush_delayed_work() at drain points. Re-arm is not possible
because check_igot_inode() in __ext4_iget() already rejects EA
inodes with extended attributes, so evicting an EA inode never
enters ext4_xattr_delete_inode(). Drop the ext4_evict_inode()
guard (was patch 5 in v10) -- it is unnecessary given the above.
Remove ext4_xattr_inode_array_free_deferred() intermediate function
-- mechanism is introduced without converting any call site.
- Patch 2: add comment on ext4_put_ea_inode() documenting why the
inode cannot be double-queued to s_ea_inode_to_free (reviewer
request).
- Patch 2: simplify ext4_ea_inode_work() by removing 'next' variable.
- Patch 5: replace per-call llist (i_ea_iput_node reuse) with a simple
on-stack ino array + __GFP_NOFAIL dynamic growth. This eliminates
all concurrent access concerns on i_ea_iput_node and avoids the
need for EXT4_STATE_EA_DEC_REF or ihold tricks. Only EA inodes
whose nlink drops to 0 are tracked, so legitimate dedup with
ref_count > 1 is correctly processed multiple times.
v10:
- New patch 5: prevent deadlock from duplicate EA inode references
on corrupted filesystems. Track processed EA inodes on a per-call
llist to skip duplicates before iget, and defer ext4_put_ea_inode()
until after the loop to avoid queuing an inode for eviction while
the same loop may still iget it.
- Patch 2: move ext4_init_ea_inode_work() before ext4_multi_mount_protect()
so that failed_mount3a drain does not hit an uninitialized delayed_work
when MMP check fails.
v9:
- Add iput_if_not_last() as proper VFS helper (per reviewer: don't
let filesystems manipulate inode refcount without VFS abstraction).
- Use iput_if_not_last() + llist_node embedded in ext4_inode_info
(union with xattr_sem) to avoid per-iput allocation entirely.
- Convert ALL EA inode iput() calls uniformly -- no exceptions.
- Remove entire ea_inode_array mechanism.
- Add WARN_ON_ONCE in ext4_put_ea_inode() to catch misuse on non-EA
inodes (protects the xattr_sem union safety).
- Move INIT_DELAYED_WORK before journal loading (fast commit replay
may trigger evictions).
- Drain before ext4_quotas_off() for correct quota accounting.
- Add flush in failed_mount_wq and failed_mount3a error paths for
journal replay case.
- Move init_rwsem(xattr_sem) from init_once to ext4_alloc_inode to
handle slab object reuse after union overwrite.
- Encapsulate worker init into ext4_init_ea_inode_work(), making
ext4_ea_inode_work() static to xattr.c.
Yun Zhou (4):
fs: add iput_if_not_last() helper
ext4: introduce ext4_put_ea_inode() for safe deferred iput
ext4: convert all EA inode iput() calls to ext4_put_ea_inode()
ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode()
fs/ext4/ext4.h | 13 +++-
fs/ext4/inode.c | 6 +-
fs/ext4/super.c | 18 +++++-
fs/ext4/xattr.c | 154 +++++++++++++++++++++------------------------
fs/ext4/xattr.h | 9 +--
include/linux/fs.h | 13 ++++
6 files changed, 117 insertions(+), 96 deletions(-)
--
2.43.0
next reply other threads:[~2026-06-30 10:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-30 10:08 Yun Zhou [this message]
2026-06-30 10:08 ` [PATCH v12 1/4] fs: add iput_if_not_last() helper Yun Zhou
2026-06-30 10:08 ` [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
2026-06-30 10:15 ` Zhou, Yun
2026-06-30 11:49 ` Jan Kara
2026-06-30 10:08 ` [PATCH v12 3/4] ext4: convert all EA inode iput() calls to ext4_put_ea_inode() Yun Zhou
2026-06-30 10:08 ` [PATCH v12 4/4] ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode() Yun Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260630100829.1257618-1-yun.zhou@windriver.com \
--to=yun.zhou@windriver.com \
--cc=adilger.kernel@dilger.ca \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=libaokun@linux.alibaba.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox