Linux EXT4 FS development
 help / color / mirror / Atom feed
* [PATCH v12 0/4] ext4: deferred iput framework for EA inodes
@ 2026-06-30 10:08 Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 1/4] fs: add iput_if_not_last() helper Yun Zhou
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Yun Zhou @ 2026-06-30 10:08 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner
  Cc: linux-ext4, linux-kernel, yun.zhou, linux-fsdevel

This series introduces a deferred-iput framework for EA inodes to
eliminate a class of lock ordering issues in ext4 xattr code.

The problem: iput() on EA inodes while holding xattr_sem or a jbd2
handle can trigger eviction, which may acquire those same locks or
s_writepages_rwsem, creating circular dependencies.  The immediate
deadlock (during mount-time orphan cleanup) is fixed by two separate
patches already reviewed and posted:

  ext4: skip extra isize expansion during mount to prevent deadlock
  ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode

This series provides the structural fix that makes the code safe
regardless of calling context:

Patch 1 adds a VFS helper iput_if_not_last() which drops an inode
reference only if it is not the last one, using atomic_add_unless().
Annotated with __must_check to ensure callers handle the failure case.

Patch 2 introduces ext4_put_ea_inode() using iput_if_not_last() as
a fast path (single atomic, zero overhead for the common case).  If
this is the last reference, the inode is linked onto a per-sb llist
(via i_ea_iput_node embedded in ext4_inode_info, union with xattr_sem
which is unused for EA inodes) and a delayed worker (1 jiffie) performs
the final iput() in a clean context.  No per-iput allocation needed.
Also moves init_rwsem(xattr_sem) from init_once to ext4_alloc_inode
to handle slab reuse after the union field has been overwritten.

Patch 3 converts all EA inode iput() calls in xattr code to use
ext4_put_ea_inode() uniformly -- no exceptions to reason about.

Patch 4 removes the now-redundant ea_inode_array mechanism (parameter
threading, struct, expand/free functions), replaced entirely by direct
ext4_put_ea_inode() calls.  This is a net code reduction.

Link: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5

v12:
 - Drop patch 5 (dedup array for corrupted fs duplicate entries).
 - Simplify ext4_put_ea_inode() to take only an inode argument (sb is
   derived from inode->i_sb).

v11:
 - Patch 1: add __must_check annotation to iput_if_not_last().
 - Patch 2: remove ext4_drain_ea_inode_work() wrapper, use direct
   flush_delayed_work() at drain points.  Re-arm is not possible
   because check_igot_inode() in __ext4_iget() already rejects EA
   inodes with extended attributes, so evicting an EA inode never
   enters ext4_xattr_delete_inode().  Drop the ext4_evict_inode()
   guard (was patch 5 in v10) -- it is unnecessary given the above.
   Remove ext4_xattr_inode_array_free_deferred() intermediate function
   -- mechanism is introduced without converting any call site.
 - Patch 2: add comment on ext4_put_ea_inode() documenting why the
   inode cannot be double-queued to s_ea_inode_to_free (reviewer
   request).
 - Patch 2: simplify ext4_ea_inode_work() by removing 'next' variable.
 - Patch 5: replace per-call llist (i_ea_iput_node reuse) with a simple
   on-stack ino array + __GFP_NOFAIL dynamic growth.  This eliminates
   all concurrent access concerns on i_ea_iput_node and avoids the
   need for EXT4_STATE_EA_DEC_REF or ihold tricks.  Only EA inodes
   whose nlink drops to 0 are tracked, so legitimate dedup with
   ref_count > 1 is correctly processed multiple times.

v10:
 - New patch 5: prevent deadlock from duplicate EA inode references
   on corrupted filesystems.  Track processed EA inodes on a per-call
   llist to skip duplicates before iget, and defer ext4_put_ea_inode()
   until after the loop to avoid queuing an inode for eviction while
   the same loop may still iget it.
 - Patch 2: move ext4_init_ea_inode_work() before ext4_multi_mount_protect()
   so that failed_mount3a drain does not hit an uninitialized delayed_work
   when MMP check fails.

v9:
 - Add iput_if_not_last() as proper VFS helper (per reviewer: don't
   let filesystems manipulate inode refcount without VFS abstraction).
 - Use iput_if_not_last() + llist_node embedded in ext4_inode_info
   (union with xattr_sem) to avoid per-iput allocation entirely.
 - Convert ALL EA inode iput() calls uniformly -- no exceptions.
 - Remove entire ea_inode_array mechanism.
 - Add WARN_ON_ONCE in ext4_put_ea_inode() to catch misuse on non-EA
   inodes (protects the xattr_sem union safety).
 - Move INIT_DELAYED_WORK before journal loading (fast commit replay
   may trigger evictions).
 - Drain before ext4_quotas_off() for correct quota accounting.
 - Add flush in failed_mount_wq and failed_mount3a error paths for
   journal replay case.
 - Move init_rwsem(xattr_sem) from init_once to ext4_alloc_inode to
   handle slab object reuse after union overwrite.
 - Encapsulate worker init into ext4_init_ea_inode_work(), making
   ext4_ea_inode_work() static to xattr.c.

Yun Zhou (4):
  fs: add iput_if_not_last() helper
  ext4: introduce ext4_put_ea_inode() for safe deferred iput
  ext4: convert all EA inode iput() calls to ext4_put_ea_inode()
  ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode()

 fs/ext4/ext4.h     |  13 +++-
 fs/ext4/inode.c    |   6 +-
 fs/ext4/super.c    |  18 +++++-
 fs/ext4/xattr.c    | 154 +++++++++++++++++++++------------------------
 fs/ext4/xattr.h    |   9 +--
 include/linux/fs.h |  13 ++++
 6 files changed, 117 insertions(+), 96 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v12 1/4] fs: add iput_if_not_last() helper
  2026-06-30 10:08 [PATCH v12 0/4] ext4: deferred iput framework for EA inodes Yun Zhou
@ 2026-06-30 10:08 ` Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Yun Zhou @ 2026-06-30 10:08 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner
  Cc: linux-ext4, linux-kernel, yun.zhou, linux-fsdevel

Add a helper that drops an inode reference only if the caller does not
hold the last one.  Returns true if the reference was dropped, false
otherwise.

This is useful for filesystems that need to release inode references
in contexts where triggering final iput (and thus eviction) would be
unsafe due to lock ordering constraints.  The caller can check the
return value and defer the final iput to a safe context.

Unlike iput_not_last() which BUG_ON's if called with the last ref,
this variant is designed to be called speculatively.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 include/linux/fs.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index d10897b3a1e3..04f0de78fa7a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2413,6 +2413,19 @@ static inline void super_set_sysfs_name_generic(struct super_block *sb, const ch
 extern void ihold(struct inode * inode);
 extern void iput(struct inode *);
 void iput_not_last(struct inode *);
+
+/**
+ * iput_if_not_last - drop an inode reference only if it is not the last one
+ * @inode: inode to put
+ *
+ * Returns true if the reference was dropped, false if this was the last
+ * reference and the caller must arrange for final iput() in a safe context.
+ */
+static inline bool __must_check iput_if_not_last(struct inode *inode)
+{
+	return atomic_add_unless(&inode->i_count, -1, 1);
+}
+
 int inode_update_time(struct inode *inode, enum fs_update_time type,
 		unsigned int flags);
 int generic_update_time(struct inode *inode, enum fs_update_time type,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
  2026-06-30 10:08 [PATCH v12 0/4] ext4: deferred iput framework for EA inodes Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 1/4] fs: add iput_if_not_last() helper Yun Zhou
@ 2026-06-30 10:08 ` Yun Zhou
  2026-06-30 10:15   ` Zhou, Yun
  2026-06-30 11:49   ` Jan Kara
  2026-06-30 10:08 ` [PATCH v12 3/4] ext4: convert all EA inode iput() calls to ext4_put_ea_inode() Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 4/4] ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode() Yun Zhou
  3 siblings, 2 replies; 7+ messages in thread
From: Yun Zhou @ 2026-06-30 10:08 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner
  Cc: linux-ext4, linux-kernel, yun.zhou, linux-fsdevel

Calling iput() on EA inodes while holding xattr_sem or a jbd2 handle
can trigger write_inode_now() -> ext4_writepages() -> s_writepages_rwsem,
creating a lock ordering issue during mount (!SB_ACTIVE).

Add ext4_put_ea_inode() which uses iput_if_not_last() as a fast path.
If this is not the last reference, it is dropped immediately.  If this
is the last reference, the inode is linked onto a per-sb lock-free llist
via i_ea_iput_node (embedded in ext4_inode_info, sharing space with the
unused xattr_sem of EA inodes via a union) and a delayed worker
(1 jiffie) performs the final iput() in a clean context.  This avoids
per-iput memory allocation.

Flush points are placed before quota shutdown (ext4_put_super and
failed_mount9) and before freeing structures that eviction depends on
(failed_mount_wq and failed_mount3a).  Initialization is placed before
journal loading since fast commit replay may trigger evictions that call
ext4_put_ea_inode().

Also moves init_rwsem(xattr_sem) from init_once to ext4_alloc_inode to
handle slab object reuse after the union field has been overwritten.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h  | 13 ++++++++++-
 fs/ext4/super.c | 18 ++++++++++++++-
 fs/ext4/xattr.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/xattr.h |  2 ++
 4 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b37c136ea3ab..b9b0ada7774b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1070,8 +1070,14 @@ struct ext4_inode_info {
 	 * between readers of EAs and writers of regular file data, so
 	 * instead we synchronize on xattr_sem when reading or changing
 	 * EAs.
+	 *
+	 * EA inodes (EXT4_EA_INODE_FL) do not use xattr_sem; they reuse
+	 * the space for deferred iput linkage.
 	 */
-	struct rw_semaphore xattr_sem;
+	union {
+		struct rw_semaphore xattr_sem;
+		struct llist_node i_ea_iput_node;
+	};
 
 	/*
 	 * Inodes with EXT4_STATE_ORPHAN_FILE use i_orphan_idx. Otherwise
@@ -1770,6 +1776,11 @@ struct ext4_sb_info {
 	struct ext4_es_stats s_es_stats;
 	struct mb_cache *s_ea_block_cache;
 	struct mb_cache *s_ea_inode_cache;
+
+	/* Deferred iput for EA inodes to avoid lock ordering issues */
+	struct llist_head s_ea_inode_to_free;
+	struct delayed_work s_ea_inode_work;
+
 	spinlock_t s_es_lock ____cacheline_aligned_in_smp;
 
 	/* Journal triggers for checksum computation */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 245f67d10ded..3efa5a817bef 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1303,6 +1303,8 @@ static void ext4_put_super(struct super_block *sb)
 			 &sb->s_uuid);
 
 	ext4_unregister_li_request(sb);
+	/* Drain deferred EA inode iputs while quota is still active. */
+	flush_delayed_work(&sbi->s_ea_inode_work);
 	ext4_quotas_off(sb, EXT4_MAXQUOTAS);
 
 	destroy_workqueue(sbi->rsv_conversion_wq);
@@ -1423,6 +1425,13 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	memset(&ei->i_dquot, 0, sizeof(ei->i_dquot));
 #endif
 	ei->jinode = NULL;
+	/*
+	 * Reinitialize xattr_sem every allocation because EA inodes
+	 * share this space with i_ea_iput_node (via union) which may
+	 * have overwritten the semaphore when the slab object was
+	 * previously used as an EA inode.
+	 */
+	init_rwsem(&ei->xattr_sem);
 	INIT_LIST_HEAD(&ei->i_rsv_conversion_list);
 	spin_lock_init(&ei->i_completed_io_lock);
 	ei->i_sync_tid = 0;
@@ -1488,7 +1497,6 @@ static void init_once(void *foo)
 	struct ext4_inode_info *ei = foo;
 
 	INIT_LIST_HEAD(&ei->i_orphan);
-	init_rwsem(&ei->xattr_sem);
 	init_rwsem(&ei->i_data_sem);
 	inode_init_once(&ei->vfs_inode);
 	ext4_fc_init_inode(&ei->vfs_inode);
@@ -5497,6 +5505,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 			  ext4_has_feature_orphan_present(sb) ||
 			  ext4_has_feature_journal_needs_recovery(sb));
 
+	ext4_init_ea_inode_work(sbi);
+
 	if (ext4_has_feature_mmp(sb) && !sb_rdonly(sb)) {
 		err = ext4_multi_mount_protect(sb, le64_to_cpu(es->s_mmp_block));
 		if (err)
@@ -5747,6 +5757,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	return 0;
 
 failed_mount9:
+	/* Drain deferred EA inode iputs before quota shutdown */
+	flush_delayed_work(&sbi->s_ea_inode_work);
 	ext4_quotas_off(sb, EXT4_MAXQUOTAS);
 failed_mount8: __maybe_unused
 	ext4_release_orphan_info(sb);
@@ -5767,6 +5779,8 @@ failed_mount8: __maybe_unused
 	if (EXT4_SB(sb)->rsv_conversion_wq)
 		destroy_workqueue(EXT4_SB(sb)->rsv_conversion_wq);
 failed_mount_wq:
+	/* Drain deferred EA inode iputs before freeing structures */
+	flush_delayed_work(&sbi->s_ea_inode_work);
 	ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
 	sbi->s_ea_inode_cache = NULL;
 
@@ -5777,6 +5791,8 @@ failed_mount8: __maybe_unused
 		ext4_journal_destroy(sbi, sbi->s_journal);
 	}
 failed_mount3a:
+	/* Drain deferred EA inode iputs from journal replay */
+	flush_delayed_work(&sbi->s_ea_inode_work);
 	ext4_es_unregister_shrinker(sbi);
 failed_mount3:
 	/* flush s_sb_upd_work before sbi destroy */
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 982a1f831e22..d5bccc64b032 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -3025,6 +3025,66 @@ void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *ea_inode_array)
 	kfree(ea_inode_array);
 }
 
+
+/*
+ * Worker function for deferred EA inode iput.  Processes all inodes queued
+ * on s_ea_inode_to_free in a context free of xattr_sem/jbd2 handle locks.
+ */
+static void ext4_ea_inode_work(struct work_struct *work)
+{
+	struct ext4_sb_info *sbi = container_of(to_delayed_work(work),
+						struct ext4_sb_info,
+						s_ea_inode_work);
+	struct llist_node *node = llist_del_all(&sbi->s_ea_inode_to_free);
+
+	while (node) {
+		struct ext4_inode_info *ei = container_of(node,
+					struct ext4_inode_info, i_ea_iput_node);
+		node = node->next;
+		iput(&ei->vfs_inode);
+	}
+}
+
+/*
+ * Release a VFS reference on an EA inode.  Must be used instead of iput()
+ * in any context where xattr_sem or a jbd2 handle is held.
+ *
+ * If this is not the last reference, drops it immediately via
+ * iput_if_not_last() with no further action needed.
+ *
+ * If this is the last reference, the inode is linked onto a per-sb
+ * llist via i_ea_iput_node (embedded in ext4_inode_info, sharing space
+ * with the unused xattr_sem) and a delayed worker performs the final
+ * iput() in a clean context.
+ *
+ * Note: while an inode is on s_ea_inode_to_free, the unconsumed i_count
+ * reference (still 1) keeps it in the inode cache, so any concurrent
+ * iget() bumps i_count to >= 2 and iput_if_not_last() will succeed.
+ * Nobody will add the inode a second time until ext4_ea_inode_work()
+ * drops that reference via iput().
+ */
+void ext4_put_ea_inode(struct inode *inode)
+{
+	if (!inode)
+		return;
+	WARN_ON_ONCE(!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL));
+	if (iput_if_not_last(inode))
+		return;
+	llist_add(&EXT4_I(inode)->i_ea_iput_node,
+		  &EXT4_SB(inode->i_sb)->s_ea_inode_to_free);
+	/*
+	 * Use a short delay to allow multiple EA inodes to accumulate,
+	 * reducing workqueue wakeups when several are released together.
+	 */
+	schedule_delayed_work(&EXT4_SB(inode->i_sb)->s_ea_inode_work, 1);
+}
+
+void ext4_init_ea_inode_work(struct ext4_sb_info *sbi)
+{
+	init_llist_head(&sbi->s_ea_inode_to_free);
+	INIT_DELAYED_WORK(&sbi->s_ea_inode_work, ext4_ea_inode_work);
+}
+
 /*
  * ext4_xattr_block_cache_insert()
  *
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 1fedf44d4fb6..2ff4b6eccd40 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -190,6 +190,8 @@ extern int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 				   struct ext4_xattr_inode_array **array,
 				   int extra_credits);
 extern void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *array);
+extern void ext4_init_ea_inode_work(struct ext4_sb_info *sbi);
+extern void ext4_put_ea_inode(struct inode *inode);
 
 extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
 			    struct ext4_inode *raw_inode, handle_t *handle);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v12 3/4] ext4: convert all EA inode iput() calls to ext4_put_ea_inode()
  2026-06-30 10:08 [PATCH v12 0/4] ext4: deferred iput framework for EA inodes Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 1/4] fs: add iput_if_not_last() helper Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
@ 2026-06-30 10:08 ` Yun Zhou
  2026-06-30 10:08 ` [PATCH v12 4/4] ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode() Yun Zhou
  3 siblings, 0 replies; 7+ messages in thread
From: Yun Zhou @ 2026-06-30 10:08 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner
  Cc: linux-ext4, linux-kernel, yun.zhou, linux-fsdevel

Convert all iput() calls on EA inodes in xattr code paths to use
ext4_put_ea_inode().  This establishes a uniform rule: every EA inode
reference release in ext4 xattr code goes through ext4_put_ea_inode(),
eliminating the need to analyze each call site individually for lock
safety.

Converted sites:

- ext4_xattr_inode_get() read path
- ext4_xattr_inode_inc_ref_all() main loop and cleanup path
- ext4_xattr_inode_dec_ref_all() error paths
- ext4_xattr_inode_create() error path
- ext4_xattr_inode_cache_find() mismatch path
- ext4_xattr_inode_lookup_create() out_err
- ext4_xattr_set_entry() old_ea_inode
- ext4_xattr_block_set() new block path, cleanup, and tmp_inode
- ext4_xattr_ibody_set() error and success paths
- ext4_xattr_delete_inode() quota loop

For most of these, iput_if_not_last() will succeed (the EA inode has
other references) making the overhead a single atomic operation.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/xattr.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index d5bccc64b032..6a1f2bdb6ff8 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -567,7 +567,7 @@ ext4_xattr_inode_get(struct inode *inode, struct ext4_xattr_entry *entry,
 					ea_inode->i_ino, true /* reusable */);
 	}
 out:
-	iput(ea_inode);
+	ext4_put_ea_inode(ea_inode);
 	return err;
 }
 
@@ -1104,10 +1104,10 @@ static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent,
 		err = ext4_xattr_inode_inc_ref(handle, ea_inode);
 		if (err) {
 			ext4_warning_inode(ea_inode, "inc ref error %d", err);
-			iput(ea_inode);
+			ext4_put_ea_inode(ea_inode);
 			goto cleanup;
 		}
-		iput(ea_inode);
+		ext4_put_ea_inode(ea_inode);
 	}
 	return 0;
 
@@ -1133,7 +1133,7 @@ static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent,
 		if (err)
 			ext4_warning_inode(ea_inode, "cleanup dec ref error %d",
 					   err);
-		iput(ea_inode);
+		ext4_put_ea_inode(ea_inode);
 	}
 	return saved_err;
 }
@@ -1201,7 +1201,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 		if (err) {
 			ext4_warning_inode(ea_inode,
 					   "Expand inode array err=%d", err);
-			iput(ea_inode);
+			ext4_put_ea_inode(ea_inode);
 			continue;
 		}
 
@@ -1505,7 +1505,7 @@ static struct inode *ext4_xattr_inode_create(handle_t *handle,
 			if (ext4_xattr_inode_dec_ref(handle, ea_inode))
 				ext4_warning_inode(ea_inode,
 					"cleanup dec ref error %d", err);
-			iput(ea_inode);
+			ext4_put_ea_inode(ea_inode);
 			return ERR_PTR(err);
 		}
 
@@ -1564,7 +1564,7 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
 			kvfree(ea_data);
 			return ea_inode;
 		}
-		iput(ea_inode);
+		ext4_put_ea_inode(ea_inode);
 	next_entry:
 		ce = mb_cache_entry_find_next(ea_inode_cache, ce);
 	}
@@ -1615,7 +1615,7 @@ static struct inode *ext4_xattr_inode_lookup_create(handle_t *handle,
 				      ea_inode->i_ino, true /* reusable */);
 	return ea_inode;
 out_err:
-	iput(ea_inode);
+	ext4_put_ea_inode(ea_inode);
 	ext4_xattr_inode_free_quota(inode, NULL, value_len);
 	return ERR_PTR(err);
 }
@@ -1848,7 +1848,7 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 
 	ret = 0;
 out:
-	iput(old_ea_inode);
+	ext4_put_ea_inode(old_ea_inode);
 	return ret;
 }
 
@@ -2010,7 +2010,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 				old_ea_inode_quota = le32_to_cpu(
 						s->here->e_value_size);
 			}
-			iput(tmp_inode);
+			ext4_put_ea_inode(tmp_inode);
 
 			s->here->e_value_inum = 0;
 			s->here->e_value_size = 0;
@@ -2150,7 +2150,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 					ext4_warning_inode(ea_inode,
 							   "dec ref error=%d",
 							   error);
-				iput(ea_inode);
+				ext4_put_ea_inode(ea_inode);
 				ea_inode = NULL;
 			}
 
@@ -2203,7 +2203,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 			ext4_xattr_inode_free_quota(inode, ea_inode,
 						    i_size_read(ea_inode));
 		}
-		iput(ea_inode);
+		ext4_put_ea_inode(ea_inode);
 	}
 	if (ce)
 		mb_cache_entry_put(ea_block_cache, ce);
@@ -2285,7 +2285,7 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
 
 			ext4_xattr_inode_free_quota(inode, ea_inode,
 						    i_size_read(ea_inode));
-			iput(ea_inode);
+			ext4_put_ea_inode(ea_inode);
 		}
 		return error;
 	}
@@ -2297,7 +2297,7 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
 		header->h_magic = cpu_to_le32(0);
 		ext4_clear_inode_state(inode, EXT4_STATE_XATTR);
 	}
-	iput(ea_inode);
+	ext4_put_ea_inode(ea_inode);
 	return 0;
 }
 
@@ -2986,7 +2986,7 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 					continue;
 				ext4_xattr_inode_free_quota(inode, ea_inode,
 					      le32_to_cpu(entry->e_value_size));
-				iput(ea_inode);
+				ext4_put_ea_inode(ea_inode);
 			}
 
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v12 4/4] ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode()
  2026-06-30 10:08 [PATCH v12 0/4] ext4: deferred iput framework for EA inodes Yun Zhou
                   ` (2 preceding siblings ...)
  2026-06-30 10:08 ` [PATCH v12 3/4] ext4: convert all EA inode iput() calls to ext4_put_ea_inode() Yun Zhou
@ 2026-06-30 10:08 ` Yun Zhou
  3 siblings, 0 replies; 7+ messages in thread
From: Yun Zhou @ 2026-06-30 10:08 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner
  Cc: linux-ext4, linux-kernel, yun.zhou, linux-fsdevel

Now that ext4_put_ea_inode() handles deferred iput safely for all cases
(using iput_if_not_last + embedded llist_node), the ea_inode_array
mechanism for batching deferred iputs is redundant.

Remove:
- ext4_expand_inode_array() and ext4_xattr_inode_array_free()
- struct ext4_xattr_inode_array and EIA_INCR/EIA_MASK defines
- ea_inode_array parameter from ext4_xattr_inode_dec_ref_all(),
  ext4_xattr_release_block(), and ext4_xattr_delete_inode()
- ea_inode_array variable from ext4_evict_inode()

Instead, ext4_xattr_inode_dec_ref_all() now calls ext4_put_ea_inode()
directly after processing each EA inode.  This simplifies the code
by eliminating multi-layer parameter threading and removes the need
for callers to manage array lifetime.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |  6 +---
 fs/ext4/xattr.c | 80 ++++---------------------------------------------
 fs/ext4/xattr.h |  7 -----
 3 files changed, 6 insertions(+), 87 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d131371ad3d..6f1b84e46a2e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -176,7 +176,6 @@ void ext4_evict_inode(struct inode *inode)
 	 * (xattr block freeing), bitmap, group descriptor (inode freeing)
 	 */
 	int extra_credits = 6;
-	struct ext4_xattr_inode_array *ea_inode_array = NULL;
 	bool freeze_protected = false;
 
 	trace_ext4_evict_inode(inode);
@@ -282,8 +281,7 @@ void ext4_evict_inode(struct inode *inode)
 	}
 
 	/* Remove xattr references. */
-	err = ext4_xattr_delete_inode(handle, inode, &ea_inode_array,
-				      extra_credits);
+	err = ext4_xattr_delete_inode(handle, inode, extra_credits);
 	if (err) {
 		ext4_warning(inode->i_sb, "xattr delete (err %d)", err);
 stop_handle:
@@ -291,7 +289,6 @@ void ext4_evict_inode(struct inode *inode)
 		ext4_orphan_del(NULL, inode);
 		if (freeze_protected)
 			sb_end_intwrite(inode->i_sb);
-		ext4_xattr_inode_array_free(ea_inode_array);
 		goto no_delete;
 	}
 
@@ -321,7 +318,6 @@ void ext4_evict_inode(struct inode *inode)
 	ext4_journal_stop(handle);
 	if (freeze_protected)
 		sb_end_intwrite(inode->i_sb);
-	ext4_xattr_inode_array_free(ea_inode_array);
 	return;
 no_delete:
 	/*
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 6a1f2bdb6ff8..4ae6ce111566 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -114,10 +114,6 @@ const struct xattr_handler * const ext4_xattr_handlers[] = {
 #define EA_INODE_CACHE(inode)	(((struct ext4_sb_info *) \
 				inode->i_sb->s_fs_info)->s_ea_inode_cache)
 
-static int
-ext4_expand_inode_array(struct ext4_xattr_inode_array **ea_inode_array,
-			struct inode *inode);
-
 #ifdef CONFIG_LOCKDEP
 void ext4_xattr_inode_set_class(struct inode *ea_inode)
 {
@@ -1160,7 +1156,6 @@ static void
 ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 			     struct buffer_head *bh,
 			     struct ext4_xattr_entry *first, bool block_csum,
-			     struct ext4_xattr_inode_array **ea_inode_array,
 			     int extra_credits, bool skip_quota)
 {
 	struct inode *ea_inode;
@@ -1197,14 +1192,6 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 		if (err)
 			continue;
 
-		err = ext4_expand_inode_array(ea_inode_array, ea_inode);
-		if (err) {
-			ext4_warning_inode(ea_inode,
-					   "Expand inode array err=%d", err);
-			ext4_put_ea_inode(ea_inode);
-			continue;
-		}
-
 		err = ext4_journal_ensure_credits_fn(handle, credits, credits,
 			ext4_free_metadata_revoke_credits(parent->i_sb, 1),
 			ext4_xattr_restart_fn(handle, parent, bh, block_csum,
@@ -1212,6 +1199,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 		if (err < 0) {
 			ext4_warning_inode(ea_inode, "Ensure credits err=%d",
 					   err);
+			ext4_put_ea_inode(ea_inode);
 			continue;
 		}
 		if (err > 0) {
@@ -1221,6 +1209,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 				ext4_warning_inode(ea_inode,
 						"Re-get write access err=%d",
 						err);
+				ext4_put_ea_inode(ea_inode);
 				continue;
 			}
 		}
@@ -1229,6 +1218,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 		if (err) {
 			ext4_warning_inode(ea_inode, "ea_inode dec ref err=%d",
 					   err);
+			ext4_put_ea_inode(ea_inode);
 			continue;
 		}
 
@@ -1245,6 +1235,7 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 		entry->e_value_inum = 0;
 		entry->e_value_size = 0;
 
+		ext4_put_ea_inode(ea_inode);
 		dirty = true;
 	}
 
@@ -1271,7 +1262,6 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 static void
 ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 			 struct buffer_head *bh,
-			 struct ext4_xattr_inode_array **ea_inode_array,
 			 int extra_credits)
 {
 	struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode);
@@ -1313,7 +1303,6 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 			ext4_xattr_inode_dec_ref_all(handle, inode, bh,
 						     BFIRST(bh),
 						     true /* block_csum */,
-						     ea_inode_array,
 						     extra_credits,
 						     true /* skip_quota */);
 		ext4_free_blocks(handle, inode, bh, 0, 1,
@@ -2182,12 +2171,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 
 	/* Drop the previous xattr block. */
 	if (bs->bh && bs->bh != new_bh) {
-		struct ext4_xattr_inode_array *ea_inode_array = NULL;
-
 		ext4_xattr_release_block(handle, inode, bs->bh,
-					 &ea_inode_array,
 					 0 /* extra_credits */);
-		ext4_xattr_inode_array_free(ea_inode_array);
 	}
 	error = 0;
 
@@ -2863,46 +2848,6 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
 	return error;
 }
 
-#define EIA_INCR 16 /* must be 2^n */
-#define EIA_MASK (EIA_INCR - 1)
-
-/* Add the large xattr @inode into @ea_inode_array for deferred iput().
- * If @ea_inode_array is new or full it will be grown and the old
- * contents copied over.
- */
-static int
-ext4_expand_inode_array(struct ext4_xattr_inode_array **ea_inode_array,
-			struct inode *inode)
-{
-	if (*ea_inode_array == NULL) {
-		/*
-		 * Start with 15 inodes, so it fits into a power-of-two size.
-		 */
-		(*ea_inode_array) = kmalloc_flex(**ea_inode_array, inodes,
-						 EIA_MASK, GFP_NOFS);
-		if (*ea_inode_array == NULL)
-			return -ENOMEM;
-		(*ea_inode_array)->count = 0;
-	} else if (((*ea_inode_array)->count & EIA_MASK) == EIA_MASK) {
-		/* expand the array once all 15 + n * 16 slots are full */
-		struct ext4_xattr_inode_array *new_array = NULL;
-
-		new_array = kmalloc_flex(**ea_inode_array, inodes,
-					 (*ea_inode_array)->count + EIA_INCR,
-					 GFP_NOFS);
-		if (new_array == NULL)
-			return -ENOMEM;
-		memcpy(new_array, *ea_inode_array,
-		       struct_size(*ea_inode_array, inodes,
-				   (*ea_inode_array)->count));
-		kfree(*ea_inode_array);
-		*ea_inode_array = new_array;
-	}
-	(*ea_inode_array)->count++;
-	(*ea_inode_array)->inodes[(*ea_inode_array)->count - 1] = inode;
-	return 0;
-}
-
 /*
  * ext4_xattr_delete_inode()
  *
@@ -2913,7 +2858,6 @@ ext4_expand_inode_array(struct ext4_xattr_inode_array **ea_inode_array,
  * references on xattr block and xattr inodes.
  */
 int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
-			    struct ext4_xattr_inode_array **ea_inode_array,
 			    int extra_credits)
 {
 	struct buffer_head *bh = NULL;
@@ -2952,7 +2896,6 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 			ext4_xattr_inode_dec_ref_all(handle, inode, iloc.bh,
 						     IFIRST(header),
 						     false /* block_csum */,
-						     ea_inode_array,
 						     extra_credits,
 						     false /* skip_quota */);
 	}
@@ -2991,7 +2934,7 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 
 		}
 
-		ext4_xattr_release_block(handle, inode, bh, ea_inode_array,
+		ext4_xattr_release_block(handle, inode, bh,
 					 extra_credits);
 		/*
 		 * Update i_file_acl value in the same transaction that releases
@@ -3013,19 +2956,6 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 	return error;
 }
 
-void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *ea_inode_array)
-{
-	int idx;
-
-	if (ea_inode_array == NULL)
-		return;
-
-	for (idx = 0; idx < ea_inode_array->count; ++idx)
-		iput(ea_inode_array->inodes[idx]);
-	kfree(ea_inode_array);
-}
-
-
 /*
  * Worker function for deferred EA inode iput.  Processes all inodes queued
  * on s_ea_inode_to_free in a context free of xattr_sem/jbd2 handle locks.
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 2ff4b6eccd40..821dc6a50e51 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -131,11 +131,6 @@ struct ext4_xattr_ibody_find {
 	struct ext4_iloc iloc;
 };
 
-struct ext4_xattr_inode_array {
-	unsigned int count;
-	struct inode *inodes[] __counted_by(count);
-};
-
 extern const struct xattr_handler ext4_xattr_user_handler;
 extern const struct xattr_handler ext4_xattr_trusted_handler;
 extern const struct xattr_handler ext4_xattr_security_handler;
@@ -187,9 +182,7 @@ extern int __ext4_xattr_set_credits(struct super_block *sb, struct inode *inode,
 				bool is_create);
 
 extern int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
-				   struct ext4_xattr_inode_array **array,
 				   int extra_credits);
-extern void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *array);
 extern void ext4_init_ea_inode_work(struct ext4_sb_info *sbi);
 extern void ext4_put_ea_inode(struct inode *inode);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
  2026-06-30 10:08 ` [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
@ 2026-06-30 10:15   ` Zhou, Yun
  2026-06-30 11:49   ` Jan Kara
  1 sibling, 0 replies; 7+ messages in thread
From: Zhou, Yun @ 2026-06-30 10:15 UTC (permalink / raw)
  To: jack
  Cc: linux-ext4, linux-kernel, linux-fsdevel, tytso, adilger.kernel,
	libaokun, ojaswin, ritesh.list, yi.zhang, viro, brauner

Hi Honza,

On 6/30/26 18:08, Yun Zhou wrote:
> +/*
> + * Release a VFS reference on an EA inode.  Must be used instead of iput()
> + * in any context where xattr_sem or a jbd2 handle is held.
> + *
> + * If this is not the last reference, drops it immediately via
> + * iput_if_not_last() with no further action needed.
> + *
> + * If this is the last reference, the inode is linked onto a per-sb
> + * llist via i_ea_iput_node (embedded in ext4_inode_info, sharing space
> + * with the unused xattr_sem) and a delayed worker performs the final
> + * iput() in a clean context.
> + *
> + * Note: while an inode is on s_ea_inode_to_free, the unconsumed i_count
> + * reference (still 1) keeps it in the inode cache, so any concurrent
> + * iget() bumps i_count to >= 2 and iput_if_not_last() will succeed.
> + * Nobody will add the inode a second time until ext4_ea_inode_work()
> + * drops that reference via iput().
> + */
> +void ext4_put_ea_inode(struct inode *inode)
> +{
> +	if (!inode)
> +		return;
> +	WARN_ON_ONCE(!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL));
> +	if (iput_if_not_last(inode))
> +		return;
> +	llist_add(&EXT4_I(inode)->i_ea_iput_node,
> +		  &EXT4_SB(inode->i_sb)->s_ea_inode_to_free);
> +	/*
> +	 * Use a short delay to allow multiple EA inodes to accumulate,
> +	 * reducing workqueue wakeups when several are released together.
> +	 */
> +	schedule_delayed_work(&EXT4_SB(inode->i_sb)->s_ea_inode_work, 1);
> +}
> +

Could you please help me review this patch again? I have reduced the
parameter of ext4_put_ea_inode() to one.
>   
>   extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>   			    struct ext4_inode *raw_inode, handle_t *handle);

Thanks,
Yun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
  2026-06-30 10:08 ` [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
  2026-06-30 10:15   ` Zhou, Yun
@ 2026-06-30 11:49   ` Jan Kara
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Kara @ 2026-06-30 11:49 UTC (permalink / raw)
  To: Yun Zhou
  Cc: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, viro, brauner, linux-ext4, linux-kernel, linux-fsdevel

On Tue 30-06-26 18:08:27, Yun Zhou wrote:
> Calling iput() on EA inodes while holding xattr_sem or a jbd2 handle
> can trigger write_inode_now() -> ext4_writepages() -> s_writepages_rwsem,
> creating a lock ordering issue during mount (!SB_ACTIVE).
> 
> Add ext4_put_ea_inode() which uses iput_if_not_last() as a fast path.
> If this is not the last reference, it is dropped immediately.  If this
> is the last reference, the inode is linked onto a per-sb lock-free llist
> via i_ea_iput_node (embedded in ext4_inode_info, sharing space with the
> unused xattr_sem of EA inodes via a union) and a delayed worker
> (1 jiffie) performs the final iput() in a clean context.  This avoids
> per-iput memory allocation.
> 
> Flush points are placed before quota shutdown (ext4_put_super and
> failed_mount9) and before freeing structures that eviction depends on
> (failed_mount_wq and failed_mount3a).  Initialization is placed before
> journal loading since fast commit replay may trigger evictions that call
> ext4_put_ea_inode().
> 
> Also moves init_rwsem(xattr_sem) from init_once to ext4_alloc_inode to
> handle slab object reuse after the union field has been overwritten.
> 
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
> Suggested-by: Jan Kara <jack@suse.cz>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h  | 13 ++++++++++-
>  fs/ext4/super.c | 18 ++++++++++++++-
>  fs/ext4/xattr.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/xattr.h |  2 ++
>  4 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b37c136ea3ab..b9b0ada7774b 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1070,8 +1070,14 @@ struct ext4_inode_info {
>  	 * between readers of EAs and writers of regular file data, so
>  	 * instead we synchronize on xattr_sem when reading or changing
>  	 * EAs.
> +	 *
> +	 * EA inodes (EXT4_EA_INODE_FL) do not use xattr_sem; they reuse
> +	 * the space for deferred iput linkage.
>  	 */
> -	struct rw_semaphore xattr_sem;
> +	union {
> +		struct rw_semaphore xattr_sem;
> +		struct llist_node i_ea_iput_node;
> +	};
>  
>  	/*
>  	 * Inodes with EXT4_STATE_ORPHAN_FILE use i_orphan_idx. Otherwise
> @@ -1770,6 +1776,11 @@ struct ext4_sb_info {
>  	struct ext4_es_stats s_es_stats;
>  	struct mb_cache *s_ea_block_cache;
>  	struct mb_cache *s_ea_inode_cache;
> +
> +	/* Deferred iput for EA inodes to avoid lock ordering issues */
> +	struct llist_head s_ea_inode_to_free;
> +	struct delayed_work s_ea_inode_work;
> +
>  	spinlock_t s_es_lock ____cacheline_aligned_in_smp;
>  
>  	/* Journal triggers for checksum computation */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 245f67d10ded..3efa5a817bef 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1303,6 +1303,8 @@ static void ext4_put_super(struct super_block *sb)
>  			 &sb->s_uuid);
>  
>  	ext4_unregister_li_request(sb);
> +	/* Drain deferred EA inode iputs while quota is still active. */
> +	flush_delayed_work(&sbi->s_ea_inode_work);
>  	ext4_quotas_off(sb, EXT4_MAXQUOTAS);
>  
>  	destroy_workqueue(sbi->rsv_conversion_wq);
> @@ -1423,6 +1425,13 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
>  	memset(&ei->i_dquot, 0, sizeof(ei->i_dquot));
>  #endif
>  	ei->jinode = NULL;
> +	/*
> +	 * Reinitialize xattr_sem every allocation because EA inodes
> +	 * share this space with i_ea_iput_node (via union) which may
> +	 * have overwritten the semaphore when the slab object was
> +	 * previously used as an EA inode.
> +	 */
> +	init_rwsem(&ei->xattr_sem);
>  	INIT_LIST_HEAD(&ei->i_rsv_conversion_list);
>  	spin_lock_init(&ei->i_completed_io_lock);
>  	ei->i_sync_tid = 0;
> @@ -1488,7 +1497,6 @@ static void init_once(void *foo)
>  	struct ext4_inode_info *ei = foo;
>  
>  	INIT_LIST_HEAD(&ei->i_orphan);
> -	init_rwsem(&ei->xattr_sem);
>  	init_rwsem(&ei->i_data_sem);
>  	inode_init_once(&ei->vfs_inode);
>  	ext4_fc_init_inode(&ei->vfs_inode);
> @@ -5497,6 +5505,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  			  ext4_has_feature_orphan_present(sb) ||
>  			  ext4_has_feature_journal_needs_recovery(sb));
>  
> +	ext4_init_ea_inode_work(sbi);
> +
>  	if (ext4_has_feature_mmp(sb) && !sb_rdonly(sb)) {
>  		err = ext4_multi_mount_protect(sb, le64_to_cpu(es->s_mmp_block));
>  		if (err)
> @@ -5747,6 +5757,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  	return 0;
>  
>  failed_mount9:
> +	/* Drain deferred EA inode iputs before quota shutdown */
> +	flush_delayed_work(&sbi->s_ea_inode_work);
>  	ext4_quotas_off(sb, EXT4_MAXQUOTAS);
>  failed_mount8: __maybe_unused
>  	ext4_release_orphan_info(sb);
> @@ -5767,6 +5779,8 @@ failed_mount8: __maybe_unused
>  	if (EXT4_SB(sb)->rsv_conversion_wq)
>  		destroy_workqueue(EXT4_SB(sb)->rsv_conversion_wq);
>  failed_mount_wq:
> +	/* Drain deferred EA inode iputs before freeing structures */
> +	flush_delayed_work(&sbi->s_ea_inode_work);
>  	ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
>  	sbi->s_ea_inode_cache = NULL;
>  
> @@ -5777,6 +5791,8 @@ failed_mount8: __maybe_unused
>  		ext4_journal_destroy(sbi, sbi->s_journal);
>  	}
>  failed_mount3a:
> +	/* Drain deferred EA inode iputs from journal replay */
> +	flush_delayed_work(&sbi->s_ea_inode_work);
>  	ext4_es_unregister_shrinker(sbi);
>  failed_mount3:
>  	/* flush s_sb_upd_work before sbi destroy */
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 982a1f831e22..d5bccc64b032 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -3025,6 +3025,66 @@ void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *ea_inode_array)
>  	kfree(ea_inode_array);
>  }
>  
> +
> +/*
> + * Worker function for deferred EA inode iput.  Processes all inodes queued
> + * on s_ea_inode_to_free in a context free of xattr_sem/jbd2 handle locks.
> + */
> +static void ext4_ea_inode_work(struct work_struct *work)
> +{
> +	struct ext4_sb_info *sbi = container_of(to_delayed_work(work),
> +						struct ext4_sb_info,
> +						s_ea_inode_work);
> +	struct llist_node *node = llist_del_all(&sbi->s_ea_inode_to_free);
> +
> +	while (node) {
> +		struct ext4_inode_info *ei = container_of(node,
> +					struct ext4_inode_info, i_ea_iput_node);
> +		node = node->next;
> +		iput(&ei->vfs_inode);
> +	}
> +}
> +
> +/*
> + * Release a VFS reference on an EA inode.  Must be used instead of iput()
> + * in any context where xattr_sem or a jbd2 handle is held.
> + *
> + * If this is not the last reference, drops it immediately via
> + * iput_if_not_last() with no further action needed.
> + *
> + * If this is the last reference, the inode is linked onto a per-sb
> + * llist via i_ea_iput_node (embedded in ext4_inode_info, sharing space
> + * with the unused xattr_sem) and a delayed worker performs the final
> + * iput() in a clean context.
> + *
> + * Note: while an inode is on s_ea_inode_to_free, the unconsumed i_count
> + * reference (still 1) keeps it in the inode cache, so any concurrent
> + * iget() bumps i_count to >= 2 and iput_if_not_last() will succeed.
> + * Nobody will add the inode a second time until ext4_ea_inode_work()
> + * drops that reference via iput().
> + */
> +void ext4_put_ea_inode(struct inode *inode)
> +{
> +	if (!inode)
> +		return;
> +	WARN_ON_ONCE(!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL));
> +	if (iput_if_not_last(inode))
> +		return;
> +	llist_add(&EXT4_I(inode)->i_ea_iput_node,
> +		  &EXT4_SB(inode->i_sb)->s_ea_inode_to_free);
> +	/*
> +	 * Use a short delay to allow multiple EA inodes to accumulate,
> +	 * reducing workqueue wakeups when several are released together.
> +	 */
> +	schedule_delayed_work(&EXT4_SB(inode->i_sb)->s_ea_inode_work, 1);
> +}
> +
> +void ext4_init_ea_inode_work(struct ext4_sb_info *sbi)
> +{
> +	init_llist_head(&sbi->s_ea_inode_to_free);
> +	INIT_DELAYED_WORK(&sbi->s_ea_inode_work, ext4_ea_inode_work);
> +}
> +
>  /*
>   * ext4_xattr_block_cache_insert()
>   *
> diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
> index 1fedf44d4fb6..2ff4b6eccd40 100644
> --- a/fs/ext4/xattr.h
> +++ b/fs/ext4/xattr.h
> @@ -190,6 +190,8 @@ extern int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
>  				   struct ext4_xattr_inode_array **array,
>  				   int extra_credits);
>  extern void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *array);
> +extern void ext4_init_ea_inode_work(struct ext4_sb_info *sbi);
> +extern void ext4_put_ea_inode(struct inode *inode);
>  
>  extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>  			    struct ext4_inode *raw_inode, handle_t *handle);
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-30 11:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30 10:08 [PATCH v12 0/4] ext4: deferred iput framework for EA inodes Yun Zhou
2026-06-30 10:08 ` [PATCH v12 1/4] fs: add iput_if_not_last() helper Yun Zhou
2026-06-30 10:08 ` [PATCH v12 2/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput Yun Zhou
2026-06-30 10:15   ` Zhou, Yun
2026-06-30 11:49   ` Jan Kara
2026-06-30 10:08 ` [PATCH v12 3/4] ext4: convert all EA inode iput() calls to ext4_put_ea_inode() Yun Zhou
2026-06-30 10:08 ` [PATCH v12 4/4] ext4: remove ea_inode_array mechanism in favor of ext4_put_ea_inode() Yun Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox