Linux EXT4 FS development
 help / color / mirror / Atom feed
* [PATCH v4] ext4: fix ABBA deadlock in ext4_xattr_inode_cache_find()
@ 2026-06-25  6:03 Aditya Srivastava
  0 siblings, 0 replies; only message in thread
From: Aditya Srivastava @ 2026-06-25  6:03 UTC (permalink / raw)
  To: tytso, jack
  Cc: adilger.kernel, libaokun, ritesh.list, yi.zhang, linux-ext4,
	linux-kernel, Aditya Prakash Srivastava, Colin Ian King

From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com>

Syzbot/stress-ng reported an ABBA deadlock in ext4 when exercising
concurrent xattr workloads (using the ea_inode mount/format option).

The deadlock occurs between the running transaction and the eviction
thread:
- Task 1 (stress-ng): Holds a reference to a shared mbcache_entry (ce)
  and calls ext4_xattr_inode_cache_find() -> ext4_iget() to retrieve
  the corresponding EA inode. Since the EA inode is currently being
  evicted, ext4_iget() blocks in __wait_on_freeing_inode() waiting for
  eviction to complete.
- Task 2 (eviction thread): Currently evicting the same EA inode in
  ext4_evict_ea_inode(). It calls mb_cache_entry_wait_unused(oe) which
  blocks waiting for Task 1 to release the reference to the mbcache_entry.

To break this deadlock, implement a new ext4_iget() configuration flag
named EXT4_IGET_NOWAIT. When set, perform a non-blocking lookup of the
inode via VFS's find_inode_nowait() API.

If the inode is currently being evicted (marked with I_FREEING or
I_WILL_FREE) or created (I_CREATING), simply skip it (returning -ESTALE)
rather than waiting for eviction/creation to complete, breaking the ABBA
cycle. If the returned inode is I_NEW, wait for its initialization to
clear via wait_on_new_inode().

If initialization fails and the inode is unhashed during the waking up of
wait_on_new_inode() (e.g., due to an I/O read error in another thread),
safely drop the reference and return -ESTALE to cleanly bypass the xattr
cache entry. Finally, standard validation checks (including is_bad_inode,
EXT4_EA_INODE_FL, file_acl, and xattr flags) are executed as normal inside
check_igot_inode() to fully guarantee VFS-layer safety.

In ext4_xattr_inode_cache_find(), invoke ext4_iget() with the new
EXT4_IGET_NOWAIT flag to perform the non-blocking cache search.

Suggested-by: Jan Kara <jack@suse.cz>
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219283
Fixes: 0a46ef234756 ("ext4: do not create EA inode under buffer lock")
Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@gmail.com>
---
Changes in v4:
  - Check if the inode was unhashed during wait_on_new_inode() waking up
    to handle transient initialization failures (like I/O read errors)
    gracefully. Dropping the reference and returning -ESTALE prevents
    false filesystem corruption errors (__ext4_error), as found by the
    Sashiko AI bot.

Changes in v3:
  - Implement a new ext4_iget() configuration flag named EXT4_IGET_NOWAIT to
    fully contain the non-blocking lookup and VFS-level validations within
    inode.c, as requested by Jan Kara.
  - Skip inodes currently being created (I_CREATING), following Jan Kara's
    direct feedback.
  - Remove all open-coded match helpers and VFS state-checks from xattr.c.

Changes in v2:
  - Read inode state locklessly using inode_state_read_once() to resolve
    a lockdep assertion on cache hit.
  - Manually restore essential inode/ea_inode validations on the retrieved
    inode (is_bad_inode, EXT4_EA_INODE_FL, file_acl, and xattr checks) to
    match VFS safety guarantees and prevent using corrupted/failed inodes.

 fs/ext4/ext4.h  |  3 ++-
 fs/ext4/inode.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
 fs/ext4/xattr.c |  2 +-
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b37c136ea3ab..c76dd0bdd3d8 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3144,7 +3144,8 @@ typedef enum {
 	EXT4_IGET_SPECIAL =	0x0001, /* OK to iget a system inode */
 	EXT4_IGET_HANDLE = 	0x0002,	/* Inode # is from a handle */
 	EXT4_IGET_BAD =		0x0004, /* Allow to iget a bad inode */
-	EXT4_IGET_EA_INODE =	0x0008	/* Inode should contain an EA value */
+	EXT4_IGET_EA_INODE =	0x0008,	/* Inode should contain an EA value */
+	EXT4_IGET_NOWAIT =	0x0010	/* Non-blocking lookup (skip if freeing) */
 } ext4_iget_flags;
 
 extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ce99807c5f5b..75ed467f5abf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5270,6 +5270,24 @@ void ext4_set_inode_mapping_order(struct inode *inode)
 	mapping_set_folio_order_range(inode->i_mapping, min_order, max_order);
 }
 
+static int ext4_iget_match(struct inode *inode, u64 ino, void *data)
+{
+	bool *is_freeing = data;
+
+	if (inode->i_ino != ino)
+		return 0;
+	spin_lock(&inode->i_lock);
+	if (inode_state_read(inode) & (I_FREEING | I_WILL_FREE | I_CREATING)) {
+		if (is_freeing)
+			*is_freeing = true;
+		spin_unlock(&inode->i_lock);
+		return -1;
+	}
+	__iget(inode);
+	spin_unlock(&inode->i_lock);
+	return 1;
+}
+
 struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 			  ext4_iget_flags flags, const char *function,
 			  unsigned int line)
@@ -5298,9 +5316,31 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 		return ERR_PTR(-EFSCORRUPTED);
 	}
 
-	inode = iget_locked(sb, ino);
-	if (!inode)
-		return ERR_PTR(-ENOMEM);
+	if (flags & EXT4_IGET_NOWAIT) {
+		bool is_freeing = false;
+
+		inode = find_inode_nowait(sb, ino, ext4_iget_match, &is_freeing);
+		if (is_freeing)
+			return ERR_PTR(-ESTALE);
+		if (!inode) {
+			inode = iget_locked(sb, ino);
+			if (!inode)
+				return ERR_PTR(-ENOMEM);
+		} else {
+			if (inode_state_read_once(inode) & I_NEW) {
+				wait_on_new_inode(inode);
+				if (unlikely(inode_unhashed(inode))) {
+					iput(inode);
+					return ERR_PTR(-ESTALE);
+				}
+			}
+		}
+	} else {
+		inode = iget_locked(sb, ino);
+		if (!inode)
+			return ERR_PTR(-ENOMEM);
+	}
+
 	if (!(inode_state_read_once(inode) & I_NEW)) {
 		ret = check_igot_inode(inode, flags, function, line);
 		if (ret) {
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 982a1f831e22..21b5670d8503 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1550,7 +1550,7 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
 
 	while (ce) {
 		ea_inode = ext4_iget(inode->i_sb, ce->e_value,
-				     EXT4_IGET_EA_INODE);
+				     EXT4_IGET_EA_INODE | EXT4_IGET_NOWAIT);
 		if (IS_ERR(ea_inode))
 			goto next_entry;
 		ext4_xattr_inode_set_class(ea_inode);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-06-25  6:03 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25  6:03 [PATCH v4] ext4: fix ABBA deadlock in ext4_xattr_inode_cache_find() Aditya Srivastava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox