public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] ext4: fast commit: fix lockdep issues
@ 2025-12-22 15:19 Li Chen
  2025-12-22 15:19 ` [RFC PATCH v2 1/2] ext4: fast_commit: assert i_data_sem only before sleep Li Chen
  2025-12-22 15:19 ` [RFC PATCH v2 2/2] ext4: fast commit: fix s_fc_lock vs i_data_sem inversion Li Chen
  0 siblings, 2 replies; 3+ messages in thread
From: Li Chen @ 2025-12-22 15:19 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel

Hi,

This series fixes two lockdep issues in the ext4 fast commit paths.

1) ext4_fc_track_inode() can return without sleeping when
   EXT4_STATE_FC_COMMITTING is already clear. The lockdep assertion for
   i_data_sem should only fire when we actually go to sleep.

2) lockdep reports a possible deadlock due to lock order inversion
   between s_fc_lock and i_data_sem. The fast commit writer held s_fc_lock
   while writing the fast commit log. Writing the journal inode mapping
   can call ext4_map_blocks() and take i_data_sem, while metadata update
   paths can hold i_data_sem and call ext4_fc_track_inode() which takes
   s_fc_lock.

The fix drops s_fc_lock before the log writing step and uses
EXT4_STATE_FC_COMMITTING to keep inode and create dentry state stable
until cleanup.

Testing:
- QEMU VM, ext4 -O fast_commit on virtio-pmem + dax, verified both lockdep
  report reproduces on an older kernel and is gone with this series.

RFC v1 -> RFC v2:
 - patch 1: move comments to correct place
 - patch 2: add it to patchset.
 - add missing RFC prefix

RFC v1: https://lore.kernel.org/linux-ext4/20251222032655.87056-1-me@linux.beauty/T/#u

Li Chen (2):
  ext4: fast_commit: assert i_data_sem only before sleep
  ext4: fast commit: fix s_fc_lock vs i_data_sem inversion

 fs/ext4/fast_commit.c | 96 +++++++++++++++++++++++++++++++------------
 1 file changed, 69 insertions(+), 27 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH v2 1/2] ext4: fast_commit: assert i_data_sem only before sleep
  2025-12-22 15:19 [RFC PATCH v2 0/2] ext4: fast commit: fix lockdep issues Li Chen
@ 2025-12-22 15:19 ` Li Chen
  2025-12-22 15:19 ` [RFC PATCH v2 2/2] ext4: fast commit: fix s_fc_lock vs i_data_sem inversion Li Chen
  1 sibling, 0 replies; 3+ messages in thread
From: Li Chen @ 2025-12-22 15:19 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel; +Cc: Li Chen

ext4_fc_track_inode() can return without sleeping when
EXT4_STATE_FC_COMMITTING is already clear. The lockdep assertion for
ei->i_data_sem was done unconditionally before the wait loop, which can
WARN in call paths that hold i_data_sem even though we never block. Move
lockdep_assert_not_held(&ei->i_data_sem) into the actual sleep path,
right before schedule().

Signed-off-by: Li Chen <me@linux.beauty>
---
 fs/ext4/fast_commit.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index fa66b08de999..3bcdd4619de1 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -566,13 +566,6 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 	if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
 		return;
 
-	/*
-	 * If we come here, we may sleep while waiting for the inode to
-	 * commit. We shouldn't be holding i_data_sem when we go to sleep since
-	 * the commit path needs to grab the lock while committing the inode.
-	 */
-	lockdep_assert_not_held(&ei->i_data_sem);
-
 	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
 #if (BITS_PER_LONG < 64)
 		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
@@ -586,8 +579,16 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 				   EXT4_STATE_FC_COMMITTING);
 #endif
 		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
-		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
+		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
+			/*
+			 * We might sleep while waiting for the inode to commit.
+			 * We shouldn't be holding i_data_sem when we go to sleep
+			 * since the commit path may grab it while committing this
+			 * inode.
+			 */
+			lockdep_assert_not_held(&ei->i_data_sem);
 			schedule();
+		}
 		finish_wait(wq, &wait.wq_entry);
 	}
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC PATCH v2 2/2] ext4: fast commit: fix s_fc_lock vs i_data_sem inversion
  2025-12-22 15:19 [RFC PATCH v2 0/2] ext4: fast commit: fix lockdep issues Li Chen
  2025-12-22 15:19 ` [RFC PATCH v2 1/2] ext4: fast_commit: assert i_data_sem only before sleep Li Chen
@ 2025-12-22 15:19 ` Li Chen
  1 sibling, 0 replies; 3+ messages in thread
From: Li Chen @ 2025-12-22 15:19 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, linux-ext4, linux-kernel; +Cc: Li Chen

lockdep reports a possible deadlock due to lock order inversion:

     CPU0                    CPU1
     ----                    ----
lock(&sbi->s_fc_lock);
                             lock(&ei->i_data_sem);
                             lock(&sbi->s_fc_lock);
rlock(&ei->i_data_sem);

ext4_fc_perform_commit() held s_fc_lock while writing fast commit blocks.
This can write the journal inode, whose mapping can call ext4_map_blocks()
and take i_data_sem. At the same time, metadata update paths can hold
i_data_sem and call ext4_fc_track_inode(), which takes s_fc_lock.

Drop s_fc_lock before the log writing step. Keep inode and dentry state
stable by using EXT4_STATE_FC_COMMITTING for synchronization: ext4_fc_del()
waits for COMMITTING, and inodes referenced only from create dentry updates
are also marked COMMITTING and woken up on cleanup.

Signed-off-by: Li Chen <me@linux.beauty>
---
 fs/ext4/fast_commit.c | 79 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 3bcdd4619de1..722952bea515 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -244,23 +244,26 @@ void ext4_fc_del(struct inode *inode)
 		return;
 	}
 
-	/*
-	 * Since ext4_fc_del is called from ext4_evict_inode while having a
-	 * handle open, there is no need for us to wait here even if a fast
-	 * commit is going on. That is because, if this inode is being
-	 * committed, ext4_mark_inode_dirty would have waited for inode commit
-	 * operation to finish before we come here. So, by the time we come
-	 * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So,
-	 * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode
-	 * here.
-	 *
-	 * We may come here without any handles open in the "no_delete" case of
-	 * ext4_evict_inode as well. However, if that happens, we first mark the
-	 * file system as fast commit ineligible anyway. So, even in that case,
-	 * it is okay to remove the inode from the fc list.
-	 */
-	WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
-		&& !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
+	/* Don't race with fast commit processing of this inode. */
+	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
+#if (BITS_PER_LONG < 64)
+		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
+				EXT4_STATE_FC_COMMITTING);
+		wq = bit_waitqueue(&ei->i_state_flags,
+				   EXT4_STATE_FC_COMMITTING);
+#else
+		DEFINE_WAIT_BIT(wait, &ei->i_flags,
+				EXT4_STATE_FC_COMMITTING);
+		wq = bit_waitqueue(&ei->i_flags, EXT4_STATE_FC_COMMITTING);
+#endif
+		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
+		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
+			mutex_unlock(&sbi->s_fc_lock);
+			schedule();
+			mutex_lock(&sbi->s_fc_lock);
+		}
+		finish_wait(wq, &wait.wq_entry);
+	}
 	while (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
 #if (BITS_PER_LONG < 64)
 		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
@@ -1107,6 +1110,27 @@ static int ext4_fc_perform_commit(journal_t *journal)
 		ext4_set_inode_state(&iter->vfs_inode,
 				     EXT4_STATE_FC_COMMITTING);
 	}
+	/*
+	 * Also mark inodes referenced by create dentry updates. These inodes are
+	 * tracked via i_fc_dilist and might not be on s_fc_q[MAIN].
+	 */
+	{
+		struct ext4_fc_dentry_update *fc_dentry;
+		struct ext4_inode_info *ei;
+
+		list_for_each_entry(fc_dentry, &sbi->s_fc_dentry_q[FC_Q_MAIN],
+				    fcd_list) {
+			if (fc_dentry->fcd_op != EXT4_FC_TAG_CREAT)
+				continue;
+			if (list_empty(&fc_dentry->fcd_dilist))
+				continue;
+			ei = list_first_entry(&fc_dentry->fcd_dilist,
+					      struct ext4_inode_info,
+					      i_fc_dilist);
+			ext4_set_inode_state(&ei->vfs_inode,
+					     EXT4_STATE_FC_COMMITTING);
+		}
+	}
 	mutex_unlock(&sbi->s_fc_lock);
 	jbd2_journal_unlock_updates(journal);
 
@@ -1135,7 +1159,6 @@ static int ext4_fc_perform_commit(journal_t *journal)
 	}
 
 	/* Step 6.2: Now write all the dentry updates. */
-	mutex_lock(&sbi->s_fc_lock);
 	ret = ext4_fc_commit_dentry_updates(journal, &crc);
 	if (ret)
 		goto out;
@@ -1157,7 +1180,6 @@ static int ext4_fc_perform_commit(journal_t *journal)
 	ret = ext4_fc_write_tail(sb, crc);
 
 out:
-	mutex_unlock(&sbi->s_fc_lock);
 	blk_finish_plug(&plug);
 	return ret;
 }
@@ -1339,6 +1361,25 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 					     struct ext4_fc_dentry_update,
 					     fcd_list);
 		list_del_init(&fc_dentry->fcd_list);
+		if (fc_dentry->fcd_op == EXT4_FC_TAG_CREAT &&
+		    !list_empty(&fc_dentry->fcd_dilist)) {
+			ei = list_first_entry(&fc_dentry->fcd_dilist,
+					      struct ext4_inode_info,
+					      i_fc_dilist);
+			ext4_clear_inode_state(&ei->vfs_inode,
+					       EXT4_STATE_FC_COMMITTING);
+			/*
+			 * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+			 * visible before we send the wakeup. Pairs with implicit
+			 * barrier in prepare_to_wait() in ext4_fc_track_inode().
+			 */
+			smp_mb();
+#if (BITS_PER_LONG < 64)
+			wake_up_bit(&ei->i_state_flags, EXT4_STATE_FC_COMMITTING);
+#else
+			wake_up_bit(&ei->i_flags, EXT4_STATE_FC_COMMITTING);
+#endif
+		}
 		list_del_init(&fc_dentry->fcd_dilist);
 
 		release_dentry_name_snapshot(&fc_dentry->fcd_name);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-12-22 15:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-22 15:19 [RFC PATCH v2 0/2] ext4: fast commit: fix lockdep issues Li Chen
2025-12-22 15:19 ` [RFC PATCH v2 1/2] ext4: fast_commit: assert i_data_sem only before sleep Li Chen
2025-12-22 15:19 ` [RFC PATCH v2 2/2] ext4: fast commit: fix s_fc_lock vs i_data_sem inversion Li Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox