All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis Henriques (SUSE)" <luis.henriques@linux.dev>
To: Theodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger@dilger.ca>,
	Jan Kara <jack@suse.cz>,
	Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Luis Henriques (SUSE)" <luis.henriques@linux.dev>
Subject: [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit
Date: Thu, 23 May 2024 12:16:18 +0100	[thread overview]
Message-ID: <20240523111618.17012-1-luis.henriques@linux.dev> (raw)

When a full journal commit is on-going, any fast commit has to be enqueued
into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
is done only once, i.e. if an inode is already queued in a previous fast
commit entry it won't be enqueued again.  However, if a full commit starts
_after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
be done into FC_Q_STAGING.  And this is not being done in function
ext4_fc_track_template().

This patch fixes the issue by flagging an inode that is already enqueued in
either queues.  Later, during the fast commit clean-up callback, if the
inode has a tid that is bigger than the one being handled, that inode is
re-enqueued into STAGING and the spliced back into MAIN.

This bug was found using fstest generic/047.  This test creates several 32k
bytes files, sync'ing each of them after it's creation, and then shutting
down the filesystem.  Some data may be loss in this operation; for example a
file may have it's size truncated to zero.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
---
Hi!

(Now Cc'ing Harshad, as I should have done in the initial RFC.)

This v2 is a complete different solution, hinted by Jan Kara.  I hope my
understanding of his suggestion is correct.  Also, I've dropped the second
patch as it didn't made sense, as Jan also pointed out.

Finally, I haven't yet done a review of Harshad's patchset [1] (hope to
get to it soon), but a quick test shows the issue is still present there.
The good news is that patch can be trivially applied on top of it.

[1] https://lore.kernel.org/all/20240520055153.136091-1-harshadshirwadkar@gmail.com

Cheers,
--
Luis

 fs/ext4/ext4.h        | 11 ++++++++++-
 fs/ext4/fast_commit.c | 11 +++++++++++
 fs/ext4/super.c       |  1 +
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 983dad8c07ec..4c308c18c3da 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1062,9 +1062,18 @@ struct ext4_inode_info {
 	/* Fast commit wait queue for this inode */
 	wait_queue_head_t i_fc_wait;
 
-	/* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
+	/*
+	 * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len,
+	 * i_fc_next
+	 */
 	struct mutex i_fc_lock;
 
+	/*
+	 * Used to flag an inode as part of the next fast commit; will be
+	 * reset during fast commit clean-up
+	 */
+	tid_t i_fc_next;
+
 	/*
 	 * i_disksize keeps track of what the inode size is ON DISK, not
 	 * in memory.  During truncate, i_size is set to the new size by
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 87c009e0c59a..bfdf249f0783 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -402,6 +402,8 @@ static int ext4_fc_track_template(
 				 sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
 				&sbi->s_fc_q[FC_Q_STAGING] :
 				&sbi->s_fc_q[FC_Q_MAIN]);
+	else
+		ei->i_fc_next = tid;
 	spin_unlock(&sbi->s_fc_lock);
 
 	return ret;
@@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 	list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
 				 i_fc_list) {
 		list_del_init(&iter->i_fc_list);
+		if (iter->i_fc_next == tid)
+			iter->i_fc_next = 0;
+		else if (iter->i_fc_next > tid)
+			/*
+			 * re-enqueue inode into STAGING, which will later be
+			 * splice back into MAIN
+			 */
+			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
+				      &sbi->s_fc_q[FC_Q_STAGING]);
 		ext4_clear_inode_state(&iter->vfs_inode,
 				       EXT4_STATE_FC_COMMITTING);
 		if (iter->i_sync_tid <= tid)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 893ab80dafba..56f416656d96 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1437,6 +1437,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
 	ext4_fc_init_inode(&ei->vfs_inode);
 	mutex_init(&ei->i_fc_lock);
+	ei->i_fc_next = 0;
 	return &ei->vfs_inode;
 }
 

             reply	other threads:[~2024-05-23 11:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 11:16 Luis Henriques (SUSE) [this message]
2024-05-24 16:22 ` [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit Jan Kara
2024-05-27  8:29   ` Luis Henriques
2024-05-27 15:48     ` Luis Henriques
2024-05-28 10:36       ` Jan Kara
2024-05-28 10:52         ` Jan Kara
2024-05-28 15:50           ` Luis Henriques
2024-05-29  0:01             ` harshad shirwadkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240523111618.17012-1-luis.henriques@linux.dev \
    --to=luis.henriques@linux.dev \
    --cc=adilger@dilger.ca \
    --cc=harshadshirwadkar@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.