From: Luis Henriques <luis.henriques@linux.dev>
To: Jan Kara <jack@suse.cz>
Cc: "Luis Henriques (SUSE)" <luis.henriques@linux.dev>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger@dilger.ca>,
Harshad Shirwadkar <harshadshirwadkar@gmail.com>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit
Date: Tue, 16 Jul 2024 15:13:05 +0100 [thread overview]
Message-ID: <87bk2xtoge.fsf@linux.dev> (raw)
In-Reply-To: <20240716102416.jublpma3qiltlrbr@quack3> (Jan Kara's message of "Tue, 16 Jul 2024 12:24:16 +0200")
On Tue, Jul 16 2024, Jan Kara wrote:
> On Thu 11-07-24 09:35:20, Luis Henriques (SUSE) wrote:
>> When a full journal commit is on-going, any fast commit has to be enqueued
>> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing
>> is done only once, i.e. if an inode is already queued in a previous fast
>> commit entry it won't be enqueued again. However, if a full commit starts
>> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
>> be done into FC_Q_STAGING. And this is not being done in function
>> ext4_fc_track_template().
>>
>> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
>> during the fast commit clean-up callback if it has a tid (i_sync_tid)
>> greater than the one being handled. The STAGING queue will then be spliced
>> back into MAIN.
>>
>> This bug was found using fstest generic/047. This test creates several 32k
>> bytes files, sync'ing each of them after it's creation, and then shutting
>> down the filesystem. Some data may be loss in this operation; for example a
>> file may have it's size truncated to zero.
>>
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>
> ...
>
>> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
>> index 3926a05eceee..facbc8dbbaa2 100644
>> --- a/fs/ext4/fast_commit.c
>> +++ b/fs/ext4/fast_commit.c
>> @@ -1290,6 +1290,16 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>> EXT4_STATE_FC_COMMITTING);
>> if (tid_geq(tid, iter->i_sync_tid))
>> ext4_fc_reset_inode(&iter->vfs_inode);
>> + } else if (tid) {
>> + /*
>> + * If the tid is valid (i.e. non-zero) re-enqueue the
>> + * inode into STAGING, which will then be splice back
>> + * into MAIN
>> + */
>> + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
>> + &sbi->s_fc_q[FC_Q_STAGING]);
>> + }
>
> I don't think this is going to work (even if we fix the tid 0 being special
> assumption). With this there would be a race like:
>
> Task 1 Task2
> modify inode I
> ext4_fc_commit()
> jbd2_fc_begin_commit()
> commits changes
> jbd2_fc_end_commit()
> __jbd2_fc_end_commit(journal, 0, false)
> jbd2_journal_unlock_updates(journal)
> jbd2_journal_start()
> modify inode I
> ...
> ext4_mark_iloc_dirty()
> ext4_fc_track_inode()
> ext4_fc_track_template()
> - doesn't add inode anywhere
> because i_fc_list is not empty
> ext4_fc_cleanup(journal, 0, 0)
> removes inode I from i_fc_list => next fastcommit will not properly
> flush it.
>
> To avoid this race I think we could move the
> journal->j_fc_cleanup_callback() call to happen before we call
> jbd2_journal_unlock_updates(). Then we are sure that inode cannot be
> modified (journal is locked) until we are done processing the fastcommit
> lists when doing fastcommit. Hence your patch could then be changed like:
>
> + } else if (full) {
> + /*
> + * We are called after a full commit, inode has been
> + * modified while the commit was running. Re-enqueue
> + * the inode into STAGING, which will then be splice
> + * back into MAIN. This cannot happen during
> + * fastcommit because the journal is locked all the
> + * time in that case (and tid doesn't increase so
> + * tid check above isn't reliable).
> + */
> + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
> + &sbi->s_fc_q[FC_Q_STAGING]);
> + }
>
> Later, Harshad's patches change the code to use EXT4_STATE_FC_COMMITTING
> for protecting inodes during fastcommit and that will also deal with these
> races without having to keep the whole journal locked.
OK, this looks like it should fix all the issues I was trying to fix
(g/047, g/472, and a few others Ted pointed out). I'll go run a few more
tests on this to try to catch any possible regression.
Once again, thanks a lot for your help, Jan.
Cheers,
--
Luís
prev parent reply other threads:[~2024-07-16 14:13 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 8:35 [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit Luis Henriques (SUSE)
2024-07-11 13:32 ` wangjianjian (C)
2024-07-11 15:16 ` Luis Henriques
2024-07-11 16:16 ` Wang Jianjian
2024-07-11 19:28 ` Andreas Dilger
2024-07-12 0:51 ` wangjianjian (C)
2024-07-12 9:15 ` Luis Henriques
2024-07-12 9:53 ` [RFC PATCH] jbd2: make '0' an invalid transaction sequence Luis Henriques
2024-07-12 10:04 ` wangjianjian (C)
2024-07-12 10:28 ` wangjianjian (C)
2024-07-16 9:52 ` Jan Kara
2024-07-16 13:11 ` Luis Henriques
2024-07-16 10:24 ` [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit Jan Kara
2024-07-16 14:13 ` Luis Henriques [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bk2xtoge.fsf@linux.dev \
--to=luis.henriques@linux.dev \
--cc=adilger@dilger.ca \
--cc=harshadshirwadkar@gmail.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox