Re: [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Luis Henriques <luis.henriques@linux.dev>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Wang Jianjian <wangjianjian0@foxmail.com>,
	"wangjianjian (C)" <wangjianjian3@huawei.com>,
	 Theodore Ts'o <tytso@mit.edu>,  Jan Kara <jack@suse.cz>,
	 Harshad Shirwadkar <harshadshirwadkar@gmail.com>,
	linux-ext4@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit
Date: Fri, 12 Jul 2024 10:15:04 +0100	[thread overview]
Message-ID: <87ed7znf8n.fsf@linux.dev> (raw)
In-Reply-To: <A90C7898-B704-4B2A-BFE6-4A32050763F0@dilger.ca> (Andreas Dilger's message of "Thu, 11 Jul 2024 13:28:23 -0600")

On Thu, Jul 11 2024, Andreas Dilger wrote:

> On Jul 11, 2024, at 10:16 AM, Wang Jianjian <wangjianjian0@foxmail.com> wrote:
>> On 2024/7/11 23:16, Luis Henriques wrote:
>>> On Thu, Jul 11 2024, wangjianjian (C) wrote:
>>> 
>>>> On 2024/7/11 16:35, Luis Henriques (SUSE) wrote:
>>>>> When a full journal commit is on-going, any fast commit has to be enqueued
>>>>> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
>>>>> is done only once, i.e. if an inode is already queued in a previous fast
>>>>> commit entry it won't be enqueued again.  However, if a full commit starts
>>>>> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
>>>>> be done into FC_Q_STAGING.  And this is not being done in function
>>>>> ext4_fc_track_template().
>>>>> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
>>>>> during the fast commit clean-up callback if it has a tid (i_sync_tid)
>>>>> greater than the one being handled.  The STAGING queue will then be spliced
>>>>> back into MAIN.
>>>>> This bug was found using fstest generic/047.  This test creates several 32k
>>>>> bytes files, sync'ing each of them after it's creation, and then shutting
>>>>> down the filesystem.  Some data may be loss in this operation; for example a
>>>>> file may have it's size truncated to zero.
>>>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
>>>>> ---
>>>>> Hi!
>>>>> v4 of this patch enqueues the inode into STAGING *only* if the current tid
>>>>> is non-zero.  It will be zero when doing an fc commit, and this would mean
>>>>> to always re-enqueue the inode.  This fixes the regressions caught by Ted
>>>>> in v3 with fstests generic/472 generic/496 generic/643.
>>>>> Also, since 2nd patch of v3 has already been merged, I've rebased this patch
>>>>> to be applied on top of it.
>>>>>   fs/ext4/fast_commit.c | 10 ++++++++++
>>>>>   1 file changed, 10 insertions(+)
>>>>> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
>>>>> index 3926a05eceee..facbc8dbbaa2 100644
>>>>> --- a/fs/ext4/fast_commit.c
>>>>> +++ b/fs/ext4/fast_commit.c
>>>>> @@ -1290,6 +1290,16 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>>>>>   				       EXT4_STATE_FC_COMMITTING);
>>>>>   		if (tid_geq(tid, iter->i_sync_tid))
>>>>>   			ext4_fc_reset_inode(&iter->vfs_inode);
>>>>> +		} else if (tid) {
>>>>> +			/*
>>>>> +			 * If the tid is valid (i.e. non-zero) re-enqueue the
>>>> one quick question about tid, if one disk is using long time and its tid   get
>>>> wrapped to 0, is it a valid seq? I don't find code handling this situation.
>>> Hmm... OK.  So, to answer to your question, the 'tid' is expected to wrap.
>>> That's why we use:
>>> 
>>> 	if (tid_geq(tid, iter->i_sync_tid))
>> Yes, I know this.
>>> 
>>> instead of:
>>> 
>>> 	if (tid >= iter->i_sync_tid)
>>> 
>>> (The second patch in v3 actually fixed a few places where the tid_*()
>>> helpers weren't being used.)
>>> 
>>> But your question shows me that my patch is wrong as '0' may actually be a
>>> valid 'tid' value.
>> 
>> Actually my question is,  there are some place use '0' to check if a transaction is valid, e.g.
>> 
>> In ext4_wait_for_tail_page_commit()
>> 
>> 5218         while (1) {
>> 5219                 struct folio *folio = filemap_lock_folio(inode->i_mapping,
>> 5220                                       inode->i_size >> PAGE_SHIFT);
>> 5221                 if (IS_ERR(folio))
>> 5222                         return;
>> 5223                 ret = __ext4_journalled_invalidate_folio(folio, offset,
>> 5224 folio_size(folio) - offset);
>> 5225                 folio_unlock(folio);
>> 5226                 folio_put(folio);
>> 5227                 if (ret != -EBUSY)
>> 5228                         return;
>> 5229                 commit_tid = 0;
>> 5230                 read_lock(&journal->j_state_lock);
>> 5231                 if (journal->j_committing_transaction)
>> 5232                         commit_tid = journal->j_committing_transaction->t_tid;
>> 5233                 read_unlock(&journal->j_state_lock);
>> 5234                 if (commit_tid)
>> 5235                         jbd2_log_wait_commit(journal, commit_tid);
>> 5236         }
>> 5237  We only wait commit if tid is not zero.
>> 
>> And in __jbd2_log_wait_for_space()
>> 
>> 79                 if (space_left < nblocks) {
>>  80                         int chkpt = journal->j_checkpoint_transactions != NULL;
>>  81                         tid_t tid = 0;
>>  82
>>  83                         if (journal->j_committing_transaction)
>>  84                                 tid = journal->j_committing_transaction->t_tid;
>>  85 spin_unlock(&journal->j_list_lock);
>>  86 write_unlock(&journal->j_state_lock);
>>  87                         if (chkpt) {
>>  88 jbd2_log_do_checkpoint(journal);
>>  89                         } else if (jbd2_cleanup_journal_tail(journal) == 0) {
>>  90                                 /* We were able to recover space; yay! */
>>  91                                 ;
>>  92                         } else if (tid) {
>>  93                                 /*
>>  94                                  * jbd2_journal_commit_transaction() may want
>>  95                                  * to take the checkpoint_mutex if JBD2_FLUSHED
>>  96                                  * is set.  So we need to temporarily drop it.
>>  97                                  */
>>  98 mutex_unlock(&journal->j_checkpoint_mutex);
>>  99                                 jbd2_log_wait_commit(journal, tid);
>> 100 write_lock(&journal->j_state_lock);
>> 101                                 continue;
>> We also only wait commit if tid is not zero.
>> 
>> Does it mean all these have bugs if '0' is a valid 'tid' ?
>> 
>> But on the other hand, if we don't consider sync and fsync, and default commit interval is 5s,
>> 
>> time of tid wrap to 0 is nearly 680 years. However, we can run sync/fsync to make tid to increase
>> 
>> more quickly in real world ?
>
> The simple solution is that "0" is not a valid sequence.  It looks like
> this is a bug in jbd2_get_transaction() where it is incrementing the TID:
>
>         transaction->t_tid = journal->j_transaction_sequence++;
>
> it should add a check to handle the wrap-around:
>
>         if (unlikely(transaction->t_tid == 0))
>                 transaction->t_tid = journal->j_transaction_sequence++;

Sound good to me.  I can prepare a patch with this change if no one else
sees other issues.  As far as I can see, this shouldn't be a problem even
when replaying journals that still have a '0' tid.

Thanks, Andreas.  And thanks Wang, for spotting this.

Cheers,
-- 
Luís

next prev parent reply	other threads:[~2024-07-12  9:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11  8:35 [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit Luis Henriques (SUSE)
2024-07-11 13:32 ` wangjianjian (C)
2024-07-11 15:16   ` Luis Henriques
2024-07-11 16:16     ` Wang Jianjian
2024-07-11 19:28       ` Andreas Dilger
2024-07-12  0:51         ` wangjianjian (C)
2024-07-12  9:15         ` Luis Henriques [this message]
2024-07-12  9:53           ` [RFC PATCH] jbd2: make '0' an invalid transaction sequence Luis Henriques
2024-07-12 10:04             ` wangjianjian (C)
2024-07-12 10:28               ` wangjianjian (C)
2024-07-16  9:52             ` Jan Kara
2024-07-16 13:11               ` Luis Henriques
2024-07-16 10:24 ` [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit Jan Kara
2024-07-16 14:13   ` Luis Henriques

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ed7znf8n.fsf@linux.dev \
    --to=luis.henriques@linux.dev \
    --cc=adilger@dilger.ca \
    --cc=harshadshirwadkar@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=wangjianjian0@foxmail.com \
    --cc=wangjianjian3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.