From: Jan Kara <jack@suse.cz>
To: "zhangyi (F)" <yi.zhang@huawei.com>
Cc: Jan Kara <jack@suse.cz>,
linux-ext4@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, miaoxie@huawei.com
Subject: Re: [PATCH] jbd2: set freed flag while revoking a buffer which belongs to older transaction
Date: Fri, 11 Jan 2019 11:30:29 +0100 [thread overview]
Message-ID: <20190111103029.GA4098@quack2.suse.cz> (raw)
In-Reply-To: <5b2cb7b3-1eff-21d2-cf12-ee844f54eda0@huawei.com>
On Fri 11-01-19 14:11:31, zhangyi (F) wrote:
> On 2019/1/10 19:20, Jan Kara Wrote:
> > On Thu 10-01-19 14:12:02, zhangyi (F) wrote:
> >> Now, we capture a data corruption problem on ext4 while we're truncating
> >> an extent index block. Imaging that if we are revoking a buffer which
> >> has been journaled by the committing transaction, the buffer's jbddirty
> >> flag will not be cleared in jbd2_journal_forget(), so the commit code
> >> will set the buffer dirty flag again after refile the buffer.
> >>
> >> fsx kjournald2
> >> jbd2_journal_commit_transaction
> >> jbd2_journal_revoke commit phase 1~5...
> >> jbd2_journal_forget
> >> belongs to older transaction commit phase 6
> >> jbddirty not clear __jbd2_journal_refile_buffer
> >> __jbd2_journal_unfile_buffer
> >> test_clear_buffer_jbddirty
> >> mark_buffer_dirty
> >>
> >> Finally, if the freed extent index block was allocated again as data
> >> block by some other files, it may corrupt the file data when writing
> >> cached pages later, such as during umount time.
> >>
> >> This patch mark buffer as freed when it already belongs to the
> >> committing transaction in jbd2_journal_forget(), so that commit code
> >> knows it should clear dirty bits when it is done with the buffer.
> >>
> >> This problem can be reproduced by xfstests generic/455 easily with
> >> seeds (3246 3247 3248 3249).
> >>
> >> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> >> Cc: stable@vger.kernel.org
> >
> > Thanks a lot for the analysis and the patch! I fully agree with your
> > analysis however I think just setting buffer as freed isn't completely
> > correct. The problem is following: The metadata buffer X has been modified
> > by the commiting transaction - let's call it A. It has been freed in the
> > currently running transaction B. Now jbd2_journal_forget() clears
> > b_next_transaction and if you set buffer freed flag, X will not be added to
> > the checkpoint list. So when transaction A finishes commit, it can get
> > checkpointed (without writing out X) before transaction B commits. So if a
> > crash occurs before B commits, we'd loose modification of X from
> > transaction A and thus cause filesystem corruption.
> >
> Thanks for your explanation! There are still two points I don't quite
> understand.
>
> I check all three cases of doing checkpoint. IIUC, both jbd2_journal_destroy()
> and jbd2_journal_flush() wait the current running transaction B to complete
> before doing checkpoint besides __jbd2_log_wait_for_space(). So I guess this is
> the case that you mentioned of transaction A could be checkpointed before B
> commits, am I right?
Yes, __jbd2_log_wait_for_space() can checkpoint already committed
transactions (i.e., A in our case) without waiting for the running
transaction (B in our case).
> For another case, jbd2_update_log_tail() will be invoked after transaction B
> complete, so the problem above also can't happen here, right?
I'm not sure which "another case" you speak about here...
> > What rather needs to happen is the same thing that is done in
> > journal_unmap_buffer() in this case: We set buffer freed flag and we also
> > set b_next_transaction to the currently running transaction (B). This will
> > prevent A from being checkpointed before B commits and thus avoids the
> > problem above.
> >
> Sorry, I don't get this point. I find that the difference between setting
> b_next_transaction or not is just re-added the buffer X to the BJ_Reserved
> list or not. How could we avoid the problem above.
Currently, X will be removed from transaction B by jbd2_journal_revoke().
So once A commits, it will not be in the running transaction and thus
checkpoint of A can complete before B is committed.
If we set X->b_next_transaction to B, X will be part of transaction B. The
handling of buffer_freed() buffer in commit code thus will not clear
jbddirty bit and X will get inserted in X as buffer for checkpointing. And
thus checkpoint of A will not be able to complete before B commits, fixing
the problem I have described.
> BTW, I am thinking of a similar case. If we modify buffer X instead of
> revork it in the transaction B, we also need to avoid transaction A from
> being checkpointed before B commits, because current buffer X contains the
> modified data (modified by B). So we should prevent writing it before
> B commits, otherwise it will corrupt metadata. How do we handle this
> situation now?
Buffers that are part of the running transaction never have buffer_dirty
bit set (look how jbd2_journal_file_buffer() clears this bit). Thus
background writeback will not write these buffers. Also checkpointing code
checks whether the buffer is part of running / committing transaction and
handles these buffers specially exactly because they cannot be written out
directly.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2019-01-11 10:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-10 6:12 [PATCH] jbd2: set freed flag while revoking a buffer which belongs to older transaction zhangyi (F)
2019-01-10 11:20 ` Jan Kara
2019-01-11 6:11 ` zhangyi (F)
2019-01-11 10:30 ` Jan Kara [this message]
2019-01-11 13:44 ` zhangyi (F)
2019-01-12 7:39 ` Eryu Guan
2019-01-12 9:32 ` zhangyi (F)
2019-01-13 15:12 ` Eryu Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190111103029.GA4098@quack2.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=miaoxie@huawei.com \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).