From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Zhihao Cheng <chengzhihao1@huawei.com>,
linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca,
jack@suse.cz, yi.zhang@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH v3 4/6] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
Date: Thu, 15 Jun 2023 11:56:13 +0800 [thread overview]
Message-ID: <4e82b97c-137f-52d0-da11-555184fe01f5@huaweicloud.com> (raw)
In-Reply-To: <20230614203740.GE51259@mit.edu>
On 2023/6/15 4:37, Theodore Ts'o wrote:
> On Wed, Jun 14, 2023 at 09:25:28PM +0800, Zhang Yi wrote:
>>
>> Sorry about the regression, I found that this issue is not introduced
>> by the first patch in this patch series ("jbd2: recheck chechpointing
>> non-dirty buffer"), is d9eafe0afafa ("jbd2: factor out journal
>> initialization from journal_get_superblock()") [1] on your dev branch.
>>
>> The problem is the journal super block had been failed to write out
>> due to IO fault, it's uptodate bit was cleared by
>> end_buffer_write_syn() and didn't reset yet in jbd2_write_superblock().
>> And it raced by jbd2_journal_revoke()->jbd2_journal_set_features()->
>> jbd2_journal_check_used_features()->journal_get_superblock()->bh_read(),
>> unfortunately, the read IO is also fail, so the error handling in
>> journal_fail_superblock() clear the journal->j_sb_buffer, finally lead
>> to above NULL pointer dereference issue.
>
> Thanks for looking into this. What I believe you are saying is that
> the root cause is that earlier patch, but it is still something about
> the patch "jbd2: recheck chechpointing non-dirty buffer" which is
> changing the timing enough that we're hitting this buffer (because
> without the "recheck checkpointing" patch, I'm not seeing the NULL
> pointer dereference.
I have send out a separate patch names "jbd2: skip reading super block
if it has been verified" to fix above NULL pointer dereference issue, I
have been runing ext3 generic/475 about 12hours and have not reproduced
the problem again (I will also do more tests later). Please take a look
at it.
>
> As far as the e2fsck bug that was causing it to hang in the ext4/adv
> test scenario, the patch was a simple one, and I have also checked in
> a test case which was a reliable reproducer of the problem. (See
> attached for the patches and more detail.)
>
> It is really interesting that "recheck checkpointing" patch is making
> enough of a difference that it is unmasking these bugs. If you could
> take a look at these changes and perhaps think about how this patch
> series could be changing the nature of the corruption (specifically,
> how symlink inodes referenced from inline directories could be
> corupted with "rechecking checkpointing", thus unmasking the
> e2fsprogs, I'd really appreciate it.
>
Sure, we will take a look at it for details.
Thanks,
Yi.
next prev parent reply other threads:[~2023-06-15 3:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-06 13:59 [PATCH v3 0/6] jbd2: fix several checkpoint inconsistent issues Zhang Yi
2023-06-06 13:59 ` [PATCH v3 1/6] jbd2: recheck chechpointing non-dirty buffer Zhang Yi
2023-06-06 13:59 ` [PATCH v3 2/6] jbd2: remove t_checkpoint_io_list Zhang Yi
2023-06-06 13:59 ` [PATCH v3 3/6] jbd2: remove journal_clean_one_cp_list() Zhang Yi
2023-06-07 8:30 ` Jan Kara
2023-06-06 13:59 ` [PATCH v3 4/6] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint Zhang Yi
2023-06-13 4:31 ` Theodore Ts'o
2023-06-13 8:13 ` Zhihao Cheng
2023-06-13 17:27 ` Theodore Ts'o
2023-06-14 5:42 ` Theodore Ts'o
2023-06-14 13:25 ` Zhang Yi
2023-06-14 20:37 ` Theodore Ts'o
2023-06-15 3:56 ` Zhang Yi [this message]
2023-06-26 7:36 ` Zhang Yi
2023-06-06 13:59 ` [PATCH v3 5/6] jbd2: fix a race when checking checkpoint buffer busy Zhang Yi
2023-06-06 13:59 ` [PATCH v3 6/6] jbd2: remove __journal_try_to_free_buffer() Zhang Yi
2023-07-12 18:29 ` [PATCH v3 0/6] jbd2: fix several checkpoint inconsistent issues Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4e82b97c-137f-52d0-da11-555184fe01f5@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).