Linux EXT4 FS development
 help / color / mirror / Atom feed
From: Zhihao Cheng <chengzhihao1@huawei.com>
To: Jan Kara <jack@suse.cz>, Zhang Yi <yi.zhang@huawei.com>
Cc: <tytso@mit.edu>, <jack@suse.com>, <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH v2] ext4: Fix potential data lost in recovering journal raced with synchronizing fs bdev
Date: Tue, 12 Sep 2023 11:52:37 +0800	[thread overview]
Message-ID: <e74a7e44-9055-31a7-0124-812a6ae42a41@huawei.com> (raw)
In-Reply-To: <20230911161825.4ny4ynxyxabwqbee@quack3>

在 2023/9/12 0:18, Jan Kara 写道:
> Hello!
> 
> On Sat 09-09-23 11:41:11, Zhang Yi wrote:
>> On 2023/9/8 20:43, Zhihao Cheng wrote:
>>> JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
>>> however, other process could intercept the EIO information from bdev's
>>> mapping, which leads journal recovering successful even EIO occurs during
>>> data written back to fs device.
>>>
>>> We found this problem in our product, iscsi + multipath is chosen for block
>>> device of ext4. Unstable network may trigger kpartx to rescan partitions in
>>> device mapper layer. Detailed process is shown as following:
>>>
>>>    mount          kpartx          irq
>>> jbd2_journal_recover
>>>   do_one_pass
>>>    memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal
>>>    mark_buffer_dirty // mark bh dirty
>>>           vfs_read
>>> 	  generic_file_read_iter // dio
>>> 	   filemap_write_and_wait_range
>>> 	    __filemap_fdatawrite_range
>>> 	     do_writepages
>>> 	      block_write_full_folio
>>> 	       submit_bh_wbc
>>> 	            >>  EIO occurs in disk  <<
>>> 	                     end_buffer_async_write
>>> 			      mark_buffer_write_io_error
>>> 			       mapping_set_error
>>> 			        set_bit(AS_EIO, &mapping->flags) // set!
>>> 	    filemap_check_errors
>>> 	     test_and_clear_bit(AS_EIO, &mapping->flags) // clear!
>>>   err2 = sync_blockdev
>>>    filemap_write_and_wait
>>>     filemap_check_errors
>>>      test_and_clear_bit(AS_EIO, &mapping->flags) // false
>>>   err2 = 0
>>>
>>> Filesystem is mounted successfully even data from journal is failed written
>>> into disk, and ext4 could become corrupted.
>>>
>>> Fix it by comparing 'sbi->s_bdev_wb_err' before loading journal and after
>>> loading journal.
>>>
>>> Fetch a reproducer in [Link].
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>> ---
>>>   v1->v2: Checks wb_err from block device only in ext4.
>>>   fs/ext4/super.c | 22 +++++++++++++++-------
>>>   1 file changed, 15 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index 38217422f938..4dcaad2403be 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -4907,6 +4907,14 @@ static int ext4_load_and_init_journal(struct super_block *sb,
>>>   	if (err)
>>>   		return err;
>>>   
>>> +	err = errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
>>> +				       &sbi->s_bdev_wb_err);
>>> +	if (err) {
>>> +		ext4_msg(sb, KERN_ERR, "Background error %d when loading journal",
>>> +			 err);
>>> +		goto out;
>>> +	}
>>> +
>>
>> This solution cannot solve the problem, because the journal tail is
>> still updated in journal_reset() even if we detect the writeback error
>> and refuse to mount the ext4 filesystem here. So I suppose we have to
>> check the I/O error by jbd2 module itself like v1 does.
> 
> Hum, that's a good point because next time we will try to mount the fs we
> will not try to replay the journal anymore. So let's return to v1 and I'm
> sorry for misguiding you Zhihao.
> 
> But when we are doing background IO error detection in jbd2 during journal
> replay, I'm wondering whether we shouldn't be doing something similar in
> checkpointing code - like when we are about to remove a transaction from
> the journal. And as I'm checking we already do that using
> JBD2_CHECKPOINT_IO_ERROR bit handling - maybe we could replace that with a
> more standard errseq mechanism that is available these days as a cleanup?
>  > And the ext4 handling in ext4_check_bdev_write_error() is useful only in
> nojournal mode as otherwise jbd2 is taking care of all writeback errors
> including the background ones. So maybe we can guard the
> ext4_check_bdev_write_error() by a !ext4_handle_valid(handle) check to make
> that obvious (and comment about that). >
> What do you think?
> 
> 								Honza
> 

Ok, I will try it. Thanks for suggestions.


      reply	other threads:[~2023-09-12  3:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 12:43 [PATCH v2] ext4: Fix potential data lost in recovering journal raced with synchronizing fs bdev Zhihao Cheng
2023-09-09  3:41 ` Zhang Yi
2023-09-11 16:18   ` Jan Kara
2023-09-12  3:52     ` Zhihao Cheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e74a7e44-9055-31a7-0124-812a6ae42a41@huawei.com \
    --to=chengzhihao1@huawei.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox