Re: [PATCH] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: tytso@mit.edu, jack@suse.com, linux-ext4@vger.kernel.org,
	yi.zhang@huawei.com
Subject: Re: [PATCH] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
Date: Fri, 8 Sep 2023 13:14:55 +0200	[thread overview]
Message-ID: <20230908111455.koi76sueeved5jpm@quack3> (raw)
In-Reply-To: <20230908092808.2929317-1-chengzhihao1@huawei.com>

On Fri 08-09-23 17:28:08, Zhihao Cheng wrote:
> JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
> however, other process could intercept the EIO information from bdev's
> mapping, which leads journal recovering successful even EIO occurs during
> data written back to fs device.
> 
> We found this problem in our product, iscsi + multipath is chosen for block
> device of ext4. Unstable network may trigger kpartx to rescan partitions in
> device mapper layer. Detailed process is shown as following:
> 
>   mount          kpartx          irq
> jbd2_journal_recover
>  do_one_pass
>   memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal
>   mark_buffer_dirty // mark bh dirty
>          vfs_read
> 	  generic_file_read_iter // dio
> 	   filemap_write_and_wait_range
> 	    __filemap_fdatawrite_range
> 	     do_writepages
> 	      block_write_full_folio
> 	       submit_bh_wbc
> 	            >>  EIO occurs in disk  <<
> 	                     end_buffer_async_write
> 			      mark_buffer_write_io_error
> 			       mapping_set_error
> 			        set_bit(AS_EIO, &mapping->flags) // set!
> 	    filemap_check_errors
> 	     test_and_clear_bit(AS_EIO, &mapping->flags) // clear!
>  err2 = sync_blockdev
>   filemap_write_and_wait
>    filemap_check_errors
>     test_and_clear_bit(AS_EIO, &mapping->flags) // false
>  err2 = 0
> 
> Filesystem is mounted successfully even data from journal is failed written
> into disk, and ext4/ocfs2 could become corrupted.
> 
> Fix it by comparing the wb_err state in fs block device before recovering
> and after recovering.
> 
> Fetch a reproducer in [Link].
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Thanks for the patch! It makes sense but it is somewhat inconsistent with
how we deal with other checks for metadata IO errors in ext4. We do those
checks in ext4 through ext4_check_bdev_write_error(). So I wonder if in
this case we shouldn't move the errseq_check_and_advance() in
__ext4_fill_super() earlier (before journal setup) and then use it in
ext4_load_and_init_journal() to detect errors during background metadata
writeback. What do you think?

								Honza

> ---
>  fs/jbd2/recovery.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index c269a7d29a46..0fecaa6a3ac6 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -289,6 +289,8 @@ int jbd2_journal_recover(journal_t *journal)
>  	journal_superblock_t *	sb;
>  
>  	struct recovery_info	info;
> +	errseq_t		wb_err;
> +	struct address_space	*mapping;
>  
>  	memset(&info, 0, sizeof(info));
>  	sb = journal->j_superblock;
> @@ -306,6 +308,8 @@ int jbd2_journal_recover(journal_t *journal)
>  		return 0;
>  	}
>  
> +	mapping = journal->j_fs_dev->bd_inode->i_mapping;
> +	errseq_check_and_advance(&mapping->wb_err, &wb_err);
>  	err = do_one_pass(journal, &info, PASS_SCAN);
>  	if (!err)
>  		err = do_one_pass(journal, &info, PASS_REVOKE);
> @@ -327,6 +331,9 @@ int jbd2_journal_recover(journal_t *journal)
>  
>  	jbd2_journal_clear_revoke(journal);
>  	err2 = sync_blockdev(journal->j_fs_dev);
> +	if (!err)
> +		err = err2;
> +	err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
>  	if (!err)
>  		err = err2;
>  	/* Make sure all replayed data is on permanent storage */
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2023-09-08 11:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08  9:28 [PATCH] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev Zhihao Cheng
2023-09-08 11:14 ` Jan Kara [this message]
2023-09-08 12:02   ` Zhihao Cheng
2023-09-08 12:26     ` Jan Kara
2023-09-11 15:58 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230908111455.koi76sueeved5jpm@quack3 \
    --to=jack@suse.cz \
    --cc=chengzhihao1@huawei.com \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox