* [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
@ 2023-09-19 1:25 Zhihao Cheng
2023-09-19 10:14 ` Jan Kara
2023-10-06 18:06 ` Theodore Ts'o
0 siblings, 2 replies; 3+ messages in thread
From: Zhihao Cheng @ 2023-09-19 1:25 UTC (permalink / raw)
To: tytso, jack; +Cc: linux-ext4, linux-kernel, chengzhihao1, yi.zhang
JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
however, other process could intercept the EIO information from bdev's
mapping, which leads journal recovering successful even EIO occurs during
data written back to fs device.
We found this problem in our product, iscsi + multipath is chosen for block
device of ext4. Unstable network may trigger kpartx to rescan partitions in
device mapper layer. Detailed process is shown as following:
mount kpartx irq
jbd2_journal_recover
do_one_pass
memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal
mark_buffer_dirty // mark bh dirty
vfs_read
generic_file_read_iter // dio
filemap_write_and_wait_range
__filemap_fdatawrite_range
do_writepages
block_write_full_folio
submit_bh_wbc
>> EIO occurs in disk <<
end_buffer_async_write
mark_buffer_write_io_error
mapping_set_error
set_bit(AS_EIO, &mapping->flags) // set!
filemap_check_errors
test_and_clear_bit(AS_EIO, &mapping->flags) // clear!
err2 = sync_blockdev
filemap_write_and_wait
filemap_check_errors
test_and_clear_bit(AS_EIO, &mapping->flags) // false
err2 = 0
Filesystem is mounted successfully even data from journal is failed written
into disk, and ext4/ocfs2 could become corrupted.
Fix it by comparing the wb_err state in fs block device before recovering
and after recovering.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888
Cc: stable@vger.kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
v1->v3: Initialize wb_err. Untialized wb_err could be same with
mapping->wb_err(eg. EIO without ERRSEQ_SEEN). When EIO
occurs again, mapping->wb_err won't be changed, and wb_err
is still same with mapping->wb_err.
fs/jbd2/recovery.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index c269a7d29a46..5b771a3d8d9a 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -289,6 +289,8 @@ int jbd2_journal_recover(journal_t *journal)
journal_superblock_t * sb;
struct recovery_info info;
+ errseq_t wb_err;
+ struct address_space *mapping;
memset(&info, 0, sizeof(info));
sb = journal->j_superblock;
@@ -306,6 +308,9 @@ int jbd2_journal_recover(journal_t *journal)
return 0;
}
+ wb_err = 0;
+ mapping = journal->j_fs_dev->bd_inode->i_mapping;
+ errseq_check_and_advance(&mapping->wb_err, &wb_err);
err = do_one_pass(journal, &info, PASS_SCAN);
if (!err)
err = do_one_pass(journal, &info, PASS_REVOKE);
@@ -327,6 +332,9 @@ int jbd2_journal_recover(journal_t *journal)
jbd2_journal_clear_revoke(journal);
err2 = sync_blockdev(journal->j_fs_dev);
+ if (!err)
+ err = err2;
+ err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
if (!err)
err = err2;
/* Make sure all replayed data is on permanent storage */
--
2.39.2
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
2023-09-19 1:25 [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev Zhihao Cheng
@ 2023-09-19 10:14 ` Jan Kara
2023-10-06 18:06 ` Theodore Ts'o
1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2023-09-19 10:14 UTC (permalink / raw)
To: Zhihao Cheng; +Cc: tytso, jack, linux-ext4, linux-kernel, yi.zhang
On Tue 19-09-23 09:25:25, Zhihao Cheng wrote:
> JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
> however, other process could intercept the EIO information from bdev's
> mapping, which leads journal recovering successful even EIO occurs during
> data written back to fs device.
>
> We found this problem in our product, iscsi + multipath is chosen for block
> device of ext4. Unstable network may trigger kpartx to rescan partitions in
> device mapper layer. Detailed process is shown as following:
>
> mount kpartx irq
> jbd2_journal_recover
> do_one_pass
> memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal
> mark_buffer_dirty // mark bh dirty
> vfs_read
> generic_file_read_iter // dio
> filemap_write_and_wait_range
> __filemap_fdatawrite_range
> do_writepages
> block_write_full_folio
> submit_bh_wbc
> >> EIO occurs in disk <<
> end_buffer_async_write
> mark_buffer_write_io_error
> mapping_set_error
> set_bit(AS_EIO, &mapping->flags) // set!
> filemap_check_errors
> test_and_clear_bit(AS_EIO, &mapping->flags) // clear!
> err2 = sync_blockdev
> filemap_write_and_wait
> filemap_check_errors
> test_and_clear_bit(AS_EIO, &mapping->flags) // false
> err2 = 0
>
> Filesystem is mounted successfully even data from journal is failed written
> into disk, and ext4/ocfs2 could become corrupted.
>
> Fix it by comparing the wb_err state in fs block device before recovering
> and after recovering.
>
> Fetch a reproducer in [Link].
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888
> Cc: stable@vger.kernel.org
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> v1->v3: Initialize wb_err. Untialized wb_err could be same with
> mapping->wb_err(eg. EIO without ERRSEQ_SEEN). When EIO
> occurs again, mapping->wb_err won't be changed, and wb_err
> is still same with mapping->wb_err.
Yeah, good catch. The patch still looks good to me. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> fs/jbd2/recovery.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index c269a7d29a46..5b771a3d8d9a 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -289,6 +289,8 @@ int jbd2_journal_recover(journal_t *journal)
> journal_superblock_t * sb;
>
> struct recovery_info info;
> + errseq_t wb_err;
> + struct address_space *mapping;
>
> memset(&info, 0, sizeof(info));
> sb = journal->j_superblock;
> @@ -306,6 +308,9 @@ int jbd2_journal_recover(journal_t *journal)
> return 0;
> }
>
> + wb_err = 0;
> + mapping = journal->j_fs_dev->bd_inode->i_mapping;
> + errseq_check_and_advance(&mapping->wb_err, &wb_err);
> err = do_one_pass(journal, &info, PASS_SCAN);
> if (!err)
> err = do_one_pass(journal, &info, PASS_REVOKE);
> @@ -327,6 +332,9 @@ int jbd2_journal_recover(journal_t *journal)
>
> jbd2_journal_clear_revoke(journal);
> err2 = sync_blockdev(journal->j_fs_dev);
> + if (!err)
> + err = err2;
> + err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
> if (!err)
> err = err2;
> /* Make sure all replayed data is on permanent storage */
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
2023-09-19 1:25 [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev Zhihao Cheng
2023-09-19 10:14 ` Jan Kara
@ 2023-10-06 18:06 ` Theodore Ts'o
1 sibling, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2023-10-06 18:06 UTC (permalink / raw)
To: jack, Zhihao Cheng; +Cc: Theodore Ts'o, linux-ext4, linux-kernel, yi.zhang
On Tue, 19 Sep 2023 09:25:25 +0800, Zhihao Cheng wrote:
> JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
> however, other process could intercept the EIO information from bdev's
> mapping, which leads journal recovering successful even EIO occurs during
> data written back to fs device.
>
> We found this problem in our product, iscsi + multipath is chosen for block
> device of ext4. Unstable network may trigger kpartx to rescan partitions in
> device mapper layer. Detailed process is shown as following:
>
> [...]
Applied, thanks!
[1/1] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev
commit: 61187fce8600e8ef90e601be84f9d0f3222c1206
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-10-06 18:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-19 1:25 [PATCH v3] jbd2: Fix potential data lost in recovering journal raced with synchronizing fs bdev Zhihao Cheng
2023-09-19 10:14 ` Jan Kara
2023-10-06 18:06 ` Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).