* [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer
@ 2024-08-23 1:33 zhangshida
2024-08-23 1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
2024-08-23 1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
0 siblings, 2 replies; 8+ messages in thread
From: zhangshida @ 2024-08-23 1:33 UTC (permalink / raw)
To: tytso, adilger.kernel, jack
Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd
From: Shida Zhang <zhangshida@kylinos.cn>
Hi all,
On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
---------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
---------
The same condition may also be applied to the lattest kernel version.
This patch set fixes it by:
1.Replace the __block_write_begin with the hoisted
ext4_block_write_begin().(patch 1)
2.Trace the user data dirtying in ext4_block_write_begin().(patch 2)
Changes since v3:
- Ditch the patch 3 in v3, because some other code paths can set the
buffer dirty:
ext4_write_begin
ext4_block_write_begin
create_empty_buffers
set_buffer_dirty
[3] Version 3:
https://lore.kernel.org/linux-ext4/20240810082814.3709867-1-zhangshida@kylinos.cn/
Changes since v2:
- Adjust the applied order of patch 1 and patch 2 in v1.
- Reword the commit message.
- Remove some superfluous logic in patch 2 and patch 3.
[2] Version 2:
https://lore.kernel.org/linux-ext4/20240809064606.3490994-2-zhangshida@kylinos.cn/
Changes since v1:
- v1 use a hack into jbd2 to fix the bug while v2 choose to journal
the dirty data in *_block_write_begin.
[1] Version 1:
https://lore.kernel.org/linux-ext4/20240720062356.2522658-1-zhangshida@kylinos.cn/
Shida Zhang (2):
ext4: hoist ext4_block_write_begin and replace the __block_write_begin
ext4: fix a potential assertion failure due to improperly dirtied
buffer
fs/ext4/ext4.h | 3 +++
fs/ext4/inline.c | 11 ++++++-----
fs/ext4/inode.c | 51 +++++++++++++++++++++++++-----------------------
3 files changed, 36 insertions(+), 29 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-23 1:33 [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer zhangshida @ 2024-08-23 1:33 ` zhangshida 2024-08-28 11:04 ` Jan Kara 2024-08-29 0:03 ` Eric Biggers 2024-08-23 1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida 1 sibling, 2 replies; 8+ messages in thread From: zhangshida @ 2024-08-23 1:33 UTC (permalink / raw) To: tytso, adilger.kernel, jack Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara From: Shida Zhang <zhangshida@kylinos.cn> Using __block_write_begin() make it inconvenient to journal the user data dirty process. We can't tell the block layer maintainer, ‘Hey, we want to trace the dirty user data in ext4, can we add some special code for ext4 in __block_write_begin?’:P So use ext4_block_write_begin() instead. The two functions are basically doing the same thing except for the fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as to allow ext4_block_write_begin() to function like __block_write_begin when the config is disabled. And hoist the ext4_block_write_begin so that it can be used in other files. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- fs/ext4/ext4.h | 2 ++ fs/ext4/inline.c | 10 +++++----- fs/ext4/inode.c | 23 ++++++----------------- 3 files changed, 13 insertions(+), 22 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 08acd152261e..5f8257b68190 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh) return buffer_uptodate(bh); } +extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, + get_block_t *get_block); #endif /* __KERNEL__ */ #define EFSBADCRC EBADMSG /* Bad CRC detected */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index e7a09a99837b..0a1a8431e281 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, goto out; if (ext4_should_dioread_nolock(inode)) { - ret = __block_write_begin(&folio->page, from, to, - ext4_get_block_unwritten); + ret = ext4_block_write_begin(folio, from, to, + ext4_get_block_unwritten); } else - ret = __block_write_begin(&folio->page, from, to, ext4_get_block); + ret = ext4_block_write_begin(folio, from, to, ext4_get_block); if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, @@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, goto out; } - ret = __block_write_begin(&folio->page, 0, inline_size, - ext4_da_get_block_prep); + ret = ext4_block_write_begin(folio, 0, inline_size, + ext4_da_get_block_prep); if (ret) { up_read(&EXT4_I(inode)->xattr_sem); folio_unlock(folio); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 941c1c0d5c6e..6b15805ca88b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, return ret; } -#ifdef CONFIG_FS_ENCRYPTION -static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, - get_block_t *get_block) +int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, + get_block_t *get_block) { unsigned from = pos & (PAGE_SIZE - 1); unsigned to = from + len; @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, } if (unlikely(err)) { folio_zero_new_buffers(folio, from, to); - } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { + } +#ifdef CONFIG_FS_ENCRYPTION + else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { for (i = 0; i < nr_wait; i++) { int err2; @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, } } } +#endif return err; } -#endif /* * To preserve ordering, it is essential that the hole instantiation and @@ -1216,19 +1217,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, /* In case writeback began while the folio was unlocked */ folio_wait_stable(folio); -#ifdef CONFIG_FS_ENCRYPTION if (ext4_should_dioread_nolock(inode)) ret = ext4_block_write_begin(folio, pos, len, ext4_get_block_unwritten); else ret = ext4_block_write_begin(folio, pos, len, ext4_get_block); -#else - if (ext4_should_dioread_nolock(inode)) - ret = __block_write_begin(&folio->page, pos, len, - ext4_get_block_unwritten); - else - ret = __block_write_begin(&folio->page, pos, len, ext4_get_block); -#endif if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, folio_buffers(folio), from, to, @@ -2961,11 +2954,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, if (IS_ERR(folio)) return PTR_ERR(folio); -#ifdef CONFIG_FS_ENCRYPTION ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep); -#else - ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep); -#endif if (ret < 0) { folio_unlock(folio); folio_put(folio); -- 2.33.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-23 1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida @ 2024-08-28 11:04 ` Jan Kara 2024-08-29 0:03 ` Eric Biggers 1 sibling, 0 replies; 8+ messages in thread From: Jan Kara @ 2024-08-28 11:04 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida, Jan Kara On Fri 23-08-24 09:33:28, zhangshida wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > Using __block_write_begin() make it inconvenient to journal the > user data dirty process. We can't tell the block layer maintainer, > ‘Hey, we want to trace the dirty user data in ext4, can we add some > special code for ext4 in __block_write_begin?’:P > > So use ext4_block_write_begin() instead. > > The two functions are basically doing the same thing except for the > fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as > to allow ext4_block_write_begin() to function like __block_write_begin > when the config is disabled. > And hoist the ext4_block_write_begin so that it can be used in other > files. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Looks good to me! Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/ext4.h | 2 ++ > fs/ext4/inline.c | 10 +++++----- > fs/ext4/inode.c | 23 ++++++----------------- > 3 files changed, 13 insertions(+), 22 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 08acd152261e..5f8257b68190 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh) > return buffer_uptodate(bh); > } > > +extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > + get_block_t *get_block); > #endif /* __KERNEL__ */ > > #define EFSBADCRC EBADMSG /* Bad CRC detected */ > diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c > index e7a09a99837b..0a1a8431e281 100644 > --- a/fs/ext4/inline.c > +++ b/fs/ext4/inline.c > @@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, > goto out; > > if (ext4_should_dioread_nolock(inode)) { > - ret = __block_write_begin(&folio->page, from, to, > - ext4_get_block_unwritten); > + ret = ext4_block_write_begin(folio, from, to, > + ext4_get_block_unwritten); > } else > - ret = __block_write_begin(&folio->page, from, to, ext4_get_block); > + ret = ext4_block_write_begin(folio, from, to, ext4_get_block); > > if (!ret && ext4_should_journal_data(inode)) { > ret = ext4_walk_page_buffers(handle, inode, > @@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, > goto out; > } > > - ret = __block_write_begin(&folio->page, 0, inline_size, > - ext4_da_get_block_prep); > + ret = ext4_block_write_begin(folio, 0, inline_size, > + ext4_da_get_block_prep); > if (ret) { > up_read(&EXT4_I(inode)->xattr_sem); > folio_unlock(folio); > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 941c1c0d5c6e..6b15805ca88b 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, > return ret; > } > > -#ifdef CONFIG_FS_ENCRYPTION > -static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > - get_block_t *get_block) > +int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > + get_block_t *get_block) > { > unsigned from = pos & (PAGE_SIZE - 1); > unsigned to = from + len; > @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > } > if (unlikely(err)) { > folio_zero_new_buffers(folio, from, to); > - } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > + } > +#ifdef CONFIG_FS_ENCRYPTION > + else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > for (i = 0; i < nr_wait; i++) { > int err2; > > @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > } > } > } > +#endif > > return err; > } > -#endif > > /* > * To preserve ordering, it is essential that the hole instantiation and > @@ -1216,19 +1217,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > /* In case writeback began while the folio was unlocked */ > folio_wait_stable(folio); > > -#ifdef CONFIG_FS_ENCRYPTION > if (ext4_should_dioread_nolock(inode)) > ret = ext4_block_write_begin(folio, pos, len, > ext4_get_block_unwritten); > else > ret = ext4_block_write_begin(folio, pos, len, ext4_get_block); > -#else > - if (ext4_should_dioread_nolock(inode)) > - ret = __block_write_begin(&folio->page, pos, len, > - ext4_get_block_unwritten); > - else > - ret = __block_write_begin(&folio->page, pos, len, ext4_get_block); > -#endif > if (!ret && ext4_should_journal_data(inode)) { > ret = ext4_walk_page_buffers(handle, inode, > folio_buffers(folio), from, to, > @@ -2961,11 +2954,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, > if (IS_ERR(folio)) > return PTR_ERR(folio); > > -#ifdef CONFIG_FS_ENCRYPTION > ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep); > -#else > - ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep); > -#endif > if (ret < 0) { > folio_unlock(folio); > folio_put(folio); > -- > 2.33.0 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-23 1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida 2024-08-28 11:04 ` Jan Kara @ 2024-08-29 0:03 ` Eric Biggers 2024-08-29 1:28 ` Stephen Zhang 1 sibling, 1 reply; 8+ messages in thread From: Eric Biggers @ 2024-08-29 0:03 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida, Jan Kara On Fri, Aug 23, 2024 at 09:33:28AM +0800, zhangshida wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > Using __block_write_begin() make it inconvenient to journal the > user data dirty process. We can't tell the block layer maintainer, > ‘Hey, we want to trace the dirty user data in ext4, can we add some > special code for ext4 in __block_write_begin?’:P > > So use ext4_block_write_begin() instead. > > The two functions are basically doing the same thing except for the > fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as > to allow ext4_block_write_begin() to function like __block_write_begin > when the config is disabled. > And hoist the ext4_block_write_begin so that it can be used in other > files. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > --- > fs/ext4/ext4.h | 2 ++ > fs/ext4/inline.c | 10 +++++----- > fs/ext4/inode.c | 23 ++++++----------------- > 3 files changed, 13 insertions(+), 22 deletions(-) Thanks for cleaning this up. There are still some comments in fs/ext4/inode.c that reference __block_write_begin. Can you update them too? One more thing below. > @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > } > if (unlikely(err)) { > folio_zero_new_buffers(folio, from, to); > - } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > + } > +#ifdef CONFIG_FS_ENCRYPTION > + else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > for (i = 0; i < nr_wait; i++) { > int err2; > > @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > } > } > } > +#endif This #ifdef isn't necessary since fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION. - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-29 0:03 ` Eric Biggers @ 2024-08-29 1:28 ` Stephen Zhang 0 siblings, 0 replies; 8+ messages in thread From: Stephen Zhang @ 2024-08-29 1:28 UTC (permalink / raw) To: Eric Biggers Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida, Jan Kara Eric Biggers <ebiggers@kernel.org> 于2024年8月29日周四 08:03写道: > > On Fri, Aug 23, 2024 at 09:33:28AM +0800, zhangshida wrote: > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > Using __block_write_begin() make it inconvenient to journal the > > user data dirty process. We can't tell the block layer maintainer, > > ‘Hey, we want to trace the dirty user data in ext4, can we add some > > special code for ext4 in __block_write_begin?’:P > > > > So use ext4_block_write_begin() instead. > > > > The two functions are basically doing the same thing except for the > > fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as > > to allow ext4_block_write_begin() to function like __block_write_begin > > when the config is disabled. > > And hoist the ext4_block_write_begin so that it can be used in other > > files. > > > > Suggested-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > --- > > fs/ext4/ext4.h | 2 ++ > > fs/ext4/inline.c | 10 +++++----- > > fs/ext4/inode.c | 23 ++++++----------------- > > 3 files changed, 13 insertions(+), 22 deletions(-) > > Thanks for cleaning this up. > > There are still some comments in fs/ext4/inode.c that reference > __block_write_begin. Can you update them too? > > One more thing below. > > > @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > > } > > if (unlikely(err)) { > > folio_zero_new_buffers(folio, from, to); > > - } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > > + } > > +#ifdef CONFIG_FS_ENCRYPTION > > + else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > > for (i = 0; i < nr_wait; i++) { > > int err2; > > > > @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > > } > > } > > } > > +#endif > > This #ifdef isn't necessary since fscrypt_inode_uses_fs_layer_crypto() returns > false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION. > Okay. Will make another version. Thanks, Stephen > - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer 2024-08-23 1:33 [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer zhangshida 2024-08-23 1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida @ 2024-08-23 1:33 ` zhangshida 2024-08-28 11:45 ` Jan Kara 1 sibling, 1 reply; 8+ messages in thread From: zhangshida @ 2024-08-23 1:33 UTC (permalink / raw) To: tytso, adilger.kernel, jack Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Baolin Liu, Jan Kara From: Shida Zhang <zhangshida@kylinos.cn> On an old kernel version(4.19, ext3, data=journal, pagesize=64k), an assertion failure will occasionally be triggered by the line below: ----------- jbd2_journal_commit_transaction { ... J_ASSERT_BH(bh, !buffer_dirty(bh)); /* * The buffer on BJ_Forget list and not jbddirty means ... } ----------- The same condition may also be applied to the lattest kernel version. When blocksize < pagesize and we truncate a file, there can be buffers in the mapping tail page beyond i_size. These buffers will be filed to transaction's BJ_Forget list by ext4_journalled_invalidatepage() during truncation. When the transaction doing truncate starts committing, we can grow the file again. This calls __block_write_begin() which allocates new blocks under these buffers in the tail page we go through the branch: if (buffer_new(bh)) { clean_bdev_bh_alias(bh); if (folio_test_uptodate(folio)) { clear_buffer_new(bh); set_buffer_uptodate(bh); mark_buffer_dirty(bh); continue; } ... } Hence buffers on BJ_Forget list of the committing transaction get marked dirty and this triggers the jbd2 assertion. Teach ext4_block_write_begin() to properly handle files with data journalling by avoiding dirtying them directly. Instead of folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which takes care of handling journalling. We also don't need to mark new uptodate buffers as dirty in ext4_block_write_begin(). That will be either done either by block_commit_write() in case of success or by folio_zero_new_buffers() in case of failure. Reported-by: Baolin Liu <liubaolin@kylinos.cn> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- fs/ext4/ext4.h | 3 ++- fs/ext4/inline.c | 7 ++++--- fs/ext4/inode.c | 30 ++++++++++++++++++++++-------- 3 files changed, 28 insertions(+), 12 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5f8257b68190..b653bd423b11 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3851,7 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh) return buffer_uptodate(bh); } -extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, +extern int ext4_block_write_begin(handle_t *handle, struct folio *folio, + loff_t pos, unsigned len, get_block_t *get_block); #endif /* __KERNEL__ */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 0a1a8431e281..8d5599d5af27 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -601,10 +601,11 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, goto out; if (ext4_should_dioread_nolock(inode)) { - ret = ext4_block_write_begin(folio, from, to, + ret = ext4_block_write_begin(handle, folio, from, to, ext4_get_block_unwritten); } else - ret = ext4_block_write_begin(folio, from, to, ext4_get_block); + ret = ext4_block_write_begin(handle, folio, from, to, + ext4_get_block); if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, @@ -856,7 +857,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, goto out; } - ret = ext4_block_write_begin(folio, 0, inline_size, + ret = ext4_block_write_begin(NULL, folio, 0, inline_size, ext4_da_get_block_prep); if (ret) { up_read(&EXT4_I(inode)->xattr_sem); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6b15805ca88b..4c34827da56e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -49,6 +49,11 @@ #include <trace/events/ext4.h> +static void ext4_journalled_zero_new_buffers(handle_t *handle, + struct inode *inode, + struct folio *folio, + unsigned from, unsigned to); + static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw, struct ext4_inode_info *ei) { @@ -1041,7 +1046,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, return ret; } -int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, +int ext4_block_write_begin(handle_t *handle, struct folio *folio, + loff_t pos, unsigned len, get_block_t *get_block) { unsigned from = pos & (PAGE_SIZE - 1); @@ -1055,6 +1061,7 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, struct buffer_head *bh, *head, *wait[2]; int nr_wait = 0; int i; + bool should_journal_data = ext4_should_journal_data(inode); BUG_ON(!folio_test_locked(folio)); BUG_ON(from > PAGE_SIZE); @@ -1083,11 +1090,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, err = get_block(inode, block, bh, 1); if (err) break; + if (should_journal_data) + do_journal_get_write_access(handle, inode, bh); if (buffer_new(bh)) { if (folio_test_uptodate(folio)) { - clear_buffer_new(bh); set_buffer_uptodate(bh); - mark_buffer_dirty(bh); continue; } if (block_end > to || block_start < from) @@ -1117,7 +1124,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, err = -EIO; } if (unlikely(err)) { - folio_zero_new_buffers(folio, from, to); + if (should_journal_data) + ext4_journalled_zero_new_buffers(handle, inode, folio, + from, to); + else + folio_zero_new_buffers(folio, from, to); } #ifdef CONFIG_FS_ENCRYPTION else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { @@ -1218,10 +1229,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, folio_wait_stable(folio); if (ext4_should_dioread_nolock(inode)) - ret = ext4_block_write_begin(folio, pos, len, + ret = ext4_block_write_begin(handle, folio, pos, len, ext4_get_block_unwritten); else - ret = ext4_block_write_begin(folio, pos, len, ext4_get_block); + ret = ext4_block_write_begin(handle, folio, pos, len, + ext4_get_block); if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, folio_buffers(folio), from, to, @@ -2954,7 +2966,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, if (IS_ERR(folio)) return PTR_ERR(folio); - ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep); + ret = ext4_block_write_begin(NULL, folio, pos, len, + ext4_da_get_block_prep); if (ret < 0) { folio_unlock(folio); folio_put(folio); @@ -6208,7 +6221,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) if (folio_pos(folio) + len > size) len = size - folio_pos(folio); - err = __block_write_begin(&folio->page, 0, len, ext4_get_block); + err = ext4_block_write_begin(handle, folio, 0, len, + ext4_get_block); if (!err) { ret = VM_FAULT_SIGBUS; if (ext4_journal_folio_buffers(handle, folio, len)) -- 2.33.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer 2024-08-23 1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida @ 2024-08-28 11:45 ` Jan Kara 2024-08-29 1:50 ` Stephen Zhang 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2024-08-28 11:45 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida, Baolin Liu, Jan Kara On Fri 23-08-24 09:33:29, zhangshida wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > On an old kernel version(4.19, ext3, data=journal, pagesize=64k), > an assertion failure will occasionally be triggered by the line below: > ----------- > jbd2_journal_commit_transaction > { > ... > J_ASSERT_BH(bh, !buffer_dirty(bh)); > /* > * The buffer on BJ_Forget list and not jbddirty means > ... > } > ----------- > > The same condition may also be applied to the lattest kernel version. > > When blocksize < pagesize and we truncate a file, there can be buffers in > the mapping tail page beyond i_size. These buffers will be filed to > transaction's BJ_Forget list by ext4_journalled_invalidatepage() during > truncation. When the transaction doing truncate starts committing, we can > grow the file again. This calls __block_write_begin() which allocates new > blocks under these buffers in the tail page we go through the branch: ^^ and we... > if (buffer_new(bh)) { > clean_bdev_bh_alias(bh); > if (folio_test_uptodate(folio)) { > clear_buffer_new(bh); > set_buffer_uptodate(bh); > mark_buffer_dirty(bh); > continue; > } > ... > } > > Hence buffers on BJ_Forget list of the committing transaction get marked > dirty and this triggers the jbd2 assertion. > > Teach ext4_block_write_begin() to properly handle files with data > journalling by avoiding dirtying them directly. Instead of > folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which > takes care of handling journalling. We also don't need to mark new uptodate > buffers as dirty in ext4_block_write_begin(). That will be either done > either by block_commit_write() in case of success or by > folio_zero_new_buffers() in case of failure. > > Reported-by: Baolin Liu <liubaolin@kylinos.cn> > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Looks mostly good. Just three small comments: > @@ -1083,11 +1090,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > err = get_block(inode, block, bh, 1); > if (err) > break; > + if (should_journal_data) > + do_journal_get_write_access(handle, inode, bh); I'd move this inside the buffer_new() branch and add before it a comment: /* * We may be zeroing partial buffers or all new * buffers in case of failure. Prepare JBD2 for * that. */ > if (buffer_new(bh)) { > if (folio_test_uptodate(folio)) { > - clear_buffer_new(bh); > set_buffer_uptodate(bh); > - mark_buffer_dirty(bh); Here I'd add comment: /* * Unlike __block_write_begin() we leave * dirtying of new uptodate buffers to * ->write_end() time or * folio_zero_new_buffers(). */ > @@ -1117,7 +1124,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > err = -EIO; > } > if (unlikely(err)) { > - folio_zero_new_buffers(folio, from, to); > + if (should_journal_data) > + ext4_journalled_zero_new_buffers(handle, inode, folio, > + from, to); I've realized there's a small bug in ext4_journalled_zero_new_buffers() that it calls write_end_fn() only if it zeroed a buffer. But for new uptodate buffers we also need to call write_end_fn() to persist the uptodate content (similarly as folio_zero_new_buffers() does it). So we need another preparatory patch moving write_end_fn() in ext4_journalled_zero_new_buffers() to be called also for uptodate pages. > + else > + folio_zero_new_buffers(folio, from, to); > } > #ifdef CONFIG_FS_ENCRYPTION > else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer 2024-08-28 11:45 ` Jan Kara @ 2024-08-29 1:50 ` Stephen Zhang 0 siblings, 0 replies; 8+ messages in thread From: Stephen Zhang @ 2024-08-29 1:50 UTC (permalink / raw) To: Jan Kara Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida, Baolin Liu Jan Kara <jack@suse.cz> 于2024年8月28日周三 19:45写道: > > On Fri 23-08-24 09:33:29, zhangshida wrote: > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > On an old kernel version(4.19, ext3, data=journal, pagesize=64k), > > an assertion failure will occasionally be triggered by the line below: > > ----------- > > jbd2_journal_commit_transaction > > { > > ... > > J_ASSERT_BH(bh, !buffer_dirty(bh)); > > /* > > * The buffer on BJ_Forget list and not jbddirty means > > ... > > } > > ----------- > > > > The same condition may also be applied to the lattest kernel version. > > > > When blocksize < pagesize and we truncate a file, there can be buffers in > > the mapping tail page beyond i_size. These buffers will be filed to > > transaction's BJ_Forget list by ext4_journalled_invalidatepage() during > > truncation. When the transaction doing truncate starts committing, we can > > grow the file again. This calls __block_write_begin() which allocates new > > blocks under these buffers in the tail page we go through the branch: > ^^ and we... > > > > if (buffer_new(bh)) { > > clean_bdev_bh_alias(bh); > > if (folio_test_uptodate(folio)) { > > clear_buffer_new(bh); > > set_buffer_uptodate(bh); > > mark_buffer_dirty(bh); > > continue; > > } > > ... > > } > > > > Hence buffers on BJ_Forget list of the committing transaction get marked > > dirty and this triggers the jbd2 assertion. > > > > Teach ext4_block_write_begin() to properly handle files with data > > journalling by avoiding dirtying them directly. Instead of > > folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which > > takes care of handling journalling. We also don't need to mark new uptodate > > buffers as dirty in ext4_block_write_begin(). That will be either done > > either by block_commit_write() in case of success or by > > folio_zero_new_buffers() in case of failure. > > > > Reported-by: Baolin Liu <liubaolin@kylinos.cn> > > Suggested-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > Looks mostly good. Just three small comments: > > > @@ -1083,11 +1090,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > > err = get_block(inode, block, bh, 1); > > if (err) > > break; > > > > + if (should_journal_data) > > + do_journal_get_write_access(handle, inode, bh); > > I'd move this inside the buffer_new() branch and add before it a comment: > /* > * We may be zeroing partial buffers or all new > * buffers in case of failure. Prepare JBD2 for > * that. > */ > > > if (buffer_new(bh)) { > > if (folio_test_uptodate(folio)) { > > - clear_buffer_new(bh); > > set_buffer_uptodate(bh); > > - mark_buffer_dirty(bh); > > Here I'd add comment: > /* > * Unlike __block_write_begin() we leave > * dirtying of new uptodate buffers to > * ->write_end() time or > * folio_zero_new_buffers(). > */ > > > @@ -1117,7 +1124,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > > err = -EIO; > > } > > if (unlikely(err)) { > > - folio_zero_new_buffers(folio, from, to); > > + if (should_journal_data) > > + ext4_journalled_zero_new_buffers(handle, inode, folio, > > + from, to); > > I've realized there's a small bug in ext4_journalled_zero_new_buffers() > that it calls write_end_fn() only if it zeroed a buffer. But for new > uptodate buffers we also need to call write_end_fn() to persist the > uptodate content (similarly as folio_zero_new_buffers() does it). So we > need another preparatory patch moving write_end_fn() in > ext4_journalled_zero_new_buffers() to be called also for uptodate pages. > Will do. And also thanks for the detailed explanation. -Stephen > > + else > > + folio_zero_new_buffers(folio, from, to); > > } > > #ifdef CONFIG_FS_ENCRYPTION > > else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-08-29 1:50 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-23 1:33 [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer zhangshida 2024-08-23 1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida 2024-08-28 11:04 ` Jan Kara 2024-08-29 0:03 ` Eric Biggers 2024-08-29 1:28 ` Stephen Zhang 2024-08-23 1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida 2024-08-28 11:45 ` Jan Kara 2024-08-29 1:50 ` Stephen Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox