* [PATCH v5 0/3] Fix an error caused by improperly dirtied buffer
@ 2024-08-29 8:54 zhangshida
2024-08-29 8:54 ` [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: zhangshida @ 2024-08-29 8:54 UTC (permalink / raw)
To: tytso, adilger.kernel, jack, ebiggers
Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd
From: Shida Zhang <zhangshida@kylinos.cn>
Hi all,
On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
---------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
---------
The same condition may also be applied to the lattest kernel version.
This patch set fixes it by:
1.Fix a small bug for ext4_journalled_zero_new_buffers first.(patch 1)
2.Replace the __block_write_begin with the hoisted
ext4_block_write_begin().(patch 2)
3.Trace the user data dirtying in ext4_block_write_begin().(patch 3)
Changes since v4:
- At first, we fix a bug in ext4_journalled_zero_new_buffers, as suggested
by Jan.
- In patch 2, clean up the related comment. And remove the #ifdef in
ext4_block_write_begin(), as suggested by Eric.
- Add some comments in patch 3.
[4] Version 4:
https://lore.kernel.org/linux-ext4/20240823013329.1996741-1-zhangshida@kylinos.cn/
Changes since v3:
- Ditch the patch 3 in v3, because some other code paths can set the
buffer dirty:
ext4_write_begin
ext4_block_write_begin
create_empty_buffers
set_buffer_dirty
[3] Version 3:
https://lore.kernel.org/linux-ext4/20240810082814.3709867-1-zhangshida@kylinos.cn/
Changes since v2:
- Adjust the applied order of patch 1 and patch 2 in v1.
- Reword the commit message.
- Remove some superfluous logic in patch 2 and patch 3.
[2] Version 2:
https://lore.kernel.org/linux-ext4/20240809064606.3490994-2-zhangshida@kylinos.cn/
Changes since v1:
- v1 use a hack into jbd2 to fix the bug while v2 choose to journal
the dirty data in *_block_write_begin.
[1] Version 1:
https://lore.kernel.org/linux-ext4/20240720062356.2522658-1-zhangshida@kylinos.cn/
Shida Zhang (3):
ext4: persist the new uptodate buffers in
ext4_journalled_zero_new_buffers
ext4: hoist ext4_block_write_begin and replace the __block_write_begin
ext4: fix a potential assertion failure due to improperly dirtied
buffer
fs/ext4/ext4.h | 3 +++
fs/ext4/inline.c | 11 ++++----
fs/ext4/inode.c | 65 ++++++++++++++++++++++++++++--------------------
3 files changed, 47 insertions(+), 32 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers 2024-08-29 8:54 [PATCH v5 0/3] Fix an error caused by improperly dirtied buffer zhangshida @ 2024-08-29 8:54 ` zhangshida 2024-08-29 9:11 ` Jan Kara 2024-08-29 8:54 ` [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida 2024-08-29 8:54 ` [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida 2 siblings, 1 reply; 9+ messages in thread From: zhangshida @ 2024-08-29 8:54 UTC (permalink / raw) To: tytso, adilger.kernel, jack, ebiggers Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara From: Shida Zhang <zhangshida@kylinos.cn> For new uptodate buffers we also need to call write_end_fn() to persist the uptodate content, similarly as folio_zero_new_buffers() does it. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 941c1c0d5c6e..a0a55cb8db53 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1389,9 +1389,9 @@ static void ext4_journalled_zero_new_buffers(handle_t *handle, size = min(to, block_end) - start; folio_zero_range(folio, start, size); - write_end_fn(handle, inode, bh); } clear_buffer_new(bh); + write_end_fn(handle, inode, bh); } } block_start = block_end; -- 2.33.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers 2024-08-29 8:54 ` [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida @ 2024-08-29 9:11 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2024-08-29 9:11 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel, zhangshida, Jan Kara On Thu 29-08-24 16:54:05, zhangshida wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > For new uptodate buffers we also need to call write_end_fn() to persist the > uptodate content, similarly as folio_zero_new_buffers() does it. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 941c1c0d5c6e..a0a55cb8db53 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1389,9 +1389,9 @@ static void ext4_journalled_zero_new_buffers(handle_t *handle, > size = min(to, block_end) - start; > > folio_zero_range(folio, start, size); > - write_end_fn(handle, inode, bh); > } > clear_buffer_new(bh); > + write_end_fn(handle, inode, bh); > } > } > block_start = block_end; > -- > 2.33.0 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-29 8:54 [PATCH v5 0/3] Fix an error caused by improperly dirtied buffer zhangshida 2024-08-29 8:54 ` [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida @ 2024-08-29 8:54 ` zhangshida 2024-08-29 9:12 ` Jan Kara 2024-08-29 8:54 ` [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida 2 siblings, 1 reply; 9+ messages in thread From: zhangshida @ 2024-08-29 8:54 UTC (permalink / raw) To: tytso, adilger.kernel, jack, ebiggers Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara From: Shida Zhang <zhangshida@kylinos.cn> Using __block_write_begin() make it inconvenient to journal the user data dirty process. We can't tell the block layer maintainer, ‘Hey, we want to trace the dirty user data in ext4, can we add some special code for ext4 in __block_write_begin?’:P So use ext4_block_write_begin() instead. The two functions are basically doing the same thing except for the fscrypt related code. Remove the unnecessary #ifdef since fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION. And hoist the ext4_block_write_begin so that it can be used in other files. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- fs/ext4/ext4.h | 2 ++ fs/ext4/inline.c | 10 +++++----- fs/ext4/inode.c | 24 +++++------------------- 3 files changed, 12 insertions(+), 24 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 08acd152261e..5f8257b68190 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh) return buffer_uptodate(bh); } +extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, + get_block_t *get_block); #endif /* __KERNEL__ */ #define EFSBADCRC EBADMSG /* Bad CRC detected */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index e7a09a99837b..0a1a8431e281 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, goto out; if (ext4_should_dioread_nolock(inode)) { - ret = __block_write_begin(&folio->page, from, to, - ext4_get_block_unwritten); + ret = ext4_block_write_begin(folio, from, to, + ext4_get_block_unwritten); } else - ret = __block_write_begin(&folio->page, from, to, ext4_get_block); + ret = ext4_block_write_begin(folio, from, to, ext4_get_block); if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, @@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, goto out; } - ret = __block_write_begin(&folio->page, 0, inline_size, - ext4_da_get_block_prep); + ret = ext4_block_write_begin(folio, 0, inline_size, + ext4_da_get_block_prep); if (ret) { up_read(&EXT4_I(inode)->xattr_sem); folio_unlock(folio); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a0a55cb8db53..4964c67e029e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1024,10 +1024,10 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, if (!buffer_mapped(bh) || buffer_freed(bh)) return 0; /* - * __block_write_begin() could have dirtied some buffers. Clean + * ext4_block_write_begin() could have dirtied some buffers. Clean * the dirty bit as jbd2_journal_get_write_access() could complain * otherwise about fs integrity issues. Setting of the dirty bit - * by __block_write_begin() isn't a real problem here as we clear + * by ext4_block_write_begin() isn't a real problem here as we clear * the bit before releasing a page lock and thus writeback cannot * ever write the buffer. */ @@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, return ret; } -#ifdef CONFIG_FS_ENCRYPTION -static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, - get_block_t *get_block) +int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, + get_block_t *get_block) { unsigned from = pos & (PAGE_SIZE - 1); unsigned to = from + len; @@ -1134,7 +1133,6 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, return err; } -#endif /* * To preserve ordering, it is essential that the hole instantiation and @@ -1216,19 +1214,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, /* In case writeback began while the folio was unlocked */ folio_wait_stable(folio); -#ifdef CONFIG_FS_ENCRYPTION if (ext4_should_dioread_nolock(inode)) ret = ext4_block_write_begin(folio, pos, len, ext4_get_block_unwritten); else ret = ext4_block_write_begin(folio, pos, len, ext4_get_block); -#else - if (ext4_should_dioread_nolock(inode)) - ret = __block_write_begin(&folio->page, pos, len, - ext4_get_block_unwritten); - else - ret = __block_write_begin(&folio->page, pos, len, ext4_get_block); -#endif if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, folio_buffers(folio), from, to, @@ -1241,7 +1231,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, folio_unlock(folio); /* - * __block_write_begin may have instantiated a few blocks + * ext4_block_write_begin may have instantiated a few blocks * outside i_size. Trim these off again. Don't need * i_size_read because we hold i_rwsem. * @@ -2961,11 +2951,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, if (IS_ERR(folio)) return PTR_ERR(folio); -#ifdef CONFIG_FS_ENCRYPTION ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep); -#else - ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep); -#endif if (ret < 0) { folio_unlock(folio); folio_put(folio); -- 2.33.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-29 8:54 ` [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida @ 2024-08-29 9:12 ` Jan Kara 2024-08-29 9:26 ` Jan Kara 0 siblings, 1 reply; 9+ messages in thread From: Jan Kara @ 2024-08-29 9:12 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel, zhangshida, Jan Kara On Thu 29-08-24 16:54:06, zhangshida wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > Using __block_write_begin() make it inconvenient to journal the > user data dirty process. We can't tell the block layer maintainer, > ‘Hey, we want to trace the dirty user data in ext4, can we add some > special code for ext4 in __block_write_begin?’:P > > So use ext4_block_write_begin() instead. > > The two functions are basically doing the same thing except for the > fscrypt related code. Remove the unnecessary #ifdef since > fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at > compile time) when !CONFIG_FS_ENCRYPTION. > > And hoist the ext4_block_write_begin so that it can be used in other > files. > > Suggested-by: Jan Kara <jack@suse.cz> > Suggested-by: Eric Biggers <ebiggers@kernel.org> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> I think I've given my Reviewed-by on this already in previous version - you can keep those tags unless the patch significantly changes. Anyway: feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/ext4/ext4.h | 2 ++ > fs/ext4/inline.c | 10 +++++----- > fs/ext4/inode.c | 24 +++++------------------- > 3 files changed, 12 insertions(+), 24 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 08acd152261e..5f8257b68190 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh) > return buffer_uptodate(bh); > } > > +extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > + get_block_t *get_block); > #endif /* __KERNEL__ */ > > #define EFSBADCRC EBADMSG /* Bad CRC detected */ > diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c > index e7a09a99837b..0a1a8431e281 100644 > --- a/fs/ext4/inline.c > +++ b/fs/ext4/inline.c > @@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, > goto out; > > if (ext4_should_dioread_nolock(inode)) { > - ret = __block_write_begin(&folio->page, from, to, > - ext4_get_block_unwritten); > + ret = ext4_block_write_begin(folio, from, to, > + ext4_get_block_unwritten); > } else > - ret = __block_write_begin(&folio->page, from, to, ext4_get_block); > + ret = ext4_block_write_begin(folio, from, to, ext4_get_block); > > if (!ret && ext4_should_journal_data(inode)) { > ret = ext4_walk_page_buffers(handle, inode, > @@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, > goto out; > } > > - ret = __block_write_begin(&folio->page, 0, inline_size, > - ext4_da_get_block_prep); > + ret = ext4_block_write_begin(folio, 0, inline_size, > + ext4_da_get_block_prep); > if (ret) { > up_read(&EXT4_I(inode)->xattr_sem); > folio_unlock(folio); > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index a0a55cb8db53..4964c67e029e 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1024,10 +1024,10 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, > if (!buffer_mapped(bh) || buffer_freed(bh)) > return 0; > /* > - * __block_write_begin() could have dirtied some buffers. Clean > + * ext4_block_write_begin() could have dirtied some buffers. Clean > * the dirty bit as jbd2_journal_get_write_access() could complain > * otherwise about fs integrity issues. Setting of the dirty bit > - * by __block_write_begin() isn't a real problem here as we clear > + * by ext4_block_write_begin() isn't a real problem here as we clear > * the bit before releasing a page lock and thus writeback cannot > * ever write the buffer. > */ > @@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, > return ret; > } > > -#ifdef CONFIG_FS_ENCRYPTION > -static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > - get_block_t *get_block) > +int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > + get_block_t *get_block) > { > unsigned from = pos & (PAGE_SIZE - 1); > unsigned to = from + len; > @@ -1134,7 +1133,6 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > > return err; > } > -#endif > > /* > * To preserve ordering, it is essential that the hole instantiation and > @@ -1216,19 +1214,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > /* In case writeback began while the folio was unlocked */ > folio_wait_stable(folio); > > -#ifdef CONFIG_FS_ENCRYPTION > if (ext4_should_dioread_nolock(inode)) > ret = ext4_block_write_begin(folio, pos, len, > ext4_get_block_unwritten); > else > ret = ext4_block_write_begin(folio, pos, len, ext4_get_block); > -#else > - if (ext4_should_dioread_nolock(inode)) > - ret = __block_write_begin(&folio->page, pos, len, > - ext4_get_block_unwritten); > - else > - ret = __block_write_begin(&folio->page, pos, len, ext4_get_block); > -#endif > if (!ret && ext4_should_journal_data(inode)) { > ret = ext4_walk_page_buffers(handle, inode, > folio_buffers(folio), from, to, > @@ -1241,7 +1231,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > > folio_unlock(folio); > /* > - * __block_write_begin may have instantiated a few blocks > + * ext4_block_write_begin may have instantiated a few blocks > * outside i_size. Trim these off again. Don't need > * i_size_read because we hold i_rwsem. > * > @@ -2961,11 +2951,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, > if (IS_ERR(folio)) > return PTR_ERR(folio); > > -#ifdef CONFIG_FS_ENCRYPTION > ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep); > -#else > - ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep); > -#endif > if (ret < 0) { > folio_unlock(folio); > folio_put(folio); > -- > 2.33.0 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin 2024-08-29 9:12 ` Jan Kara @ 2024-08-29 9:26 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2024-08-29 9:26 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel, zhangshida, Jan Kara On Thu 29-08-24 11:12:50, Jan Kara wrote: > On Thu 29-08-24 16:54:06, zhangshida wrote: > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > Using __block_write_begin() make it inconvenient to journal the > > user data dirty process. We can't tell the block layer maintainer, > > ‘Hey, we want to trace the dirty user data in ext4, can we add some > > special code for ext4 in __block_write_begin?’:P > > > > So use ext4_block_write_begin() instead. > > > > The two functions are basically doing the same thing except for the > > fscrypt related code. Remove the unnecessary #ifdef since > > fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at > > compile time) when !CONFIG_FS_ENCRYPTION. > > > > And hoist the ext4_block_write_begin so that it can be used in other > > files. > > > > Suggested-by: Jan Kara <jack@suse.cz> > > Suggested-by: Eric Biggers <ebiggers@kernel.org> > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > I think I've given my Reviewed-by on this already in previous version - you > can keep those tags unless the patch significantly changes. Anyway: feel > free to add: > > Reviewed-by: Jan Kara <jack@suse.cz> I've realized the patch slightly changed so that's likely why you've dropped the Reviewed-by so I retract my comment :) I've also realized one thing: > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -1024,10 +1024,10 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, > > if (!buffer_mapped(bh) || buffer_freed(bh)) > > return 0; > > /* > > - * __block_write_begin() could have dirtied some buffers. Clean > > + * ext4_block_write_begin() could have dirtied some buffers. Clean > > * the dirty bit as jbd2_journal_get_write_access() could complain > > * otherwise about fs integrity issues. Setting of the dirty bit > > - * by __block_write_begin() isn't a real problem here as we clear > > + * by ext4_block_write_begin() isn't a real problem here as we clear > > * the bit before releasing a page lock and thus writeback cannot > > * ever write the buffer. > > */ This comment and the special buffer_dirty() handling in this function can be removed after patch 3 of this series. Nice additional cleanup :). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer 2024-08-29 8:54 [PATCH v5 0/3] Fix an error caused by improperly dirtied buffer zhangshida 2024-08-29 8:54 ` [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida 2024-08-29 8:54 ` [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida @ 2024-08-29 8:54 ` zhangshida 2024-08-29 9:30 ` Jan Kara 2 siblings, 1 reply; 9+ messages in thread From: zhangshida @ 2024-08-29 8:54 UTC (permalink / raw) To: tytso, adilger.kernel, jack, ebiggers Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Baolin Liu, Jan Kara From: Shida Zhang <zhangshida@kylinos.cn> On an old kernel version(4.19, ext3, data=journal, pagesize=64k), an assertion failure will occasionally be triggered by the line below: ----------- jbd2_journal_commit_transaction { ... J_ASSERT_BH(bh, !buffer_dirty(bh)); /* * The buffer on BJ_Forget list and not jbddirty means ... } ----------- The same condition may also be applied to the lattest kernel version. When blocksize < pagesize and we truncate a file, there can be buffers in the mapping tail page beyond i_size. These buffers will be filed to transaction's BJ_Forget list by ext4_journalled_invalidatepage() during truncation. When the transaction doing truncate starts committing, we can grow the file again. This calls __block_write_begin() which allocates new blocks under these buffers in the tail page we go through the branch: if (buffer_new(bh)) { clean_bdev_bh_alias(bh); if (folio_test_uptodate(folio)) { clear_buffer_new(bh); set_buffer_uptodate(bh); mark_buffer_dirty(bh); continue; } ... } Hence buffers on BJ_Forget list of the committing transaction get marked dirty and this triggers the jbd2 assertion. Teach ext4_block_write_begin() to properly handle files with data journalling by avoiding dirtying them directly. Instead of folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which takes care of handling journalling. We also don't need to mark new uptodate buffers as dirty in ext4_block_write_begin(). That will be either done either by block_commit_write() in case of success or by folio_zero_new_buffers() in case of failure. Reported-by: Baolin Liu <liubaolin@kylinos.cn> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> --- fs/ext4/ext4.h | 3 ++- fs/ext4/inline.c | 7 ++++--- fs/ext4/inode.c | 41 +++++++++++++++++++++++++++++++++-------- 3 files changed, 39 insertions(+), 12 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5f8257b68190..b653bd423b11 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3851,7 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh) return buffer_uptodate(bh); } -extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, +extern int ext4_block_write_begin(handle_t *handle, struct folio *folio, + loff_t pos, unsigned len, get_block_t *get_block); #endif /* __KERNEL__ */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 0a1a8431e281..8d5599d5af27 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -601,10 +601,11 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, goto out; if (ext4_should_dioread_nolock(inode)) { - ret = ext4_block_write_begin(folio, from, to, + ret = ext4_block_write_begin(handle, folio, from, to, ext4_get_block_unwritten); } else - ret = ext4_block_write_begin(folio, from, to, ext4_get_block); + ret = ext4_block_write_begin(handle, folio, from, to, + ext4_get_block); if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, @@ -856,7 +857,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping, goto out; } - ret = ext4_block_write_begin(folio, 0, inline_size, + ret = ext4_block_write_begin(NULL, folio, 0, inline_size, ext4_da_get_block_prep); if (ret) { up_read(&EXT4_I(inode)->xattr_sem); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 4964c67e029e..bc26200b2852 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -49,6 +49,11 @@ #include <trace/events/ext4.h> +static void ext4_journalled_zero_new_buffers(handle_t *handle, + struct inode *inode, + struct folio *folio, + unsigned from, unsigned to); + static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw, struct ext4_inode_info *ei) { @@ -1041,7 +1046,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, return ret; } -int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, +int ext4_block_write_begin(handle_t *handle, struct folio *folio, + loff_t pos, unsigned len, get_block_t *get_block) { unsigned from = pos & (PAGE_SIZE - 1); @@ -1055,6 +1061,7 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, struct buffer_head *bh, *head, *wait[2]; int nr_wait = 0; int i; + bool should_journal_data = ext4_should_journal_data(inode); BUG_ON(!folio_test_locked(folio)); BUG_ON(from > PAGE_SIZE); @@ -1083,11 +1090,22 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, err = get_block(inode, block, bh, 1); if (err) break; + /* + * We may be zeroing partial buffers or all new + * buffers in case of failure. Prepare JBD2 for + * that. + */ + if (should_journal_data) + do_journal_get_write_access(handle, inode, bh); if (buffer_new(bh)) { if (folio_test_uptodate(folio)) { - clear_buffer_new(bh); + /* + * Unlike __block_write_begin() we leave + * dirtying of new uptodate buffers to + * ->write_end() time or + * folio_zero_new_buffers(). + */ set_buffer_uptodate(bh); - mark_buffer_dirty(bh); continue; } if (block_end > to || block_start < from) @@ -1117,7 +1135,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, err = -EIO; } if (unlikely(err)) { - folio_zero_new_buffers(folio, from, to); + if (should_journal_data) + ext4_journalled_zero_new_buffers(handle, inode, folio, + from, to); + else + folio_zero_new_buffers(folio, from, to); } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) { for (i = 0; i < nr_wait; i++) { int err2; @@ -1215,10 +1237,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, folio_wait_stable(folio); if (ext4_should_dioread_nolock(inode)) - ret = ext4_block_write_begin(folio, pos, len, + ret = ext4_block_write_begin(handle, folio, pos, len, ext4_get_block_unwritten); else - ret = ext4_block_write_begin(folio, pos, len, ext4_get_block); + ret = ext4_block_write_begin(handle, folio, pos, len, + ext4_get_block); if (!ret && ext4_should_journal_data(inode)) { ret = ext4_walk_page_buffers(handle, inode, folio_buffers(folio), from, to, @@ -2951,7 +2974,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, if (IS_ERR(folio)) return PTR_ERR(folio); - ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep); + ret = ext4_block_write_begin(NULL, folio, pos, len, + ext4_da_get_block_prep); if (ret < 0) { folio_unlock(folio); folio_put(folio); @@ -6205,7 +6229,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) if (folio_pos(folio) + len > size) len = size - folio_pos(folio); - err = __block_write_begin(&folio->page, 0, len, ext4_get_block); + err = ext4_block_write_begin(handle, folio, 0, len, + ext4_get_block); if (!err) { ret = VM_FAULT_SIGBUS; if (ext4_journal_folio_buffers(handle, folio, len)) -- 2.33.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer 2024-08-29 8:54 ` [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida @ 2024-08-29 9:30 ` Jan Kara 2024-08-30 2:03 ` Stephen Zhang 0 siblings, 1 reply; 9+ messages in thread From: Jan Kara @ 2024-08-29 9:30 UTC (permalink / raw) To: zhangshida Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel, zhangshida, Baolin Liu, Jan Kara On Thu 29-08-24 16:54:07, zhangshida wrote: > From: Shida Zhang <zhangshida@kylinos.cn> > > On an old kernel version(4.19, ext3, data=journal, pagesize=64k), > an assertion failure will occasionally be triggered by the line below: > ----------- > jbd2_journal_commit_transaction > { > ... > J_ASSERT_BH(bh, !buffer_dirty(bh)); > /* > * The buffer on BJ_Forget list and not jbddirty means > ... > } > ----------- > > The same condition may also be applied to the lattest kernel version. > > When blocksize < pagesize and we truncate a file, there can be buffers in > the mapping tail page beyond i_size. These buffers will be filed to > transaction's BJ_Forget list by ext4_journalled_invalidatepage() during > truncation. When the transaction doing truncate starts committing, we can > grow the file again. This calls __block_write_begin() which allocates new > blocks under these buffers in the tail page we go through the branch: > > if (buffer_new(bh)) { > clean_bdev_bh_alias(bh); > if (folio_test_uptodate(folio)) { > clear_buffer_new(bh); > set_buffer_uptodate(bh); > mark_buffer_dirty(bh); > continue; > } > ... > } > > Hence buffers on BJ_Forget list of the committing transaction get marked > dirty and this triggers the jbd2 assertion. > > Teach ext4_block_write_begin() to properly handle files with data > journalling by avoiding dirtying them directly. Instead of > folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which > takes care of handling journalling. We also don't need to mark new uptodate > buffers as dirty in ext4_block_write_begin(). That will be either done > either by block_commit_write() in case of success or by > folio_zero_new_buffers() in case of failure. > > Reported-by: Baolin Liu <liubaolin@kylinos.cn> > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> One small comment below but regardless whether you decide to address it or not, feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> > @@ -1083,11 +1090,22 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > err = get_block(inode, block, bh, 1); > if (err) > break; > + /* > + * We may be zeroing partial buffers or all new > + * buffers in case of failure. Prepare JBD2 for > + * that. > + */ > + if (should_journal_data) > + do_journal_get_write_access(handle, inode, bh); Thanks for adding comments! I also mentioned this hunk can be moved inside the if (buffer_new(bh)) check below to make it more obvious that this is indeed about handling of newly allocated buffers. But this is just a nit and the comment explains is well enough so I don't insist. > if (buffer_new(bh)) { > if (folio_test_uptodate(folio)) { > - clear_buffer_new(bh); > + /* > + * Unlike __block_write_begin() we leave > + * dirtying of new uptodate buffers to > + * ->write_end() time or > + * folio_zero_new_buffers(). > + */ > set_buffer_uptodate(bh); > - mark_buffer_dirty(bh); > continue; > } > if (block_end > to || block_start < from) Thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer 2024-08-29 9:30 ` Jan Kara @ 2024-08-30 2:03 ` Stephen Zhang 0 siblings, 0 replies; 9+ messages in thread From: Stephen Zhang @ 2024-08-30 2:03 UTC (permalink / raw) To: Jan Kara Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel, zhangshida, Baolin Liu Jan Kara <jack@suse.cz> 于2024年8月29日周四 17:30写道: > > On Thu 29-08-24 16:54:07, zhangshida wrote: > > From: Shida Zhang <zhangshida@kylinos.cn> > > > > On an old kernel version(4.19, ext3, data=journal, pagesize=64k), > > an assertion failure will occasionally be triggered by the line below: > > ----------- > > jbd2_journal_commit_transaction > > { > > ... > > J_ASSERT_BH(bh, !buffer_dirty(bh)); > > /* > > * The buffer on BJ_Forget list and not jbddirty means > > ... > > } > > ----------- > > > > The same condition may also be applied to the lattest kernel version. > > > > When blocksize < pagesize and we truncate a file, there can be buffers in > > the mapping tail page beyond i_size. These buffers will be filed to > > transaction's BJ_Forget list by ext4_journalled_invalidatepage() during > > truncation. When the transaction doing truncate starts committing, we can > > grow the file again. This calls __block_write_begin() which allocates new > > blocks under these buffers in the tail page we go through the branch: > > > > if (buffer_new(bh)) { > > clean_bdev_bh_alias(bh); > > if (folio_test_uptodate(folio)) { > > clear_buffer_new(bh); > > set_buffer_uptodate(bh); > > mark_buffer_dirty(bh); > > continue; > > } > > ... > > } > > > > Hence buffers on BJ_Forget list of the committing transaction get marked > > dirty and this triggers the jbd2 assertion. > > > > Teach ext4_block_write_begin() to properly handle files with data > > journalling by avoiding dirtying them directly. Instead of > > folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which > > takes care of handling journalling. We also don't need to mark new uptodate > > buffers as dirty in ext4_block_write_begin(). That will be either done > > either by block_commit_write() in case of success or by > > folio_zero_new_buffers() in case of failure. > > > > Reported-by: Baolin Liu <liubaolin@kylinos.cn> > > Suggested-by: Jan Kara <jack@suse.cz> > > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> > > One small comment below but regardless whether you decide to address it or > not, feel free to add: > > Reviewed-by: Jan Kara <jack@suse.cz> > > > @@ -1083,11 +1090,22 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len, > > err = get_block(inode, block, bh, 1); > > if (err) > > break; > > + /* > > + * We may be zeroing partial buffers or all new > > + * buffers in case of failure. Prepare JBD2 for > > + * that. > > + */ > > + if (should_journal_data) > > + do_journal_get_write_access(handle, inode, bh); > > Thanks for adding comments! I also mentioned this hunk can be moved inside > the if (buffer_new(bh)) check below to make it more obvious that this is > indeed about handling of newly allocated buffers. But this is just a nit > and the comment explains is well enough so I don't insist. > Feel free to tell me if you have other issues/nits/ideas. Because even with your detailed explanation, I may take it in a wrong way. :p And Thanks for your patience. -Stephen > > if (buffer_new(bh)) { > > if (folio_test_uptodate(folio)) { > > - clear_buffer_new(bh); > > + /* > > + * Unlike __block_write_begin() we leave > > + * dirtying of new uptodate buffers to > > + * ->write_end() time or > > + * folio_zero_new_buffers(). > > + */ > > set_buffer_uptodate(bh); > > - mark_buffer_dirty(bh); > > continue; > > } > > if (block_end > to || block_start < from) > > Thanks! > > Honza > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-30 2:03 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-29 8:54 [PATCH v5 0/3] Fix an error caused by improperly dirtied buffer zhangshida 2024-08-29 8:54 ` [PATCH 1/3] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida 2024-08-29 9:11 ` Jan Kara 2024-08-29 8:54 ` [PATCH 2/3] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida 2024-08-29 9:12 ` Jan Kara 2024-08-29 9:26 ` Jan Kara 2024-08-29 8:54 ` [PATCH 3/3] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida 2024-08-29 9:30 ` Jan Kara 2024-08-30 2:03 ` Stephen Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox