[PATCH v4 0/2] Fix an error caused by improperly dirtied buffer

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer
@ 2024-08-23  1:33 zhangshida
  2024-08-23  1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
  2024-08-23  1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
  0 siblings, 2 replies; 8+ messages in thread
From: zhangshida @ 2024-08-23  1:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd

From: Shida Zhang <zhangshida@kylinos.cn>

Hi all,

On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
---------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
---------

The same condition may also be applied to the lattest kernel version.

This patch set fixes it by: 
1.Replace the __block_write_begin with the hoisted
  ext4_block_write_begin().(patch 1)
2.Trace the user data dirtying in ext4_block_write_begin().(patch 2)


Changes since v3: 
- Ditch the patch 3 in v3, because some other code paths can set the 
  buffer dirty:
	ext4_write_begin
		ext4_block_write_begin
			create_empty_buffers
				set_buffer_dirty 

[3] Version 3:
https://lore.kernel.org/linux-ext4/20240810082814.3709867-1-zhangshida@kylinos.cn/
Changes since v2: 
- Adjust the applied order of patch 1 and patch 2 in v1. 
- Reword the commit message.
- Remove some superfluous logic in patch 2 and patch 3.

[2] Version 2:
https://lore.kernel.org/linux-ext4/20240809064606.3490994-2-zhangshida@kylinos.cn/
Changes since v1: 
- v1 use a hack into jbd2 to fix the bug while v2 choose to journal
  the dirty data in *_block_write_begin.

[1] Version 1:
https://lore.kernel.org/linux-ext4/20240720062356.2522658-1-zhangshida@kylinos.cn/




Shida Zhang (2):
  ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  ext4: fix a potential assertion failure due to improperly dirtied
    buffer

 fs/ext4/ext4.h   |  3 +++
 fs/ext4/inline.c | 11 ++++++-----
 fs/ext4/inode.c  | 51 +++++++++++++++++++++++++-----------------------
 3 files changed, 36 insertions(+), 29 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  2024-08-23  1:33 [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer zhangshida
@ 2024-08-23  1:33 ` zhangshida
  2024-08-28 11:04   ` Jan Kara
  2024-08-29  0:03   ` Eric Biggers
  2024-08-23  1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
  1 sibling, 2 replies; 8+ messages in thread
From: zhangshida @ 2024-08-23  1:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara

From: Shida Zhang <zhangshida@kylinos.cn>

Using __block_write_begin() make it inconvenient to journal the
user data dirty process. We can't tell the block layer maintainer,
‘Hey, we want to trace the dirty user data in ext4, can we add some
special code for ext4 in __block_write_begin?’:P

So use ext4_block_write_begin() instead.

The two functions are basically doing the same thing except for the
fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as
to allow ext4_block_write_begin() to function like __block_write_begin
when the config is disabled.
And hoist the ext4_block_write_begin so that it can be used in other
files.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
---
 fs/ext4/ext4.h   |  2 ++
 fs/ext4/inline.c | 10 +++++-----
 fs/ext4/inode.c  | 23 ++++++-----------------
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 08acd152261e..5f8257b68190 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
 	return buffer_uptodate(bh);
 }
 
+extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+				  get_block_t *get_block);
 #endif	/* __KERNEL__ */
 
 #define EFSBADCRC	EBADMSG		/* Bad CRC detected */
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index e7a09a99837b..0a1a8431e281 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
 		goto out;
 
 	if (ext4_should_dioread_nolock(inode)) {
-		ret = __block_write_begin(&folio->page, from, to,
-					  ext4_get_block_unwritten);
+		ret = ext4_block_write_begin(folio, from, to,
+					     ext4_get_block_unwritten);
 	} else
-		ret = __block_write_begin(&folio->page, from, to, ext4_get_block);
+		ret = ext4_block_write_begin(folio, from, to, ext4_get_block);
 
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
@@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 			goto out;
 	}
 
-	ret = __block_write_begin(&folio->page, 0, inline_size,
-				  ext4_da_get_block_prep);
+	ret = ext4_block_write_begin(folio, 0, inline_size,
+				     ext4_da_get_block_prep);
 	if (ret) {
 		up_read(&EXT4_I(inode)->xattr_sem);
 		folio_unlock(folio);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 941c1c0d5c6e..6b15805ca88b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
 	return ret;
 }
 
-#ifdef CONFIG_FS_ENCRYPTION
-static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
-				  get_block_t *get_block)
+int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+			   get_block_t *get_block)
 {
 	unsigned from = pos & (PAGE_SIZE - 1);
 	unsigned to = from + len;
@@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 	}
 	if (unlikely(err)) {
 		folio_zero_new_buffers(folio, from, to);
-	} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
+	}
+#ifdef CONFIG_FS_ENCRYPTION
+	else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
 		for (i = 0; i < nr_wait; i++) {
 			int err2;
 
@@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 			}
 		}
 	}
+#endif
 
 	return err;
 }
-#endif
 
 /*
  * To preserve ordering, it is essential that the hole instantiation and
@@ -1216,19 +1217,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 	/* In case writeback began while the folio was unlocked */
 	folio_wait_stable(folio);
 
-#ifdef CONFIG_FS_ENCRYPTION
 	if (ext4_should_dioread_nolock(inode))
 		ret = ext4_block_write_begin(folio, pos, len,
 					     ext4_get_block_unwritten);
 	else
 		ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
-#else
-	if (ext4_should_dioread_nolock(inode))
-		ret = __block_write_begin(&folio->page, pos, len,
-					  ext4_get_block_unwritten);
-	else
-		ret = __block_write_begin(&folio->page, pos, len, ext4_get_block);
-#endif
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
 					     folio_buffers(folio), from, to,
@@ -2961,11 +2954,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
-#ifdef CONFIG_FS_ENCRYPTION
 	ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
-#else
-	ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep);
-#endif
 	if (ret < 0) {
 		folio_unlock(folio);
 		folio_put(folio);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  2024-08-23  1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
@ 2024-08-28 11:04   ` Jan Kara
  2024-08-29  0:03   ` Eric Biggers
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Kara @ 2024-08-28 11:04 UTC (permalink / raw)
  To: zhangshida
  Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida,
	Jan Kara

On Fri 23-08-24 09:33:28, zhangshida wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
> 
> Using __block_write_begin() make it inconvenient to journal the
> user data dirty process. We can't tell the block layer maintainer,
> ‘Hey, we want to trace the dirty user data in ext4, can we add some
> special code for ext4 in __block_write_begin?’:P
> 
> So use ext4_block_write_begin() instead.
> 
> The two functions are basically doing the same thing except for the
> fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as
> to allow ext4_block_write_begin() to function like __block_write_begin
> when the config is disabled.
> And hoist the ext4_block_write_begin so that it can be used in other
> files.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>

Looks good to me! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h   |  2 ++
>  fs/ext4/inline.c | 10 +++++-----
>  fs/ext4/inode.c  | 23 ++++++-----------------
>  3 files changed, 13 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 08acd152261e..5f8257b68190 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
>  	return buffer_uptodate(bh);
>  }
>  
> +extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> +				  get_block_t *get_block);
>  #endif	/* __KERNEL__ */
>  
>  #define EFSBADCRC	EBADMSG		/* Bad CRC detected */
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index e7a09a99837b..0a1a8431e281 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
>  		goto out;
>  
>  	if (ext4_should_dioread_nolock(inode)) {
> -		ret = __block_write_begin(&folio->page, from, to,
> -					  ext4_get_block_unwritten);
> +		ret = ext4_block_write_begin(folio, from, to,
> +					     ext4_get_block_unwritten);
>  	} else
> -		ret = __block_write_begin(&folio->page, from, to, ext4_get_block);
> +		ret = ext4_block_write_begin(folio, from, to, ext4_get_block);
>  
>  	if (!ret && ext4_should_journal_data(inode)) {
>  		ret = ext4_walk_page_buffers(handle, inode,
> @@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
>  			goto out;
>  	}
>  
> -	ret = __block_write_begin(&folio->page, 0, inline_size,
> -				  ext4_da_get_block_prep);
> +	ret = ext4_block_write_begin(folio, 0, inline_size,
> +				     ext4_da_get_block_prep);
>  	if (ret) {
>  		up_read(&EXT4_I(inode)->xattr_sem);
>  		folio_unlock(folio);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 941c1c0d5c6e..6b15805ca88b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
>  	return ret;
>  }
>  
> -#ifdef CONFIG_FS_ENCRYPTION
> -static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> -				  get_block_t *get_block)
> +int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> +			   get_block_t *get_block)
>  {
>  	unsigned from = pos & (PAGE_SIZE - 1);
>  	unsigned to = from + len;
> @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  	}
>  	if (unlikely(err)) {
>  		folio_zero_new_buffers(folio, from, to);
> -	} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
> +	}
> +#ifdef CONFIG_FS_ENCRYPTION
> +	else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
>  		for (i = 0; i < nr_wait; i++) {
>  			int err2;
>  
> @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  			}
>  		}
>  	}
> +#endif
>  
>  	return err;
>  }
> -#endif
>  
>  /*
>   * To preserve ordering, it is essential that the hole instantiation and
> @@ -1216,19 +1217,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
>  	/* In case writeback began while the folio was unlocked */
>  	folio_wait_stable(folio);
>  
> -#ifdef CONFIG_FS_ENCRYPTION
>  	if (ext4_should_dioread_nolock(inode))
>  		ret = ext4_block_write_begin(folio, pos, len,
>  					     ext4_get_block_unwritten);
>  	else
>  		ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
> -#else
> -	if (ext4_should_dioread_nolock(inode))
> -		ret = __block_write_begin(&folio->page, pos, len,
> -					  ext4_get_block_unwritten);
> -	else
> -		ret = __block_write_begin(&folio->page, pos, len, ext4_get_block);
> -#endif
>  	if (!ret && ext4_should_journal_data(inode)) {
>  		ret = ext4_walk_page_buffers(handle, inode,
>  					     folio_buffers(folio), from, to,
> @@ -2961,11 +2954,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  	if (IS_ERR(folio))
>  		return PTR_ERR(folio);
>  
> -#ifdef CONFIG_FS_ENCRYPTION
>  	ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
> -#else
> -	ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep);
> -#endif
>  	if (ret < 0) {
>  		folio_unlock(folio);
>  		folio_put(folio);
> -- 
> 2.33.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  2024-08-23  1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
  2024-08-28 11:04   ` Jan Kara
@ 2024-08-29  0:03   ` Eric Biggers
  2024-08-29  1:28     ` Stephen Zhang
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-08-29  0:03 UTC (permalink / raw)
  To: zhangshida
  Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida,
	Jan Kara

On Fri, Aug 23, 2024 at 09:33:28AM +0800, zhangshida wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
> 
> Using __block_write_begin() make it inconvenient to journal the
> user data dirty process. We can't tell the block layer maintainer,
> ‘Hey, we want to trace the dirty user data in ext4, can we add some
> special code for ext4 in __block_write_begin?’:P
> 
> So use ext4_block_write_begin() instead.
> 
> The two functions are basically doing the same thing except for the
> fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as
> to allow ext4_block_write_begin() to function like __block_write_begin
> when the config is disabled.
> And hoist the ext4_block_write_begin so that it can be used in other
> files.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
> ---
>  fs/ext4/ext4.h   |  2 ++
>  fs/ext4/inline.c | 10 +++++-----
>  fs/ext4/inode.c  | 23 ++++++-----------------
>  3 files changed, 13 insertions(+), 22 deletions(-)

Thanks for cleaning this up.

There are still some comments in fs/ext4/inode.c that reference
__block_write_begin.  Can you update them too?

One more thing below.

> @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  	}
>  	if (unlikely(err)) {
>  		folio_zero_new_buffers(folio, from, to);
> -	} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
> +	}
> +#ifdef CONFIG_FS_ENCRYPTION
> +	else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
>  		for (i = 0; i < nr_wait; i++) {
>  			int err2;
>  
> @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  			}
>  		}
>  	}
> +#endif

This #ifdef isn't necessary since fscrypt_inode_uses_fs_layer_crypto() returns
false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION.

- Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  2024-08-29  0:03   ` Eric Biggers
@ 2024-08-29  1:28     ` Stephen Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Stephen Zhang @ 2024-08-29  1:28 UTC (permalink / raw)
  To: Eric Biggers
  Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida,
	Jan Kara

Eric Biggers <ebiggers@kernel.org> 于2024年8月29日周四 08:03写道：
>
> On Fri, Aug 23, 2024 at 09:33:28AM +0800, zhangshida wrote:
> > From: Shida Zhang <zhangshida@kylinos.cn>
> >
> > Using __block_write_begin() make it inconvenient to journal the
> > user data dirty process. We can't tell the block layer maintainer,
> > ‘Hey, we want to trace the dirty user data in ext4, can we add some
> > special code for ext4 in __block_write_begin?’:P
> >
> > So use ext4_block_write_begin() instead.
> >
> > The two functions are basically doing the same thing except for the
> > fscrypt related code. Narrow the scope of CONFIG_FS_ENCRYPTION so as
> > to allow ext4_block_write_begin() to function like __block_write_begin
> > when the config is disabled.
> > And hoist the ext4_block_write_begin so that it can be used in other
> > files.
> >
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
> > ---
> >  fs/ext4/ext4.h   |  2 ++
> >  fs/ext4/inline.c | 10 +++++-----
> >  fs/ext4/inode.c  | 23 ++++++-----------------
> >  3 files changed, 13 insertions(+), 22 deletions(-)
>
> Thanks for cleaning this up.
>
> There are still some comments in fs/ext4/inode.c that reference
> __block_write_begin.  Can you update them too?
>
> One more thing below.
>
> > @@ -1119,7 +1118,9 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> >       }
> >       if (unlikely(err)) {
> >               folio_zero_new_buffers(folio, from, to);
> > -     } else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
> > +     }
> > +#ifdef CONFIG_FS_ENCRYPTION
> > +     else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
> >               for (i = 0; i < nr_wait; i++) {
> >                       int err2;
> >
> > @@ -1131,10 +1132,10 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> >                       }
> >               }
> >       }
> > +#endif
>
> This #ifdef isn't necessary since fscrypt_inode_uses_fs_layer_crypto() returns
> false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION.
>

Okay. Will make another version.

Thanks,
Stephen

> - Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer
  2024-08-23  1:33 [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer zhangshida
  2024-08-23  1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
@ 2024-08-23  1:33 ` zhangshida
  2024-08-28 11:45   ` Jan Kara
  1 sibling, 1 reply; 8+ messages in thread
From: zhangshida @ 2024-08-23  1:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Baolin Liu,
	Jan Kara

From: Shida Zhang <zhangshida@kylinos.cn>

On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
-----------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
-----------

The same condition may also be applied to the lattest kernel version.

When blocksize < pagesize and we truncate a file, there can be buffers in
the mapping tail page beyond i_size. These buffers will be filed to
transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
truncation. When the transaction doing truncate starts committing, we can
grow the file again. This calls __block_write_begin() which allocates new
blocks under these buffers in the tail page we go through the branch:

                        if (buffer_new(bh)) {
                                clean_bdev_bh_alias(bh);
                                if (folio_test_uptodate(folio)) {
                                        clear_buffer_new(bh);
                                        set_buffer_uptodate(bh);
                                        mark_buffer_dirty(bh);
                                        continue;
                                }
                                ...
                        }

Hence buffers on BJ_Forget list of the committing transaction get marked
dirty and this triggers the jbd2 assertion.

Teach ext4_block_write_begin() to properly handle files with data
journalling by avoiding dirtying them directly. Instead of
folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
takes care of handling journalling. We also don't need to mark new uptodate
buffers as dirty in ext4_block_write_begin(). That will be either done
either by block_commit_write() in case of success or by
folio_zero_new_buffers() in case of failure.

Reported-by: Baolin Liu <liubaolin@kylinos.cn>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
---
 fs/ext4/ext4.h   |  3 ++-
 fs/ext4/inline.c |  7 ++++---
 fs/ext4/inode.c  | 30 ++++++++++++++++++++++--------
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5f8257b68190..b653bd423b11 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3851,7 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
 	return buffer_uptodate(bh);
 }
 
-extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
+				  loff_t pos, unsigned len,
 				  get_block_t *get_block);
 #endif	/* __KERNEL__ */
 
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 0a1a8431e281..8d5599d5af27 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -601,10 +601,11 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
 		goto out;
 
 	if (ext4_should_dioread_nolock(inode)) {
-		ret = ext4_block_write_begin(folio, from, to,
+		ret = ext4_block_write_begin(handle, folio, from, to,
 					     ext4_get_block_unwritten);
 	} else
-		ret = ext4_block_write_begin(folio, from, to, ext4_get_block);
+		ret = ext4_block_write_begin(handle, folio, from, to,
+					     ext4_get_block);
 
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
@@ -856,7 +857,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 			goto out;
 	}
 
-	ret = ext4_block_write_begin(folio, 0, inline_size,
+	ret = ext4_block_write_begin(NULL, folio, 0, inline_size,
 				     ext4_da_get_block_prep);
 	if (ret) {
 		up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6b15805ca88b..4c34827da56e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -49,6 +49,11 @@
 
 #include <trace/events/ext4.h>
 
+static void ext4_journalled_zero_new_buffers(handle_t *handle,
+					    struct inode *inode,
+					    struct folio *folio,
+					    unsigned from, unsigned to);
+
 static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw,
 			      struct ext4_inode_info *ei)
 {
@@ -1041,7 +1046,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
 	return ret;
 }
 
-int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+int ext4_block_write_begin(handle_t *handle, struct folio *folio,
+			   loff_t pos, unsigned len,
 			   get_block_t *get_block)
 {
 	unsigned from = pos & (PAGE_SIZE - 1);
@@ -1055,6 +1061,7 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 	struct buffer_head *bh, *head, *wait[2];
 	int nr_wait = 0;
 	int i;
+	bool should_journal_data = ext4_should_journal_data(inode);
 
 	BUG_ON(!folio_test_locked(folio));
 	BUG_ON(from > PAGE_SIZE);
@@ -1083,11 +1090,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 			err = get_block(inode, block, bh, 1);
 			if (err)
 				break;
+			if (should_journal_data)
+				do_journal_get_write_access(handle, inode, bh);
 			if (buffer_new(bh)) {
 				if (folio_test_uptodate(folio)) {
-					clear_buffer_new(bh);
 					set_buffer_uptodate(bh);
-					mark_buffer_dirty(bh);
 					continue;
 				}
 				if (block_end > to || block_start < from)
@@ -1117,7 +1124,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 			err = -EIO;
 	}
 	if (unlikely(err)) {
-		folio_zero_new_buffers(folio, from, to);
+		if (should_journal_data)
+			ext4_journalled_zero_new_buffers(handle, inode, folio,
+							 from, to);
+		else
+			folio_zero_new_buffers(folio, from, to);
 	}
 #ifdef CONFIG_FS_ENCRYPTION
 	else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
@@ -1218,10 +1229,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 	folio_wait_stable(folio);
 
 	if (ext4_should_dioread_nolock(inode))
-		ret = ext4_block_write_begin(folio, pos, len,
+		ret = ext4_block_write_begin(handle, folio, pos, len,
 					     ext4_get_block_unwritten);
 	else
-		ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
+		ret = ext4_block_write_begin(handle, folio, pos, len,
+					     ext4_get_block);
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
 					     folio_buffers(folio), from, to,
@@ -2954,7 +2966,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
-	ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
+	ret = ext4_block_write_begin(NULL, folio, pos, len,
+				     ext4_da_get_block_prep);
 	if (ret < 0) {
 		folio_unlock(folio);
 		folio_put(folio);
@@ -6208,7 +6221,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 		if (folio_pos(folio) + len > size)
 			len = size - folio_pos(folio);
 
-		err = __block_write_begin(&folio->page, 0, len, ext4_get_block);
+		err = ext4_block_write_begin(handle, folio, 0, len,
+					     ext4_get_block);
 		if (!err) {
 			ret = VM_FAULT_SIGBUS;
 			if (ext4_journal_folio_buffers(handle, folio, len))
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer
  2024-08-23  1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
@ 2024-08-28 11:45   ` Jan Kara
  2024-08-29  1:50     ` Stephen Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2024-08-28 11:45 UTC (permalink / raw)
  To: zhangshida
  Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida,
	Baolin Liu, Jan Kara

On Fri 23-08-24 09:33:29, zhangshida wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
> 
> On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
> an assertion failure will occasionally be triggered by the line below:
> -----------
> jbd2_journal_commit_transaction
> {
> ...
> J_ASSERT_BH(bh, !buffer_dirty(bh));
> /*
> * The buffer on BJ_Forget list and not jbddirty means
> ...
> }
> -----------
> 
> The same condition may also be applied to the lattest kernel version.
> 
> When blocksize < pagesize and we truncate a file, there can be buffers in
> the mapping tail page beyond i_size. These buffers will be filed to
> transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
> truncation. When the transaction doing truncate starts committing, we can
> grow the file again. This calls __block_write_begin() which allocates new
> blocks under these buffers in the tail page we go through the branch:
					     ^^ and we...

 
>                         if (buffer_new(bh)) {
>                                 clean_bdev_bh_alias(bh);
>                                 if (folio_test_uptodate(folio)) {
>                                         clear_buffer_new(bh);
>                                         set_buffer_uptodate(bh);
>                                         mark_buffer_dirty(bh);
>                                         continue;
>                                 }
>                                 ...
>                         }
> 
> Hence buffers on BJ_Forget list of the committing transaction get marked
> dirty and this triggers the jbd2 assertion.
> 
> Teach ext4_block_write_begin() to properly handle files with data
> journalling by avoiding dirtying them directly. Instead of
> folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
> takes care of handling journalling. We also don't need to mark new uptodate
> buffers as dirty in ext4_block_write_begin(). That will be either done
> either by block_commit_write() in case of success or by
> folio_zero_new_buffers() in case of failure.
> 
> Reported-by: Baolin Liu <liubaolin@kylinos.cn>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>

Looks mostly good. Just three small comments:

> @@ -1083,11 +1090,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  			err = get_block(inode, block, bh, 1);
>  			if (err)
>  				break;


> +			if (should_journal_data)
> +				do_journal_get_write_access(handle, inode, bh);

I'd move this inside the buffer_new() branch and add before it a comment:
			/*
			 * We may be zeroing partial buffers or all new
			 * buffers in case of failure. Prepare JBD2 for
			 * that.
			 */

>  			if (buffer_new(bh)) {
>  				if (folio_test_uptodate(folio)) {
> -					clear_buffer_new(bh);
>  					set_buffer_uptodate(bh);
> -					mark_buffer_dirty(bh);

Here I'd add comment:
				/*
				 * Unlike __block_write_begin() we leave
				 * dirtying of new uptodate buffers to
				 * ->write_end() time or
				 * folio_zero_new_buffers().
				 */

> @@ -1117,7 +1124,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  			err = -EIO;
>  	}
>  	if (unlikely(err)) {
> -		folio_zero_new_buffers(folio, from, to);
> +		if (should_journal_data)
> +			ext4_journalled_zero_new_buffers(handle, inode, folio,
> +							 from, to);

I've realized there's a small bug in ext4_journalled_zero_new_buffers()
that it calls write_end_fn() only if it zeroed a buffer. But for new
uptodate buffers we also need to call write_end_fn() to persist the
uptodate content (similarly as folio_zero_new_buffers() does it). So we
need another preparatory patch moving write_end_fn() in
ext4_journalled_zero_new_buffers() to be called also for uptodate pages.

> +		else
> +			folio_zero_new_buffers(folio, from, to);
>  	}
>  #ifdef CONFIG_FS_ENCRYPTION
>  	else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer
  2024-08-28 11:45   ` Jan Kara
@ 2024-08-29  1:50     ` Stephen Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Stephen Zhang @ 2024-08-29  1:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: tytso, adilger.kernel, jack, linux-ext4, linux-kernel, zhangshida,
	Baolin Liu

Jan Kara <jack@suse.cz> 于2024年8月28日周三 19:45写道：
>
> On Fri 23-08-24 09:33:29, zhangshida wrote:
> > From: Shida Zhang <zhangshida@kylinos.cn>
> >
> > On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
> > an assertion failure will occasionally be triggered by the line below:
> > -----------
> > jbd2_journal_commit_transaction
> > {
> > ...
> > J_ASSERT_BH(bh, !buffer_dirty(bh));
> > /*
> > * The buffer on BJ_Forget list and not jbddirty means
> > ...
> > }
> > -----------
> >
> > The same condition may also be applied to the lattest kernel version.
> >
> > When blocksize < pagesize and we truncate a file, there can be buffers in
> > the mapping tail page beyond i_size. These buffers will be filed to
> > transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
> > truncation. When the transaction doing truncate starts committing, we can
> > grow the file again. This calls __block_write_begin() which allocates new
> > blocks under these buffers in the tail page we go through the branch:
>                                              ^^ and we...
>
>
> >                         if (buffer_new(bh)) {
> >                                 clean_bdev_bh_alias(bh);
> >                                 if (folio_test_uptodate(folio)) {
> >                                         clear_buffer_new(bh);
> >                                         set_buffer_uptodate(bh);
> >                                         mark_buffer_dirty(bh);
> >                                         continue;
> >                                 }
> >                                 ...
> >                         }
> >
> > Hence buffers on BJ_Forget list of the committing transaction get marked
> > dirty and this triggers the jbd2 assertion.
> >
> > Teach ext4_block_write_begin() to properly handle files with data
> > journalling by avoiding dirtying them directly. Instead of
> > folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
> > takes care of handling journalling. We also don't need to mark new uptodate
> > buffers as dirty in ext4_block_write_begin(). That will be either done
> > either by block_commit_write() in case of success or by
> > folio_zero_new_buffers() in case of failure.
> >
> > Reported-by: Baolin Liu <liubaolin@kylinos.cn>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
>
> Looks mostly good. Just three small comments:
>
> > @@ -1083,11 +1090,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> >                       err = get_block(inode, block, bh, 1);
> >                       if (err)
> >                               break;
>
>
> > +                     if (should_journal_data)
> > +                             do_journal_get_write_access(handle, inode, bh);
>
> I'd move this inside the buffer_new() branch and add before it a comment:
>                         /*
>                          * We may be zeroing partial buffers or all new
>                          * buffers in case of failure. Prepare JBD2 for
>                          * that.
>                          */
>
> >                       if (buffer_new(bh)) {
> >                               if (folio_test_uptodate(folio)) {
> > -                                     clear_buffer_new(bh);
> >                                       set_buffer_uptodate(bh);
> > -                                     mark_buffer_dirty(bh);
>
> Here I'd add comment:
>                                 /*
>                                  * Unlike __block_write_begin() we leave
>                                  * dirtying of new uptodate buffers to
>                                  * ->write_end() time or
>                                  * folio_zero_new_buffers().
>                                  */
>
> > @@ -1117,7 +1124,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> >                       err = -EIO;
> >       }
> >       if (unlikely(err)) {
> > -             folio_zero_new_buffers(folio, from, to);
> > +             if (should_journal_data)
> > +                     ext4_journalled_zero_new_buffers(handle, inode, folio,
> > +                                                      from, to);
>
> I've realized there's a small bug in ext4_journalled_zero_new_buffers()
> that it calls write_end_fn() only if it zeroed a buffer. But for new
> uptodate buffers we also need to call write_end_fn() to persist the
> uptodate content (similarly as folio_zero_new_buffers() does it). So we
> need another preparatory patch moving write_end_fn() in
> ext4_journalled_zero_new_buffers() to be called also for uptodate pages.
>

Will do. And also thanks for the detailed explanation.

-Stephen

> > +             else
> > +                     folio_zero_new_buffers(folio, from, to);
> >       }
> >  #ifdef CONFIG_FS_ENCRYPTION
> >       else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-29  1:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-23  1:33 [PATCH v4 0/2] Fix an error caused by improperly dirtied buffer zhangshida
2024-08-23  1:33 ` [PATCH 1/2] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
2024-08-28 11:04   ` Jan Kara
2024-08-29  0:03   ` Eric Biggers
2024-08-29  1:28     ` Stephen Zhang
2024-08-23  1:33 ` [PATCH 2/2] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
2024-08-28 11:45   ` Jan Kara
2024-08-29  1:50     ` Stephen Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox