public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer
@ 2024-08-30  5:37 zhangshida
  2024-08-30  5:37 ` [PATCH 1/4] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: zhangshida @ 2024-08-30  5:37 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack, ebiggers
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd

From: Shida Zhang <zhangshida@kylinos.cn>

Hi all,

On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
---------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
---------

The same condition may also be applied to the lattest kernel version.

This patch set fixes it by: 
1.Fix a small bug for ext4_journalled_zero_new_buffers first.(patch 1)
2.Replace the __block_write_begin with the hoisted
  ext4_block_write_begin().(patch 2)
3.Trace the user data dirtying in ext4_block_write_begin().(patch 3)
4.Clean up some extra things.(patch 4)

Changes since v5: 
- Moved a hunk inside the if (buffer_new(bh)) check in patch 3.
- Add a cleanup in patch 4. 

[5] Version 5:
https://lore.kernel.org/linux-ext4/20240829085407.3331490-1-zhangshida@kylinos.cn/
Changes since v4: 
- At first, we fix a bug in ext4_journalled_zero_new_buffers, as suggested
  by Jan.
- In patch 2, clean up the related comment. And remove the #ifdef in  
  ext4_block_write_begin(), as suggested by Eric. 
- Add some comments in patch 3.

[4] Version 4:
https://lore.kernel.org/linux-ext4/20240823013329.1996741-1-zhangshida@kylinos.cn/
Changes since v3: 
- Ditch the patch 3 in v3, because some other code paths can set the 
  buffer dirty:
        ext4_write_begin
                ext4_block_write_begin
                        create_empty_buffers
                                set_buffer_dirty 

[3] Version 3:
https://lore.kernel.org/linux-ext4/20240810082814.3709867-1-zhangshida@kylinos.cn/
Changes since v2: 
- Adjust the applied order of patch 1 and patch 2 in v1. 
- Reword the commit message.
- Remove some superfluous logic in patch 2 and patch 3.

[2] Version 2:
https://lore.kernel.org/linux-ext4/20240809064606.3490994-2-zhangshida@kylinos.cn/
Changes since v1:
- v1 use a hack into jbd2 to fix the bug while v2 choose to journal
  the dirty data in *_block_write_begin.

[1] Version 1:
https://lore.kernel.org/linux-ext4/20240720062356.2522658-1-zhangshida@kylinos.cn/


Shida Zhang (4):
  ext4: persist the new uptodate buffers in
    ext4_journalled_zero_new_buffers
  ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  ext4: fix a potential assertion failure due to improperly dirtied
    buffer
  ext4: remove the special buffer dirty handling in
    do_journal_get_write_access

 fs/ext4/ext4.h   |  3 ++
 fs/ext4/inline.c | 11 ++++---
 fs/ext4/inode.c  | 80 +++++++++++++++++++++++-------------------------
 3 files changed, 47 insertions(+), 47 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers
  2024-08-30  5:37 [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer zhangshida
@ 2024-08-30  5:37 ` zhangshida
  2024-08-30  5:37 ` [PATCH 2/4] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: zhangshida @ 2024-08-30  5:37 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack, ebiggers
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara

From: Shida Zhang <zhangshida@kylinos.cn>

For new uptodate buffers we also need to call write_end_fn() to persist the
uptodate content, similarly as folio_zero_new_buffers() does it.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 941c1c0d5c6e..a0a55cb8db53 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1389,9 +1389,9 @@ static void ext4_journalled_zero_new_buffers(handle_t *handle,
 					size = min(to, block_end) - start;
 
 					folio_zero_range(folio, start, size);
-					write_end_fn(handle, inode, bh);
 				}
 				clear_buffer_new(bh);
+				write_end_fn(handle, inode, bh);
 			}
 		}
 		block_start = block_end;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] ext4: hoist ext4_block_write_begin and replace the __block_write_begin
  2024-08-30  5:37 [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer zhangshida
  2024-08-30  5:37 ` [PATCH 1/4] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida
@ 2024-08-30  5:37 ` zhangshida
  2024-08-30  5:37 ` [PATCH 3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: zhangshida @ 2024-08-30  5:37 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack, ebiggers
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara

From: Shida Zhang <zhangshida@kylinos.cn>

Using __block_write_begin() make it inconvenient to journal the
user data dirty process. We can't tell the block layer maintainer,
‘Hey, we want to trace the dirty user data in ext4, can we add some
special code for ext4 in __block_write_begin?’:P

So use ext4_block_write_begin() instead.

The two functions are basically doing the same thing except for the
fscrypt related code. Remove the unnecessary #ifdef since
fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at
compile time) when !CONFIG_FS_ENCRYPTION.

And hoist the ext4_block_write_begin so that it can be used in other
files.

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h   |  2 ++
 fs/ext4/inline.c | 10 +++++-----
 fs/ext4/inode.c  | 24 +++++-------------------
 3 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 08acd152261e..5f8257b68190 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3851,6 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
 	return buffer_uptodate(bh);
 }
 
+extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+				  get_block_t *get_block);
 #endif	/* __KERNEL__ */
 
 #define EFSBADCRC	EBADMSG		/* Bad CRC detected */
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index e7a09a99837b..0a1a8431e281 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -601,10 +601,10 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
 		goto out;
 
 	if (ext4_should_dioread_nolock(inode)) {
-		ret = __block_write_begin(&folio->page, from, to,
-					  ext4_get_block_unwritten);
+		ret = ext4_block_write_begin(folio, from, to,
+					     ext4_get_block_unwritten);
 	} else
-		ret = __block_write_begin(&folio->page, from, to, ext4_get_block);
+		ret = ext4_block_write_begin(folio, from, to, ext4_get_block);
 
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
@@ -856,8 +856,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 			goto out;
 	}
 
-	ret = __block_write_begin(&folio->page, 0, inline_size,
-				  ext4_da_get_block_prep);
+	ret = ext4_block_write_begin(folio, 0, inline_size,
+				     ext4_da_get_block_prep);
 	if (ret) {
 		up_read(&EXT4_I(inode)->xattr_sem);
 		folio_unlock(folio);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a0a55cb8db53..4964c67e029e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1024,10 +1024,10 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
 	if (!buffer_mapped(bh) || buffer_freed(bh))
 		return 0;
 	/*
-	 * __block_write_begin() could have dirtied some buffers. Clean
+	 * ext4_block_write_begin() could have dirtied some buffers. Clean
 	 * the dirty bit as jbd2_journal_get_write_access() could complain
 	 * otherwise about fs integrity issues. Setting of the dirty bit
-	 * by __block_write_begin() isn't a real problem here as we clear
+	 * by ext4_block_write_begin() isn't a real problem here as we clear
 	 * the bit before releasing a page lock and thus writeback cannot
 	 * ever write the buffer.
 	 */
@@ -1041,9 +1041,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
 	return ret;
 }
 
-#ifdef CONFIG_FS_ENCRYPTION
-static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
-				  get_block_t *get_block)
+int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+			   get_block_t *get_block)
 {
 	unsigned from = pos & (PAGE_SIZE - 1);
 	unsigned to = from + len;
@@ -1134,7 +1133,6 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 
 	return err;
 }
-#endif
 
 /*
  * To preserve ordering, it is essential that the hole instantiation and
@@ -1216,19 +1214,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 	/* In case writeback began while the folio was unlocked */
 	folio_wait_stable(folio);
 
-#ifdef CONFIG_FS_ENCRYPTION
 	if (ext4_should_dioread_nolock(inode))
 		ret = ext4_block_write_begin(folio, pos, len,
 					     ext4_get_block_unwritten);
 	else
 		ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
-#else
-	if (ext4_should_dioread_nolock(inode))
-		ret = __block_write_begin(&folio->page, pos, len,
-					  ext4_get_block_unwritten);
-	else
-		ret = __block_write_begin(&folio->page, pos, len, ext4_get_block);
-#endif
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
 					     folio_buffers(folio), from, to,
@@ -1241,7 +1231,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 
 		folio_unlock(folio);
 		/*
-		 * __block_write_begin may have instantiated a few blocks
+		 * ext4_block_write_begin may have instantiated a few blocks
 		 * outside i_size.  Trim these off again. Don't need
 		 * i_size_read because we hold i_rwsem.
 		 *
@@ -2961,11 +2951,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
-#ifdef CONFIG_FS_ENCRYPTION
 	ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
-#else
-	ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep);
-#endif
 	if (ret < 0) {
 		folio_unlock(folio);
 		folio_put(folio);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer
  2024-08-30  5:37 [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer zhangshida
  2024-08-30  5:37 ` [PATCH 1/4] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida
  2024-08-30  5:37 ` [PATCH 2/4] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
@ 2024-08-30  5:37 ` zhangshida
  2024-09-02 13:34   ` Jan Kara
  2024-08-30  5:37 ` [PATCH 4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access zhangshida
  2024-09-05 14:53 ` [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer Theodore Ts'o
  4 siblings, 1 reply; 8+ messages in thread
From: zhangshida @ 2024-08-30  5:37 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack, ebiggers
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Baolin Liu,
	Jan Kara

From: Shida Zhang <zhangshida@kylinos.cn>

On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
-----------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
-----------

The same condition may also be applied to the lattest kernel version.

When blocksize < pagesize and we truncate a file, there can be buffers in
the mapping tail page beyond i_size. These buffers will be filed to
transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
truncation. When the transaction doing truncate starts committing, we can
grow the file again. This calls __block_write_begin() which allocates new
blocks under these buffers in the tail page we go through the branch:

                        if (buffer_new(bh)) {
                                clean_bdev_bh_alias(bh);
                                if (folio_test_uptodate(folio)) {
                                        clear_buffer_new(bh);
                                        set_buffer_uptodate(bh);
                                        mark_buffer_dirty(bh);
                                        continue;
                                }
                                ...
                        }

Hence buffers on BJ_Forget list of the committing transaction get marked
dirty and this triggers the jbd2 assertion.

Teach ext4_block_write_begin() to properly handle files with data
journalling by avoiding dirtying them directly. Instead of
folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
takes care of handling journalling. We also don't need to mark new uptodate
buffers as dirty in ext4_block_write_begin(). That will be either done
either by block_commit_write() in case of success or by
folio_zero_new_buffers() in case of failure.

Reported-by: Baolin Liu <liubaolin@kylinos.cn>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
---
 fs/ext4/ext4.h   |  3 ++-
 fs/ext4/inline.c |  7 ++++---
 fs/ext4/inode.c  | 42 ++++++++++++++++++++++++++++++++++--------
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5f8257b68190..b653bd423b11 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3851,7 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
 	return buffer_uptodate(bh);
 }
 
-extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
+				  loff_t pos, unsigned len,
 				  get_block_t *get_block);
 #endif	/* __KERNEL__ */
 
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 0a1a8431e281..8d5599d5af27 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -601,10 +601,11 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
 		goto out;
 
 	if (ext4_should_dioread_nolock(inode)) {
-		ret = ext4_block_write_begin(folio, from, to,
+		ret = ext4_block_write_begin(handle, folio, from, to,
 					     ext4_get_block_unwritten);
 	} else
-		ret = ext4_block_write_begin(folio, from, to, ext4_get_block);
+		ret = ext4_block_write_begin(handle, folio, from, to,
+					     ext4_get_block);
 
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
@@ -856,7 +857,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 			goto out;
 	}
 
-	ret = ext4_block_write_begin(folio, 0, inline_size,
+	ret = ext4_block_write_begin(NULL, folio, 0, inline_size,
 				     ext4_da_get_block_prep);
 	if (ret) {
 		up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4964c67e029e..a28f279fd02f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -49,6 +49,11 @@
 
 #include <trace/events/ext4.h>
 
+static void ext4_journalled_zero_new_buffers(handle_t *handle,
+					    struct inode *inode,
+					    struct folio *folio,
+					    unsigned from, unsigned to);
+
 static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw,
 			      struct ext4_inode_info *ei)
 {
@@ -1041,7 +1046,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
 	return ret;
 }
 
-int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
+int ext4_block_write_begin(handle_t *handle, struct folio *folio,
+			   loff_t pos, unsigned len,
 			   get_block_t *get_block)
 {
 	unsigned from = pos & (PAGE_SIZE - 1);
@@ -1055,6 +1061,7 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 	struct buffer_head *bh, *head, *wait[2];
 	int nr_wait = 0;
 	int i;
+	bool should_journal_data = ext4_should_journal_data(inode);
 
 	BUG_ON(!folio_test_locked(folio));
 	BUG_ON(from > PAGE_SIZE);
@@ -1084,10 +1091,22 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 			if (err)
 				break;
 			if (buffer_new(bh)) {
+				/*
+				 * We may be zeroing partial buffers or all new
+				 * buffers in case of failure. Prepare JBD2 for
+				 * that.
+				 */
+				if (should_journal_data)
+					do_journal_get_write_access(handle,
+								    inode, bh);
 				if (folio_test_uptodate(folio)) {
-					clear_buffer_new(bh);
+					/*
+					 * Unlike __block_write_begin() we leave
+					 * dirtying of new uptodate buffers to
+					 * ->write_end() time or
+					 * folio_zero_new_buffers().
+					 */
 					set_buffer_uptodate(bh);
-					mark_buffer_dirty(bh);
 					continue;
 				}
 				if (block_end > to || block_start < from)
@@ -1117,7 +1136,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
 			err = -EIO;
 	}
 	if (unlikely(err)) {
-		folio_zero_new_buffers(folio, from, to);
+		if (should_journal_data)
+			ext4_journalled_zero_new_buffers(handle, inode, folio,
+							 from, to);
+		else
+			folio_zero_new_buffers(folio, from, to);
 	} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
 		for (i = 0; i < nr_wait; i++) {
 			int err2;
@@ -1215,10 +1238,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 	folio_wait_stable(folio);
 
 	if (ext4_should_dioread_nolock(inode))
-		ret = ext4_block_write_begin(folio, pos, len,
+		ret = ext4_block_write_begin(handle, folio, pos, len,
 					     ext4_get_block_unwritten);
 	else
-		ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
+		ret = ext4_block_write_begin(handle, folio, pos, len,
+					     ext4_get_block);
 	if (!ret && ext4_should_journal_data(inode)) {
 		ret = ext4_walk_page_buffers(handle, inode,
 					     folio_buffers(folio), from, to,
@@ -2951,7 +2975,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
 
-	ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
+	ret = ext4_block_write_begin(NULL, folio, pos, len,
+				     ext4_da_get_block_prep);
 	if (ret < 0) {
 		folio_unlock(folio);
 		folio_put(folio);
@@ -6205,7 +6230,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 		if (folio_pos(folio) + len > size)
 			len = size - folio_pos(folio);
 
-		err = __block_write_begin(&folio->page, 0, len, ext4_get_block);
+		err = ext4_block_write_begin(handle, folio, 0, len,
+					     ext4_get_block);
 		if (!err) {
 			ret = VM_FAULT_SIGBUS;
 			if (ext4_journal_folio_buffers(handle, folio, len))
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access
  2024-08-30  5:37 [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer zhangshida
                   ` (2 preceding siblings ...)
  2024-08-30  5:37 ` [PATCH 3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
@ 2024-08-30  5:37 ` zhangshida
  2024-09-02 13:35   ` Jan Kara
  2024-09-05 14:53 ` [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer Theodore Ts'o
  4 siblings, 1 reply; 8+ messages in thread
From: zhangshida @ 2024-08-30  5:37 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack, ebiggers
  Cc: linux-ext4, linux-kernel, zhangshida, starzhangzsd, Jan Kara

From: Shida Zhang <zhangshida@kylinos.cn>

This kinda revert the commit 56d35a4cd13e("ext4: Fix dirtying of
journalled buffers in data=journal mode") made by Jan 14 years ago,
since the do_get_write_access() itself can deal with the extra
unexpected buf dirting things in a proper way now.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
---
 fs/ext4/inode.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a28f279fd02f..2687bf451a25 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1023,27 +1023,11 @@ static int ext4_dirty_journalled_data(handle_t *handle, struct buffer_head *bh)
 int do_journal_get_write_access(handle_t *handle, struct inode *inode,
 				struct buffer_head *bh)
 {
-	int dirty = buffer_dirty(bh);
-	int ret;
-
 	if (!buffer_mapped(bh) || buffer_freed(bh))
 		return 0;
-	/*
-	 * ext4_block_write_begin() could have dirtied some buffers. Clean
-	 * the dirty bit as jbd2_journal_get_write_access() could complain
-	 * otherwise about fs integrity issues. Setting of the dirty bit
-	 * by ext4_block_write_begin() isn't a real problem here as we clear
-	 * the bit before releasing a page lock and thus writeback cannot
-	 * ever write the buffer.
-	 */
-	if (dirty)
-		clear_buffer_dirty(bh);
 	BUFFER_TRACE(bh, "get write access");
-	ret = ext4_journal_get_write_access(handle, inode->i_sb, bh,
+	return ext4_journal_get_write_access(handle, inode->i_sb, bh,
 					    EXT4_JTR_NONE);
-	if (!ret && dirty)
-		ret = ext4_dirty_journalled_data(handle, bh);
-	return ret;
 }
 
 int ext4_block_write_begin(handle_t *handle, struct folio *folio,
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer
  2024-08-30  5:37 ` [PATCH 3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
@ 2024-09-02 13:34   ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2024-09-02 13:34 UTC (permalink / raw)
  To: zhangshida
  Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel,
	zhangshida, Baolin Liu, Jan Kara

On Fri 30-08-24 13:37:38, zhangshida wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
> 
> On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
> an assertion failure will occasionally be triggered by the line below:
> -----------
> jbd2_journal_commit_transaction
> {
> ...
> J_ASSERT_BH(bh, !buffer_dirty(bh));
> /*
> * The buffer on BJ_Forget list and not jbddirty means
> ...
> }
> -----------
> 
> The same condition may also be applied to the lattest kernel version.
> 
> When blocksize < pagesize and we truncate a file, there can be buffers in
> the mapping tail page beyond i_size. These buffers will be filed to
> transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
> truncation. When the transaction doing truncate starts committing, we can
> grow the file again. This calls __block_write_begin() which allocates new
> blocks under these buffers in the tail page we go through the branch:
> 
>                         if (buffer_new(bh)) {
>                                 clean_bdev_bh_alias(bh);
>                                 if (folio_test_uptodate(folio)) {
>                                         clear_buffer_new(bh);
>                                         set_buffer_uptodate(bh);
>                                         mark_buffer_dirty(bh);
>                                         continue;
>                                 }
>                                 ...
>                         }
> 
> Hence buffers on BJ_Forget list of the committing transaction get marked
> dirty and this triggers the jbd2 assertion.
> 
> Teach ext4_block_write_begin() to properly handle files with data
> journalling by avoiding dirtying them directly. Instead of
> folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
> takes care of handling journalling. We also don't need to mark new uptodate
> buffers as dirty in ext4_block_write_begin(). That will be either done
> either by block_commit_write() in case of success or by
> folio_zero_new_buffers() in case of failure.
> 
> Reported-by: Baolin Liu <liubaolin@kylinos.cn>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h   |  3 ++-
>  fs/ext4/inline.c |  7 ++++---
>  fs/ext4/inode.c  | 42 ++++++++++++++++++++++++++++++++++--------
>  3 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5f8257b68190..b653bd423b11 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3851,7 +3851,8 @@ static inline int ext4_buffer_uptodate(struct buffer_head *bh)
>  	return buffer_uptodate(bh);
>  }
>  
> -extern int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> +extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> +				  loff_t pos, unsigned len,
>  				  get_block_t *get_block);
>  #endif	/* __KERNEL__ */
>  
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 0a1a8431e281..8d5599d5af27 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -601,10 +601,11 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
>  		goto out;
>  
>  	if (ext4_should_dioread_nolock(inode)) {
> -		ret = ext4_block_write_begin(folio, from, to,
> +		ret = ext4_block_write_begin(handle, folio, from, to,
>  					     ext4_get_block_unwritten);
>  	} else
> -		ret = ext4_block_write_begin(folio, from, to, ext4_get_block);
> +		ret = ext4_block_write_begin(handle, folio, from, to,
> +					     ext4_get_block);
>  
>  	if (!ret && ext4_should_journal_data(inode)) {
>  		ret = ext4_walk_page_buffers(handle, inode,
> @@ -856,7 +857,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
>  			goto out;
>  	}
>  
> -	ret = ext4_block_write_begin(folio, 0, inline_size,
> +	ret = ext4_block_write_begin(NULL, folio, 0, inline_size,
>  				     ext4_da_get_block_prep);
>  	if (ret) {
>  		up_read(&EXT4_I(inode)->xattr_sem);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 4964c67e029e..a28f279fd02f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -49,6 +49,11 @@
>  
>  #include <trace/events/ext4.h>
>  
> +static void ext4_journalled_zero_new_buffers(handle_t *handle,
> +					    struct inode *inode,
> +					    struct folio *folio,
> +					    unsigned from, unsigned to);
> +
>  static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw,
>  			      struct ext4_inode_info *ei)
>  {
> @@ -1041,7 +1046,8 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode,
>  	return ret;
>  }
>  
> -int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> +int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> +			   loff_t pos, unsigned len,
>  			   get_block_t *get_block)
>  {
>  	unsigned from = pos & (PAGE_SIZE - 1);
> @@ -1055,6 +1061,7 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  	struct buffer_head *bh, *head, *wait[2];
>  	int nr_wait = 0;
>  	int i;
> +	bool should_journal_data = ext4_should_journal_data(inode);
>  
>  	BUG_ON(!folio_test_locked(folio));
>  	BUG_ON(from > PAGE_SIZE);
> @@ -1084,10 +1091,22 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  			if (err)
>  				break;
>  			if (buffer_new(bh)) {
> +				/*
> +				 * We may be zeroing partial buffers or all new
> +				 * buffers in case of failure. Prepare JBD2 for
> +				 * that.
> +				 */
> +				if (should_journal_data)
> +					do_journal_get_write_access(handle,
> +								    inode, bh);
>  				if (folio_test_uptodate(folio)) {
> -					clear_buffer_new(bh);
> +					/*
> +					 * Unlike __block_write_begin() we leave
> +					 * dirtying of new uptodate buffers to
> +					 * ->write_end() time or
> +					 * folio_zero_new_buffers().
> +					 */
>  					set_buffer_uptodate(bh);
> -					mark_buffer_dirty(bh);
>  					continue;
>  				}
>  				if (block_end > to || block_start < from)
> @@ -1117,7 +1136,11 @@ int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
>  			err = -EIO;
>  	}
>  	if (unlikely(err)) {
> -		folio_zero_new_buffers(folio, from, to);
> +		if (should_journal_data)
> +			ext4_journalled_zero_new_buffers(handle, inode, folio,
> +							 from, to);
> +		else
> +			folio_zero_new_buffers(folio, from, to);
>  	} else if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
>  		for (i = 0; i < nr_wait; i++) {
>  			int err2;
> @@ -1215,10 +1238,11 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
>  	folio_wait_stable(folio);
>  
>  	if (ext4_should_dioread_nolock(inode))
> -		ret = ext4_block_write_begin(folio, pos, len,
> +		ret = ext4_block_write_begin(handle, folio, pos, len,
>  					     ext4_get_block_unwritten);
>  	else
> -		ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
> +		ret = ext4_block_write_begin(handle, folio, pos, len,
> +					     ext4_get_block);
>  	if (!ret && ext4_should_journal_data(inode)) {
>  		ret = ext4_walk_page_buffers(handle, inode,
>  					     folio_buffers(folio), from, to,
> @@ -2951,7 +2975,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  	if (IS_ERR(folio))
>  		return PTR_ERR(folio);
>  
> -	ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
> +	ret = ext4_block_write_begin(NULL, folio, pos, len,
> +				     ext4_da_get_block_prep);
>  	if (ret < 0) {
>  		folio_unlock(folio);
>  		folio_put(folio);
> @@ -6205,7 +6230,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
>  		if (folio_pos(folio) + len > size)
>  			len = size - folio_pos(folio);
>  
> -		err = __block_write_begin(&folio->page, 0, len, ext4_get_block);
> +		err = ext4_block_write_begin(handle, folio, 0, len,
> +					     ext4_get_block);
>  		if (!err) {
>  			ret = VM_FAULT_SIGBUS;
>  			if (ext4_journal_folio_buffers(handle, folio, len))
> -- 
> 2.33.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access
  2024-08-30  5:37 ` [PATCH 4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access zhangshida
@ 2024-09-02 13:35   ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2024-09-02 13:35 UTC (permalink / raw)
  To: zhangshida
  Cc: tytso, adilger.kernel, jack, ebiggers, linux-ext4, linux-kernel,
	zhangshida, Jan Kara

On Fri 30-08-24 13:37:39, zhangshida wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
> 
> This kinda revert the commit 56d35a4cd13e("ext4: Fix dirtying of
> journalled buffers in data=journal mode") made by Jan 14 years ago,
> since the do_get_write_access() itself can deal with the extra
> unexpected buf dirting things in a proper way now.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 18 +-----------------
>  1 file changed, 1 insertion(+), 17 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index a28f279fd02f..2687bf451a25 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1023,27 +1023,11 @@ static int ext4_dirty_journalled_data(handle_t *handle, struct buffer_head *bh)
>  int do_journal_get_write_access(handle_t *handle, struct inode *inode,
>  				struct buffer_head *bh)
>  {
> -	int dirty = buffer_dirty(bh);
> -	int ret;
> -
>  	if (!buffer_mapped(bh) || buffer_freed(bh))
>  		return 0;
> -	/*
> -	 * ext4_block_write_begin() could have dirtied some buffers. Clean
> -	 * the dirty bit as jbd2_journal_get_write_access() could complain
> -	 * otherwise about fs integrity issues. Setting of the dirty bit
> -	 * by ext4_block_write_begin() isn't a real problem here as we clear
> -	 * the bit before releasing a page lock and thus writeback cannot
> -	 * ever write the buffer.
> -	 */
> -	if (dirty)
> -		clear_buffer_dirty(bh);
>  	BUFFER_TRACE(bh, "get write access");
> -	ret = ext4_journal_get_write_access(handle, inode->i_sb, bh,
> +	return ext4_journal_get_write_access(handle, inode->i_sb, bh,
>  					    EXT4_JTR_NONE);
> -	if (!ret && dirty)
> -		ret = ext4_dirty_journalled_data(handle, bh);
> -	return ret;
>  }
>  
>  int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> -- 
> 2.33.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer
  2024-08-30  5:37 [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer zhangshida
                   ` (3 preceding siblings ...)
  2024-08-30  5:37 ` [PATCH 4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access zhangshida
@ 2024-09-05 14:53 ` Theodore Ts'o
  4 siblings, 0 replies; 8+ messages in thread
From: Theodore Ts'o @ 2024-09-05 14:53 UTC (permalink / raw)
  To: adilger.kernel, jack, ebiggers, zhangshida
  Cc: Theodore Ts'o, linux-ext4, linux-kernel, zhangshida


On Fri, 30 Aug 2024 13:37:35 +0800, zhangshida wrote:
> On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
> an assertion failure will occasionally be triggered by the line below:
> ---------
> jbd2_journal_commit_transaction
> {
> ...
> J_ASSERT_BH(bh, !buffer_dirty(bh));
> /*
> * The buffer on BJ_Forget list and not jbddirty means
> ...
> }
> ---------
> 
> [...]

Applied, thanks!

[1/4] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers
      commit: 3910b513fcdf33030798755c722174ef4a827d2a
[2/4] ext4: hoist ext4_block_write_begin and replace the __block_write_begin
      commit: 6b730a405037501a260d6efd24782d2737e65d07
[3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer
      commit: cb3de5fc876ee9ef2b830c9e6cdafac5c90903ef
[4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access
      commit: 183aa1d3baea18b199505455a0247b13de826e2f

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-05 14:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30  5:37 [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer zhangshida
2024-08-30  5:37 ` [PATCH 1/4] ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers zhangshida
2024-08-30  5:37 ` [PATCH 2/4] ext4: hoist ext4_block_write_begin and replace the __block_write_begin zhangshida
2024-08-30  5:37 ` [PATCH 3/4] ext4: fix a potential assertion failure due to improperly dirtied buffer zhangshida
2024-09-02 13:34   ` Jan Kara
2024-08-30  5:37 ` [PATCH 4/4] ext4: remove the special buffer dirty handling in do_journal_get_write_access zhangshida
2024-09-02 13:35   ` Jan Kara
2024-09-05 14:53 ` [PATCH v6 0/4] Fix an error caused by improperly dirtied buffer Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox