All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: follow regular writeback error handling by clearing block dirty and start/end writeback
@ 2025-07-29  9:31 Qu Wenruo
  2025-07-29  9:31 ` [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed Qu Wenruo
  2025-07-29  9:31 ` [PATCH 2/2] btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed Qu Wenruo
  0 siblings, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2025-07-29  9:31 UTC (permalink / raw)
  To: linux-btrfs

Recently I'm working on various error handling fixes during
run_delalloc_range(), but sometimes I also hit generic/475 kernel
warnings where no failure from run_delalloc_range().

It turns out that extent_writepage_io() is not following the common
writeback error handling pattern, and leave failed blocks dirty.

This can lead to writeback of those failed blocks again, and the next
time we will hit various possible problems, but the most common one will
be the DEBUG_WARN() from btrfs_writepage_cow_fixup(), exactly the bug
I'm hunting.

So this series fixes the two error handling paths which still leaves the
blocks dirty (even intentionally, and of course that's caused by
myself), to clear the dirty flag and start/finish writeback for involved
block(s).

Qu Wenruo (2):
  btrfs: clear block dirty if submit_one_sector() failed
  btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed

 fs/btrfs/extent_io.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed
  2025-07-29  9:31 [PATCH 0/2] btrfs: follow regular writeback error handling by clearing block dirty and start/end writeback Qu Wenruo
@ 2025-07-29  9:31 ` Qu Wenruo
  2025-07-29 20:49   ` Leo Martins
  2025-08-05 11:01   ` Filipe Manana
  2025-07-29  9:31 ` [PATCH 2/2] btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed Qu Wenruo
  1 sibling, 2 replies; 7+ messages in thread
From: Qu Wenruo @ 2025-07-29  9:31 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
If submit_one_sector() failed, the block will be kept dirty, but with
their corresponding range finished in the ordered extent.

This means if later a writeback happens again, we can hit the following
problems:

- ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector()
  If the original extent map is a hole, then we can hit this case, as
  the new ordered extent failed, we will drop the new extent map and
  re-read one from the disk.

- DEBUG_WARN() in btrfs_writepage_cow_fixup()
  This is because we no longer have an ordered extent for those dirty
  blocks. The original for them is already finished with error.

[CAUSE]
The function submit_one_sector() is not following the regular error
handling of writeback.
The common practice is to clear the folio dirty, start and finish the
writeback for the block.

This is normally done by extent_clear_unlock_delalloc() with
PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during
run_delalloc_range().

So if we keep those failed blocks dirty, they will stay in the page
cache and wait for the next writeback.

And since the original ordered extent is already finished and removed,
depending on the original extent map, we either hit the ASSERT() inside
submit_one_sector(), or hit the DEBUG_WARN() in
btrfs_writepage_cow_fixup().

[FIX]
Follow the regular error handling to clear the dirty flag for the block,
start and finish writeback for that block instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c11c93bcc7f6..f6765ddab4a7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1548,7 +1548,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 
 /*
  * Return 0 if we have submitted or queued the sector for submission.
- * Return <0 for critical errors.
+ * Return <0 for critical errors, and the sector will have its dirty flag cleared.
  *
  * Caller should make sure filepos < i_size and handle filepos >= i_size case.
  */
@@ -1571,8 +1571,19 @@ static int submit_one_sector(struct btrfs_inode *inode,
 	ASSERT(filepos < i_size);
 
 	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
-	if (IS_ERR(em))
-		return PTR_ERR(em);
+	if (IS_ERR(em)) {
+		int ret = PTR_ERR(em);
+
+		/*
+		 * When submission failed, we should still clear the folio dirty.
+		 * Or the folio will be written back again but without any
+		 * ordered extent.
+		 */
+		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
+		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
+		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
+		return ret;
+	}
 
 	extent_offset = filepos - em->start;
 	em_end = btrfs_extent_map_end(em);
@@ -1702,8 +1713,9 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 	 * Here we set writeback and clear for the range. If the full folio
 	 * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
 	 *
-	 * If we hit any error, the corresponding sector will still be dirty
-	 * thus no need to clear PAGECACHE_TAG_DIRTY.
+	 * If we hit any error, the corresponding sector will still have its
+	 * dirty flag cleared and finish writeback just like below, thus
+	 * no need to handle error case either.
 	 */
 	if (!submitted_io && !error) {
 		btrfs_folio_set_writeback(fs_info, folio, start, len);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed
  2025-07-29  9:31 [PATCH 0/2] btrfs: follow regular writeback error handling by clearing block dirty and start/end writeback Qu Wenruo
  2025-07-29  9:31 ` [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed Qu Wenruo
@ 2025-07-29  9:31 ` Qu Wenruo
  2025-08-05 11:11   ` Filipe Manana
  1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2025-07-29  9:31 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
If btrfs_writepage_cow_fixup() failed (return value not 0 nor -EGAIN),
the block will be kept dirty, but with their corresponding range finished
in the ordered extent.

Currently the only error pattern is only possible for experimental
builds, which places extra check to ensure we shouldn't hit an dirty
block without a corresponding ordered extent.

This means if later a writeback happens again, we can hit the following
problems:

- ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector()
  If the original extent map is a hole, then we can hit this case, as
  the new ordered extent failed, we will drop the new extent map and
  re-read one from the disk.

- DEBUG_WARN() in btrfs_writepage_cow_fixup()
  This is because we no longer have an ordered extent for those dirty
  blocks. The original for them is already finished with error.

[CAUSE]
The function btrfs_writepage_cow_fixup() is not following the regular
error handling of writeback.
The common practice is to clear the folio dirty, start and finish the
writeback for the block.

This is normally done by extent_clear_unlock_delalloc() with
PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during
run_delalloc_range().

So if we keep those failed blocks dirty, they will stay in the page
cache and wait for the next writeback.

And since the original ordered extent is already finished and removed,
depending on the original extent map, we either hit the ASSERT() inside
submit_one_sector(), or hit the DEBUG_WARN() in
btrfs_writepage_cow_fixup() again (and very ironic).

[FIX]
Follow the regular error handling to clear the dirty flag for the block
range, start and finish writeback for that block range instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f6765ddab4a7..b2ff2a445b80 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1656,8 +1656,12 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
 		folio_unlock(folio);
 		return 1;
 	}
-	if (ret < 0)
+	if (ret < 0) {
+		btrfs_folio_clear_dirty(fs_info, folio, start, len);
+		btrfs_folio_set_writeback(fs_info, folio, start, len);
+		btrfs_folio_clear_writeback(fs_info, folio, start, len);
 		return ret;
+	}
 
 	for (cur = start; cur < start + len; cur += fs_info->sectorsize)
 		set_bit((cur - folio_start) >> fs_info->sectorsize_bits, &range_bitmap);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed
  2025-07-29  9:31 ` [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed Qu Wenruo
@ 2025-07-29 20:49   ` Leo Martins
  2025-07-29 21:53     ` Qu Wenruo
  2025-08-05 11:01   ` Filipe Manana
  1 sibling, 1 reply; 7+ messages in thread
From: Leo Martins @ 2025-07-29 20:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, 29 Jul 2025 19:01:45 +0930 Qu Wenruo <wqu@suse.com> wrote:

> [BUG]
> If submit_one_sector() failed, the block will be kept dirty, but with
> their corresponding range finished in the ordered extent.
> 
> This means if later a writeback happens again, we can hit the following
> problems:
> 
> - ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector()
>   If the original extent map is a hole, then we can hit this case, as
>   the new ordered extent failed, we will drop the new extent map and
>   re-read one from the disk.
> 
> - DEBUG_WARN() in btrfs_writepage_cow_fixup()
>   This is because we no longer have an ordered extent for those dirty
>   blocks. The original for them is already finished with error.
> 
> [CAUSE]
> The function submit_one_sector() is not following the regular error
> handling of writeback.
> The common practice is to clear the folio dirty, start and finish the
> writeback for the block.
> 
> This is normally done by extent_clear_unlock_delalloc() with
> PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during
> run_delalloc_range().
> 
> So if we keep those failed blocks dirty, they will stay in the page
> cache and wait for the next writeback.
> 
> And since the original ordered extent is already finished and removed,
> depending on the original extent map, we either hit the ASSERT() inside
> submit_one_sector(), or hit the DEBUG_WARN() in
> btrfs_writepage_cow_fixup().
> 
> [FIX]
> Follow the regular error handling to clear the dirty flag for the block,
> start and finish writeback for that block instead.

Might be a dumb question, but if the page failed to finish writeback why
are we allowed to clear the dirty flag? Wouldn't the page still have dirty
data?

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c11c93bcc7f6..f6765ddab4a7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1548,7 +1548,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>  
>  /*
>   * Return 0 if we have submitted or queued the sector for submission.
> - * Return <0 for critical errors.
> + * Return <0 for critical errors, and the sector will have its dirty flag cleared.
>   *
>   * Caller should make sure filepos < i_size and handle filepos >= i_size case.
>   */
> @@ -1571,8 +1571,19 @@ static int submit_one_sector(struct btrfs_inode *inode,
>  	ASSERT(filepos < i_size);
>  
>  	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
> -	if (IS_ERR(em))
> -		return PTR_ERR(em);
> +	if (IS_ERR(em)) {
> +		int ret = PTR_ERR(em);
> +
> +		/*
> +		 * When submission failed, we should still clear the folio dirty.
> +		 * Or the folio will be written back again but without any
> +		 * ordered extent.
> +		 */
> +		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
> +		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
> +		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
> +		return ret;
> +	}
>  
>  	extent_offset = filepos - em->start;
>  	em_end = btrfs_extent_map_end(em);
> @@ -1702,8 +1713,9 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
>  	 * Here we set writeback and clear for the range. If the full folio
>  	 * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
>  	 *
> -	 * If we hit any error, the corresponding sector will still be dirty
> -	 * thus no need to clear PAGECACHE_TAG_DIRTY.
> +	 * If we hit any error, the corresponding sector will still have its
> +	 * dirty flag cleared and finish writeback just like below, thus
> +	 * no need to handle error case either.
>  	 */
>  	if (!submitted_io && !error) {
>  		btrfs_folio_set_writeback(fs_info, folio, start, len);
> -- 
> 2.50.1

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed
  2025-07-29 20:49   ` Leo Martins
@ 2025-07-29 21:53     ` Qu Wenruo
  0 siblings, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2025-07-29 21:53 UTC (permalink / raw)
  To: Leo Martins; +Cc: linux-btrfs



在 2025/7/30 06:19, Leo Martins 写道:
> On Tue, 29 Jul 2025 19:01:45 +0930 Qu Wenruo <wqu@suse.com> wrote:
> 
>> [BUG]
>> If submit_one_sector() failed, the block will be kept dirty, but with
>> their corresponding range finished in the ordered extent.
>>
>> This means if later a writeback happens again, we can hit the following
>> problems:
>>
>> - ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector()
>>    If the original extent map is a hole, then we can hit this case, as
>>    the new ordered extent failed, we will drop the new extent map and
>>    re-read one from the disk.
>>
>> - DEBUG_WARN() in btrfs_writepage_cow_fixup()
>>    This is because we no longer have an ordered extent for those dirty
>>    blocks. The original for them is already finished with error.
>>
>> [CAUSE]
>> The function submit_one_sector() is not following the regular error
>> handling of writeback.
>> The common practice is to clear the folio dirty, start and finish the
>> writeback for the block.
>>
>> This is normally done by extent_clear_unlock_delalloc() with
>> PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during
>> run_delalloc_range().
>>
>> So if we keep those failed blocks dirty, they will stay in the page
>> cache and wait for the next writeback.
>>
>> And since the original ordered extent is already finished and removed,
>> depending on the original extent map, we either hit the ASSERT() inside
>> submit_one_sector(), or hit the DEBUG_WARN() in
>> btrfs_writepage_cow_fixup().
>>
>> [FIX]
>> Follow the regular error handling to clear the dirty flag for the block,
>> start and finish writeback for that block instead.
> 
> Might be a dumb question, but if the page failed to finish writeback why
> are we allowed to clear the dirty flag? Wouldn't the page still have dirty
> data?

The error is recorded into the mapping by mapping_set_error().

There is no better solution than clearing dirty and finishing the 
writeback, and that's also shared between all common fses and iomap.

Thanks,
Qu

> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 22 +++++++++++++++++-----
>>   1 file changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index c11c93bcc7f6..f6765ddab4a7 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -1548,7 +1548,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>>   
>>   /*
>>    * Return 0 if we have submitted or queued the sector for submission.
>> - * Return <0 for critical errors.
>> + * Return <0 for critical errors, and the sector will have its dirty flag cleared.
>>    *
>>    * Caller should make sure filepos < i_size and handle filepos >= i_size case.
>>    */
>> @@ -1571,8 +1571,19 @@ static int submit_one_sector(struct btrfs_inode *inode,
>>   	ASSERT(filepos < i_size);
>>   
>>   	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
>> -	if (IS_ERR(em))
>> -		return PTR_ERR(em);
>> +	if (IS_ERR(em)) {
>> +		int ret = PTR_ERR(em);
>> +
>> +		/*
>> +		 * When submission failed, we should still clear the folio dirty.
>> +		 * Or the folio will be written back again but without any
>> +		 * ordered extent.
>> +		 */
>> +		btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
>> +		btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
>> +		btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
>> +		return ret;
>> +	}
>>   
>>   	extent_offset = filepos - em->start;
>>   	em_end = btrfs_extent_map_end(em);
>> @@ -1702,8 +1713,9 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
>>   	 * Here we set writeback and clear for the range. If the full folio
>>   	 * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
>>   	 *
>> -	 * If we hit any error, the corresponding sector will still be dirty
>> -	 * thus no need to clear PAGECACHE_TAG_DIRTY.
>> +	 * If we hit any error, the corresponding sector will still have its
>> +	 * dirty flag cleared and finish writeback just like below, thus
>> +	 * no need to handle error case either.
>>   	 */
>>   	if (!submitted_io && !error) {
>>   		btrfs_folio_set_writeback(fs_info, folio, start, len);
>> -- 
>> 2.50.1
> 
> Sent using hkml (https://github.com/sjp38/hackermail)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed
  2025-07-29  9:31 ` [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed Qu Wenruo
  2025-07-29 20:49   ` Leo Martins
@ 2025-08-05 11:01   ` Filipe Manana
  1 sibling, 0 replies; 7+ messages in thread
From: Filipe Manana @ 2025-08-05 11:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jul 29, 2025 at 10:32 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> If submit_one_sector() failed, the block will be kept dirty, but with
> their corresponding range finished in the ordered extent.
>
> This means if later a writeback happens again, we can hit the following
> problems:
>
> - ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector()
>   If the original extent map is a hole, then we can hit this case, as
>   the new ordered extent failed, we will drop the new extent map and
>   re-read one from the disk.
>
> - DEBUG_WARN() in btrfs_writepage_cow_fixup()
>   This is because we no longer have an ordered extent for those dirty
>   blocks. The original for them is already finished with error.
>
> [CAUSE]
> The function submit_one_sector() is not following the regular error
> handling of writeback.
> The common practice is to clear the folio dirty, start and finish the
> writeback for the block.
>
> This is normally done by extent_clear_unlock_delalloc() with
> PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during
> run_delalloc_range().
>
> So if we keep those failed blocks dirty, they will stay in the page
> cache and wait for the next writeback.
>
> And since the original ordered extent is already finished and removed,
> depending on the original extent map, we either hit the ASSERT() inside
> submit_one_sector(), or hit the DEBUG_WARN() in
> btrfs_writepage_cow_fixup().
>
> [FIX]
> Follow the regular error handling to clear the dirty flag for the block,
> start and finish writeback for that block instead.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c11c93bcc7f6..f6765ddab4a7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1548,7 +1548,7 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>
>  /*
>   * Return 0 if we have submitted or queued the sector for submission.
> - * Return <0 for critical errors.
> + * Return <0 for critical errors, and the sector will have its dirty flag cleared.
>   *
>   * Caller should make sure filepos < i_size and handle filepos >= i_size case.
>   */
> @@ -1571,8 +1571,19 @@ static int submit_one_sector(struct btrfs_inode *inode,
>         ASSERT(filepos < i_size);
>
>         em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
> -       if (IS_ERR(em))
> -               return PTR_ERR(em);
> +       if (IS_ERR(em)) {
> +               int ret = PTR_ERR(em);
> +
> +               /*
> +                * When submission failed, we should still clear the folio dirty.
> +                * Or the folio will be written back again but without any
> +                * ordered extent.
> +                */
> +               btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
> +               btrfs_folio_set_writeback(fs_info, folio, filepos, sectorsize);
> +               btrfs_folio_clear_writeback(fs_info, folio, filepos, sectorsize);
> +               return ret;

No need for the variable 'ret', it's only used once, we can directly:
return PTR_ERR(em);

> +       }
>
>         extent_offset = filepos - em->start;
>         em_end = btrfs_extent_map_end(em);
> @@ -1702,8 +1713,9 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
>          * Here we set writeback and clear for the range. If the full folio
>          * is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
>          *
> -        * If we hit any error, the corresponding sector will still be dirty
> -        * thus no need to clear PAGECACHE_TAG_DIRTY.
> +        * If we hit any error, the corresponding sector will still have its

"will still have" is odd, it gives the idea the dirty flag was not set
before - it was, but we cleared it when we got an error.

I'd suggest:  "If we hit any error, the corresponding sector will have
its dirty flag cleared and writeback finished ..."

Otherwise it looks good, so:

Reviewed-by: Filipe Manana <fdmanana@suse.com>


> +        * dirty flag cleared and finish writeback just like below, thus
> +        * no need to handle error case either.
>          */
>         if (!submitted_io && !error) {
>                 btrfs_folio_set_writeback(fs_info, folio, start, len);
> --
> 2.50.1
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed
  2025-07-29  9:31 ` [PATCH 2/2] btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed Qu Wenruo
@ 2025-08-05 11:11   ` Filipe Manana
  0 siblings, 0 replies; 7+ messages in thread
From: Filipe Manana @ 2025-08-05 11:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jul 29, 2025 at 10:33 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> If btrfs_writepage_cow_fixup() failed (return value not 0 nor -EGAIN),

EGAIN > EAGAIN

> the block will be kept dirty, but with their corresponding range finished
> in the ordered extent.
>
> Currently the only error pattern is only possible for experimental
> builds, which places extra check to ensure we shouldn't hit an dirty

an dirty -> a dirty

Otherwise it looks good, so:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> block without a corresponding ordered extent.
>
> This means if later a writeback happens again, we can hit the following
> problems:
>
> - ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector()
>   If the original extent map is a hole, then we can hit this case, as
>   the new ordered extent failed, we will drop the new extent map and
>   re-read one from the disk.
>
> - DEBUG_WARN() in btrfs_writepage_cow_fixup()
>   This is because we no longer have an ordered extent for those dirty
>   blocks. The original for them is already finished with error.
>
> [CAUSE]
> The function btrfs_writepage_cow_fixup() is not following the regular
> error handling of writeback.
> The common practice is to clear the folio dirty, start and finish the
> writeback for the block.
>
> This is normally done by extent_clear_unlock_delalloc() with
> PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during
> run_delalloc_range().
>
> So if we keep those failed blocks dirty, they will stay in the page
> cache and wait for the next writeback.
>
> And since the original ordered extent is already finished and removed,
> depending on the original extent map, we either hit the ASSERT() inside
> submit_one_sector(), or hit the DEBUG_WARN() in
> btrfs_writepage_cow_fixup() again (and very ironic).
>
> [FIX]
> Follow the regular error handling to clear the dirty flag for the block
> range, start and finish writeback for that block range instead.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f6765ddab4a7..b2ff2a445b80 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1656,8 +1656,12 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
>                 folio_unlock(folio);
>                 return 1;
>         }
> -       if (ret < 0)
> +       if (ret < 0) {
> +               btrfs_folio_clear_dirty(fs_info, folio, start, len);
> +               btrfs_folio_set_writeback(fs_info, folio, start, len);
> +               btrfs_folio_clear_writeback(fs_info, folio, start, len);
>                 return ret;
> +       }
>
>         for (cur = start; cur < start + len; cur += fs_info->sectorsize)
>                 set_bit((cur - folio_start) >> fs_info->sectorsize_bits, &range_bitmap);
> --
> 2.50.1
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-05 11:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29  9:31 [PATCH 0/2] btrfs: follow regular writeback error handling by clearing block dirty and start/end writeback Qu Wenruo
2025-07-29  9:31 ` [PATCH 1/2] btrfs: clear block dirty if submit_one_sector() failed Qu Wenruo
2025-07-29 20:49   ` Leo Martins
2025-07-29 21:53     ` Qu Wenruo
2025-08-05 11:01   ` Filipe Manana
2025-07-29  9:31 ` [PATCH 2/2] btrfs: clear block dirty if btrfs_writepage_cow_fixup() failed Qu Wenruo
2025-08-05 11:11   ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.