[PATCH] btrfs: don't call end_extent_writepage() in __extent

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
@ 2021-08-03  5:53 Qu Wenruo
  2021-09-08 17:03 ` Boris Burkov
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2021-08-03  5:53 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
When running generic/475 with 64K page size and 4K sectorsize (aka
subpage), it can trigger the following BUG_ON() inside
btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:

	bio_for_each_segment(bvec, bio, iter) {
		if (!contig)
			offset = page_offset(bvec.bv_page) + bvec.bv_offset;

		if (!ordered) {
			ordered = btrfs_lookup_ordered_extent(inode, offset);
			BUG_ON(!ordered); /* Logic error */ <<<<
		}

		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,

[CAUSE]
Test case generic/475 uses dm-errors to emulate IO failure.

Here if we have a page cache which has the following delalloc range:

	0		32K		64K
	|/////|		|////|		|
	\- [0, 4K)	\- [32K, 36K)

And then __extent_writepage() can go through the following race:

	T1 (writeback)		|	T2 (endio)
--------------------------------+----------------------------------
__extent_writepage()		|
|- writepage_delalloc()		|
|  |- run_delalloc_range()	|
|  |  Add OE for [0, 4K)	|
|  |- run_delalloc_range()	|
|     Add OE for [32K, 36K)	|
|				|
|- __extent_writepage_io()	|
|  |- submit_extent_page()	|
|  |  |- Assemble the bio for	|
|  |     range [0, 4K)		|
|  |- submit_extent_page()	|
|  |  |- Submit the bio for	|
|  |  |  range [0, 4K)		|
|  |  |				| end_bio_extent_writepage()
|  |  |				| |- error = -EIO;
|  |  |				| |- end_extent_writepage( error=-EIO);
|  |  |				|    |- writepage_endio_finish_ordered()
|  |  |				|    |  Remove OE for range [0, 4K)
|  |  |				|    |- btrfs_page_set_error()
|  |- submit_extent_page()	|
|     |- Assemble the bio for	|
|        range [32K, 36K)	|
|- if (PageError(page))		|
|- end_extent_writepage()	|
   |- endio_finish_ordered()	|
      Remove OE [32K, 36K)	|
				|
Submit bio for [32K, 36K)	|
|- btrfs_csum_one_bio()		|
   |- BUG_ON(!ordered_extent)	|
      OE [32K, 36K) is already  |
      removed.			|

This can only happen for subpage case, as for regular sectorsize, we
never submit current page, thus IO error will never mark the current
page Error.

[FIX]
Just remove the end_extent_writepage() call and the if (PageError())
check.

As mentioned, the end_extent_writepage() never really get executed for
regular sectorsize, and could cause above BUG_ON() for subpage.

This also means, inside __extent_writepage() we should not bother any IO
failure, but only focus on the error hit during bio assembly and
submission.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e665779c046d..a1a6ac787faf 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 	 * Here we used to have a check for PageError() and then set @ret and
 	 * call end_extent_writepage().
 	 *
-	 * But in fact setting @ret here will cause different error paths
-	 * between subpage and regular sectorsize.
+	 * But in fact setting @ret and call end_extent_writepage() here will
+	 * cause different error paths between subpage and regular sectorsize.
 	 *
 	 * For regular page size, we never submit current page, but only add
 	 * current page to current bio.
@@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 	 * thus can get PageError() set by submitted bio of the same page,
 	 * while our @ret is still 0.
 	 *
-	 * So here we unify the behavior and don't set @ret.
+	 * The same is also for end_extent_writepage(), which can finish
+	 * ordered extent before submitting the real bio, causing
+	 * BUG_ON() in btrfs_csum_one_bio().
+	 *
+	 * So here we unify the behavior and don't set @ret nor call
+	 * end_extent_writepage().
 	 * Error can still be properly passed to higher layer as page will
 	 * be set error, here we just don't handle the IO failure.
 	 *
@@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 	 * Currently the full page based __extent_writepage_io() is not
 	 * capable of that.
 	 */
-	if (PageError(page))
-		end_extent_writepage(page, ret, start, page_end);
+
 	unlock_page(page);
 	ASSERT(ret <= 0);
 	return ret;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
  2021-08-03  5:53 [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed Qu Wenruo
@ 2021-09-08 17:03 ` Boris Burkov
  2021-09-08 22:36   ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Boris Burkov @ 2021-09-08 17:03 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 03, 2021 at 01:53:48PM +0800, Qu Wenruo wrote:
> [BUG]
> When running generic/475 with 64K page size and 4K sectorsize (aka
> subpage), it can trigger the following BUG_ON() inside
> btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:
> 
> 	bio_for_each_segment(bvec, bio, iter) {
> 		if (!contig)
> 			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
> 
> 		if (!ordered) {
> 			ordered = btrfs_lookup_ordered_extent(inode, offset);
> 			BUG_ON(!ordered); /* Logic error */ <<<<
> 		}
> 
> 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
> 
> [CAUSE]
> Test case generic/475 uses dm-errors to emulate IO failure.
> 
> Here if we have a page cache which has the following delalloc range:
> 
> 	0		32K		64K
> 	|/////|		|////|		|
> 	\- [0, 4K)	\- [32K, 36K)
> 
> And then __extent_writepage() can go through the following race:
> 
> 	T1 (writeback)		|	T2 (endio)
> --------------------------------+----------------------------------
> __extent_writepage()		|
> |- writepage_delalloc()		|
> |  |- run_delalloc_range()	|
> |  |  Add OE for [0, 4K)	|
> |  |- run_delalloc_range()	|
> |     Add OE for [32K, 36K)	|
> |				|
> |- __extent_writepage_io()	|
> |  |- submit_extent_page()	|
> |  |  |- Assemble the bio for	|
> |  |     range [0, 4K)		|
> |  |- submit_extent_page()	|
> |  |  |- Submit the bio for	|
> |  |  |  range [0, 4K)		|
> |  |  |				| end_bio_extent_writepage()
> |  |  |				| |- error = -EIO;
> |  |  |				| |- end_extent_writepage( error=-EIO);
> |  |  |				|    |- writepage_endio_finish_ordered()
> |  |  |				|    |  Remove OE for range [0, 4K)
> |  |  |				|    |- btrfs_page_set_error()
> |  |- submit_extent_page()	|
> |     |- Assemble the bio for	|
> |        range [32K, 36K)	|
> |- if (PageError(page))		|
> |- end_extent_writepage()	|
>    |- endio_finish_ordered()	|
>       Remove OE [32K, 36K)	|
> 				|
> Submit bio for [32K, 36K)	|
> |- btrfs_csum_one_bio()		|
>    |- BUG_ON(!ordered_extent)	|
>       OE [32K, 36K) is already  |
>       removed.			|
> 
> This can only happen for subpage case, as for regular sectorsize, we
> never submit current page, thus IO error will never mark the current
> page Error.
> 
> [FIX]
> Just remove the end_extent_writepage() call and the if (PageError())
> check.
> 
> As mentioned, the end_extent_writepage() never really get executed for
> regular sectorsize, and could cause above BUG_ON() for subpage.

I was a little surprised to see this assertion, because it begs the
question: "why was this call added in the first place?"

As best as I can tell, it was introduced by Filipe in
"Btrfs: fix hang on error (such as ENOSPC) when writing extent pages"

That looks like a reasonably niche case that might not be covered by
xfstests, so I was wondering if you had already convinced yourself that
it no longer applies.

I'll try to see if I can reproduce his issue with this patch, or if the
code has changed by enough that it no longer reproduces.

> 
> This also means, inside __extent_writepage() we should not bother any IO
> failure, but only focus on the error hit during bio assembly and
> submission.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e665779c046d..a1a6ac787faf 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>  	 * Here we used to have a check for PageError() and then set @ret and
>  	 * call end_extent_writepage().
>  	 *
> -	 * But in fact setting @ret here will cause different error paths
> -	 * between subpage and regular sectorsize.
> +	 * But in fact setting @ret and call end_extent_writepage() here will
> +	 * cause different error paths between subpage and regular sectorsize.
>  	 *
>  	 * For regular page size, we never submit current page, but only add
>  	 * current page to current bio.
> @@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>  	 * thus can get PageError() set by submitted bio of the same page,
>  	 * while our @ret is still 0.
>  	 *
> -	 * So here we unify the behavior and don't set @ret.
> +	 * The same is also for end_extent_writepage(), which can finish
> +	 * ordered extent before submitting the real bio, causing
> +	 * BUG_ON() in btrfs_csum_one_bio().
> +	 *
> +	 * So here we unify the behavior and don't set @ret nor call
> +	 * end_extent_writepage().
>  	 * Error can still be properly passed to higher layer as page will
>  	 * be set error, here we just don't handle the IO failure.
>  	 *
> @@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>  	 * Currently the full page based __extent_writepage_io() is not
>  	 * capable of that.
>  	 */
> -	if (PageError(page))
> -		end_extent_writepage(page, ret, start, page_end);
> +
>  	unlock_page(page);
>  	ASSERT(ret <= 0);
>  	return ret;
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
  2021-09-08 17:03 ` Boris Burkov
@ 2021-09-08 22:36   ` Qu Wenruo
  0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-09-08 22:36 UTC (permalink / raw)
  To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs



On 2021/9/9 上午1:03, Boris Burkov wrote:
> On Tue, Aug 03, 2021 at 01:53:48PM +0800, Qu Wenruo wrote:
>> [BUG]
>> When running generic/475 with 64K page size and 4K sectorsize (aka
>> subpage), it can trigger the following BUG_ON() inside
>> btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:
>>
>> 	bio_for_each_segment(bvec, bio, iter) {
>> 		if (!contig)
>> 			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
>>
>> 		if (!ordered) {
>> 			ordered = btrfs_lookup_ordered_extent(inode, offset);
>> 			BUG_ON(!ordered); /* Logic error */ <<<<
>> 		}
>>
>> 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
>>
>> [CAUSE]
>> Test case generic/475 uses dm-errors to emulate IO failure.
>>
>> Here if we have a page cache which has the following delalloc range:
>>
>> 	0		32K		64K
>> 	|/////|		|////|		|
>> 	\- [0, 4K)	\- [32K, 36K)
>>
>> And then __extent_writepage() can go through the following race:
>>
>> 	T1 (writeback)		|	T2 (endio)
>> --------------------------------+----------------------------------
>> __extent_writepage()		|
>> |- writepage_delalloc()		|
>> |  |- run_delalloc_range()	|
>> |  |  Add OE for [0, 4K)	|
>> |  |- run_delalloc_range()	|
>> |     Add OE for [32K, 36K)	|
>> |				|
>> |- __extent_writepage_io()	|
>> |  |- submit_extent_page()	|
>> |  |  |- Assemble the bio for	|
>> |  |     range [0, 4K)		|
>> |  |- submit_extent_page()	|
>> |  |  |- Submit the bio for	|
>> |  |  |  range [0, 4K)		|
>> |  |  |				| end_bio_extent_writepage()
>> |  |  |				| |- error = -EIO;
>> |  |  |				| |- end_extent_writepage( error=-EIO);
>> |  |  |				|    |- writepage_endio_finish_ordered()
>> |  |  |				|    |  Remove OE for range [0, 4K)
>> |  |  |				|    |- btrfs_page_set_error()
>> |  |- submit_extent_page()	|
>> |     |- Assemble the bio for	|
>> |        range [32K, 36K)	|
>> |- if (PageError(page))		|
>> |- end_extent_writepage()	|
>>     |- endio_finish_ordered()	|
>>        Remove OE [32K, 36K)	|
>> 				|
>> Submit bio for [32K, 36K)	|
>> |- btrfs_csum_one_bio()		|
>>     |- BUG_ON(!ordered_extent)	|
>>        OE [32K, 36K) is already  |
>>        removed.			|
>>
>> This can only happen for subpage case, as for regular sectorsize, we
>> never submit current page, thus IO error will never mark the current
>> page Error.
>>
>> [FIX]
>> Just remove the end_extent_writepage() call and the if (PageError())
>> check.
>>
>> As mentioned, the end_extent_writepage() never really get executed for
>> regular sectorsize, and could cause above BUG_ON() for subpage.
>
> I was a little surprised to see this assertion, because it begs the
> question: "why was this call added in the first place?"
>
> As best as I can tell, it was introduced by Filipe in
> "Btrfs: fix hang on error (such as ENOSPC) when writing extent pages"
>
> That looks like a reasonably niche case that might not be covered by
> xfstests, so I was wondering if you had already convinced yourself that
> it no longer applies.

Not that niche, since the commit message provides a reproducer.

>
> I'll try to see if I can reproduce his issue with this patch, or if the
> code has changed by enough that it no longer reproduces.

There are a lot of more code change since 2014, one of the core change
is 524272607e88 ("btrfs: Handle delalloc error correctly to avoid
ordered extent hang"), which adds proper error handling in
run_delalloc_range().

Feel free to add if you find more commits enhancing the error handling path.

But for now, from the original reproducer and the existing ENOSPC test
groups, I don't think there is anything extra you need to worry.

Thanks,
Qu

>
>>
>> This also means, inside __extent_writepage() we should not bother any IO
>> failure, but only focus on the error hit during bio assembly and
>> submission.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 14 +++++++++-----
>>   1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index e665779c046d..a1a6ac787faf 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>>   	 * Here we used to have a check for PageError() and then set @ret and
>>   	 * call end_extent_writepage().
>>   	 *
>> -	 * But in fact setting @ret here will cause different error paths
>> -	 * between subpage and regular sectorsize.
>> +	 * But in fact setting @ret and call end_extent_writepage() here will
>> +	 * cause different error paths between subpage and regular sectorsize.
>>   	 *
>>   	 * For regular page size, we never submit current page, but only add
>>   	 * current page to current bio.
>> @@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>>   	 * thus can get PageError() set by submitted bio of the same page,
>>   	 * while our @ret is still 0.
>>   	 *
>> -	 * So here we unify the behavior and don't set @ret.
>> +	 * The same is also for end_extent_writepage(), which can finish
>> +	 * ordered extent before submitting the real bio, causing
>> +	 * BUG_ON() in btrfs_csum_one_bio().
>> +	 *
>> +	 * So here we unify the behavior and don't set @ret nor call
>> +	 * end_extent_writepage().
>>   	 * Error can still be properly passed to higher layer as page will
>>   	 * be set error, here we just don't handle the IO failure.
>>   	 *
>> @@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>>   	 * Currently the full page based __extent_writepage_io() is not
>>   	 * capable of that.
>>   	 */
>> -	if (PageError(page))
>> -		end_extent_writepage(page, ret, start, page_end);
>> +
>>   	unlock_page(page);
>>   	ASSERT(ret <= 0);
>>   	return ret;
>> --
>> 2.32.0
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-08 22:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-03  5:53 [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed Qu Wenruo
2021-09-08 17:03 ` Boris Burkov
2021-09-08 22:36   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).