* [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed
@ 2021-08-03 5:53 Qu Wenruo
2021-09-08 17:03 ` Boris Burkov
0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2021-08-03 5:53 UTC (permalink / raw)
To: linux-btrfs
[BUG]
When running generic/475 with 64K page size and 4K sectorsize (aka
subpage), it can trigger the following BUG_ON() inside
btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5:
bio_for_each_segment(bvec, bio, iter) {
if (!contig)
offset = page_offset(bvec.bv_page) + bvec.bv_offset;
if (!ordered) {
ordered = btrfs_lookup_ordered_extent(inode, offset);
BUG_ON(!ordered); /* Logic error */ <<<<
}
nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info,
[CAUSE]
Test case generic/475 uses dm-errors to emulate IO failure.
Here if we have a page cache which has the following delalloc range:
0 32K 64K
|/////| |////| |
\- [0, 4K) \- [32K, 36K)
And then __extent_writepage() can go through the following race:
T1 (writeback) | T2 (endio)
--------------------------------+----------------------------------
__extent_writepage() |
|- writepage_delalloc() |
| |- run_delalloc_range() |
| | Add OE for [0, 4K) |
| |- run_delalloc_range() |
| Add OE for [32K, 36K) |
| |
|- __extent_writepage_io() |
| |- submit_extent_page() |
| | |- Assemble the bio for |
| | range [0, 4K) |
| |- submit_extent_page() |
| | |- Submit the bio for |
| | | range [0, 4K) |
| | | | end_bio_extent_writepage()
| | | | |- error = -EIO;
| | | | |- end_extent_writepage( error=-EIO);
| | | | |- writepage_endio_finish_ordered()
| | | | | Remove OE for range [0, 4K)
| | | | |- btrfs_page_set_error()
| |- submit_extent_page() |
| |- Assemble the bio for |
| range [32K, 36K) |
|- if (PageError(page)) |
|- end_extent_writepage() |
|- endio_finish_ordered() |
Remove OE [32K, 36K) |
|
Submit bio for [32K, 36K) |
|- btrfs_csum_one_bio() |
|- BUG_ON(!ordered_extent) |
OE [32K, 36K) is already |
removed. |
This can only happen for subpage case, as for regular sectorsize, we
never submit current page, thus IO error will never mark the current
page Error.
[FIX]
Just remove the end_extent_writepage() call and the if (PageError())
check.
As mentioned, the end_extent_writepage() never really get executed for
regular sectorsize, and could cause above BUG_ON() for subpage.
This also means, inside __extent_writepage() we should not bother any IO
failure, but only focus on the error hit during bio assembly and
submission.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e665779c046d..a1a6ac787faf 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
* Here we used to have a check for PageError() and then set @ret and
* call end_extent_writepage().
*
- * But in fact setting @ret here will cause different error paths
- * between subpage and regular sectorsize.
+ * But in fact setting @ret and call end_extent_writepage() here will
+ * cause different error paths between subpage and regular sectorsize.
*
* For regular page size, we never submit current page, but only add
* current page to current bio.
@@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
* thus can get PageError() set by submitted bio of the same page,
* while our @ret is still 0.
*
- * So here we unify the behavior and don't set @ret.
+ * The same is also for end_extent_writepage(), which can finish
+ * ordered extent before submitting the real bio, causing
+ * BUG_ON() in btrfs_csum_one_bio().
+ *
+ * So here we unify the behavior and don't set @ret nor call
+ * end_extent_writepage().
* Error can still be properly passed to higher layer as page will
* be set error, here we just don't handle the IO failure.
*
@@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
* Currently the full page based __extent_writepage_io() is not
* capable of that.
*/
- if (PageError(page))
- end_extent_writepage(page, ret, start, page_end);
+
unlock_page(page);
ASSERT(ret <= 0);
return ret;
--
2.32.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed 2021-08-03 5:53 [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed Qu Wenruo @ 2021-09-08 17:03 ` Boris Burkov 2021-09-08 22:36 ` Qu Wenruo 0 siblings, 1 reply; 3+ messages in thread From: Boris Burkov @ 2021-09-08 17:03 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Tue, Aug 03, 2021 at 01:53:48PM +0800, Qu Wenruo wrote: > [BUG] > When running generic/475 with 64K page size and 4K sectorsize (aka > subpage), it can trigger the following BUG_ON() inside > btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5: > > bio_for_each_segment(bvec, bio, iter) { > if (!contig) > offset = page_offset(bvec.bv_page) + bvec.bv_offset; > > if (!ordered) { > ordered = btrfs_lookup_ordered_extent(inode, offset); > BUG_ON(!ordered); /* Logic error */ <<<< > } > > nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, > > [CAUSE] > Test case generic/475 uses dm-errors to emulate IO failure. > > Here if we have a page cache which has the following delalloc range: > > 0 32K 64K > |/////| |////| | > \- [0, 4K) \- [32K, 36K) > > And then __extent_writepage() can go through the following race: > > T1 (writeback) | T2 (endio) > --------------------------------+---------------------------------- > __extent_writepage() | > |- writepage_delalloc() | > | |- run_delalloc_range() | > | | Add OE for [0, 4K) | > | |- run_delalloc_range() | > | Add OE for [32K, 36K) | > | | > |- __extent_writepage_io() | > | |- submit_extent_page() | > | | |- Assemble the bio for | > | | range [0, 4K) | > | |- submit_extent_page() | > | | |- Submit the bio for | > | | | range [0, 4K) | > | | | | end_bio_extent_writepage() > | | | | |- error = -EIO; > | | | | |- end_extent_writepage( error=-EIO); > | | | | |- writepage_endio_finish_ordered() > | | | | | Remove OE for range [0, 4K) > | | | | |- btrfs_page_set_error() > | |- submit_extent_page() | > | |- Assemble the bio for | > | range [32K, 36K) | > |- if (PageError(page)) | > |- end_extent_writepage() | > |- endio_finish_ordered() | > Remove OE [32K, 36K) | > | > Submit bio for [32K, 36K) | > |- btrfs_csum_one_bio() | > |- BUG_ON(!ordered_extent) | > OE [32K, 36K) is already | > removed. | > > This can only happen for subpage case, as for regular sectorsize, we > never submit current page, thus IO error will never mark the current > page Error. > > [FIX] > Just remove the end_extent_writepage() call and the if (PageError()) > check. > > As mentioned, the end_extent_writepage() never really get executed for > regular sectorsize, and could cause above BUG_ON() for subpage. I was a little surprised to see this assertion, because it begs the question: "why was this call added in the first place?" As best as I can tell, it was introduced by Filipe in "Btrfs: fix hang on error (such as ENOSPC) when writing extent pages" That looks like a reasonably niche case that might not be covered by xfstests, so I was wondering if you had already convinced yourself that it no longer applies. I'll try to see if I can reproduce his issue with this patch, or if the code has changed by enough that it no longer reproduces. > > This also means, inside __extent_writepage() we should not bother any IO > failure, but only focus on the error hit during bio assembly and > submission. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index e665779c046d..a1a6ac787faf 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, > * Here we used to have a check for PageError() and then set @ret and > * call end_extent_writepage(). > * > - * But in fact setting @ret here will cause different error paths > - * between subpage and regular sectorsize. > + * But in fact setting @ret and call end_extent_writepage() here will > + * cause different error paths between subpage and regular sectorsize. > * > * For regular page size, we never submit current page, but only add > * current page to current bio. > @@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, > * thus can get PageError() set by submitted bio of the same page, > * while our @ret is still 0. > * > - * So here we unify the behavior and don't set @ret. > + * The same is also for end_extent_writepage(), which can finish > + * ordered extent before submitting the real bio, causing > + * BUG_ON() in btrfs_csum_one_bio(). > + * > + * So here we unify the behavior and don't set @ret nor call > + * end_extent_writepage(). > * Error can still be properly passed to higher layer as page will > * be set error, here we just don't handle the IO failure. > * > @@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, > * Currently the full page based __extent_writepage_io() is not > * capable of that. > */ > - if (PageError(page)) > - end_extent_writepage(page, ret, start, page_end); > + > unlock_page(page); > ASSERT(ret <= 0); > return ret; > -- > 2.32.0 > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed 2021-09-08 17:03 ` Boris Burkov @ 2021-09-08 22:36 ` Qu Wenruo 0 siblings, 0 replies; 3+ messages in thread From: Qu Wenruo @ 2021-09-08 22:36 UTC (permalink / raw) To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs On 2021/9/9 上午1:03, Boris Burkov wrote: > On Tue, Aug 03, 2021 at 01:53:48PM +0800, Qu Wenruo wrote: >> [BUG] >> When running generic/475 with 64K page size and 4K sectorsize (aka >> subpage), it can trigger the following BUG_ON() inside >> btrfs_csum_one_bio(), the possibility is around 1/20 ~ 1/5: >> >> bio_for_each_segment(bvec, bio, iter) { >> if (!contig) >> offset = page_offset(bvec.bv_page) + bvec.bv_offset; >> >> if (!ordered) { >> ordered = btrfs_lookup_ordered_extent(inode, offset); >> BUG_ON(!ordered); /* Logic error */ <<<< >> } >> >> nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, >> >> [CAUSE] >> Test case generic/475 uses dm-errors to emulate IO failure. >> >> Here if we have a page cache which has the following delalloc range: >> >> 0 32K 64K >> |/////| |////| | >> \- [0, 4K) \- [32K, 36K) >> >> And then __extent_writepage() can go through the following race: >> >> T1 (writeback) | T2 (endio) >> --------------------------------+---------------------------------- >> __extent_writepage() | >> |- writepage_delalloc() | >> | |- run_delalloc_range() | >> | | Add OE for [0, 4K) | >> | |- run_delalloc_range() | >> | Add OE for [32K, 36K) | >> | | >> |- __extent_writepage_io() | >> | |- submit_extent_page() | >> | | |- Assemble the bio for | >> | | range [0, 4K) | >> | |- submit_extent_page() | >> | | |- Submit the bio for | >> | | | range [0, 4K) | >> | | | | end_bio_extent_writepage() >> | | | | |- error = -EIO; >> | | | | |- end_extent_writepage( error=-EIO); >> | | | | |- writepage_endio_finish_ordered() >> | | | | | Remove OE for range [0, 4K) >> | | | | |- btrfs_page_set_error() >> | |- submit_extent_page() | >> | |- Assemble the bio for | >> | range [32K, 36K) | >> |- if (PageError(page)) | >> |- end_extent_writepage() | >> |- endio_finish_ordered() | >> Remove OE [32K, 36K) | >> | >> Submit bio for [32K, 36K) | >> |- btrfs_csum_one_bio() | >> |- BUG_ON(!ordered_extent) | >> OE [32K, 36K) is already | >> removed. | >> >> This can only happen for subpage case, as for regular sectorsize, we >> never submit current page, thus IO error will never mark the current >> page Error. >> >> [FIX] >> Just remove the end_extent_writepage() call and the if (PageError()) >> check. >> >> As mentioned, the end_extent_writepage() never really get executed for >> regular sectorsize, and could cause above BUG_ON() for subpage. > > I was a little surprised to see this assertion, because it begs the > question: "why was this call added in the first place?" > > As best as I can tell, it was introduced by Filipe in > "Btrfs: fix hang on error (such as ENOSPC) when writing extent pages" > > That looks like a reasonably niche case that might not be covered by > xfstests, so I was wondering if you had already convinced yourself that > it no longer applies. Not that niche, since the commit message provides a reproducer. > > I'll try to see if I can reproduce his issue with this patch, or if the > code has changed by enough that it no longer reproduces. There are a lot of more code change since 2014, one of the core change is 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang"), which adds proper error handling in run_delalloc_range(). Feel free to add if you find more commits enhancing the error handling path. But for now, from the original reproducer and the existing ENOSPC test groups, I don't think there is anything extra you need to worry. Thanks, Qu > >> >> This also means, inside __extent_writepage() we should not bother any IO >> failure, but only focus on the error hit during bio assembly and >> submission. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/extent_io.c | 14 +++++++++----- >> 1 file changed, 9 insertions(+), 5 deletions(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index e665779c046d..a1a6ac787faf 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -4111,8 +4111,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, >> * Here we used to have a check for PageError() and then set @ret and >> * call end_extent_writepage(). >> * >> - * But in fact setting @ret here will cause different error paths >> - * between subpage and regular sectorsize. >> + * But in fact setting @ret and call end_extent_writepage() here will >> + * cause different error paths between subpage and regular sectorsize. >> * >> * For regular page size, we never submit current page, but only add >> * current page to current bio. >> @@ -4124,7 +4124,12 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, >> * thus can get PageError() set by submitted bio of the same page, >> * while our @ret is still 0. >> * >> - * So here we unify the behavior and don't set @ret. >> + * The same is also for end_extent_writepage(), which can finish >> + * ordered extent before submitting the real bio, causing >> + * BUG_ON() in btrfs_csum_one_bio(). >> + * >> + * So here we unify the behavior and don't set @ret nor call >> + * end_extent_writepage(). >> * Error can still be properly passed to higher layer as page will >> * be set error, here we just don't handle the IO failure. >> * >> @@ -4138,8 +4143,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, >> * Currently the full page based __extent_writepage_io() is not >> * capable of that. >> */ >> - if (PageError(page)) >> - end_extent_writepage(page, ret, start, page_end); >> + >> unlock_page(page); >> ASSERT(ret <= 0); >> return ret; >> -- >> 2.32.0 >> ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-09-08 22:36 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-03 5:53 [PATCH] btrfs: don't call end_extent_writepage() in __extent_writepage() when IO failed Qu Wenruo 2021-09-08 17:03 ` Boris Burkov 2021-09-08 22:36 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).