* [PATCH 0/3] btrfs: fsstress hang fix for large data folios
@ 2025-04-04 1:47 Qu Wenruo
2025-04-04 1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04 1:47 UTC (permalink / raw)
To: linux-btrfs
With my local large data folios enabled btrfs, fsstress can hit a hang
where lock_delalloc_folios() fails to lock a folio.
It turns our that filemap_get_folios_contig() can return duplicated
folios, which is fatal for lock_delalloc_folios() and
unlock_delalloc_folio().
The series here is to address the annonying filemap_get_folios_contig()
behavior for large folios.
The first patch is a small cleanup to remove unnecessary early exits,
exposed during the next folio_contains() change.
The second patch uses folio_contains() to handle EOF detection for large
folios.
The final one is the fix of filemap_get_folios_contig(), by getting rid
of it.
Qu Wenruo (3):
btrfs: remove unnecessary early exits in delalloc folio lock and
unlock
btrfs: use folio_contains() for EOF detection
btrfs: get rid of filemap_get_folios_contig() calls
fs/btrfs/compression.c | 2 +-
fs/btrfs/extent_io.c | 20 +++++---------------
fs/btrfs/tests/extent-io-tests.c | 3 +--
3 files changed, 7 insertions(+), 18 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock 2025-04-04 1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo @ 2025-04-04 1:47 ` Qu Wenruo 2025-04-04 16:04 ` Filipe Manana 2025-04-04 1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo 2025-04-04 1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo 2 siblings, 1 reply; 16+ messages in thread From: Qu Wenruo @ 2025-04-04 1:47 UTC (permalink / raw) To: linux-btrfs Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we have the following early exist: if (index == locked_folio->index && end_index == index) return; This allows us to exist early if the range are inside the same locked folio. But even without the above early check, the existing code can handle it well, as both __process_folios_contig() and lock_delalloc_folios() will skip any folio page lock/unlock if it's on the locked folio. Just remove those unnecessary early exits. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8b29eeac3884..013268f70621 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode, const struct folio *locked_folio, u64 start, u64 end) { - unsigned long index = start >> PAGE_SHIFT; - unsigned long end_index = end >> PAGE_SHIFT; - ASSERT(locked_folio); - if (index == locked_folio->index && end_index == index) - return; __process_folios_contig(inode->i_mapping, locked_folio, start, end, PAGE_UNLOCK); @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode, u64 processed_end = start; struct folio_batch fbatch; - if (index == locked_folio->index && index == end_index) - return 0; - folio_batch_init(&fbatch); while (index <= end_index) { unsigned int found_folios, i; -- 2.49.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock 2025-04-04 1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo @ 2025-04-04 16:04 ` Filipe Manana 2025-04-04 21:44 ` Qu Wenruo 0 siblings, 1 reply; 16+ messages in thread From: Filipe Manana @ 2025-04-04 16:04 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: > > Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we function -> functions > have the following early exist: exist -> exit > > if (index == locked_folio->index && end_index == index) > return; > > This allows us to exist early if the range are inside the same locked exist -> exit the range are inside -> the range is inside > folio. > > But even without the above early check, the existing code can handle it > well, as both __process_folios_contig() and lock_delalloc_folios() will > skip any folio page lock/unlock if it's on the locked folio. > > Just remove those unnecessary early exits. It looks good to me from a functional point of view. But without this early exits there's a bit of work done, from function calls to initializing and releasing folio batches, calling filemap_get_folios_contig() which will search the mapping's xarray and always grab one folio and add it to the batch, etc. It's not uncommon to do IO on a range not spanning more than one folio, especially when supporting large folios, which will be more likely. I understand the goal here is to remove some code, but this code is cheaper compared to all that unnecessary overhead. Have you considered that? Thanks. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 8 -------- > 1 file changed, 8 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 8b29eeac3884..013268f70621 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode, > const struct folio *locked_folio, > u64 start, u64 end) > { > - unsigned long index = start >> PAGE_SHIFT; > - unsigned long end_index = end >> PAGE_SHIFT; > - > ASSERT(locked_folio); > - if (index == locked_folio->index && end_index == index) > - return; > > __process_folios_contig(inode->i_mapping, locked_folio, start, end, > PAGE_UNLOCK); > @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode, > u64 processed_end = start; > struct folio_batch fbatch; > > - if (index == locked_folio->index && index == end_index) > - return 0; > - > folio_batch_init(&fbatch); > while (index <= end_index) { > unsigned int found_folios, i; > -- > 2.49.0 > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock 2025-04-04 16:04 ` Filipe Manana @ 2025-04-04 21:44 ` Qu Wenruo 2025-04-05 17:54 ` Filipe Manana 0 siblings, 1 reply; 16+ messages in thread From: Qu Wenruo @ 2025-04-04 21:44 UTC (permalink / raw) To: Filipe Manana; +Cc: linux-btrfs 在 2025/4/5 02:34, Filipe Manana 写道: > On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: >> >> Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we > > function -> functions > >> have the following early exist: > > exist -> exit > >> >> if (index == locked_folio->index && end_index == index) >> return; >> >> This allows us to exist early if the range are inside the same locked > > exist -> exit > the range are inside -> the range is inside > >> folio. >> >> But even without the above early check, the existing code can handle it >> well, as both __process_folios_contig() and lock_delalloc_folios() will >> skip any folio page lock/unlock if it's on the locked folio. >> >> Just remove those unnecessary early exits. > > It looks good to me from a functional point of view. > > But without this early exits there's a bit of work done, from function > calls to initializing and releasing folio batches, calling > filemap_get_folios_contig() which > will search the mapping's xarray and always grab one folio and add it > to the batch, etc. > > It's not uncommon to do IO on a range not spanning more than one > folio, especially when supporting large folios, which will be more > likely. > I understand the goal here is to remove some code, but this code is > cheaper compared to all that unnecessary overhead. > > Have you considered that? Yes, but the main reason here is the usage of (folio->index == index) check. With large folios, such check is no longer reliable anyway, and initially I thought to just change it to folio_contains(), but it turns out that is not really needed either. So that's the main reason to get rid this of these early exits. Thanks, Qu > > Thanks. > >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/extent_io.c | 8 -------- >> 1 file changed, 8 deletions(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index 8b29eeac3884..013268f70621 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode, >> const struct folio *locked_folio, >> u64 start, u64 end) >> { >> - unsigned long index = start >> PAGE_SHIFT; >> - unsigned long end_index = end >> PAGE_SHIFT; >> - >> ASSERT(locked_folio); >> - if (index == locked_folio->index && end_index == index) >> - return; >> >> __process_folios_contig(inode->i_mapping, locked_folio, start, end, >> PAGE_UNLOCK); >> @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode, >> u64 processed_end = start; >> struct folio_batch fbatch; >> >> - if (index == locked_folio->index && index == end_index) >> - return 0; >> - >> folio_batch_init(&fbatch); >> while (index <= end_index) { >> unsigned int found_folios, i; >> -- >> 2.49.0 >> >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock 2025-04-04 21:44 ` Qu Wenruo @ 2025-04-05 17:54 ` Filipe Manana 0 siblings, 0 replies; 16+ messages in thread From: Filipe Manana @ 2025-04-05 17:54 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Apr 4, 2025 at 10:44 PM Qu Wenruo <wqu@suse.com> wrote: > > > > 在 2025/4/5 02:34, Filipe Manana 写道: > > On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: > >> > >> Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we > > > > function -> functions > > > >> have the following early exist: > > > > exist -> exit > > > >> > >> if (index == locked_folio->index && end_index == index) > >> return; > >> > >> This allows us to exist early if the range are inside the same locked > > > > exist -> exit > > the range are inside -> the range is inside > > > >> folio. > >> > >> But even without the above early check, the existing code can handle it > >> well, as both __process_folios_contig() and lock_delalloc_folios() will > >> skip any folio page lock/unlock if it's on the locked folio. > >> > >> Just remove those unnecessary early exits. > > > > It looks good to me from a functional point of view. > > > > But without this early exits there's a bit of work done, from function > > calls to initializing and releasing folio batches, calling > > filemap_get_folios_contig() which > > will search the mapping's xarray and always grab one folio and add it > > to the batch, etc. > > > > It's not uncommon to do IO on a range not spanning more than one > > folio, especially when supporting large folios, which will be more > > likely. > > I understand the goal here is to remove some code, but this code is > > cheaper compared to all that unnecessary overhead. > > > > Have you considered that? > > Yes, but the main reason here is the usage of (folio->index == index) check. > > With large folios, such check is no longer reliable anyway, and > initially I thought to just change it to folio_contains(), but it turns > out that is not really needed either. > > So that's the main reason to get rid this of these early exits. Ok. Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > > Thanks, > Qu > > > > > Thanks. > > > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/extent_io.c | 8 -------- > >> 1 file changed, 8 deletions(-) > >> > >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > >> index 8b29eeac3884..013268f70621 100644 > >> --- a/fs/btrfs/extent_io.c > >> +++ b/fs/btrfs/extent_io.c > >> @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode, > >> const struct folio *locked_folio, > >> u64 start, u64 end) > >> { > >> - unsigned long index = start >> PAGE_SHIFT; > >> - unsigned long end_index = end >> PAGE_SHIFT; > >> - > >> ASSERT(locked_folio); > >> - if (index == locked_folio->index && end_index == index) > >> - return; > >> > >> __process_folios_contig(inode->i_mapping, locked_folio, start, end, > >> PAGE_UNLOCK); > >> @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode, > >> u64 processed_end = start; > >> struct folio_batch fbatch; > >> > >> - if (index == locked_folio->index && index == end_index) > >> - return 0; > >> - > >> folio_batch_init(&fbatch); > >> while (index <= end_index) { > >> unsigned int found_folios, i; > >> -- > >> 2.49.0 > >> > >> > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/3] btrfs: use folio_contains() for EOF detection 2025-04-04 1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo 2025-04-04 1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo @ 2025-04-04 1:47 ` Qu Wenruo 2025-04-04 16:09 ` Filipe Manana 2025-04-07 18:39 ` David Sterba 2025-04-04 1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo 2 siblings, 2 replies; 16+ messages in thread From: Qu Wenruo @ 2025-04-04 1:47 UTC (permalink / raw) To: linux-btrfs Currently we use the following pattern to detect if the folio contains the end of a file: if (folio->index == end_index) folio_zero_range(); But that only works if the folio is page sized. For the following case, it will not work and leave the range beyond EOF uninitialized: The page size is 4K, and the fs block size is also 4K. 16K 20K 24K | | | | | EOF at 22K And we have a large folio sized 8K at file offset 16K. In that case, the old "folio->index == end_index" will not work, thus we the range [22K, 24K) will not be zeroed out. Fix the following call sites which use the above pattern: - add_ra_bio_pages() - extent_writepage() Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/compression.c | 2 +- fs/btrfs/extent_io.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index cb954f9bc332..7aa63681f92a 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, free_extent_map(em); unlock_extent(tree, cur, page_end, NULL); - if (folio->index == end_index) { + if (folio_contains(folio, end_index)) { size_t zero_offset = offset_in_folio(folio, isize); if (zero_offset) { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 013268f70621..f0d51f6ed951 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping, } static noinline void unlock_delalloc_folio(const struct inode *inode, - const struct folio *locked_folio, + struct folio *locked_folio, u64 start, u64 end) { ASSERT(locked_folio); @@ -231,7 +231,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode, } static noinline int lock_delalloc_folios(struct inode *inode, - const struct folio *locked_folio, + struct folio *locked_folio, u64 start, u64 end) { struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); @@ -1711,7 +1711,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl return 0; } - if (folio->index == end_index) + if (folio_contains(folio, end_index)) folio_zero_range(folio, pg_offset, folio_size(folio) - pg_offset); /* -- 2.49.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection 2025-04-04 1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo @ 2025-04-04 16:09 ` Filipe Manana 2025-04-07 18:39 ` David Sterba 1 sibling, 0 replies; 16+ messages in thread From: Filipe Manana @ 2025-04-04 16:09 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: > > Currently we use the following pattern to detect if the folio contains > the end of a file: > > if (folio->index == end_index) > folio_zero_range(); > > But that only works if the folio is page sized. > > For the following case, it will not work and leave the range beyond EOF > uninitialized: > > The page size is 4K, and the fs block size is also 4K. > > 16K 20K 24K > | | | | > | > EOF at 22K > > And we have a large folio sized 8K at file offset 16K. > > In that case, the old "folio->index == end_index" will not work, thus > we the range [22K, 24K) will not be zeroed out. thus we the range -> thus the range > > Fix the following call sites which use the above pattern: > > - add_ra_bio_pages() > > - extent_writepage() > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Otherwise it looks good, thanks. > --- > fs/btrfs/compression.c | 2 +- > fs/btrfs/extent_io.c | 6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index cb954f9bc332..7aa63681f92a 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, > free_extent_map(em); > unlock_extent(tree, cur, page_end, NULL); > > - if (folio->index == end_index) { > + if (folio_contains(folio, end_index)) { > size_t zero_offset = offset_in_folio(folio, isize); > > if (zero_offset) { > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 013268f70621..f0d51f6ed951 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping, > } > > static noinline void unlock_delalloc_folio(const struct inode *inode, > - const struct folio *locked_folio, > + struct folio *locked_folio, > u64 start, u64 end) > { > ASSERT(locked_folio); > @@ -231,7 +231,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode, > } > > static noinline int lock_delalloc_folios(struct inode *inode, > - const struct folio *locked_folio, > + struct folio *locked_folio, > u64 start, u64 end) > { > struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); > @@ -1711,7 +1711,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl > return 0; > } > > - if (folio->index == end_index) > + if (folio_contains(folio, end_index)) > folio_zero_range(folio, pg_offset, folio_size(folio) - pg_offset); > > /* > -- > 2.49.0 > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection 2025-04-04 1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo 2025-04-04 16:09 ` Filipe Manana @ 2025-04-07 18:39 ` David Sterba 2025-04-07 21:58 ` Qu Wenruo 1 sibling, 1 reply; 16+ messages in thread From: David Sterba @ 2025-04-07 18:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote: > Currently we use the following pattern to detect if the folio contains > the end of a file: > > if (folio->index == end_index) > folio_zero_range(); > > But that only works if the folio is page sized. > > For the following case, it will not work and leave the range beyond EOF > uninitialized: > > The page size is 4K, and the fs block size is also 4K. > > 16K 20K 24K > | | | | > | > EOF at 22K > > And we have a large folio sized 8K at file offset 16K. > > In that case, the old "folio->index == end_index" will not work, thus > we the range [22K, 24K) will not be zeroed out. > > Fix the following call sites which use the above pattern: > > - add_ra_bio_pages() > > - extent_writepage() > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/compression.c | 2 +- > fs/btrfs/extent_io.c | 6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index cb954f9bc332..7aa63681f92a 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, > free_extent_map(em); > unlock_extent(tree, cur, page_end, NULL); > > - if (folio->index == end_index) { > + if (folio_contains(folio, end_index)) { > size_t zero_offset = offset_in_folio(folio, isize); > > if (zero_offset) { > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 013268f70621..f0d51f6ed951 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping, > } > > static noinline void unlock_delalloc_folio(const struct inode *inode, > - const struct folio *locked_folio, > + struct folio *locked_folio, I'm not happy to see removing const from the parameters as it's quite tedious to find them. Here it's not necessary as it's still not changing the folio, only required because folio API is not const-clean, folio_contains() in particular. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection 2025-04-07 18:39 ` David Sterba @ 2025-04-07 21:58 ` Qu Wenruo 2025-04-08 23:12 ` David Sterba 0 siblings, 1 reply; 16+ messages in thread From: Qu Wenruo @ 2025-04-07 21:58 UTC (permalink / raw) To: dsterba, Qu Wenruo; +Cc: linux-btrfs 在 2025/4/8 04:09, David Sterba 写道: > On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote: >> Currently we use the following pattern to detect if the folio contains >> the end of a file: >> >> if (folio->index == end_index) >> folio_zero_range(); >> >> But that only works if the folio is page sized. >> >> For the following case, it will not work and leave the range beyond EOF >> uninitialized: >> >> The page size is 4K, and the fs block size is also 4K. >> >> 16K 20K 24K >> | | | | >> | >> EOF at 22K >> >> And we have a large folio sized 8K at file offset 16K. >> >> In that case, the old "folio->index == end_index" will not work, thus >> we the range [22K, 24K) will not be zeroed out. >> >> Fix the following call sites which use the above pattern: >> >> - add_ra_bio_pages() >> >> - extent_writepage() >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/compression.c | 2 +- >> fs/btrfs/extent_io.c | 6 +++--- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c >> index cb954f9bc332..7aa63681f92a 100644 >> --- a/fs/btrfs/compression.c >> +++ b/fs/btrfs/compression.c >> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, >> free_extent_map(em); >> unlock_extent(tree, cur, page_end, NULL); >> >> - if (folio->index == end_index) { >> + if (folio_contains(folio, end_index)) { >> size_t zero_offset = offset_in_folio(folio, isize); >> >> if (zero_offset) { >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index 013268f70621..f0d51f6ed951 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping, >> } >> >> static noinline void unlock_delalloc_folio(const struct inode *inode, >> - const struct folio *locked_folio, >> + struct folio *locked_folio, > > I'm not happy to see removing const from the parameters as it's quite > tedious to find them. Here it's not necessary as it's still not changing > the folio, only required because folio API is not const-clean, > folio_contains() in particular. > Yes, I'm not happy with that either, and I'm planning to constify the parameters for those helpers in a dedicated series. Thanks, Qu ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection 2025-04-07 21:58 ` Qu Wenruo @ 2025-04-08 23:12 ` David Sterba 2025-04-08 23:16 ` Qu Wenruo 0 siblings, 1 reply; 16+ messages in thread From: David Sterba @ 2025-04-08 23:12 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs On Tue, Apr 08, 2025 at 07:28:58AM +0930, Qu Wenruo wrote: > 在 2025/4/8 04:09, David Sterba 写道: > > On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote: > >> Currently we use the following pattern to detect if the folio contains > >> the end of a file: > >> > >> if (folio->index == end_index) > >> folio_zero_range(); > >> > >> But that only works if the folio is page sized. > >> > >> For the following case, it will not work and leave the range beyond EOF > >> uninitialized: > >> > >> The page size is 4K, and the fs block size is also 4K. > >> > >> 16K 20K 24K > >> | | | | > >> | > >> EOF at 22K > >> > >> And we have a large folio sized 8K at file offset 16K. > >> > >> In that case, the old "folio->index == end_index" will not work, thus > >> we the range [22K, 24K) will not be zeroed out. > >> > >> Fix the following call sites which use the above pattern: > >> > >> - add_ra_bio_pages() > >> > >> - extent_writepage() > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/compression.c | 2 +- > >> fs/btrfs/extent_io.c | 6 +++--- > >> 2 files changed, 4 insertions(+), 4 deletions(-) > >> > >> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > >> index cb954f9bc332..7aa63681f92a 100644 > >> --- a/fs/btrfs/compression.c > >> +++ b/fs/btrfs/compression.c > >> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, > >> free_extent_map(em); > >> unlock_extent(tree, cur, page_end, NULL); > >> > >> - if (folio->index == end_index) { > >> + if (folio_contains(folio, end_index)) { > >> size_t zero_offset = offset_in_folio(folio, isize); > >> > >> if (zero_offset) { > >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > >> index 013268f70621..f0d51f6ed951 100644 > >> --- a/fs/btrfs/extent_io.c > >> +++ b/fs/btrfs/extent_io.c > >> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping, > >> } > >> > >> static noinline void unlock_delalloc_folio(const struct inode *inode, > >> - const struct folio *locked_folio, > >> + struct folio *locked_folio, > > > > I'm not happy to see removing const from the parameters as it's quite > > tedious to find them. Here it's not necessary as it's still not changing > > the folio, only required because folio API is not const-clean, > > folio_contains() in particular. > > > > Yes, I'm not happy with that either, and I'm planning to constify the > parameters for those helpers in a dedicated series. Thanks, but don't let it distract you from the more important folio changes. I noticed there are missing consts in the page API too, like page_offset() or the folio/page boundary like folio_pos() or folio_pgoff(). This is can wait, my comment was more like a note to self to have a look later. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection 2025-04-08 23:12 ` David Sterba @ 2025-04-08 23:16 ` Qu Wenruo 0 siblings, 0 replies; 16+ messages in thread From: Qu Wenruo @ 2025-04-08 23:16 UTC (permalink / raw) To: dsterba, Qu Wenruo; +Cc: linux-btrfs 在 2025/4/9 08:42, David Sterba 写道: > On Tue, Apr 08, 2025 at 07:28:58AM +0930, Qu Wenruo wrote: >> 在 2025/4/8 04:09, David Sterba 写道: >>> On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote: >>>> Currently we use the following pattern to detect if the folio contains >>>> the end of a file: >>>> >>>> if (folio->index == end_index) >>>> folio_zero_range(); >>>> >>>> But that only works if the folio is page sized. >>>> >>>> For the following case, it will not work and leave the range beyond EOF >>>> uninitialized: >>>> >>>> The page size is 4K, and the fs block size is also 4K. >>>> >>>> 16K 20K 24K >>>> | | | | >>>> | >>>> EOF at 22K >>>> >>>> And we have a large folio sized 8K at file offset 16K. >>>> >>>> In that case, the old "folio->index == end_index" will not work, thus >>>> we the range [22K, 24K) will not be zeroed out. >>>> >>>> Fix the following call sites which use the above pattern: >>>> >>>> - add_ra_bio_pages() >>>> >>>> - extent_writepage() >>>> >>>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>>> --- >>>> fs/btrfs/compression.c | 2 +- >>>> fs/btrfs/extent_io.c | 6 +++--- >>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c >>>> index cb954f9bc332..7aa63681f92a 100644 >>>> --- a/fs/btrfs/compression.c >>>> +++ b/fs/btrfs/compression.c >>>> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, >>>> free_extent_map(em); >>>> unlock_extent(tree, cur, page_end, NULL); >>>> >>>> - if (folio->index == end_index) { >>>> + if (folio_contains(folio, end_index)) { >>>> size_t zero_offset = offset_in_folio(folio, isize); >>>> >>>> if (zero_offset) { >>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >>>> index 013268f70621..f0d51f6ed951 100644 >>>> --- a/fs/btrfs/extent_io.c >>>> +++ b/fs/btrfs/extent_io.c >>>> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping, >>>> } >>>> >>>> static noinline void unlock_delalloc_folio(const struct inode *inode, >>>> - const struct folio *locked_folio, >>>> + struct folio *locked_folio, >>> >>> I'm not happy to see removing const from the parameters as it's quite >>> tedious to find them. Here it's not necessary as it's still not changing >>> the folio, only required because folio API is not const-clean, >>> folio_contains() in particular. >>> >> >> Yes, I'm not happy with that either, and I'm planning to constify the >> parameters for those helpers in a dedicated series. > > Thanks, but don't let it distract you from the more important folio > changes. I noticed there are missing consts in the page API too, like > page_offset() or the folio/page boundary like folio_pos() or > folio_pgoff(). This is can wait, my comment was more like a note to self > to have a look later. No worry, as the large folios work is done (at least on x86_64, no new regression for the whole fstests run). Now I only need to wait for the remaining 4 patches before the final enablement. Sure there are still something left for large folios, e.g. data reloc inode is still using page sized folios, but that is not urgent either. So the constification of the MM interfaces can be a good low hanging fruit here. Thanks, Qu ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls 2025-04-04 1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo 2025-04-04 1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo 2025-04-04 1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo @ 2025-04-04 1:47 ` Qu Wenruo 2025-04-04 16:38 ` Filipe Manana 2025-04-22 22:47 ` Qu Wenruo 2 siblings, 2 replies; 16+ messages in thread From: Qu Wenruo @ 2025-04-04 1:47 UTC (permalink / raw) To: linux-btrfs With large folios, filemap_get_folios_contig() can return duplicated folios, for example: 704K 768K 1M |<-- 64K sized folio --->|<------- 256K sized folio ----->| | | 764K 800K If we call lock_delalloc_folios() with range [762K, 800K) on above layout with locked folio at 704K, we will hit the following sequence: 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) As this is a folio boundary. The returned folio will be folio at 704K. Since the folio is already locked, we will not lock the folio. 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) All the entries are the same folio at 768K. 3. We lock folio 768K for the slot 0 of the fbatch 4. We lock folio 768K for the slot 1 of the fbatch We deadlock, as we have already locked the same folio at step 3. The filemap_get_folios_contig() behavior is definitely not ideal, but on the other hand, we do not really need the filemap_get_folios_contig() call either. The current filemap_get_folios() is already good enough, and we require no strong contiguous requirement either, we only need the returned folios contiguous at file map level (aka, their folio file offsets are contiguous). So get rid of the cursed filemap_get_folios_contig() and use regular filemap_get_folios() instead, this will fix the above deadlock for large folios. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 6 ++---- fs/btrfs/tests/extent-io-tests.c | 3 +-- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f0d51f6ed951..986bda2eff1c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, while (index <= end_index) { int found_folios; - found_folios = filemap_get_folios_contig(mapping, &index, - end_index, &fbatch); + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); for (i = 0; i < found_folios; i++) { struct folio *folio = fbatch.folios[i]; @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, while (index <= end_index) { unsigned int found_folios, i; - found_folios = filemap_get_folios_contig(mapping, &index, - end_index, &fbatch); + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); if (found_folios == 0) goto out; diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index 74aca7180a5a..e762eca8a99f 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, folio_batch_init(&fbatch); while (index <= end_index) { - ret = filemap_get_folios_contig(inode->i_mapping, &index, - end_index, &fbatch); + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); for (i = 0; i < ret; i++) { struct folio *folio = fbatch.folios[i]; -- 2.49.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls 2025-04-04 1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo @ 2025-04-04 16:38 ` Filipe Manana 2025-04-04 21:51 ` Qu Wenruo 2025-04-22 22:47 ` Qu Wenruo 1 sibling, 1 reply; 16+ messages in thread From: Filipe Manana @ 2025-04-04 16:38 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: > > With large folios, filemap_get_folios_contig() can return duplicated > folios, for example: > > 704K 768K 1M > |<-- 64K sized folio --->|<------- 256K sized folio ----->| > | | > 764K 800K > > If we call lock_delalloc_folios() with range [762K, 800K) on above > layout with locked folio at 704K, we will hit the following sequence: > > 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) > As this is a folio boundary. > > The returned folio will be folio at 704K. > > Since the folio is already locked, we will not lock the folio. > > 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) > All the entries are the same folio at 768K. > > 3. We lock folio 768K for the slot 0 of the fbatch > > 4. We lock folio 768K for the slot 1 of the fbatch > We deadlock, as we have already locked the same folio at step 3. > > The filemap_get_folios_contig() behavior is definitely not ideal, but on > the other hand, we do not really need the filemap_get_folios_contig() > call either. > > The current filemap_get_folios() is already good enough, and we require > no strong contiguous requirement either, we only need the returned folios > contiguous at file map level (aka, their folio file offsets are contiguous). This paragraph is confusing. This says filemap_get_folios() provides contiguous results in the sense that the file offset of each folio is greater than the previous ones (the folios in the batch are ordered by file offsets). But then what is the kind of contiguous results that filemap_get_folios_contig() provides? How different is it? Is it that the batch doesn't get "holes" - i.e. a folio's start always matches the end of the previous one +1? > > So get rid of the cursed filemap_get_folios_contig() and use regular > filemap_get_folios() instead, this will fix the above deadlock for large > folios. I think it's worth adding in this changelog that it is known that filemap_get_folios_contig() has a bug and there's a pending fix for it, adding links to the thread you started and the respective fix: https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/ https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/ Anyway, it seems good, so with those small changes: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 6 ++---- > fs/btrfs/tests/extent-io-tests.c | 3 +-- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index f0d51f6ed951..986bda2eff1c 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, > while (index <= end_index) { > int found_folios; > > - found_folios = filemap_get_folios_contig(mapping, &index, > - end_index, &fbatch); > + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); > for (i = 0; i < found_folios; i++) { > struct folio *folio = fbatch.folios[i]; > > @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, > while (index <= end_index) { > unsigned int found_folios, i; > > - found_folios = filemap_get_folios_contig(mapping, &index, > - end_index, &fbatch); > + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); > if (found_folios == 0) > goto out; > > diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c > index 74aca7180a5a..e762eca8a99f 100644 > --- a/fs/btrfs/tests/extent-io-tests.c > +++ b/fs/btrfs/tests/extent-io-tests.c > @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, > folio_batch_init(&fbatch); > > while (index <= end_index) { > - ret = filemap_get_folios_contig(inode->i_mapping, &index, > - end_index, &fbatch); > + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); > for (i = 0; i < ret; i++) { > struct folio *folio = fbatch.folios[i]; > > -- > 2.49.0 > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls 2025-04-04 16:38 ` Filipe Manana @ 2025-04-04 21:51 ` Qu Wenruo 2025-04-04 22:00 ` Qu Wenruo 0 siblings, 1 reply; 16+ messages in thread From: Qu Wenruo @ 2025-04-04 21:51 UTC (permalink / raw) To: Filipe Manana; +Cc: linux-btrfs 在 2025/4/5 03:08, Filipe Manana 写道: > On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: >> >> With large folios, filemap_get_folios_contig() can return duplicated >> folios, for example: >> >> 704K 768K 1M >> |<-- 64K sized folio --->|<------- 256K sized folio ----->| >> | | >> 764K 800K >> >> If we call lock_delalloc_folios() with range [762K, 800K) on above >> layout with locked folio at 704K, we will hit the following sequence: >> >> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) >> As this is a folio boundary. >> >> The returned folio will be folio at 704K. >> >> Since the folio is already locked, we will not lock the folio. >> >> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) >> All the entries are the same folio at 768K. >> >> 3. We lock folio 768K for the slot 0 of the fbatch >> >> 4. We lock folio 768K for the slot 1 of the fbatch >> We deadlock, as we have already locked the same folio at step 3. >> >> The filemap_get_folios_contig() behavior is definitely not ideal, but on >> the other hand, we do not really need the filemap_get_folios_contig() >> call either. >> >> The current filemap_get_folios() is already good enough, and we require >> no strong contiguous requirement either, we only need the returned folios >> contiguous at file map level (aka, their folio file offsets are contiguous). > > This paragraph is confusing. > This says filemap_get_folios() provides contiguous results in the > sense that the file offset of each folio is greater than the previous > ones (the folios in the batch are ordered by file offsets). > But then what is the kind of contiguous results that > filemap_get_folios_contig() provides? How different is it? Is it that > the batch doesn't get "holes" - i.e. a folio's start always matches > the end of the previous one +1? From my understanding, filemap_get_folios_contig() returns physically contiguous pages/folios. And the hole handling is always the caller's responsibility, no matter if it's the _contig() version or not. Thus for filemap_get_folios_contig(), and the above two large folios cases, as long as the two large folios are not physically contiguous, the call returns the first folio, then the next call it returns the next folio. Not returning both in one go, and this behavior is not that useful to us either. And talking about holes, due to the _contig() behavior itself, if we hit a hole the function will definitely return, but between different calls caller should check the folio's position between calls, to make sure no holes is between two filemap_get_folios_contig() calls. Of course, no caller is really doing that, thus another reason why we do not need the filemap_get_folios_contig() call. Thanks, Qu> >> >> So get rid of the cursed filemap_get_folios_contig() and use regular >> filemap_get_folios() instead, this will fix the above deadlock for large >> folios. > > I think it's worth adding in this changelog that it is known that > filemap_get_folios_contig() has a bug and there's a pending fix for > it, adding links to the thread you started and the respective fix: > > https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/ > https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/ > > Anyway, it seems good, so with those small changes: > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > > Thanks. > >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/extent_io.c | 6 ++---- >> fs/btrfs/tests/extent-io-tests.c | 3 +-- >> 2 files changed, 3 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index f0d51f6ed951..986bda2eff1c 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, >> while (index <= end_index) { >> int found_folios; >> >> - found_folios = filemap_get_folios_contig(mapping, &index, >> - end_index, &fbatch); >> + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); >> for (i = 0; i < found_folios; i++) { >> struct folio *folio = fbatch.folios[i]; >> >> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, >> while (index <= end_index) { >> unsigned int found_folios, i; >> >> - found_folios = filemap_get_folios_contig(mapping, &index, >> - end_index, &fbatch); >> + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); >> if (found_folios == 0) >> goto out; >> >> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c >> index 74aca7180a5a..e762eca8a99f 100644 >> --- a/fs/btrfs/tests/extent-io-tests.c >> +++ b/fs/btrfs/tests/extent-io-tests.c >> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, >> folio_batch_init(&fbatch); >> >> while (index <= end_index) { >> - ret = filemap_get_folios_contig(inode->i_mapping, &index, >> - end_index, &fbatch); >> + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); >> for (i = 0; i < ret; i++) { >> struct folio *folio = fbatch.folios[i]; >> >> -- >> 2.49.0 >> >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls 2025-04-04 21:51 ` Qu Wenruo @ 2025-04-04 22:00 ` Qu Wenruo 0 siblings, 0 replies; 16+ messages in thread From: Qu Wenruo @ 2025-04-04 22:00 UTC (permalink / raw) To: Filipe Manana; +Cc: linux-btrfs 在 2025/4/5 08:21, Qu Wenruo 写道: > > > 在 2025/4/5 03:08, Filipe Manana 写道: >> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: >>> >>> With large folios, filemap_get_folios_contig() can return duplicated >>> folios, for example: >>> >>> 704K 768K 1M >>> |<-- 64K sized folio --->|<------- 256K sized folio ----->| >>> | | >>> 764K 800K >>> >>> If we call lock_delalloc_folios() with range [762K, 800K) on above >>> layout with locked folio at 704K, we will hit the following sequence: >>> >>> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) >>> As this is a folio boundary. >>> >>> The returned folio will be folio at 704K. >>> >>> Since the folio is already locked, we will not lock the folio. >>> >>> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) >>> All the entries are the same folio at 768K. >>> >>> 3. We lock folio 768K for the slot 0 of the fbatch >>> >>> 4. We lock folio 768K for the slot 1 of the fbatch >>> We deadlock, as we have already locked the same folio at step 3. >>> >>> The filemap_get_folios_contig() behavior is definitely not ideal, but on >>> the other hand, we do not really need the filemap_get_folios_contig() >>> call either. >>> >>> The current filemap_get_folios() is already good enough, and we require >>> no strong contiguous requirement either, we only need the returned >>> folios >>> contiguous at file map level (aka, their folio file offsets are >>> contiguous). >> >> This paragraph is confusing. >> This says filemap_get_folios() provides contiguous results in the >> sense that the file offset of each folio is greater than the previous >> ones (the folios in the batch are ordered by file offsets). >> But then what is the kind of contiguous results that >> filemap_get_folios_contig() provides? How different is it? Is it that >> the batch doesn't get "holes" - i.e. a folio's start always matches >> the end of the previous one +1? > > From my understanding, filemap_get_folios_contig() returns physically > contiguous pages/folios. > > And the hole handling is always the caller's responsibility, no matter > if it's the _contig() version or not. > > Thus for filemap_get_folios_contig(), and the above two large folios > cases, as long as the two large folios are not physically contiguous, > the call returns the first folio, then the next call it returns the next > folio. > > Not returning both in one go, and this behavior is not that useful to us > either. My bad, I should double check the code before commenting on this. It turns out it's not really splitting the result based on physical addresses, but purely returning to the end of a large folio. Which is another thing we do not really want in btrfs' code. So the contiguous part really seems to be the following points: - There should always be a folio at the start index Or it will return 0 immediately - No holes in the returned fbatch Although it can still return the result early, due to large folios. Anyway I'll add these to the next version of the commit message, to explain the behavior in details and why it's safe we go the regular filemap_get_folios() call. Thanks, Qu > > > And talking about holes, due to the _contig() behavior itself, if we hit > a hole the function will definitely return, but between different calls > caller should check the folio's position between calls, to make sure no > holes is between two filemap_get_folios_contig() calls. > > Of course, no caller is really doing that, thus another reason why we do > not need the filemap_get_folios_contig() call. > > Thanks, > Qu> >>> >>> So get rid of the cursed filemap_get_folios_contig() and use regular >>> filemap_get_folios() instead, this will fix the above deadlock for large >>> folios. >> >> I think it's worth adding in this changelog that it is known that >> filemap_get_folios_contig() has a bug and there's a pending fix for >> it, adding links to the thread you started and the respective fix: >> >> https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035- >> b829-72cfb5eb6fc6@gmx.com/ >> https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/ >> >> Anyway, it seems good, so with those small changes: >> >> Reviewed-by: Filipe Manana <fdmanana@suse.com> >> >> Thanks. >> >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> --- >>> fs/btrfs/extent_io.c | 6 ++---- >>> fs/btrfs/tests/extent-io-tests.c | 3 +-- >>> 2 files changed, 3 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >>> index f0d51f6ed951..986bda2eff1c 100644 >>> --- a/fs/btrfs/extent_io.c >>> +++ b/fs/btrfs/extent_io.c >>> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct >>> address_space *mapping, >>> while (index <= end_index) { >>> int found_folios; >>> >>> - found_folios = filemap_get_folios_contig(mapping, >>> &index, >>> - end_index, &fbatch); >>> + found_folios = filemap_get_folios(mapping, &index, >>> end_index, &fbatch); >>> for (i = 0; i < found_folios; i++) { >>> struct folio *folio = fbatch.folios[i]; >>> >>> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct >>> inode *inode, >>> while (index <= end_index) { >>> unsigned int found_folios, i; >>> >>> - found_folios = filemap_get_folios_contig(mapping, >>> &index, >>> - end_index, &fbatch); >>> + found_folios = filemap_get_folios(mapping, &index, >>> end_index, &fbatch); >>> if (found_folios == 0) >>> goto out; >>> >>> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/ >>> extent-io-tests.c >>> index 74aca7180a5a..e762eca8a99f 100644 >>> --- a/fs/btrfs/tests/extent-io-tests.c >>> +++ b/fs/btrfs/tests/extent-io-tests.c >>> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode >>> *inode, u64 start, u64 end, >>> folio_batch_init(&fbatch); >>> >>> while (index <= end_index) { >>> - ret = filemap_get_folios_contig(inode->i_mapping, >>> &index, >>> - end_index, &fbatch); >>> + ret = filemap_get_folios(inode->i_mapping, &index, >>> end_index, &fbatch); >>> for (i = 0; i < ret; i++) { >>> struct folio *folio = fbatch.folios[i]; >>> >>> -- >>> 2.49.0 >>> >>> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls 2025-04-04 1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo 2025-04-04 16:38 ` Filipe Manana @ 2025-04-22 22:47 ` Qu Wenruo 1 sibling, 0 replies; 16+ messages in thread From: Qu Wenruo @ 2025-04-22 22:47 UTC (permalink / raw) To: linux-btrfs 在 2025/4/4 12:17, Qu Wenruo 写道: > With large folios, filemap_get_folios_contig() can return duplicated > folios, for example: > > 704K 768K 1M > |<-- 64K sized folio --->|<------- 256K sized folio ----->| > | | > 764K 800K > > If we call lock_delalloc_folios() with range [762K, 800K) on above > layout with locked folio at 704K, we will hit the following sequence: > > 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) > As this is a folio boundary. > > The returned folio will be folio at 704K. > > Since the folio is already locked, we will not lock the folio. > > 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) > All the entries are the same folio at 768K. > > 3. We lock folio 768K for the slot 0 of the fbatch > > 4. We lock folio 768K for the slot 1 of the fbatch > We deadlock, as we have already locked the same folio at step 3. > > The filemap_get_folios_contig() behavior is definitely not ideal, but on > the other hand, we do not really need the filemap_get_folios_contig() > call either. > > The current filemap_get_folios() is already good enough, and we require > no strong contiguous requirement either, we only need the returned folios > contiguous at file map level (aka, their folio file offsets are contiguous). > > So get rid of the cursed filemap_get_folios_contig() and use regular > filemap_get_folios() instead, this will fix the above deadlock for large > folios. > > Signed-off-by: Qu Wenruo <wqu@suse.com> This is causing ASSERT() triggered in generic/524. And thankfully the fix in filemap_get_folios_contig() has already landed upstream, we can safely drop this one. I'll remove it from the for-next branch. Thanks, Qu > --- > fs/btrfs/extent_io.c | 6 ++---- > fs/btrfs/tests/extent-io-tests.c | 3 +-- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index f0d51f6ed951..986bda2eff1c 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, > while (index <= end_index) { > int found_folios; > > - found_folios = filemap_get_folios_contig(mapping, &index, > - end_index, &fbatch); > + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); > for (i = 0; i < found_folios; i++) { > struct folio *folio = fbatch.folios[i]; > > @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, > while (index <= end_index) { > unsigned int found_folios, i; > > - found_folios = filemap_get_folios_contig(mapping, &index, > - end_index, &fbatch); > + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); > if (found_folios == 0) > goto out; > > diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c > index 74aca7180a5a..e762eca8a99f 100644 > --- a/fs/btrfs/tests/extent-io-tests.c > +++ b/fs/btrfs/tests/extent-io-tests.c > @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, > folio_batch_init(&fbatch); > > while (index <= end_index) { > - ret = filemap_get_folios_contig(inode->i_mapping, &index, > - end_index, &fbatch); > + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); > for (i = 0; i < ret; i++) { > struct folio *folio = fbatch.folios[i]; > ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-04-22 22:47 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-04 1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo 2025-04-04 1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo 2025-04-04 16:04 ` Filipe Manana 2025-04-04 21:44 ` Qu Wenruo 2025-04-05 17:54 ` Filipe Manana 2025-04-04 1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo 2025-04-04 16:09 ` Filipe Manana 2025-04-07 18:39 ` David Sterba 2025-04-07 21:58 ` Qu Wenruo 2025-04-08 23:12 ` David Sterba 2025-04-08 23:16 ` Qu Wenruo 2025-04-04 1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo 2025-04-04 16:38 ` Filipe Manana 2025-04-04 21:51 ` Qu Wenruo 2025-04-04 22:00 ` Qu Wenruo 2025-04-22 22:47 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox