[PATCH 0/3] btrfs: fsstress hang fix for large data folios

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] btrfs: fsstress hang fix for large data folios
@ 2025-04-04  1:47 Qu Wenruo
  2025-04-04  1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04  1:47 UTC (permalink / raw)
  To: linux-btrfs

With my local large data folios enabled btrfs, fsstress can hit a hang
where lock_delalloc_folios() fails to lock a folio.

It turns our that filemap_get_folios_contig() can return duplicated
folios, which is fatal for lock_delalloc_folios() and
unlock_delalloc_folio().

The series here is to address the annonying filemap_get_folios_contig()
behavior for large folios.

The first patch is a small cleanup to remove unnecessary early exits,
exposed during the next folio_contains() change.

The second patch uses folio_contains() to handle EOF detection for large
folios.

The final one is the fix of filemap_get_folios_contig(), by getting rid
of it.

Qu Wenruo (3):
  btrfs: remove unnecessary early exits in delalloc folio lock and
    unlock
  btrfs: use folio_contains() for EOF detection
  btrfs: get rid of filemap_get_folios_contig() calls

 fs/btrfs/compression.c           |  2 +-
 fs/btrfs/extent_io.c             | 20 +++++---------------
 fs/btrfs/tests/extent-io-tests.c |  3 +--
 3 files changed, 7 insertions(+), 18 deletions(-)

-- 
2.49.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock
  2025-04-04  1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo
@ 2025-04-04  1:47 ` Qu Wenruo
  2025-04-04 16:04   ` Filipe Manana
  2025-04-04  1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo
  2025-04-04  1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo
  2 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04  1:47 UTC (permalink / raw)
  To: linux-btrfs

Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we
have the following early exist:

	if (index == locked_folio->index && end_index == index)
		return;

This allows us to exist early if the range are inside the same locked
folio.

But even without the above early check, the existing code can handle it
well, as both __process_folios_contig() and lock_delalloc_folios() will
skip any folio page lock/unlock if it's on the locked folio.

Just remove those unnecessary early exits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8b29eeac3884..013268f70621 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode,
 					   const struct folio *locked_folio,
 					   u64 start, u64 end)
 {
-	unsigned long index = start >> PAGE_SHIFT;
-	unsigned long end_index = end >> PAGE_SHIFT;
-
 	ASSERT(locked_folio);
-	if (index == locked_folio->index && end_index == index)
-		return;
 
 	__process_folios_contig(inode->i_mapping, locked_folio, start, end,
 				PAGE_UNLOCK);
@@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode,
 	u64 processed_end = start;
 	struct folio_batch fbatch;
 
-	if (index == locked_folio->index && index == end_index)
-		return 0;
-
 	folio_batch_init(&fbatch);
 	while (index <= end_index) {
 		unsigned int found_folios, i;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock
  2025-04-04  1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo
@ 2025-04-04 16:04   ` Filipe Manana
  2025-04-04 21:44     ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2025-04-04 16:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we

function -> functions

> have the following early exist:

exist -> exit

>
>         if (index == locked_folio->index && end_index == index)
>                 return;
>
> This allows us to exist early if the range are inside the same locked

exist -> exit
the range are inside -> the range is inside

> folio.
>
> But even without the above early check, the existing code can handle it
> well, as both __process_folios_contig() and lock_delalloc_folios() will
> skip any folio page lock/unlock if it's on the locked folio.
>
> Just remove those unnecessary early exits.

It looks good to me from a functional point of view.

But without this early exits there's a bit of work done, from function
calls to initializing and releasing folio batches, calling
filemap_get_folios_contig()  which
will search the mapping's xarray and always grab one folio and add it
to the batch, etc.

It's not uncommon to do IO on a range not spanning more than one
folio, especially when supporting large folios, which will be more
likely.
I understand the goal here is to remove some code, but this code is
cheaper compared to all that unnecessary overhead.

Have you considered that?

Thanks.

>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 8b29eeac3884..013268f70621 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode,
>                                            const struct folio *locked_folio,
>                                            u64 start, u64 end)
>  {
> -       unsigned long index = start >> PAGE_SHIFT;
> -       unsigned long end_index = end >> PAGE_SHIFT;
> -
>         ASSERT(locked_folio);
> -       if (index == locked_folio->index && end_index == index)
> -               return;
>
>         __process_folios_contig(inode->i_mapping, locked_folio, start, end,
>                                 PAGE_UNLOCK);
> @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>         u64 processed_end = start;
>         struct folio_batch fbatch;
>
> -       if (index == locked_folio->index && index == end_index)
> -               return 0;
> -
>         folio_batch_init(&fbatch);
>         while (index <= end_index) {
>                 unsigned int found_folios, i;
> --
> 2.49.0
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock
  2025-04-04 16:04   ` Filipe Manana
@ 2025-04-04 21:44     ` Qu Wenruo
  2025-04-05 17:54       ` Filipe Manana
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04 21:44 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



在 2025/4/5 02:34, Filipe Manana 写道:
> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we
> 
> function -> functions
> 
>> have the following early exist:
> 
> exist -> exit
> 
>>
>>          if (index == locked_folio->index && end_index == index)
>>                  return;
>>
>> This allows us to exist early if the range are inside the same locked
> 
> exist -> exit
> the range are inside -> the range is inside
> 
>> folio.
>>
>> But even without the above early check, the existing code can handle it
>> well, as both __process_folios_contig() and lock_delalloc_folios() will
>> skip any folio page lock/unlock if it's on the locked folio.
>>
>> Just remove those unnecessary early exits.
> 
> It looks good to me from a functional point of view.
> 
> But without this early exits there's a bit of work done, from function
> calls to initializing and releasing folio batches, calling
> filemap_get_folios_contig()  which
> will search the mapping's xarray and always grab one folio and add it
> to the batch, etc.
> 
> It's not uncommon to do IO on a range not spanning more than one
> folio, especially when supporting large folios, which will be more
> likely.
> I understand the goal here is to remove some code, but this code is
> cheaper compared to all that unnecessary overhead.
> 
> Have you considered that?

Yes, but the main reason here is the usage of (folio->index == index) check.

With  large folios, such check is no longer reliable anyway, and 
initially I thought to just change it to folio_contains(), but it turns 
out that is not really needed either.

So that's the main reason to get rid this of these early exits.

Thanks,
Qu

> 
> Thanks.
> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 8 --------
>>   1 file changed, 8 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 8b29eeac3884..013268f70621 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode,
>>                                             const struct folio *locked_folio,
>>                                             u64 start, u64 end)
>>   {
>> -       unsigned long index = start >> PAGE_SHIFT;
>> -       unsigned long end_index = end >> PAGE_SHIFT;
>> -
>>          ASSERT(locked_folio);
>> -       if (index == locked_folio->index && end_index == index)
>> -               return;
>>
>>          __process_folios_contig(inode->i_mapping, locked_folio, start, end,
>>                                  PAGE_UNLOCK);
>> @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>>          u64 processed_end = start;
>>          struct folio_batch fbatch;
>>
>> -       if (index == locked_folio->index && index == end_index)
>> -               return 0;
>> -
>>          folio_batch_init(&fbatch);
>>          while (index <= end_index) {
>>                  unsigned int found_folios, i;
>> --
>> 2.49.0
>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock
  2025-04-04 21:44     ` Qu Wenruo
@ 2025-04-05 17:54       ` Filipe Manana
  0 siblings, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2025-04-05 17:54 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 4, 2025 at 10:44 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2025/4/5 02:34, Filipe Manana 写道:
> > On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> Inside function unlock_delalloc_folio() and lock_delalloc_folios(), we
> >
> > function -> functions
> >
> >> have the following early exist:
> >
> > exist -> exit
> >
> >>
> >>          if (index == locked_folio->index && end_index == index)
> >>                  return;
> >>
> >> This allows us to exist early if the range are inside the same locked
> >
> > exist -> exit
> > the range are inside -> the range is inside
> >
> >> folio.
> >>
> >> But even without the above early check, the existing code can handle it
> >> well, as both __process_folios_contig() and lock_delalloc_folios() will
> >> skip any folio page lock/unlock if it's on the locked folio.
> >>
> >> Just remove those unnecessary early exits.
> >
> > It looks good to me from a functional point of view.
> >
> > But without this early exits there's a bit of work done, from function
> > calls to initializing and releasing folio batches, calling
> > filemap_get_folios_contig()  which
> > will search the mapping's xarray and always grab one folio and add it
> > to the batch, etc.
> >
> > It's not uncommon to do IO on a range not spanning more than one
> > folio, especially when supporting large folios, which will be more
> > likely.
> > I understand the goal here is to remove some code, but this code is
> > cheaper compared to all that unnecessary overhead.
> >
> > Have you considered that?
>
> Yes, but the main reason here is the usage of (folio->index == index) check.
>
> With  large folios, such check is no longer reliable anyway, and
> initially I thought to just change it to folio_contains(), but it turns
> out that is not really needed either.
>
> So that's the main reason to get rid this of these early exits.

Ok.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
> Thanks,
> Qu
>
> >
> > Thanks.
> >
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/extent_io.c | 8 --------
> >>   1 file changed, 8 deletions(-)
> >>
> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> >> index 8b29eeac3884..013268f70621 100644
> >> --- a/fs/btrfs/extent_io.c
> >> +++ b/fs/btrfs/extent_io.c
> >> @@ -224,12 +224,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode,
> >>                                             const struct folio *locked_folio,
> >>                                             u64 start, u64 end)
> >>   {
> >> -       unsigned long index = start >> PAGE_SHIFT;
> >> -       unsigned long end_index = end >> PAGE_SHIFT;
> >> -
> >>          ASSERT(locked_folio);
> >> -       if (index == locked_folio->index && end_index == index)
> >> -               return;
> >>
> >>          __process_folios_contig(inode->i_mapping, locked_folio, start, end,
> >>                                  PAGE_UNLOCK);
> >> @@ -246,9 +241,6 @@ static noinline int lock_delalloc_folios(struct inode *inode,
> >>          u64 processed_end = start;
> >>          struct folio_batch fbatch;
> >>
> >> -       if (index == locked_folio->index && index == end_index)
> >> -               return 0;
> >> -
> >>          folio_batch_init(&fbatch);
> >>          while (index <= end_index) {
> >>                  unsigned int found_folios, i;
> >> --
> >> 2.49.0
> >>
> >>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/3] btrfs: use folio_contains() for EOF detection
  2025-04-04  1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo
  2025-04-04  1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo
@ 2025-04-04  1:47 ` Qu Wenruo
  2025-04-04 16:09   ` Filipe Manana
  2025-04-07 18:39   ` David Sterba
  2025-04-04  1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo
  2 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04  1:47 UTC (permalink / raw)
  To: linux-btrfs

Currently we use the following pattern to detect if the folio contains
the end of a file:

	if (folio->index == end_index)
		folio_zero_range();

But that only works if the folio is page sized.

For the following case, it will not work and leave the range beyond EOF
uninitialized:

  The page size is 4K, and the fs block size is also 4K.

	16K        20K       24K
        |          |     |   |
	                 |
                         EOF at 22K

And we have a large folio sized 8K at file offset 16K.

In that case, the old "folio->index == end_index" will not work, thus
we the range [22K, 24K) will not be zeroed out.

Fix the following call sites which use the above pattern:

- add_ra_bio_pages()

- extent_writepage()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 2 +-
 fs/btrfs/extent_io.c   | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index cb954f9bc332..7aa63681f92a 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 		free_extent_map(em);
 		unlock_extent(tree, cur, page_end, NULL);
 
-		if (folio->index == end_index) {
+		if (folio_contains(folio, end_index)) {
 			size_t zero_offset = offset_in_folio(folio, isize);
 
 			if (zero_offset) {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 013268f70621..f0d51f6ed951 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping,
 }
 
 static noinline void unlock_delalloc_folio(const struct inode *inode,
-					   const struct folio *locked_folio,
+					   struct folio *locked_folio,
 					   u64 start, u64 end)
 {
 	ASSERT(locked_folio);
@@ -231,7 +231,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode,
 }
 
 static noinline int lock_delalloc_folios(struct inode *inode,
-					 const struct folio *locked_folio,
+					 struct folio *locked_folio,
 					 u64 start, u64 end)
 {
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
@@ -1711,7 +1711,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
 		return 0;
 	}
 
-	if (folio->index == end_index)
+	if (folio_contains(folio, end_index))
 		folio_zero_range(folio, pg_offset, folio_size(folio) - pg_offset);
 
 	/*
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection
  2025-04-04  1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo
@ 2025-04-04 16:09   ` Filipe Manana
  2025-04-07 18:39   ` David Sterba
  1 sibling, 0 replies; 16+ messages in thread
From: Filipe Manana @ 2025-04-04 16:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Currently we use the following pattern to detect if the folio contains
> the end of a file:
>
>         if (folio->index == end_index)
>                 folio_zero_range();
>
> But that only works if the folio is page sized.
>
> For the following case, it will not work and leave the range beyond EOF
> uninitialized:
>
>   The page size is 4K, and the fs block size is also 4K.
>
>         16K        20K       24K
>         |          |     |   |
>                          |
>                          EOF at 22K
>
> And we have a large folio sized 8K at file offset 16K.
>
> In that case, the old "folio->index == end_index" will not work, thus
> we the range [22K, 24K) will not be zeroed out.

thus we the range -> thus the range

>
> Fix the following call sites which use the above pattern:
>
> - add_ra_bio_pages()
>
> - extent_writepage()
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Otherwise it looks good, thanks.


> ---
>  fs/btrfs/compression.c | 2 +-
>  fs/btrfs/extent_io.c   | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index cb954f9bc332..7aa63681f92a 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>                 free_extent_map(em);
>                 unlock_extent(tree, cur, page_end, NULL);
>
> -               if (folio->index == end_index) {
> +               if (folio_contains(folio, end_index)) {
>                         size_t zero_offset = offset_in_folio(folio, isize);
>
>                         if (zero_offset) {
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 013268f70621..f0d51f6ed951 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping,
>  }
>
>  static noinline void unlock_delalloc_folio(const struct inode *inode,
> -                                          const struct folio *locked_folio,
> +                                          struct folio *locked_folio,
>                                            u64 start, u64 end)
>  {
>         ASSERT(locked_folio);
> @@ -231,7 +231,7 @@ static noinline void unlock_delalloc_folio(const struct inode *inode,
>  }
>
>  static noinline int lock_delalloc_folios(struct inode *inode,
> -                                        const struct folio *locked_folio,
> +                                        struct folio *locked_folio,
>                                          u64 start, u64 end)
>  {
>         struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
> @@ -1711,7 +1711,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
>                 return 0;
>         }
>
> -       if (folio->index == end_index)
> +       if (folio_contains(folio, end_index))
>                 folio_zero_range(folio, pg_offset, folio_size(folio) - pg_offset);
>
>         /*
> --
> 2.49.0
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection
  2025-04-04  1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo
  2025-04-04 16:09   ` Filipe Manana
@ 2025-04-07 18:39   ` David Sterba
  2025-04-07 21:58     ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: David Sterba @ 2025-04-07 18:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote:
> Currently we use the following pattern to detect if the folio contains
> the end of a file:
> 
> 	if (folio->index == end_index)
> 		folio_zero_range();
> 
> But that only works if the folio is page sized.
> 
> For the following case, it will not work and leave the range beyond EOF
> uninitialized:
> 
>   The page size is 4K, and the fs block size is also 4K.
> 
> 	16K        20K       24K
>         |          |     |   |
> 	                 |
>                          EOF at 22K
> 
> And we have a large folio sized 8K at file offset 16K.
> 
> In that case, the old "folio->index == end_index" will not work, thus
> we the range [22K, 24K) will not be zeroed out.
> 
> Fix the following call sites which use the above pattern:
> 
> - add_ra_bio_pages()
> 
> - extent_writepage()
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/compression.c | 2 +-
>  fs/btrfs/extent_io.c   | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index cb954f9bc332..7aa63681f92a 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  		free_extent_map(em);
>  		unlock_extent(tree, cur, page_end, NULL);
>  
> -		if (folio->index == end_index) {
> +		if (folio_contains(folio, end_index)) {
>  			size_t zero_offset = offset_in_folio(folio, isize);
>  
>  			if (zero_offset) {
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 013268f70621..f0d51f6ed951 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping,
>  }
>  
>  static noinline void unlock_delalloc_folio(const struct inode *inode,
> -					   const struct folio *locked_folio,
> +					   struct folio *locked_folio,

I'm not happy to see removing const from the parameters as it's quite
tedious to find them. Here it's not necessary as it's still not changing
the folio, only required because folio API is not const-clean,
folio_contains() in particular.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection
  2025-04-07 18:39   ` David Sterba
@ 2025-04-07 21:58     ` Qu Wenruo
  2025-04-08 23:12       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-04-07 21:58 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/8 04:09, David Sterba 写道:
> On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote:
>> Currently we use the following pattern to detect if the folio contains
>> the end of a file:
>>
>> 	if (folio->index == end_index)
>> 		folio_zero_range();
>>
>> But that only works if the folio is page sized.
>>
>> For the following case, it will not work and leave the range beyond EOF
>> uninitialized:
>>
>>    The page size is 4K, and the fs block size is also 4K.
>>
>> 	16K        20K       24K
>>          |          |     |   |
>> 	                 |
>>                           EOF at 22K
>>
>> And we have a large folio sized 8K at file offset 16K.
>>
>> In that case, the old "folio->index == end_index" will not work, thus
>> we the range [22K, 24K) will not be zeroed out.
>>
>> Fix the following call sites which use the above pattern:
>>
>> - add_ra_bio_pages()
>>
>> - extent_writepage()
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/compression.c | 2 +-
>>   fs/btrfs/extent_io.c   | 6 +++---
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>> index cb954f9bc332..7aa63681f92a 100644
>> --- a/fs/btrfs/compression.c
>> +++ b/fs/btrfs/compression.c
>> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>>   		free_extent_map(em);
>>   		unlock_extent(tree, cur, page_end, NULL);
>>
>> -		if (folio->index == end_index) {
>> +		if (folio_contains(folio, end_index)) {
>>   			size_t zero_offset = offset_in_folio(folio, isize);
>>
>>   			if (zero_offset) {
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 013268f70621..f0d51f6ed951 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping,
>>   }
>>
>>   static noinline void unlock_delalloc_folio(const struct inode *inode,
>> -					   const struct folio *locked_folio,
>> +					   struct folio *locked_folio,
>
> I'm not happy to see removing const from the parameters as it's quite
> tedious to find them. Here it's not necessary as it's still not changing
> the folio, only required because folio API is not const-clean,
> folio_contains() in particular.
>

Yes, I'm not happy with that either, and I'm planning to constify the
parameters for those helpers in a dedicated series.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection
  2025-04-07 21:58     ` Qu Wenruo
@ 2025-04-08 23:12       ` David Sterba
  2025-04-08 23:16         ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2025-04-08 23:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Tue, Apr 08, 2025 at 07:28:58AM +0930, Qu Wenruo wrote:
> 在 2025/4/8 04:09, David Sterba 写道:
> > On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote:
> >> Currently we use the following pattern to detect if the folio contains
> >> the end of a file:
> >>
> >> 	if (folio->index == end_index)
> >> 		folio_zero_range();
> >>
> >> But that only works if the folio is page sized.
> >>
> >> For the following case, it will not work and leave the range beyond EOF
> >> uninitialized:
> >>
> >>    The page size is 4K, and the fs block size is also 4K.
> >>
> >> 	16K        20K       24K
> >>          |          |     |   |
> >> 	                 |
> >>                           EOF at 22K
> >>
> >> And we have a large folio sized 8K at file offset 16K.
> >>
> >> In that case, the old "folio->index == end_index" will not work, thus
> >> we the range [22K, 24K) will not be zeroed out.
> >>
> >> Fix the following call sites which use the above pattern:
> >>
> >> - add_ra_bio_pages()
> >>
> >> - extent_writepage()
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/compression.c | 2 +-
> >>   fs/btrfs/extent_io.c   | 6 +++---
> >>   2 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> >> index cb954f9bc332..7aa63681f92a 100644
> >> --- a/fs/btrfs/compression.c
> >> +++ b/fs/btrfs/compression.c
> >> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
> >>   		free_extent_map(em);
> >>   		unlock_extent(tree, cur, page_end, NULL);
> >>
> >> -		if (folio->index == end_index) {
> >> +		if (folio_contains(folio, end_index)) {
> >>   			size_t zero_offset = offset_in_folio(folio, isize);
> >>
> >>   			if (zero_offset) {
> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> >> index 013268f70621..f0d51f6ed951 100644
> >> --- a/fs/btrfs/extent_io.c
> >> +++ b/fs/btrfs/extent_io.c
> >> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping,
> >>   }
> >>
> >>   static noinline void unlock_delalloc_folio(const struct inode *inode,
> >> -					   const struct folio *locked_folio,
> >> +					   struct folio *locked_folio,
> >
> > I'm not happy to see removing const from the parameters as it's quite
> > tedious to find them. Here it's not necessary as it's still not changing
> > the folio, only required because folio API is not const-clean,
> > folio_contains() in particular.
> >
> 
> Yes, I'm not happy with that either, and I'm planning to constify the
> parameters for those helpers in a dedicated series.

Thanks, but don't let it distract you from the more important folio
changes. I noticed there are missing consts in the page API too, like
page_offset() or the folio/page boundary like folio_pos() or
folio_pgoff(). This is can wait, my comment was more like a note to self
to have a look later.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] btrfs: use folio_contains() for EOF detection
  2025-04-08 23:12       ` David Sterba
@ 2025-04-08 23:16         ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-08 23:16 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/9 08:42, David Sterba 写道:
> On Tue, Apr 08, 2025 at 07:28:58AM +0930, Qu Wenruo wrote:
>> 在 2025/4/8 04:09, David Sterba 写道:
>>> On Fri, Apr 04, 2025 at 12:17:41PM +1030, Qu Wenruo wrote:
>>>> Currently we use the following pattern to detect if the folio contains
>>>> the end of a file:
>>>>
>>>> 	if (folio->index == end_index)
>>>> 		folio_zero_range();
>>>>
>>>> But that only works if the folio is page sized.
>>>>
>>>> For the following case, it will not work and leave the range beyond EOF
>>>> uninitialized:
>>>>
>>>>     The page size is 4K, and the fs block size is also 4K.
>>>>
>>>> 	16K        20K       24K
>>>>           |          |     |   |
>>>> 	                 |
>>>>                            EOF at 22K
>>>>
>>>> And we have a large folio sized 8K at file offset 16K.
>>>>
>>>> In that case, the old "folio->index == end_index" will not work, thus
>>>> we the range [22K, 24K) will not be zeroed out.
>>>>
>>>> Fix the following call sites which use the above pattern:
>>>>
>>>> - add_ra_bio_pages()
>>>>
>>>> - extent_writepage()
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>    fs/btrfs/compression.c | 2 +-
>>>>    fs/btrfs/extent_io.c   | 6 +++---
>>>>    2 files changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
>>>> index cb954f9bc332..7aa63681f92a 100644
>>>> --- a/fs/btrfs/compression.c
>>>> +++ b/fs/btrfs/compression.c
>>>> @@ -523,7 +523,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>>>>    		free_extent_map(em);
>>>>    		unlock_extent(tree, cur, page_end, NULL);
>>>>
>>>> -		if (folio->index == end_index) {
>>>> +		if (folio_contains(folio, end_index)) {
>>>>    			size_t zero_offset = offset_in_folio(folio, isize);
>>>>
>>>>    			if (zero_offset) {
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index 013268f70621..f0d51f6ed951 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -221,7 +221,7 @@ static void __process_folios_contig(struct address_space *mapping,
>>>>    }
>>>>
>>>>    static noinline void unlock_delalloc_folio(const struct inode *inode,
>>>> -					   const struct folio *locked_folio,
>>>> +					   struct folio *locked_folio,
>>>
>>> I'm not happy to see removing const from the parameters as it's quite
>>> tedious to find them. Here it's not necessary as it's still not changing
>>> the folio, only required because folio API is not const-clean,
>>> folio_contains() in particular.
>>>
>>
>> Yes, I'm not happy with that either, and I'm planning to constify the
>> parameters for those helpers in a dedicated series.
> 
> Thanks, but don't let it distract you from the more important folio
> changes. I noticed there are missing consts in the page API too, like
> page_offset() or the folio/page boundary like folio_pos() or
> folio_pgoff(). This is can wait, my comment was more like a note to self
> to have a look later.

No worry, as the large folios work is done (at least on x86_64, no new 
regression for the whole fstests run).

Now I only need to wait for the remaining 4 patches before the final 
enablement.

Sure there are still something left for large folios, e.g. data reloc 
inode is still using page sized folios, but that is not urgent either.

So the constification of the MM interfaces can be a good low hanging 
fruit here.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls
  2025-04-04  1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo
  2025-04-04  1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo
  2025-04-04  1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo
@ 2025-04-04  1:47 ` Qu Wenruo
  2025-04-04 16:38   ` Filipe Manana
  2025-04-22 22:47   ` Qu Wenruo
  2 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04  1:47 UTC (permalink / raw)
  To: linux-btrfs

With large folios, filemap_get_folios_contig() can return duplicated
folios, for example:

	704K                     768K	                          1M
        |<-- 64K sized folio --->|<------- 256K sized folio ----->|
	                      |          |
			      764K       800K

If we call lock_delalloc_folios() with range [762K, 800K) on above
layout with locked folio at 704K, we will hit the following sequence:

1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
   As this is a folio boundary.

   The returned folio will be folio at 704K.

   Since the folio is already locked, we will not lock the folio.

2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
   All the entries are the same folio at 768K.

3. We lock folio 768K for the slot 0 of the fbatch

4. We lock folio 768K for the slot 1 of the fbatch
   We deadlock, as we have already locked the same folio at step 3.

The filemap_get_folios_contig() behavior is definitely not ideal, but on
the other hand, we do not really need the filemap_get_folios_contig()
call either.

The current filemap_get_folios() is already good enough, and we require
no strong contiguous requirement either, we only need the returned folios
contiguous at file map level (aka, their folio file offsets are contiguous).

So get rid of the cursed filemap_get_folios_contig() and use regular
filemap_get_folios() instead, this will fix the above deadlock for large
folios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c             | 6 ++----
 fs/btrfs/tests/extent-io-tests.c | 3 +--
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f0d51f6ed951..986bda2eff1c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping,
 	while (index <= end_index) {
 		int found_folios;

-		found_folios = filemap_get_folios_contig(mapping, &index,
-				end_index, &fbatch);
+		found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
 		for (i = 0; i < found_folios; i++) {
 			struct folio *folio = fbatch.folios[i];

@@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode,
 	while (index <= end_index) {
 		unsigned int found_folios, i;

-		found_folios = filemap_get_folios_contig(mapping, &index,
-				end_index, &fbatch);
+		found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
 		if (found_folios == 0)
 			goto out;

diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index 74aca7180a5a..e762eca8a99f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
 	folio_batch_init(&fbatch);

 	while (index <= end_index) {
-		ret = filemap_get_folios_contig(inode->i_mapping, &index,
-				end_index, &fbatch);
+		ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
 		for (i = 0; i < ret; i++) {
 			struct folio *folio = fbatch.folios[i];

-- 
2.49.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls
  2025-04-04  1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo
@ 2025-04-04 16:38   ` Filipe Manana
  2025-04-04 21:51     ` Qu Wenruo
  2025-04-22 22:47   ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2025-04-04 16:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>
> With large folios, filemap_get_folios_contig() can return duplicated
> folios, for example:
>
>         704K                     768K                             1M
>         |<-- 64K sized folio --->|<------- 256K sized folio ----->|
>                               |          |
>                               764K       800K
>
> If we call lock_delalloc_folios() with range [762K, 800K) on above
> layout with locked folio at 704K, we will hit the following sequence:
>
> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>    As this is a folio boundary.
>
>    The returned folio will be folio at 704K.
>
>    Since the folio is already locked, we will not lock the folio.
>
> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>    All the entries are the same folio at 768K.
>
> 3. We lock folio 768K for the slot 0 of the fbatch
>
> 4. We lock folio 768K for the slot 1 of the fbatch
>    We deadlock, as we have already locked the same folio at step 3.
>
> The filemap_get_folios_contig() behavior is definitely not ideal, but on
> the other hand, we do not really need the filemap_get_folios_contig()
> call either.
>
> The current filemap_get_folios() is already good enough, and we require
> no strong contiguous requirement either, we only need the returned folios
> contiguous at file map level (aka, their folio file offsets are contiguous).

This paragraph is confusing.
This says filemap_get_folios() provides contiguous results in the
sense that the file offset of each folio is greater than the previous
ones (the folios in the batch are ordered by file offsets).
But then what is the kind of contiguous results that
filemap_get_folios_contig() provides? How different is it? Is it that
the batch doesn't get "holes" - i.e. a folio's start always matches
the end of the previous one +1?

>
> So get rid of the cursed filemap_get_folios_contig() and use regular
> filemap_get_folios() instead, this will fix the above deadlock for large
> folios.

I think it's worth adding in this changelog that it is known that
filemap_get_folios_contig() has a bug and there's a pending fix for
it, adding links to the thread you started and the respective fix:

https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/
https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/

Anyway, it seems good, so with those small changes:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c             | 6 ++----
>  fs/btrfs/tests/extent-io-tests.c | 3 +--
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f0d51f6ed951..986bda2eff1c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping,
>         while (index <= end_index) {
>                 int found_folios;
>
> -               found_folios = filemap_get_folios_contig(mapping, &index,
> -                               end_index, &fbatch);
> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>                 for (i = 0; i < found_folios; i++) {
>                         struct folio *folio = fbatch.folios[i];
>
> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>         while (index <= end_index) {
>                 unsigned int found_folios, i;
>
> -               found_folios = filemap_get_folios_contig(mapping, &index,
> -                               end_index, &fbatch);
> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>                 if (found_folios == 0)
>                         goto out;
>
> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
> index 74aca7180a5a..e762eca8a99f 100644
> --- a/fs/btrfs/tests/extent-io-tests.c
> +++ b/fs/btrfs/tests/extent-io-tests.c
> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
>         folio_batch_init(&fbatch);
>
>         while (index <= end_index) {
> -               ret = filemap_get_folios_contig(inode->i_mapping, &index,
> -                               end_index, &fbatch);
> +               ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
>                 for (i = 0; i < ret; i++) {
>                         struct folio *folio = fbatch.folios[i];
>
> --
> 2.49.0
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls
  2025-04-04 16:38   ` Filipe Manana
@ 2025-04-04 21:51     ` Qu Wenruo
  2025-04-04 22:00       ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04 21:51 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



在 2025/4/5 03:08, Filipe Manana 写道:
> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> With large folios, filemap_get_folios_contig() can return duplicated
>> folios, for example:
>>
>>          704K                     768K                             1M
>>          |<-- 64K sized folio --->|<------- 256K sized folio ----->|
>>                                |          |
>>                                764K       800K
>>
>> If we call lock_delalloc_folios() with range [762K, 800K) on above
>> layout with locked folio at 704K, we will hit the following sequence:
>>
>> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>>     As this is a folio boundary.
>>
>>     The returned folio will be folio at 704K.
>>
>>     Since the folio is already locked, we will not lock the folio.
>>
>> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>>     All the entries are the same folio at 768K.
>>
>> 3. We lock folio 768K for the slot 0 of the fbatch
>>
>> 4. We lock folio 768K for the slot 1 of the fbatch
>>     We deadlock, as we have already locked the same folio at step 3.
>>
>> The filemap_get_folios_contig() behavior is definitely not ideal, but on
>> the other hand, we do not really need the filemap_get_folios_contig()
>> call either.
>>
>> The current filemap_get_folios() is already good enough, and we require
>> no strong contiguous requirement either, we only need the returned folios
>> contiguous at file map level (aka, their folio file offsets are contiguous).
> 
> This paragraph is confusing.
> This says filemap_get_folios() provides contiguous results in the
> sense that the file offset of each folio is greater than the previous
> ones (the folios in the batch are ordered by file offsets).
> But then what is the kind of contiguous results that
> filemap_get_folios_contig() provides? How different is it? Is it that
> the batch doesn't get "holes" - i.e. a folio's start always matches
> the end of the previous one +1?

 From my understanding, filemap_get_folios_contig() returns physically 
contiguous pages/folios.

And the hole handling is always the caller's responsibility, no matter 
if it's the _contig() version or not.

Thus for filemap_get_folios_contig(), and the above two large folios 
cases, as long as the two large folios are not physically contiguous, 
the call returns the first folio, then the next call it returns the next 
folio.

Not returning both in one go, and this behavior is not that useful to us 
either.


And talking about holes, due to the _contig() behavior itself, if we hit 
a hole the function will definitely return, but between different calls 
caller should check the folio's position between calls, to make sure no 
holes is between two filemap_get_folios_contig() calls.

Of course, no caller is really doing that, thus another reason why we do 
not need the filemap_get_folios_contig() call.

Thanks,
Qu>
>>
>> So get rid of the cursed filemap_get_folios_contig() and use regular
>> filemap_get_folios() instead, this will fix the above deadlock for large
>> folios.
> 
> I think it's worth adding in this changelog that it is known that
> filemap_get_folios_contig() has a bug and there's a pending fix for
> it, adding links to the thread you started and the respective fix:
> 
> https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/
> https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/
> 
> Anyway, it seems good, so with those small changes:
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Thanks.
> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c             | 6 ++----
>>   fs/btrfs/tests/extent-io-tests.c | 3 +--
>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index f0d51f6ed951..986bda2eff1c 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping,
>>          while (index <= end_index) {
>>                  int found_folios;
>>
>> -               found_folios = filemap_get_folios_contig(mapping, &index,
>> -                               end_index, &fbatch);
>> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>>                  for (i = 0; i < found_folios; i++) {
>>                          struct folio *folio = fbatch.folios[i];
>>
>> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>>          while (index <= end_index) {
>>                  unsigned int found_folios, i;
>>
>> -               found_folios = filemap_get_folios_contig(mapping, &index,
>> -                               end_index, &fbatch);
>> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>>                  if (found_folios == 0)
>>                          goto out;
>>
>> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
>> index 74aca7180a5a..e762eca8a99f 100644
>> --- a/fs/btrfs/tests/extent-io-tests.c
>> +++ b/fs/btrfs/tests/extent-io-tests.c
>> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
>>          folio_batch_init(&fbatch);
>>
>>          while (index <= end_index) {
>> -               ret = filemap_get_folios_contig(inode->i_mapping, &index,
>> -                               end_index, &fbatch);
>> +               ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
>>                  for (i = 0; i < ret; i++) {
>>                          struct folio *folio = fbatch.folios[i];
>>
>> --
>> 2.49.0
>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls
  2025-04-04 21:51     ` Qu Wenruo
@ 2025-04-04 22:00       ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-04 22:00 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



在 2025/4/5 08:21, Qu Wenruo 写道:
> 
> 
> 在 2025/4/5 03:08, Filipe Manana 写道:
>> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>>>
>>> With large folios, filemap_get_folios_contig() can return duplicated
>>> folios, for example:
>>>
>>>          704K                     768K                             1M
>>>          |<-- 64K sized folio --->|<------- 256K sized folio ----->|
>>>                                |          |
>>>                                764K       800K
>>>
>>> If we call lock_delalloc_folios() with range [762K, 800K) on above
>>> layout with locked folio at 704K, we will hit the following sequence:
>>>
>>> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>>>     As this is a folio boundary.
>>>
>>>     The returned folio will be folio at 704K.
>>>
>>>     Since the folio is already locked, we will not lock the folio.
>>>
>>> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>>>     All the entries are the same folio at 768K.
>>>
>>> 3. We lock folio 768K for the slot 0 of the fbatch
>>>
>>> 4. We lock folio 768K for the slot 1 of the fbatch
>>>     We deadlock, as we have already locked the same folio at step 3.
>>>
>>> The filemap_get_folios_contig() behavior is definitely not ideal, but on
>>> the other hand, we do not really need the filemap_get_folios_contig()
>>> call either.
>>>
>>> The current filemap_get_folios() is already good enough, and we require
>>> no strong contiguous requirement either, we only need the returned 
>>> folios
>>> contiguous at file map level (aka, their folio file offsets are 
>>> contiguous).
>>
>> This paragraph is confusing.
>> This says filemap_get_folios() provides contiguous results in the
>> sense that the file offset of each folio is greater than the previous
>> ones (the folios in the batch are ordered by file offsets).
>> But then what is the kind of contiguous results that
>> filemap_get_folios_contig() provides? How different is it? Is it that
>> the batch doesn't get "holes" - i.e. a folio's start always matches
>> the end of the previous one +1?
> 
>  From my understanding, filemap_get_folios_contig() returns physically 
> contiguous pages/folios.
> 
> And the hole handling is always the caller's responsibility, no matter 
> if it's the _contig() version or not.
> 
> Thus for filemap_get_folios_contig(), and the above two large folios 
> cases, as long as the two large folios are not physically contiguous, 
> the call returns the first folio, then the next call it returns the next 
> folio.
> 
> Not returning both in one go, and this behavior is not that useful to us 
> either.

My bad, I should double check the code before commenting on this.

It turns out it's not really splitting the result based on physical 
addresses, but purely returning to the end of a large folio.
Which is another thing we do not really want in btrfs' code.

So the contiguous part really seems to be the following points:

- There should always be a folio at the start index
   Or it will return 0 immediately

- No holes in the returned fbatch
   Although it can still return the result early, due to large folios.

Anyway I'll add these to the next version of the commit message, to 
explain the behavior in details and why it's safe we go the regular 
filemap_get_folios() call.

Thanks,
Qu

> 
> 
> And talking about holes, due to the _contig() behavior itself, if we hit 
> a hole the function will definitely return, but between different calls 
> caller should check the folio's position between calls, to make sure no 
> holes is between two filemap_get_folios_contig() calls.
> 
> Of course, no caller is really doing that, thus another reason why we do 
> not need the filemap_get_folios_contig() call.
> 
> Thanks,
> Qu>
>>>
>>> So get rid of the cursed filemap_get_folios_contig() and use regular
>>> filemap_get_folios() instead, this will fix the above deadlock for large
>>> folios.
>>
>> I think it's worth adding in this changelog that it is known that
>> filemap_get_folios_contig() has a bug and there's a pending fix for
>> it, adding links to the thread you started and the respective fix:
>>
>> https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035- 
>> b829-72cfb5eb6fc6@gmx.com/
>> https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/
>>
>> Anyway, it seems good, so with those small changes:
>>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>
>> Thanks.
>>
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   fs/btrfs/extent_io.c             | 6 ++----
>>>   fs/btrfs/tests/extent-io-tests.c | 3 +--
>>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index f0d51f6ed951..986bda2eff1c 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct 
>>> address_space *mapping,
>>>          while (index <= end_index) {
>>>                  int found_folios;
>>>
>>> -               found_folios = filemap_get_folios_contig(mapping, 
>>> &index,
>>> -                               end_index, &fbatch);
>>> +               found_folios = filemap_get_folios(mapping, &index, 
>>> end_index, &fbatch);
>>>                  for (i = 0; i < found_folios; i++) {
>>>                          struct folio *folio = fbatch.folios[i];
>>>
>>> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct 
>>> inode *inode,
>>>          while (index <= end_index) {
>>>                  unsigned int found_folios, i;
>>>
>>> -               found_folios = filemap_get_folios_contig(mapping, 
>>> &index,
>>> -                               end_index, &fbatch);
>>> +               found_folios = filemap_get_folios(mapping, &index, 
>>> end_index, &fbatch);
>>>                  if (found_folios == 0)
>>>                          goto out;
>>>
>>> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/ 
>>> extent-io-tests.c
>>> index 74aca7180a5a..e762eca8a99f 100644
>>> --- a/fs/btrfs/tests/extent-io-tests.c
>>> +++ b/fs/btrfs/tests/extent-io-tests.c
>>> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode 
>>> *inode, u64 start, u64 end,
>>>          folio_batch_init(&fbatch);
>>>
>>>          while (index <= end_index) {
>>> -               ret = filemap_get_folios_contig(inode->i_mapping, 
>>> &index,
>>> -                               end_index, &fbatch);
>>> +               ret = filemap_get_folios(inode->i_mapping, &index, 
>>> end_index, &fbatch);
>>>                  for (i = 0; i < ret; i++) {
>>>                          struct folio *folio = fbatch.folios[i];
>>>
>>> -- 
>>> 2.49.0
>>>
>>>
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls
  2025-04-04  1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo
  2025-04-04 16:38   ` Filipe Manana
@ 2025-04-22 22:47   ` Qu Wenruo
  1 sibling, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2025-04-22 22:47 UTC (permalink / raw)
  To: linux-btrfs



在 2025/4/4 12:17, Qu Wenruo 写道:
> With large folios, filemap_get_folios_contig() can return duplicated
> folios, for example:
> 
> 	704K                     768K	                          1M
>          |<-- 64K sized folio --->|<------- 256K sized folio ----->|
> 	                      |          |
> 			      764K       800K
> 
> If we call lock_delalloc_folios() with range [762K, 800K) on above
> layout with locked folio at 704K, we will hit the following sequence:
> 
> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>     As this is a folio boundary.
> 
>     The returned folio will be folio at 704K.
> 
>     Since the folio is already locked, we will not lock the folio.
> 
> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>     All the entries are the same folio at 768K.
> 
> 3. We lock folio 768K for the slot 0 of the fbatch
> 
> 4. We lock folio 768K for the slot 1 of the fbatch
>     We deadlock, as we have already locked the same folio at step 3.
> 
> The filemap_get_folios_contig() behavior is definitely not ideal, but on
> the other hand, we do not really need the filemap_get_folios_contig()
> call either.
> 
> The current filemap_get_folios() is already good enough, and we require
> no strong contiguous requirement either, we only need the returned folios
> contiguous at file map level (aka, their folio file offsets are contiguous).
> 
> So get rid of the cursed filemap_get_folios_contig() and use regular
> filemap_get_folios() instead, this will fix the above deadlock for large
> folios.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This is causing ASSERT() triggered in generic/524.

And thankfully the fix in filemap_get_folios_contig() has already landed 
upstream, we can safely drop this one.

I'll remove it from the for-next branch.

Thanks,
Qu

> ---
>   fs/btrfs/extent_io.c             | 6 ++----
>   fs/btrfs/tests/extent-io-tests.c | 3 +--
>   2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f0d51f6ed951..986bda2eff1c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping,
>   	while (index <= end_index) {
>   		int found_folios;
>   
> -		found_folios = filemap_get_folios_contig(mapping, &index,
> -				end_index, &fbatch);
> +		found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>   		for (i = 0; i < found_folios; i++) {
>   			struct folio *folio = fbatch.folios[i];
>   
> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>   	while (index <= end_index) {
>   		unsigned int found_folios, i;
>   
> -		found_folios = filemap_get_folios_contig(mapping, &index,
> -				end_index, &fbatch);
> +		found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>   		if (found_folios == 0)
>   			goto out;
>   
> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
> index 74aca7180a5a..e762eca8a99f 100644
> --- a/fs/btrfs/tests/extent-io-tests.c
> +++ b/fs/btrfs/tests/extent-io-tests.c
> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
>   	folio_batch_init(&fbatch);
>   
>   	while (index <= end_index) {
> -		ret = filemap_get_folios_contig(inode->i_mapping, &index,
> -				end_index, &fbatch);
> +		ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
>   		for (i = 0; i < ret; i++) {
>   			struct folio *folio = fbatch.folios[i];
>   


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-04-22 22:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04  1:47 [PATCH 0/3] btrfs: fsstress hang fix for large data folios Qu Wenruo
2025-04-04  1:47 ` [PATCH 1/3] btrfs: remove unnecessary early exits in delalloc folio lock and unlock Qu Wenruo
2025-04-04 16:04   ` Filipe Manana
2025-04-04 21:44     ` Qu Wenruo
2025-04-05 17:54       ` Filipe Manana
2025-04-04  1:47 ` [PATCH 2/3] btrfs: use folio_contains() for EOF detection Qu Wenruo
2025-04-04 16:09   ` Filipe Manana
2025-04-07 18:39   ` David Sterba
2025-04-07 21:58     ` Qu Wenruo
2025-04-08 23:12       ` David Sterba
2025-04-08 23:16         ` Qu Wenruo
2025-04-04  1:47 ` [PATCH 3/3] btrfs: get rid of filemap_get_folios_contig() calls Qu Wenruo
2025-04-04 16:38   ` Filipe Manana
2025-04-04 21:51     ` Qu Wenruo
2025-04-04 22:00       ` Qu Wenruo
2025-04-22 22:47   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox