public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
@ 2024-02-06 23:30 Qu Wenruo
  2024-02-07  1:02 ` Boris Burkov
  2024-02-07 14:31 ` Filipe Manana
  0 siblings, 2 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-02-06 23:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

[BUG]
With the following file extent layout, defrag would do unnecessary IO
and result more on-disk space usage.

 # mkfs.btrfs -f $dev
 # mount $dev $mnt
 # xfs_io -f -c "pwrite 0 40m" $mnt/foobar
 # sync
 # xfs_io -f -c "pwrite 40m 16k" $mnt/foobar.
 # sync

Above command would lead to the following file extent layout:

        item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 298844160 nr 41943040
                extent data offset 0 nr 41943040 ram 41943040
                extent compression 0 (none)
        item 7 key (257 EXTENT_DATA 41943040) itemoff 15763 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13631488 nr 16384
                extent data offset 0 nr 16384 ram 16384
                extent compression 0 (none)

Which is mostly fine. We can allow the final 16K to be merged with the
previous 40M, but it's upon the end users' preference.

But if we defrag the file using the default parameters, it would result
worse file layout:

 # btrfs filesystem defrag $mnt/foobar
 # sync

        item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 298844160 nr 41943040
                extent data offset 0 nr 8650752 ram 41943040
                extent compression 0 (none)
        item 7 key (257 EXTENT_DATA 8650752) itemoff 15763 itemsize 53
                generation 9 type 1 (regular)
                extent data disk byte 340787200 nr 33292288
                extent data offset 0 nr 33292288 ram 33292288
                extent compression 0 (none)
        item 8 key (257 EXTENT_DATA 41943040) itemoff 15710 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13631488 nr 16384
                extent data offset 0 nr 16384 ram 16384
                extent compression 0 (none)

Note the original 40M extent is still there, but a new 32M extent is
created for no benefit at all.

[CAUSE]
There is an existing check to make sure we won't defrag a large enough
extent (the threshold is by default 32M).

But the check is using the length to the end of the extent:

	range_len = em->len - (cur - em->start);

	/* Skip too large extent */
	if (range_len >= extent_thresh)
		goto next;

This means, for the first 8MiB of the extent, the range_len is always
smaller than the default threshold, and would not be defragged.
But after the first 8MiB, the remaining part would fit the requirement,
and be defragged.

Such different behavior inside the same extent caused the above problem,
and we should avoid different defrag decision inside the same extent.

[FIX]
Instead of using @range_len, just use @em->len, so that we have a
consistent decision among the same file extent.

Now with this fix, we won't touch the extent, thus not making it any
worse.

Fixes: 0cb5950f3f3b ("btrfs: fix deadlock when reserving space during defrag")
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/defrag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 8fc8118c3225..eb62ff490c48 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -1046,7 +1046,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
 			goto add;
 
 		/* Skip too large extent */
-		if (range_len >= extent_thresh)
+		if (em->len >= extent_thresh)
 			goto next;
 
 		/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
  2024-02-06 23:30 [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size Qu Wenruo
@ 2024-02-07  1:02 ` Boris Burkov
  2024-02-07  1:54   ` Qu Wenruo
  2024-02-07 14:31 ` Filipe Manana
  1 sibling, 1 reply; 4+ messages in thread
From: Boris Burkov @ 2024-02-07  1:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana

On Wed, Feb 07, 2024 at 10:00:42AM +1030, Qu Wenruo wrote:
> [BUG]
> With the following file extent layout, defrag would do unnecessary IO
> and result more on-disk space usage.
> 
>  # mkfs.btrfs -f $dev
>  # mount $dev $mnt
>  # xfs_io -f -c "pwrite 0 40m" $mnt/foobar
>  # sync
>  # xfs_io -f -c "pwrite 40m 16k" $mnt/foobar.
>  # sync

Are you planning to make this an xfstest? I think that would be good!

> 
> Above command would lead to the following file extent layout:
> 
>         item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>                 generation 7 type 1 (regular)
>                 extent data disk byte 298844160 nr 41943040
>                 extent data offset 0 nr 41943040 ram 41943040
>                 extent compression 0 (none)
>         item 7 key (257 EXTENT_DATA 41943040) itemoff 15763 itemsize 53
>                 generation 8 type 1 (regular)
>                 extent data disk byte 13631488 nr 16384
>                 extent data offset 0 nr 16384 ram 16384
>                 extent compression 0 (none)
> 
> Which is mostly fine. We can allow the final 16K to be merged with the
> previous 40M, but it's upon the end users' preference.
> 
> But if we defrag the file using the default parameters, it would result
> worse file layout:
> 
>  # btrfs filesystem defrag $mnt/foobar
>  # sync
> 
>         item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>                 generation 7 type 1 (regular)
>                 extent data disk byte 298844160 nr 41943040
>                 extent data offset 0 nr 8650752 ram 41943040
>                 extent compression 0 (none)
>         item 7 key (257 EXTENT_DATA 8650752) itemoff 15763 itemsize 53
>                 generation 9 type 1 (regular)
>                 extent data disk byte 340787200 nr 33292288
>                 extent data offset 0 nr 33292288 ram 33292288
>                 extent compression 0 (none)
>         item 8 key (257 EXTENT_DATA 41943040) itemoff 15710 itemsize 53
>                 generation 8 type 1 (regular)
>                 extent data disk byte 13631488 nr 16384
>                 extent data offset 0 nr 16384 ram 16384
>                 extent compression 0 (none)
> 
> Note the original 40M extent is still there, but a new 32M extent is
> created for no benefit at all.
> 
> [CAUSE]
> There is an existing check to make sure we won't defrag a large enough
> extent (the threshold is by default 32M).
> 
> But the check is using the length to the end of the extent:
> 
> 	range_len = em->len - (cur - em->start);
> 
> 	/* Skip too large extent */
> 	if (range_len >= extent_thresh)
> 		goto next;
> 
> This means, for the first 8MiB of the extent, the range_len is always
> smaller than the default threshold, and would not be defragged.
> But after the first 8MiB, the remaining part would fit the requirement,
> and be defragged.
> 
> Such different behavior inside the same extent caused the above problem,
> and we should avoid different defrag decision inside the same extent.
> 
> [FIX]
> Instead of using @range_len, just use @em->len, so that we have a
> consistent decision among the same file extent.
> 
> Now with this fix, we won't touch the extent, thus not making it any
> worse.
> 
> Fixes: 0cb5950f3f3b ("btrfs: fix deadlock when reserving space during defrag")
> Reported-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>)
> ---
>  fs/btrfs/defrag.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
> index 8fc8118c3225..eb62ff490c48 100644
> --- a/fs/btrfs/defrag.c
> +++ b/fs/btrfs/defrag.c
> @@ -1046,7 +1046,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>  			goto add;
>  
>  		/* Skip too large extent */
> -		if (range_len >= extent_thresh)
> +		if (em->len >= extent_thresh)
>  			goto next;

The next check is using em->len and checking the max extnt lengths,
so we could theoretically merge the max extent size and thresh checks now,
by taking the min of all the relevant options or something.

>  
>  		/*
> 
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
  2024-02-07  1:02 ` Boris Burkov
@ 2024-02-07  1:54   ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-02-07  1:54 UTC (permalink / raw)
  To: Boris Burkov, Qu Wenruo; +Cc: linux-btrfs, Filipe Manana



On 2024/2/7 11:32, Boris Burkov wrote:
> On Wed, Feb 07, 2024 at 10:00:42AM +1030, Qu Wenruo wrote:
>> [BUG]
>> With the following file extent layout, defrag would do unnecessary IO
>> and result more on-disk space usage.
>>
>>   # mkfs.btrfs -f $dev
>>   # mount $dev $mnt
>>   # xfs_io -f -c "pwrite 0 40m" $mnt/foobar
>>   # sync
>>   # xfs_io -f -c "pwrite 40m 16k" $mnt/foobar.
>>   # sync
>
> Are you planning to make this an xfstest? I think that would be good!

Check this one:

https://lore.kernel.org/linux-btrfs/20240206233024.35399-1-wqu@suse.com/T/#u

It goes with qgroup for accounting the real used space of the workload.

And I believe it can even work with squote.

>
>>
>> Above command would lead to the following file extent layout:
>>
>>          item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>                  generation 7 type 1 (regular)
>>                  extent data disk byte 298844160 nr 41943040
>>                  extent data offset 0 nr 41943040 ram 41943040
>>                  extent compression 0 (none)
>>          item 7 key (257 EXTENT_DATA 41943040) itemoff 15763 itemsize 53
>>                  generation 8 type 1 (regular)
>>                  extent data disk byte 13631488 nr 16384
>>                  extent data offset 0 nr 16384 ram 16384
>>                  extent compression 0 (none)
>>
>> Which is mostly fine. We can allow the final 16K to be merged with the
>> previous 40M, but it's upon the end users' preference.
>>
>> But if we defrag the file using the default parameters, it would result
>> worse file layout:
>>
>>   # btrfs filesystem defrag $mnt/foobar
>>   # sync
>>
>>          item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>                  generation 7 type 1 (regular)
>>                  extent data disk byte 298844160 nr 41943040
>>                  extent data offset 0 nr 8650752 ram 41943040
>>                  extent compression 0 (none)
>>          item 7 key (257 EXTENT_DATA 8650752) itemoff 15763 itemsize 53
>>                  generation 9 type 1 (regular)
>>                  extent data disk byte 340787200 nr 33292288
>>                  extent data offset 0 nr 33292288 ram 33292288
>>                  extent compression 0 (none)
>>          item 8 key (257 EXTENT_DATA 41943040) itemoff 15710 itemsize 53
>>                  generation 8 type 1 (regular)
>>                  extent data disk byte 13631488 nr 16384
>>                  extent data offset 0 nr 16384 ram 16384
>>                  extent compression 0 (none)
>>
>> Note the original 40M extent is still there, but a new 32M extent is
>> created for no benefit at all.
>>
>> [CAUSE]
>> There is an existing check to make sure we won't defrag a large enough
>> extent (the threshold is by default 32M).
>>
>> But the check is using the length to the end of the extent:
>>
>> 	range_len = em->len - (cur - em->start);
>>
>> 	/* Skip too large extent */
>> 	if (range_len >= extent_thresh)
>> 		goto next;
>>
>> This means, for the first 8MiB of the extent, the range_len is always
>> smaller than the default threshold, and would not be defragged.
>> But after the first 8MiB, the remaining part would fit the requirement,
>> and be defragged.
>>
>> Such different behavior inside the same extent caused the above problem,
>> and we should avoid different defrag decision inside the same extent.
>>
>> [FIX]
>> Instead of using @range_len, just use @em->len, so that we have a
>> consistent decision among the same file extent.
>>
>> Now with this fix, we won't touch the extent, thus not making it any
>> worse.
>>
>> Fixes: 0cb5950f3f3b ("btrfs: fix deadlock when reserving space during defrag")
>> Reported-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Boris Burkov <boris@bur.io>)
>> ---
>>   fs/btrfs/defrag.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
>> index 8fc8118c3225..eb62ff490c48 100644
>> --- a/fs/btrfs/defrag.c
>> +++ b/fs/btrfs/defrag.c
>> @@ -1046,7 +1046,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>   			goto add;
>>
>>   		/* Skip too large extent */
>> -		if (range_len >= extent_thresh)
>> +		if (em->len >= extent_thresh)
>>   			goto next;
>
> The next check is using em->len and checking the max extnt lengths,
> so we could theoretically merge the max extent size and thresh checks now,
> by taking the min of all the relevant options or something.

Indeed, but for backport purposes, the minimal fix would be preferred,
especially there are quite some code moving between the original cause
and later defrag works.

Thanks,
Qu

>
>>
>>   		/*
>>
>> --
>> 2.43.0
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
  2024-02-06 23:30 [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size Qu Wenruo
  2024-02-07  1:02 ` Boris Burkov
@ 2024-02-07 14:31 ` Filipe Manana
  1 sibling, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2024-02-07 14:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Filipe Manana

On Tue, Feb 6, 2024 at 11:30 PM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> With the following file extent layout, defrag would do unnecessary IO
> and result more on-disk space usage.
>
>  # mkfs.btrfs -f $dev
>  # mount $dev $mnt
>  # xfs_io -f -c "pwrite 0 40m" $mnt/foobar
>  # sync
>  # xfs_io -f -c "pwrite 40m 16k" $mnt/foobar.

Unintended . at the end of the filenamme.

>  # sync
>
> Above command would lead to the following file extent layout:
>
>         item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>                 generation 7 type 1 (regular)
>                 extent data disk byte 298844160 nr 41943040
>                 extent data offset 0 nr 41943040 ram 41943040
>                 extent compression 0 (none)
>         item 7 key (257 EXTENT_DATA 41943040) itemoff 15763 itemsize 53
>                 generation 8 type 1 (regular)
>                 extent data disk byte 13631488 nr 16384
>                 extent data offset 0 nr 16384 ram 16384
>                 extent compression 0 (none)
>
> Which is mostly fine. We can allow the final 16K to be merged with the
> previous 40M, but it's upon the end users' preference.
>
> But if we defrag the file using the default parameters, it would result
> worse file layout:
>
>  # btrfs filesystem defrag $mnt/foobar
>  # sync
>
>         item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>                 generation 7 type 1 (regular)
>                 extent data disk byte 298844160 nr 41943040
>                 extent data offset 0 nr 8650752 ram 41943040
>                 extent compression 0 (none)
>         item 7 key (257 EXTENT_DATA 8650752) itemoff 15763 itemsize 53
>                 generation 9 type 1 (regular)
>                 extent data disk byte 340787200 nr 33292288
>                 extent data offset 0 nr 33292288 ram 33292288
>                 extent compression 0 (none)
>         item 8 key (257 EXTENT_DATA 41943040) itemoff 15710 itemsize 53
>                 generation 8 type 1 (regular)
>                 extent data disk byte 13631488 nr 16384
>                 extent data offset 0 nr 16384 ram 16384
>                 extent compression 0 (none)
>
> Note the original 40M extent is still there, but a new 32M extent is
> created for no benefit at all.
>
> [CAUSE]
> There is an existing check to make sure we won't defrag a large enough
> extent (the threshold is by default 32M).
>
> But the check is using the length to the end of the extent:
>
>         range_len = em->len - (cur - em->start);
>
>         /* Skip too large extent */
>         if (range_len >= extent_thresh)
>                 goto next;
>
> This means, for the first 8MiB of the extent, the range_len is always
> smaller than the default threshold, and would not be defragged.
> But after the first 8MiB, the remaining part would fit the requirement,
> and be defragged.
>
> Such different behavior inside the same extent caused the above problem,
> and we should avoid different defrag decision inside the same extent.
>
> [FIX]
> Instead of using @range_len, just use @em->len, so that we have a
> consistent decision among the same file extent.
>
> Now with this fix, we won't touch the extent, thus not making it any
> worse.
>
> Fixes: 0cb5950f3f3b ("btrfs: fix deadlock when reserving space during defrag")
> Reported-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>


> ---
>  fs/btrfs/defrag.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
> index 8fc8118c3225..eb62ff490c48 100644
> --- a/fs/btrfs/defrag.c
> +++ b/fs/btrfs/defrag.c
> @@ -1046,7 +1046,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>                         goto add;
>
>                 /* Skip too large extent */
> -               if (range_len >= extent_thresh)
> +               if (em->len >= extent_thresh)
>                         goto next;
>
>                 /*
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-07 14:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-06 23:30 [PATCH] btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size Qu Wenruo
2024-02-07  1:02 ` Boris Burkov
2024-02-07  1:54   ` Qu Wenruo
2024-02-07 14:31 ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox