[PATCH] btrfs: defrag: don't try to merge regular extents with preallocated extents

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] btrfs: defrag: don't try to merge regular extents with preallocated extents
@ 2022-01-23  4:52 Qu Wenruo
  2022-01-24 12:19 ` Filipe Manana
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2022-01-23  4:52 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With older kernels (before v5.16), btrfs will defrag preallocated extents.
While with newer kernels (v5.16 and newer) btrfs will not defrag
preallocated extents, but it will defrag the extent just before the
preallocated extent, even it's just a single sector.

This can be exposed by the following small script:

	mkfs.btrfs -f $dev > /dev/null

	mount $dev $mnt
	xfs_io -f -c "pwrite 0 4k" -c sync -c "falloc 4k 16K" $mnt/file
	xfs_io -c "fiemap -v" $mnt/file
	btrfs fi defrag $mnt/file
	sync
	xfs_io -c "fiemap -v" $mnt/file

The output looks like this on older kernels:

/mnt/btrfs/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          26624..26631         8   0x0
   1: [8..39]:         26632..26663        32 0x801
/mnt/btrfs/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..39]:         26664..26703        40   0x1

Which defrags the single sector along with the preallocated extent, and
replace them with an regular extent into a new location (caused by data
COW).
This wastes most of the data IO just for the preallocated range.

On the other hand, v5.16 is slightly better:

/mnt/btrfs/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          26624..26631         8   0x0
   1: [8..39]:         26632..26663        32 0x801
/mnt/btrfs/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          26664..26671         8   0x0
   1: [8..39]:         26632..26663        32 0x801

The preallocated range is not defragged, but the sector before it still
gets defragged, which has no need for it.

[CAUSE]
One of the function reused by the old and new behavior is
defrag_check_next_extent(), it will determine if we should defrag
current extent by checking the next one.

It only checks if the next extent is a hole or inlined, but it doesn't
check if it's preallocated.

On the other hand, out of the function, both old and new kernel will
reject preallocated extents.

Such inconsistent behavior causes above behavior.

[FIX]
- Also check if next extent is preallocated
  If so, don't defrag current extent

- Add comments on each case we don't defrag

This will reduce the IO caused by defrag ioctl and autodefrag.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ioctl.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 91ba2efe9792..dfa81b377e89 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1049,23 +1049,40 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
 	return em;
 }
 
+/*
+ * Return if current extent @em is a good candidate for defrag.
+ *
+ * This is done by checking against the next extent after @em.
+ */
 static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
 				     bool locked)
 {
 	struct extent_map *next;
-	bool ret = true;
+	bool ret = false;
 
 	/* this is the last extent */
 	if (em->start + em->len >= i_size_read(inode))
-		return false;
+		return ret;
 
 	next = defrag_lookup_extent(inode, em->start + em->len, locked);
+	/* No next extent or a hole, no way to merge */
 	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
-		ret = false;
-	else if ((em->block_start + em->block_len == next->block_start) &&
-		 (em->block_len > SZ_128K && next->block_len > SZ_128K))
-		ret = false;
+		goto out;
 
+	/* Next extent is preallocated, no sense to defrag current extent */
+	if (test_bit(EXTENT_FLAG_PREALLOC, &next->flags))
+		goto out;
+
+	/*
+	 * Next extent are not only mergable but also adjacent in their
+	 * logical address, normally an excellent candicate, but if they
+	 * are already large enough, then no need to defrag current extent.
+	 */
+	if ((em->block_start + em->block_len == next->block_start) &&
+	    (em->block_len > SZ_128K && next->block_len > SZ_128K))
+		goto out;
+	ret = true;
+out:
 	free_extent_map(next);
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: defrag: don't try to merge regular extents with preallocated extents
  2022-01-23  4:52 [PATCH] btrfs: defrag: don't try to merge regular extents with preallocated extents Qu Wenruo
@ 2022-01-24 12:19 ` Filipe Manana
  2022-01-24 12:36   ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Filipe Manana @ 2022-01-24 12:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Jan 23, 2022 at 12:52:42PM +0800, Qu Wenruo wrote:
> [BUG]
> With older kernels (before v5.16), btrfs will defrag preallocated extents.
> While with newer kernels (v5.16 and newer) btrfs will not defrag
> preallocated extents, but it will defrag the extent just before the
> preallocated extent, even it's just a single sector.
> 
> This can be exposed by the following small script:
> 
> 	mkfs.btrfs -f $dev > /dev/null
> 
> 	mount $dev $mnt
> 	xfs_io -f -c "pwrite 0 4k" -c sync -c "falloc 4k 16K" $mnt/file
> 	xfs_io -c "fiemap -v" $mnt/file
> 	btrfs fi defrag $mnt/file
> 	sync
> 	xfs_io -c "fiemap -v" $mnt/file
> 
> The output looks like this on older kernels:
> 
> /mnt/btrfs/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    0: [0..7]:          26624..26631         8   0x0
>    1: [8..39]:         26632..26663        32 0x801
> /mnt/btrfs/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    0: [0..39]:         26664..26703        40   0x1
> 
> Which defrags the single sector along with the preallocated extent, and
> replace them with an regular extent into a new location (caused by data
> COW).
> This wastes most of the data IO just for the preallocated range.
> 
> On the other hand, v5.16 is slightly better:
> 
> /mnt/btrfs/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    0: [0..7]:          26624..26631         8   0x0
>    1: [8..39]:         26632..26663        32 0x801
> /mnt/btrfs/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    0: [0..7]:          26664..26671         8   0x0
>    1: [8..39]:         26632..26663        32 0x801
> 
> The preallocated range is not defragged, but the sector before it still
> gets defragged, which has no need for it.
> 
> [CAUSE]
> One of the function reused by the old and new behavior is
> defrag_check_next_extent(), it will determine if we should defrag
> current extent by checking the next one.
> 
> It only checks if the next extent is a hole or inlined, but it doesn't
> check if it's preallocated.
> 
> On the other hand, out of the function, both old and new kernel will
> reject preallocated extents.
> 
> Such inconsistent behavior causes above behavior.
> 
> [FIX]
> - Also check if next extent is preallocated
>   If so, don't defrag current extent
> 
> - Add comments on each case we don't defrag
> 
> This will reduce the IO caused by defrag ioctl and autodefrag.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ioctl.c | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 91ba2efe9792..dfa81b377e89 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1049,23 +1049,40 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
>  	return em;
>  }
>  
> +/*
> + * Return if current extent @em is a good candidate for defrag.
> + *
> + * This is done by checking against the next extent after @em.
> + */
>  static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
>  				     bool locked)
>  {
>  	struct extent_map *next;
> -	bool ret = true;
> +	bool ret = false;
>  
>  	/* this is the last extent */
>  	if (em->start + em->len >= i_size_read(inode))
> -		return false;
> +		return ret;
>  
>  	next = defrag_lookup_extent(inode, em->start + em->len, locked);
> +	/* No next extent or a hole, no way to merge */
>  	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
> -		ret = false;
> -	else if ((em->block_start + em->block_len == next->block_start) &&
> -		 (em->block_len > SZ_128K && next->block_len > SZ_128K))
> -		ret = false;
> +		goto out;
>  
> +	/* Next extent is preallocated, no sense to defrag current extent */
> +	if (test_bit(EXTENT_FLAG_PREALLOC, &next->flags))
> +		goto out;
> +
> +	/*
> +	 * Next extent are not only mergable but also adjacent in their

are not -> is not
mergable -> mergeable
their -> its

> +	 * logical address, normally an excellent candicate, but if they

candicate -> candidate

> +	 * are already large enough, then no need to defrag current extent.
> +	 */

It still sounds a bit odd to me, maybe:

Next extent is mergeable and its logical address is contiguous with this
extent, so normally an excellent candidate, but if this extent or the next
one is already large enough, then we don't need to defrag. We use SZ_128K
because in case of enabled compression, extents can never be larger than
that.

Adding this comment is unrelated to this fix about prealloc extents, but I'm
fine with it.

Other than that it looks fine.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +	if ((em->block_start + em->block_len == next->block_start) &&
> +	    (em->block_len > SZ_128K && next->block_len > SZ_128K))
> +		goto out;
> +	ret = true;
> +out:
>  	free_extent_map(next);
>  	return ret;
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs: defrag: don't try to merge regular extents with preallocated extents
  2022-01-24 12:19 ` Filipe Manana
@ 2022-01-24 12:36   ` Qu Wenruo
  0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-01-24 12:36 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



On 2022/1/24 20:19, Filipe Manana wrote:
> On Sun, Jan 23, 2022 at 12:52:42PM +0800, Qu Wenruo wrote:
>> [BUG]
>> With older kernels (before v5.16), btrfs will defrag preallocated extents.
>> While with newer kernels (v5.16 and newer) btrfs will not defrag
>> preallocated extents, but it will defrag the extent just before the
>> preallocated extent, even it's just a single sector.
>>
>> This can be exposed by the following small script:
>>
>> 	mkfs.btrfs -f $dev > /dev/null
>>
>> 	mount $dev $mnt
>> 	xfs_io -f -c "pwrite 0 4k" -c sync -c "falloc 4k 16K" $mnt/file
>> 	xfs_io -c "fiemap -v" $mnt/file
>> 	btrfs fi defrag $mnt/file
>> 	sync
>> 	xfs_io -c "fiemap -v" $mnt/file
>>
>> The output looks like this on older kernels:
>>
>> /mnt/btrfs/file:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>>     0: [0..7]:          26624..26631         8   0x0
>>     1: [8..39]:         26632..26663        32 0x801
>> /mnt/btrfs/file:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>>     0: [0..39]:         26664..26703        40   0x1
>>
>> Which defrags the single sector along with the preallocated extent, and
>> replace them with an regular extent into a new location (caused by data
>> COW).
>> This wastes most of the data IO just for the preallocated range.
>>
>> On the other hand, v5.16 is slightly better:
>>
>> /mnt/btrfs/file:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>>     0: [0..7]:          26624..26631         8   0x0
>>     1: [8..39]:         26632..26663        32 0x801
>> /mnt/btrfs/file:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>>     0: [0..7]:          26664..26671         8   0x0
>>     1: [8..39]:         26632..26663        32 0x801
>>
>> The preallocated range is not defragged, but the sector before it still
>> gets defragged, which has no need for it.
>>
>> [CAUSE]
>> One of the function reused by the old and new behavior is
>> defrag_check_next_extent(), it will determine if we should defrag
>> current extent by checking the next one.
>>
>> It only checks if the next extent is a hole or inlined, but it doesn't
>> check if it's preallocated.
>>
>> On the other hand, out of the function, both old and new kernel will
>> reject preallocated extents.
>>
>> Such inconsistent behavior causes above behavior.
>>
>> [FIX]
>> - Also check if next extent is preallocated
>>    If so, don't defrag current extent
>>
>> - Add comments on each case we don't defrag
>>
>> This will reduce the IO caused by defrag ioctl and autodefrag.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/ioctl.c | 29 +++++++++++++++++++++++------
>>   1 file changed, 23 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 91ba2efe9792..dfa81b377e89 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1049,23 +1049,40 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
>>   	return em;
>>   }
>>   
>> +/*
>> + * Return if current extent @em is a good candidate for defrag.
>> + *
>> + * This is done by checking against the next extent after @em.
>> + */
>>   static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
>>   				     bool locked)
>>   {
>>   	struct extent_map *next;
>> -	bool ret = true;
>> +	bool ret = false;
>>   
>>   	/* this is the last extent */
>>   	if (em->start + em->len >= i_size_read(inode))
>> -		return false;
>> +		return ret;
>>   
>>   	next = defrag_lookup_extent(inode, em->start + em->len, locked);
>> +	/* No next extent or a hole, no way to merge */
>>   	if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
>> -		ret = false;
>> -	else if ((em->block_start + em->block_len == next->block_start) &&
>> -		 (em->block_len > SZ_128K && next->block_len > SZ_128K))
>> -		ret = false;
>> +		goto out;
>>   
>> +	/* Next extent is preallocated, no sense to defrag current extent */
>> +	if (test_bit(EXTENT_FLAG_PREALLOC, &next->flags))
>> +		goto out;
>> +
>> +	/*
>> +	 * Next extent are not only mergable but also adjacent in their
> 
> are not -> is not
> mergable -> mergeable
> their -> its
> 
>> +	 * logical address, normally an excellent candicate, but if they
> 
> candicate -> candidate
> 
>> +	 * are already large enough, then no need to defrag current extent.
>> +	 */
> 
> It still sounds a bit odd to me, maybe:
> 
> Next extent is mergeable and its logical address is contiguous with this
> extent, so normally an excellent candidate, but if this extent or the next
> one is already large enough, then we don't need to defrag. We use SZ_128K
> because in case of enabled compression, extents can never be larger than
> that.

In fact, I'm a little more concerned of the original condition now.

One thing is, the threshold here, it's hard coded 128K, while the 
extent_threshold for defrag can be specified by the ioctl caller.

Another thing is, the original condition is using block_start, I'm not 
sure if we really need to check that.

As long as the next extent is not a hole/preallocated one, we're 
completely happy to defrag.

In fact, if the disk bytenr/num_bytes of @em is not adjacent to the next 
extent, it's even better, we can merge it into one extent without an 
extra seek.

So I tend to change the condition more, like this:

- Skip holes/preallocated
   That's already here in the patch

- Skip large extents, using @threshold passed into this function
   No longer hard coded values, let the defrag caller to have more
   control, and have more consistent behavior.

- No more check on em::block_start
   There are some pros and cons of defragging already physically adjacent
   file extents:

   Pros:
   * Reduces the number of extents
     Which may be what the defrag users want, to defrag extents caused by
     small but sequential direct IO.
     With reduced number of extents, there is a slight chance to reduce
     mount time by a little.

   Cons:
   * Extra IO and no saving in seeking time

   So why the existing code checks on the em::block_start is already
   questionable, as it's not a clear win.

   I know this sounds weird especially after I have broken so many defrag
   code, but I still want to remove the checks, replacing with more
   reasonable checks, even it means it will change the behavior again.

Thanks,
Qu
> 
> Adding this comment is unrelated to this fix about prealloc extents, but I'm
> fine with it.
> 
> Other than that it looks fine.
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Thanks.
> 
>> +	if ((em->block_start + em->block_len == next->block_start) &&
>> +	    (em->block_len > SZ_128K && next->block_len > SZ_128K))
>> +		goto out;
>> +	ret = true;
>> +out:
>>   	free_extent_map(next);
>>   	return ret;
>>   }
>> -- 
>> 2.34.1
>>
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-24 12:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-23  4:52 [PATCH] btrfs: defrag: don't try to merge regular extents with preallocated extents Qu Wenruo
2022-01-24 12:19 ` Filipe Manana
2022-01-24 12:36   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).