public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Filipe Manana <fdmanana@kernel.org>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 2/3] btrfs: defrag: use extent_thresh to replace the hardcoded size limit
Date: Wed, 26 Jan 2022 21:00:17 +0800	[thread overview]
Message-ID: <dc25f2ec-1afc-8cb4-8a01-6416602d45a4@gmx.com> (raw)
In-Reply-To: <CAL3q7H4RJTxZ-SebRPhyQRPUUaSr0fHvaTm1o1qu18wTuKYOZg@mail.gmail.com>



On 2022/1/26 20:36, Filipe Manana wrote:
> On Wed, Jan 26, 2022 at 12:26 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> On 2022/1/26 19:40, Filipe Manana wrote:
>>> On Wed, Jan 26, 2022 at 08:58:49AM +0800, Qu Wenruo wrote:
>>>> In defrag_lookup_extent() we use hardcoded extent size threshold, SZ_128K,
>>>> other than @extent_thresh in btrfs_defrag_file().
>>>>
>>>> This can lead to some inconsistent behavior, especially the default
>>>> extent size threshold is 256K.
>>>>
>>>> Fix this by passing @extent_thresh into defrag_check_next_extent() and
>>>> use that value.
>>>>
>>>> Also, since the extent_thresh check should be applied to all extents,
>>>> not only physically adjacent extents, move the threshold check into a
>>>> dedicate if ().
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>    fs/btrfs/ioctl.c | 12 +++++++-----
>>>>    1 file changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>> index 0d8bfc716e6b..2911df12fc48 100644
>>>> --- a/fs/btrfs/ioctl.c
>>>> +++ b/fs/btrfs/ioctl.c
>>>> @@ -1050,7 +1050,7 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
>>>>    }
>>>>
>>>>    static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
>>>> -                                 bool locked)
>>>> +                                 u32 extent_thresh, bool locked)
>>>>    {
>>>>       struct extent_map *next;
>>>>       bool ret = false;
>>>> @@ -1066,9 +1066,11 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
>>>>       /* Preallocated */
>>>>       if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
>>>>               goto out;
>>>> -    /* Physically adjacent and large enough */
>>>> -    if ((em->block_start + em->block_len == next->block_start) &&
>>>> -        (em->block_len > SZ_128K && next->block_len > SZ_128K))
>>>> +    /* Extent is already large enough */
>>>> +    if (next->len >= extent_thresh)
>>>> +            goto out;
>>>
>>> So this will trigger unnecessary rewrites of compressed extents.
>>> The SZ_128K is there to deal with compressed extents, it has nothing to
>>> do with the threshold passed to the ioctl.
>>
>> Then there is still something wrong.
>>
>> The original check will only reject it when both conditions are met.
>>
>> So based on your script, I can still find a way to defrag the extents,
>> with or without this modification:
>
> Right, without the intermediary write to file "baz", this patchset
> brings a regression in regards to
> compressed extents - when they are adjacent, which is typically the
> case when doing large writes,
> as they'll create multiple extents covering consecutive 128K ranges.
>
> With the write to file "baz", as I pasted it, it happens before and
> after the patchset.
>
>>
>>          mkfs.btrfs -f $DEV
>>          mount -o compress $DEV $MNT
>>
>>          xfs_io -f -c "pwrite -S 0xab 0 128K" $MNT/file1
>>          sync
>>          xfs_io -f -c "pwrite -S 0xab 0 128K" $MNT/file2
>>          sync
>>          xfs_io -f -c "pwrite -S 0xab 128K 128K" $MNT/file1
>>          sync
>>
>>          echo "=== file1 before defrag ==="
>>          xfs_io -f -c "fiemap -v" $MNT/file1
>>          echo "=== file1 after defrag ==="
>>          btrfs fi defrag $MNT/file1
>>          sync
>>          xfs_io -f -c "fiemap -v" $MNT/file1
>>
>> The output looks like this:
>>
>> === before ===
>> /mnt/btrfs/file1:
>>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>>      0: [0..255]:        26624..26879       256   0x8
>>      1: [256..511]:      26640..26895       256   0x9
>> === after ===
>> /mnt/btrfs/file1:
>>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>>      0: [0..255]:        26648..26903       256   0x8
>>      1: [256..511]:      26656..26911       256   0x9
>>
>> No matter if the patch is applied, the result is the same.
>
> Yes, explained above.
>
>>
>> So thank you very much for finding another case we're not handling well...
>>
>>
>> BTW, if the check is want to reject adjacent non-compressed extent, the
>> original one is still incorrect, we can have extents smaller than 128K
>> and is still uncompressed.
>>
>> So what we really want is to reject physically adjacent, non-compressed
>> extents?
>
> We want to avoid doing work that does nothing.
> If 2 consecutive extents are compressed and at least one is already
> 128K, then it's a waste of time, IO and CPU.

So can we define the behavior like this?

  If the extent is already at its max capacity (compressed 128K,
   non-compressed 128M), we don't defrag it.

This also means, we need to do the same check in
defrag_collect_targets() to avoid defragging such extent.

Thanks,
Qu


>
> And that's a fairly common scenario. Do a one megabyte write for
> example, then after writeback we end up with several 128K extents with
> compression.
> In that case defrag should do nothing for the whole range.
>
>
>>
>> Thanks,
>> Qu
>>>
>>> After applying this patchset, if you run a trivial test like this:
>>>
>>>      #!/bin/bash
>>>
>>>      DEV=/dev/sdj
>>>      MNT=/mnt/sdj
>>>
>>>      mkfs.btrfs -f $DEV
>>>      mount -o compress $DEV $MNT
>>>
>>>      xfs_io -f -c "pwrite -S 0xab 0 128K" $MNT/foobar
>>>      sync
>>>      # Write to some other file so that the next extent for foobar
>>>      # is not contiguous with the first extent.
>>>      xfs_io -f -c "pwrite 0 128K" $MNT/baz
>>>      sync
>>>      xfs_io -f -c "pwrite -S 0xcd 128K 128K" $MNT/foobar
>>>      sync
>>>
>>>      echo -e "\n\nTree after creating file:\n\n"
>>>      btrfs inspect-internal dump-tree -t 5 $DEV
>>>
>>>      btrfs filesystem defragment $MNT/foobar
>>>      sync
>>>
>>>      echo -e "\n\nTree after defrag:\n\n"
>>>      btrfs inspect-internal dump-tree -t 5 $DEV
>>>
>>>      umount $MNT
>>>
>>> It will result in rewriting the two 128K compressed extents:
>>>
>>> (...)
>>> Tree after write and sync:
>>>
>>> btrfs-progs v5.12.1
>>> fs tree key (FS_TREE ROOT_ITEM 0)
>>> (...)
>>>        item 7 key (257 INODE_REF 256) itemoff 15797 itemsize 16
>>>                index 2 namelen 6 name: foobar
>>>        item 8 key (257 EXTENT_DATA 0) itemoff 15744 itemsize 53
>>>                generation 6 type 1 (regular)
>>>                extent data disk byte 13631488 nr 4096
>>>                extent data offset 0 nr 131072 ram 131072
>>>                extent compression 1 (zlib)
>>>        item 9 key (257 EXTENT_DATA 131072) itemoff 15691 itemsize 53
>>>                generation 8 type 1 (regular)
>>>                extent data disk byte 14163968 nr 4096
>>>                extent data offset 0 nr 131072 ram 131072
>>>                extent compression 1 (zlib)
>>> (...)
>>>
>>> Tree after defrag:
>>>
>>> btrfs-progs v5.12.1
>>> fs tree key (FS_TREE ROOT_ITEM 0)
>>> (...)
>>>        item 7 key (257 INODE_REF 256) itemoff 15797 itemsize 16
>>>                index 2 namelen 6 name: foobar
>>>        item 8 key (257 EXTENT_DATA 0) itemoff 15744 itemsize 53
>>>                generation 9 type 1 (regular)
>>>                extent data disk byte 14430208 nr 4096
>>>                extent data offset 0 nr 131072 ram 131072
>>>                extent compression 1 (zlib)
>>>        item 9 key (257 EXTENT_DATA 131072) itemoff 15691 itemsize 53
>>>                generation 9 type 1 (regular)
>>>                extent data disk byte 13635584 nr 4096
>>>                extent data offset 0 nr 131072 ram 131072
>>>                extent compression 1 (zlib)
>>>
>>> In other words, a waste of IO and CPU time.
>>>
>>> So it needs to check if we are dealing with compressed extents, and
>>> if so, skip either of them has a size of SZ_128K (and changelog updated).
>>>
>>> Thanks.
>>>
>>>> +    /* Physically adjacent */
>>>> +    if ((em->block_start + em->block_len == next->block_start))
>>>>               goto out;
>>>>       ret = true;
>>>>    out:
>>>> @@ -1231,7 +1233,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>>>>                       goto next;
>>>>
>>>>               next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
>>>> -                                                      locked);
>>>> +                                                      extent_thresh, locked);
>>>>               if (!next_mergeable) {
>>>>                       struct defrag_target_range *last;
>>>>
>>>> --
>>>> 2.34.1
>>>>
>>>
>>

  reply	other threads:[~2022-01-26 13:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26  0:58 [PATCH v3 1/3] btrfs: defrag: don't try to merge regular extents with preallocated extents Qu Wenruo
2022-01-26  0:58 ` [PATCH 2/3] btrfs: defrag: use extent_thresh to replace the hardcoded size limit Qu Wenruo
2022-01-26 11:40   ` Filipe Manana
2022-01-26 12:26     ` Qu Wenruo
2022-01-26 12:36       ` Filipe Manana
2022-01-26 13:00         ` Qu Wenruo [this message]
2022-01-26 13:37           ` Filipe Manana
2022-01-26 23:57             ` Qu Wenruo
2022-01-27 10:58               ` Filipe Manana
2022-01-27 11:11                 ` Forza
2022-01-26  0:58 ` [PATCH 3/3] btrfs: defrag: remove the physical adjacent extents rejection in defrag_check_next_extent() Qu Wenruo
2022-01-26 11:44   ` Filipe Manana
2022-01-26 11:26 ` [PATCH v3 1/3] btrfs: defrag: don't try to merge regular extents with preallocated extents Filipe Manana
2022-01-26 11:33   ` Qu Wenruo
2022-01-26 11:47     ` Filipe Manana
2022-01-28  6:31 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc25f2ec-1afc-8cb4-8a01-6416602d45a4@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox