From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Filipe Manana <fdmanana@kernel.org>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org,
Christoph Anton Mitterer <calestyo@scientia.org>
Subject: Re: [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list
Date: Wed, 7 Feb 2024 07:11:08 +1030 [thread overview]
Message-ID: <54a1bb50-7fd2-440f-8563-a82c54bb2179@gmx.com> (raw)
In-Reply-To: <CAL3q7H72pQ=3wPZpN9zow8+G4xnhSP5UKH1ev808Y5GYYB2BQw@mail.gmail.com>
On 2024/2/7 02:53, Filipe Manana wrote:
> On Mon, Feb 5, 2024 at 11:46 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> The following script can lead to a very under utilized extent and we
>> have no way to use defrag to properly reclaim its wasted space:
>>
>> # mkfs.btrfs -f $dev
>> # mount $dev $mnt
>> # xfs_io -f -c "pwrite 0 128M" $mnt/foobar
>> # sync
>> # truncate -s 4k $mnt/foobar
>> # btrfs filesystem defrag $mnt/foobar
>> # sync
>>
>> After the above operations, the file "foobar" is still utilizing the
>> whole 128M:
>>
>> item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160
>> generation 7 transid 8 size 4096 nbytes 4096
>> block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
>> sequence 32770 flags 0x0(none)
>> item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14
>> index 2 namelen 4 name: file
>> item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>> generation 7 type 1 (regular)
>> extent data disk byte 298844160 nr 134217728 <<<
>> extent data offset 0 nr 4096 ram 134217728
>> extent compression 0 (none)
>>
>> This is the common btrfs bookend behavior, that 128M extent would only
>> be freed if the last referencer of the extent is freed.
>>
>> The problem is, normally we would go defrag to free that 128M extent,
>> but defrag would not touch the extent at all.
>>
>> [CAUSE]
>> The file extent has no adjacent extent at all, thus all existing defrag
>> code consider it a perfectly good file extent, even if it's only
>> utilizing a very tiny amount of space.
>>
>> [FIX]
>> For a file extent without any adjacent file extent, we should still
>> consider to defrag such under utilized extent, base on the following
>> conditions:
>>
>> - utilization ratio
>> If the extent is utilizing less than 1/16 of the on-disk extent size,
>> then it would be a defrag target.
>>
>> - wasted space
>> If we defrag the extent and can free at least 16MiB, then it would be
>> a defrag target.
>>
>> Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> As I said in my previous review, please include the lore link to the
> report, it's always useful.
>
>> ---
>> fs/btrfs/defrag.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 42 insertions(+)
>>
>> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
>> index 8fc8118c3225..85c6e45d0cd4 100644
>> --- a/fs/btrfs/defrag.c
>> +++ b/fs/btrfs/defrag.c
>> @@ -950,6 +950,38 @@ struct defrag_target_range {
>> u64 len;
>> };
>>
>> +/*
>> + * Special entry for extents that do not have any adjacent extents.
>> + *
>> + * This is for cases like the only and truncated extent of a file.
>> + * Normally they won't be defraged at all (as they won't be merged with
>> + * any adjacent ones), but we may still want to defrag them, to free up
>> + * some space if possible.
>> + */
>> +static bool should_defrag_under_utilized(struct extent_map *em)
>
> Can be made const.
>
>> +{
>> + /*
>> + * Ratio based check.
>> + *
>> + * If the current extent is only utilizing 1/16 of its on-disk size,
>> + * it's definitely under-utilized, and defragging it may free up
>> + * the whole extent.
>> + */
>> + if (em->len < em->orig_block_len / 16)
>> + return true;
>> +
>> + /*
>> + * Wasted space based check.
>> + *
>> + * If we can free up at least 16MiB, then it may be a good idea
>> + * to defrag.
>> + */
>> + if (em->len < em->orig_block_len &&
>> + em->orig_block_len - em->len > SZ_16M)
>
> The first check, len < orig_block_len, is redundant, isn't it?
> em->len can only be less than or equals to em->orig_block_len.
> The second condition is enough to have.
Not exactly, don't forget compressed file extents.
>
>> + return true;
>> + return false;
>> +}
>> +
>> /*
>> * Collect all valid target extents.
>> *
>> @@ -1070,6 +1102,16 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
>> if (!next_mergeable) {
>> struct defrag_target_range *last;
>>
>> + /*
>> + * This is a single extent without any chance to merge
>> + * with any adjacent extent.
>> + *
>> + * But if we may free up some space, it is still worth
>> + * defragging.
>> + */
>> + if (should_defrag_under_utilized(em))
>> + goto add;
>> +
>
> So this logic is making some cases worse actually, making us use more
> disk space.
> For example:
>
> DEV=/dev/sdi
> MNT=/mnt/sdi
>
> mkfs.btrfs -f $DEV
> mount $DEV $MNT
>
> xfs_io -f -c "pwrite 0 128M" $MNT/foobar
> sync
> xfs_io -c "truncate 40M" $MNT/foobar
> btrfs filesystem defrag $MNT/foobar
>
> After this patch, we get:
>
> item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
> generation 7 type 1 (regular)
> extent data disk byte 1104150528 nr 134217728
> extent data offset 0 nr 8650752 ram 134217728
> extent compression 0 (none)
> item 7 key (257 EXTENT_DATA 8650752) itemoff 15757 itemsize 53
> generation 8 type 1 (regular)
> extent data disk byte 1238368256 nr 33292288
> extent data offset 0 nr 33292288 ram 33292288
> extent compression 0 (none)
This behavior is unexpected, as we should redirty the whole 40M, but the
first 8.25M didn't got re-dirtied is a big problem to me.
Will look into the situation.
>
> So we're now using 128M + 32M of disk space where before defrag we used 128M.
> Before the defrag we had:
>
> item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
> generation 7 type 1 (regular)
> extent data disk byte 1104150528 nr 134217728
> extent data offset 0 nr 41943040 ram 134217728
> extent compression 0 (none)
>
> So something's not good in this logic.
>
> Also, there's something else worth considering here, which is extent sharedness.
This is a known problem for defrag, thus for snapshot/reflink it's
really on the end user to determine whether they really need defrag.
Thanks,
Qu
> If an extent is shared we don't want to always dirty the respective
> range and rewrite it, as that may result in using more disk space...
>
> Example:
>
> xfs_io -f -c "pwrite 0 128M" $MNT/foobar
> btrfs subvolume snapshot $MNT $MNT/snap
> xfs_io -c "truncate 16M" $MNT/foobar
> btrfs filesystem defrag $MNT/foobar
>
> We end up consuming an additional 16M of data without any benefits.
> Without this patch, this wouldn't happen.
>
> Thanks.
>
>
>> /* Empty target list, no way to merge with last entry */
>> if (list_empty(target_list))
>> goto next;
>> --
>> 2.43.0
>>
>>
>
next prev parent reply other threads:[~2024-02-06 20:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 23:45 [PATCH 0/2] btrfs: defrag: better lone extent handling Qu Wenruo
2024-02-05 23:45 ` [PATCH 1/2] btrfs: defrag: add under utilized extent to defrag target list Qu Wenruo
2024-02-06 16:23 ` Filipe Manana
2024-02-06 20:41 ` Qu Wenruo [this message]
2024-02-06 22:26 ` Qu Wenruo
2024-02-07 12:19 ` Filipe Manana
2024-02-05 23:45 ` [PATCH 2/2] btrfs: defrag: allow fine-tuning on lone extent defrag behavior Qu Wenruo
2024-02-06 17:03 ` Filipe Manana
2024-02-06 20:50 ` Qu Wenruo
2024-02-07 12:27 ` Filipe Manana
2024-02-07 17:40 ` Christoph Anton Mitterer
2024-02-06 20:27 ` [PATCH 0/2] btrfs: defrag: better lone extent handling Christoph Anton Mitterer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54a1bb50-7fd2-440f-8563-a82c54bb2179@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=calestyo@scientia.org \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox