From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges
Date: Tue, 15 Feb 2022 14:55:17 +0800 [thread overview]
Message-ID: <870cb1f0-4108-75d5-6b45-e6a26a2be3d2@suse.com> (raw)
In-Reply-To: <cover.1644737297.git.wqu@suse.com>
On 2022/2/13 15:42, Qu Wenruo wrote:
> When a small write reaches disk, btrfs will mark the inode for
> autodefrag, and record the transid of the inode for autodefrag.
>
> Then autodefrag uses the transid to only scan newer file extents.
>
> However this transid based scanning scheme has a hole, the small write
> size threshold triggering autodefrag is not the same extent size
> threshold for autodefrag.
>
> For the following write sequence on an non-compressed inode:
>
> pwrite 0 4k
> sync
> pwrite 4k 128k
> sync
> pwrite 132k 128k
> sync.
>
> The first 4K is indeed a small write (<64K), but the later two 128K ones
> are definite not (>64K).
>
> Hoever autodefrag will try to defrag all three writes, as the
> extent_threshold used for autodefrag is fixed 256K.
>
> This extra scanning on extents which didn't trigger autodefrag can cause
> extra IO.
>
> This patchset will try to address the problem by:
>
> - Remove the inode_defrag re-queue behavior
> Now we just scan one file til its end (while keep the
> max_sectors_to_defrag limit, and frequently check the root refs, and
> remount situation to exit).
>
> This also saves several bytes from inode_defrag structure.
>
> This is done in the 3rd patch.
>
> - Save @small_write value into inode_defrag and use it as autodefrag
> extent threshold
> Now there is no gap for autodefrag and small_write.
>
> This is done in the 4th patch.
>
> The remaining patches are:
>
> - Removing one dead parameter
>
> - Add extra trace events for autodefrag
> So end users will no longer need to re-compile kernel modules, and
> use trace events to provide debug info on the autodefrag/defrag ioctl.
>
> Unfortunately I don't have a good benchmark setup for the patchset yet,
> but unlike previous RFC version, this one brings very little extra
> resource usage, and is just changing the extent_threshold for
> autodefrag.
Got a small benchmark result for it.
Using the following fio job:
[torrent]
filename=torrent-test
rw=randwrite
ioengine=sync
size=4g
randseed=123456
allrandrepeat=1
fallocate=none
And the VM only has 2G ram.
This should really be the worst case scenario.
Then the full benchmark includes:
start_trace()
{
echo 0 > $tracedir/tracing_on
echo > $tracedir/trace
#echo > $tracedir/trace_options
echo > $tracedir/set_event
echo "btrfs:defrag_file_end" >> $tracedir/set_event
echo 1 > $tracedir/tracing_on
}
end_trace()
{
cp $tracedir/trace /home/adam
echo 0 > $tracedir/tracing_on
}
mkfs.btrfs -f $dev
start_trace
mount $dev $mnt -o autodefrag
cd $mnt
fio /home/adam/torrent.fio
cd
umount $mnt
end_trace
With all defragged sectors accounted, before the last two patches:
Total sectors defragged = 6846831
Total defrag_file() calls = 6701
After the last two patches:
Total sectors defragged = 3466851
Total defrag_file() calls = 3396
Which shows an obvious drop in the sectors marked for autodefrag.
Thanks,
Qu
>
> Changelog:
> RFC->v1:
> - Add ftrace events for defrag
>
> - Add a new patch to change how we run defrag inodes
> Instead of saving previous location and re-queue, just run it in one
> run.
> Previously btrfs_run_defrag_inodse() will always exhaust the existing
> inode_defrag anyway, the change should not bring much difference.
>
> - Change autodefrag extent_thresh to close the gap, other than using
> another extent io tree
> Now it uses less resource, keep the critical section small, while
> can almost reach the same objective.
>
> Qu Wenruo (4):
> btrfs: remove unused parameter for btrfs_add_inode_defrag()
> btrfs: add trace events for defrag
> btrfs: autodefrag: only scan one inode once
> btrfs: close the gap between inode_should_defrag() and autodefrag
> extent size threshold
>
> fs/btrfs/ctree.h | 3 +-
> fs/btrfs/file.c | 165 +++++++++++++++--------------------
> fs/btrfs/inode.c | 4 +-
> fs/btrfs/ioctl.c | 4 +
> include/trace/events/btrfs.h | 127 +++++++++++++++++++++++++++
> 5 files changed, 206 insertions(+), 97 deletions(-)
>
next prev parent reply other threads:[~2022-02-15 6:55 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-13 7:42 [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-13 7:42 ` [PATCH 1/4] btrfs: remove unused parameter for btrfs_add_inode_defrag() Qu Wenruo
2022-02-13 7:42 ` [PATCH 2/4] btrfs: add trace events for defrag Qu Wenruo
2022-02-13 7:42 ` [PATCH 3/4] btrfs: autodefrag: only scan one inode once Qu Wenruo
2022-02-22 17:32 ` David Sterba
2022-02-22 23:42 ` Qu Wenruo
2022-02-23 15:53 ` David Sterba
2022-02-24 6:59 ` Qu Wenruo
2022-02-24 9:45 ` Qu Wenruo
2022-02-24 12:18 ` Qu Wenruo
2022-02-24 19:44 ` David Sterba
2022-02-24 19:41 ` David Sterba
2022-02-13 7:42 ` [PATCH 4/4] btrfs: close the gap between inode_should_defrag() and autodefrag extent size threshold Qu Wenruo
2022-02-15 6:55 ` Qu Wenruo [this message]
2022-02-22 1:10 ` [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Su Yue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=870cb1f0-4108-75d5-6b45-e6a26a2be3d2@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox