From: Su Yue <l@damenly.su>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Cc: jd.girard@sysnux.pf
Cc: ben.r.xiao@gmail.com
Subject: Re: [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges
Date: Tue, 22 Feb 2022 09:10:45 +0800 [thread overview]
Message-ID: <fsobsnaw.fsf@damenly.su> (raw)
In-Reply-To: <cover.1644737297.git.wqu@suse.com>
On Sun 13 Feb 2022 at 15:42, Qu Wenruo <wqu@suse.com> wrote:
> When a small write reaches disk, btrfs will mark the inode for
> autodefrag, and record the transid of the inode for autodefrag.
>
> Then autodefrag uses the transid to only scan newer file
> extents.
>
> However this transid based scanning scheme has a hole, the small
> write
> size threshold triggering autodefrag is not the same extent size
> threshold for autodefrag.
>
> For the following write sequence on an non-compressed inode:
>
> pwrite 0 4k
> sync
> pwrite 4k 128k
> sync
> pwrite 132k 128k
> sync.
>
> The first 4K is indeed a small write (<64K), but the later two
> 128K ones
> are definite not (>64K).
>
> Hoever autodefrag will try to defrag all three writes, as the
> extent_threshold used for autodefrag is fixed 256K.
>
> This extra scanning on extents which didn't trigger autodefrag
> can cause
> extra IO.
>
> This patchset will try to address the problem by:
>
> - Remove the inode_defrag re-queue behavior
> Now we just scan one file til its end (while keep the
> max_sectors_to_defrag limit, and frequently check the root
> refs, and
> remount situation to exit).
>
> This also saves several bytes from inode_defrag structure.
>
> This is done in the 3rd patch.
>
> - Save @small_write value into inode_defrag and use it as
> autodefrag
> extent threshold
> Now there is no gap for autodefrag and small_write.
>
> This is done in the 4th patch.
>
> The remaining patches are:
>
> - Removing one dead parameter
>
> - Add extra trace events for autodefrag
> So end users will no longer need to re-compile kernel modules,
> and
> use trace events to provide debug info on the
> autodefrag/defrag ioctl.
>
> Unfortunately I don't have a good benchmark setup for the
> patchset yet,
> but unlike previous RFC version, this one brings very little
> extra
> resource usage, and is just changing the extent_threshold for
> autodefrag.
>
> Changelog:
> RFC->v1:
> - Add ftrace events for defrag
>
> - Add a new patch to change how we run defrag inodes
> Instead of saving previous location and re-queue, just run it
> in one
> run.
> Previously btrfs_run_defrag_inodse() will always exhaust the
> existing
> inode_defrag anyway, the change should not bring much
> difference.
>
> - Change autodefrag extent_thresh to close the gap, other than
> using
> another extent io tree
> Now it uses less resource, keep the critical section small,
> while
> can almost reach the same objective.
>
> Qu Wenruo (4):
> btrfs: remove unused parameter for btrfs_add_inode_defrag()
> btrfs: add trace events for defrag
> btrfs: autodefrag: only scan one inode once
> btrfs: close the gap between inode_should_defrag() and
> autodefrag
> extent size threshold
>
Cc the reporters.
It was about ~20 days agao when I saw the report about autodefrag
causes
IO burning. Then I turned the autodefrag option of the raid1 btrfs
on
my NAS to do some experimental tests.
At first time, I tried to download files in ~20GB size but iotop
showed
everything was fine. And the autodefrag was left in /etc/fstab.
When I was back from my vacation, it surprised me that my 4TB size
disk
has been written about ~70TB and iotop showed btrfs-cleaner is
eating
the whole disk io bandwidth.
And after switched kernel 5.16.8-arch1-1 from to Qu's
autodefrag_fixes[1],
I'd say the btrfs on my NAS works well(no panic of course) and no
more
IO burning occurs.
https://github.com/adam900710/linux/tree/autodefrag_fixes
--
Su
> fs/btrfs/ctree.h | 3 +-
> fs/btrfs/file.c | 165
> +++++++++++++++--------------------
> fs/btrfs/inode.c | 4 +-
> fs/btrfs/ioctl.c | 4 +
> include/trace/events/btrfs.h | 127 +++++++++++++++++++++++++++
> 5 files changed, 206 insertions(+), 97 deletions(-)
prev parent reply other threads:[~2022-02-22 1:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-13 7:42 [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-13 7:42 ` [PATCH 1/4] btrfs: remove unused parameter for btrfs_add_inode_defrag() Qu Wenruo
2022-02-13 7:42 ` [PATCH 2/4] btrfs: add trace events for defrag Qu Wenruo
2022-02-13 7:42 ` [PATCH 3/4] btrfs: autodefrag: only scan one inode once Qu Wenruo
2022-02-22 17:32 ` David Sterba
2022-02-22 23:42 ` Qu Wenruo
2022-02-23 15:53 ` David Sterba
2022-02-24 6:59 ` Qu Wenruo
2022-02-24 9:45 ` Qu Wenruo
2022-02-24 12:18 ` Qu Wenruo
2022-02-24 19:44 ` David Sterba
2022-02-24 19:41 ` David Sterba
2022-02-13 7:42 ` [PATCH 4/4] btrfs: close the gap between inode_should_defrag() and autodefrag extent size threshold Qu Wenruo
2022-02-15 6:55 ` [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-22 1:10 ` Su Yue [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fsobsnaw.fsf@damenly.su \
--to=l@damenly.su \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox