public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Su Yue <l@damenly.su>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Cc: jd.girard@sysnux.pf
Cc: ben.r.xiao@gmail.com
Subject: Re: [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges
Date: Tue, 22 Feb 2022 09:10:45 +0800	[thread overview]
Message-ID: <fsobsnaw.fsf@damenly.su> (raw)
In-Reply-To: <cover.1644737297.git.wqu@suse.com>


On Sun 13 Feb 2022 at 15:42, Qu Wenruo <wqu@suse.com> wrote:

> When a small write reaches disk, btrfs will mark the inode for
> autodefrag, and record the transid of the inode for autodefrag.
>
> Then autodefrag uses the transid to only scan newer file 
> extents.
>
> However this transid based scanning scheme has a hole, the small 
> write
> size threshold triggering autodefrag is not the same extent size
> threshold for autodefrag.
>
> For the following write sequence on an non-compressed inode:
>
>  pwrite 0 4k
>  sync
>  pwrite 4k 128k
>  sync
>  pwrite 132k 128k
>  sync.
>
> The first 4K is indeed a small write (<64K), but the later two 
> 128K ones
> are definite not (>64K).
>
> Hoever autodefrag will try to defrag all three writes, as the
> extent_threshold used for autodefrag is fixed 256K.
>
> This extra scanning on extents which didn't trigger autodefrag 
> can cause
> extra IO.
>
> This patchset will try to address the problem by:
>
> - Remove the inode_defrag re-queue behavior
>   Now we just scan one file til its end (while keep the
>   max_sectors_to_defrag limit, and frequently check the root 
>   refs, and
>   remount situation to exit).
>
>   This also saves several bytes from inode_defrag structure.
>
>   This is done in the 3rd patch.
>
> - Save @small_write value into inode_defrag and use it as 
> autodefrag
>   extent threshold
>   Now there is no gap for autodefrag and small_write.
>
>   This is done in the 4th patch.
>
> The remaining patches are:
>
> - Removing one dead parameter
>
> - Add extra trace events for autodefrag
>   So end users will no longer need to re-compile kernel modules, 
>   and
>   use trace events to provide debug info on the 
>   autodefrag/defrag ioctl.
>
> Unfortunately I don't have a good benchmark setup for the 
> patchset yet,
> but unlike previous RFC version, this one brings very little 
> extra
> resource usage, and is just changing the extent_threshold for
> autodefrag.
>
> Changelog:
> RFC->v1:
> - Add ftrace events for defrag
>
> - Add a new patch to change how we run defrag inodes
>   Instead of saving previous location and re-queue, just run it 
>   in one
>   run.
>   Previously btrfs_run_defrag_inodse() will always exhaust the 
>   existing
>   inode_defrag anyway, the change should not bring much 
>   difference.
>
> - Change autodefrag extent_thresh to close the gap, other than 
> using
>   another extent io tree
>   Now it uses less resource, keep the critical section small, 
>   while
>   can almost reach the same objective.
>
> Qu Wenruo (4):
>   btrfs: remove unused parameter for btrfs_add_inode_defrag()
>   btrfs: add trace events for defrag
>   btrfs: autodefrag: only scan one inode once
>   btrfs: close the gap between inode_should_defrag() and 
>   autodefrag
>     extent size threshold
>

Cc the reporters.

It was about ~20 days agao when I saw the report about autodefrag 
causes
IO burning. Then I turned the autodefrag option of the raid1 btrfs 
on
my NAS to do some experimental tests.

At first time, I tried to download files in ~20GB size but iotop 
showed
everything was fine. And the autodefrag was left in /etc/fstab.
When I was back from my vacation, it surprised me that my 4TB size 
disk
has been written about ~70TB and iotop showed btrfs-cleaner is 
eating
the whole disk io bandwidth.

And after switched kernel 5.16.8-arch1-1 from to Qu's 
autodefrag_fixes[1],
I'd say the btrfs on my NAS works well(no panic of course) and no 
more
IO burning occurs.

https://github.com/adam900710/linux/tree/autodefrag_fixes
--
Su
>  fs/btrfs/ctree.h             |   3 +-
>  fs/btrfs/file.c              | 165 
>  +++++++++++++++--------------------
>  fs/btrfs/inode.c             |   4 +-
>  fs/btrfs/ioctl.c             |   4 +
>  include/trace/events/btrfs.h | 127 +++++++++++++++++++++++++++
>  5 files changed, 206 insertions(+), 97 deletions(-)

      parent reply	other threads:[~2022-02-22  1:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-13  7:42 [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-13  7:42 ` [PATCH 1/4] btrfs: remove unused parameter for btrfs_add_inode_defrag() Qu Wenruo
2022-02-13  7:42 ` [PATCH 2/4] btrfs: add trace events for defrag Qu Wenruo
2022-02-13  7:42 ` [PATCH 3/4] btrfs: autodefrag: only scan one inode once Qu Wenruo
2022-02-22 17:32   ` David Sterba
2022-02-22 23:42     ` Qu Wenruo
2022-02-23 15:53       ` David Sterba
2022-02-24  6:59         ` Qu Wenruo
2022-02-24  9:45           ` Qu Wenruo
2022-02-24 12:18             ` Qu Wenruo
2022-02-24 19:44               ` David Sterba
2022-02-24 19:41           ` David Sterba
2022-02-13  7:42 ` [PATCH 4/4] btrfs: close the gap between inode_should_defrag() and autodefrag extent size threshold Qu Wenruo
2022-02-15  6:55 ` [PATCH 0/4] btrfs: make autodefrag to defrag and only defrag small write ranges Qu Wenruo
2022-02-22  1:10 ` Su Yue [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fsobsnaw.fsf@damenly.su \
    --to=l@damenly.su \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox