Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Zygo Blaxell <zblaxell@furryterror.org>
To: Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit
Date: Fri, 11 Mar 2022 21:43:18 -0500	[thread overview]
Message-ID: <YiwIxnCMjsl8BPPA@hungrycats.org> (raw)
In-Reply-To: <CAODFU0r=9i2mOwNXVx74GcKUSt4Z6wGqshgD=5bktFhoXCWE4A@mail.gmail.com>

On Sat, Mar 12, 2022 at 12:28:10AM +0100, Jan Ziak wrote:
> On Sat, Mar 12, 2022 at 12:04 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> > As stated before, autodefrag is not really that useful for database.
> 
> Do you realize that you are claiming that btrfs autodefrag should not
> - by design - be effective in the case of high-fragmentation files? If
> it isn't supposed to be useful for high-fragmentation files then where
> is it supposed to be useful? Low-fragmentation files?

IMHO it's best to deprecate the in-kernel autodefrag option, and start
over with a better approach.  The kernel is the wrong place to solve
this problem, and the undesirable and unfixable things in autodefrag
are a consequence of that early design error.

As far as I can tell, in-kernel autodefrag's only purpose is to provide
exposure to new and exciting bugs on each kernel release, and a lot of
uncontrolled IO demands even when it's working perfectly.  Inevitably,
re-reading old fragments that are no longer in memory will consume RAM
and iops during writeback activity, when memory and IO bandwidth is least
available.  If we avoid expensive re-reading of extents, then we don't
get a useful rate of reduction of fragmentation, because we can't coalesce
small new exists with small existing ones.  If we try to fix these issues
one at a time, the feature would inevitably grow a lot of complicated
and brittle configuration knobs to turn it off selectively, because it's
so awful without extensive filtering.

All the above criticism applies to abstract ideal in-kernel autodefrag,
_before_ considering whether a concrete implementation might have
limitations or bugs which make it worse than the already-bad best case.
5.16 happened to have a lot of examples of these, but fixing the
regressions can only restore autodefrag's relative harmlessness, not
add utility within the constraints the kernel is under.

The right place to do autodefrag is userspace.  Interfaces already
exist for userspace to 1) discover new extents and their neighbors,
quickly and safely, across the entire filesystem; 2) invoke defrag_range
on file extent ranges found in step 1; and 3) run a while (true)
loop that periodically performs steps 1 and 2.  Indeed, the existing
kernel autodefrag implementation is already using the same back-end
infrastructure for parts 1 and 2, so all that would be required for
userspace is to reimplement (and start improving upon) part 3.

A command-line utility or daemon can locate new extents immediately with
tree_search queries, either at filesystem-wide scales, or directed at
user-chosen file subsets.  Tools can quickly assess whether new extents
are good candidates for defrag, then coalesce them with their neighbors.

The user can choose between different tools to decide basic policy
questions like: whether to run once in a batch job or continuously in
the background, what amounts of IO bandwidth and memory to consume,
whether to recompress data with a more aggressive algorithm/level, which
reference to a snapshot-shared extent should be preferred for defrag,
file-type-specific layout optimizations to apply, or any custom or
experimental selection, scheduling, or optimization logic desired.

Implementations can be kept simple because it's not necessary for
userspace tools to pile every possible option into a single implementation,
and support every released option forever (as required for the kernel).
A specialist implementation can discard existing code with impunity or
start from scratch with an experimental algorithm, and spend its life
in a fork of the main userspace autodefrag project with niche users
who never have to cope with generic users' use cases and vice versa.
This efficiently distributes development and maintenance costs.

Userspace autodefrag can be implemented today in any programming language
with btrfs ioctl support, and run on any kernel released in the last
6 years.  Alas, I don't know of anybody who's released a userspace
autodefrag tool yet, and it hasn't been important enough to me to build
one myself (other than a few proof-of-concept prototypes).

For now, I do defrag mostly ad-hoc with 'btrfs fi defrag' on the most
severely fragmented files (top N list of files with the highest extent
counts on the filesystem), and ignore fragmentation everywhere else.

> -Jan

next prev parent reply	other threads:[~2022-03-12  3:07 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-06 15:59 Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit Jan Ziak
2022-03-07  0:48 ` Qu Wenruo
2022-03-07  2:23   ` Jan Ziak
2022-03-07  2:39     ` Qu Wenruo
2022-03-07  7:31       ` Qu Wenruo
2022-03-10  1:10         ` Jan Ziak
2022-03-10  1:26           ` Qu Wenruo
2022-03-10  4:33             ` Jan Ziak
2022-03-10  6:42               ` Qu Wenruo
2022-03-10 21:31                 ` Jan Ziak
2022-03-10 23:27                   ` Qu Wenruo
2022-03-11  2:42                     ` Jan Ziak
2022-03-11  2:59                       ` Qu Wenruo
2022-03-11  5:04                         ` Jan Ziak
2022-03-11 16:31                           ` Jan Ziak
2022-03-11 20:02                             ` Jan Ziak
2022-03-11 23:04                             ` Qu Wenruo
2022-03-11 23:28                               ` Jan Ziak
2022-03-11 23:39                                 ` Qu Wenruo
2022-03-12  0:01                                   ` Jan Ziak
2022-03-12  0:15                                     ` Qu Wenruo
2022-03-12  3:16                                     ` Zygo Blaxell
2022-03-12  2:43                                 ` Zygo Blaxell [this message]
2022-03-12  3:24                                   ` Qu Wenruo
2022-03-12  3:48                                     ` Zygo Blaxell
2022-03-14 20:09                         ` Phillip Susi
2022-03-14 22:59                           ` Zygo Blaxell
2022-03-15 18:28                             ` Phillip Susi
2022-03-15 19:28                               ` Jan Ziak
2022-03-15 21:06                               ` Zygo Blaxell
2022-03-15 22:20                                 ` Jan Ziak
2022-03-16 17:02                                   ` Zygo Blaxell
2022-03-16 17:48                                     ` Jan Ziak
2022-03-17  2:11                                       ` Zygo Blaxell
2022-03-16 18:46                                 ` Phillip Susi
2022-03-16 19:59                                   ` Zygo Blaxell
2022-03-20 17:50                             ` Forza
2022-03-20 21:15                               ` Zygo Blaxell
2022-03-08 21:57       ` Jan Ziak
2022-03-08 23:40         ` Qu Wenruo
2022-03-09 22:22           ` Jan Ziak
2022-03-09 22:44             ` Qu Wenruo
2022-03-09 22:55               ` Jan Ziak
2022-03-09 23:00                 ` Jan Ziak
2022-03-09  4:48         ` Zygo Blaxell
2022-03-07 14:30 ` Phillip Susi
2022-03-08 21:43   ` Jan Ziak
2022-03-09 18:46     ` Phillip Susi
2022-03-09 21:35       ` Jan Ziak
2022-03-14 20:02         ` Phillip Susi
2022-03-14 21:53           ` Jan Ziak
2022-03-14 22:24             ` Remi Gauvin
2022-03-14 22:51               ` Zygo Blaxell
2022-03-14 23:07                 ` Remi Gauvin
2022-03-14 23:39                   ` Zygo Blaxell
2022-03-15 14:14                     ` Remi Gauvin
2022-03-15 18:51                       ` Zygo Blaxell
2022-03-15 19:22                         ` Remi Gauvin
2022-03-15 21:08                           ` Zygo Blaxell
2022-03-15 18:15             ` Phillip Susi
2022-03-16 16:52           ` Andrei Borzenkov
2022-03-16 18:28             ` Jan Ziak
2022-03-16 18:31             ` Phillip Susi
2022-03-16 18:43               ` Andrei Borzenkov
2022-03-16 18:46               ` Jan Ziak
2022-03-16 19:04               ` Zygo Blaxell
2022-03-17 20:34                 ` Phillip Susi
2022-03-17 22:06                   ` Zygo Blaxell
2022-03-16 12:47 ` Kai Krakow
2022-03-16 18:18   ` Jan Ziak
  -- strict thread matches above, loose matches on Subject: below --
2022-06-17  0:20 Jan Ziak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiwIxnCMjsl8BPPA@hungrycats.org \
    --to=zblaxell@furryterror.org \
    --cc=0xe2.0x9a.0x9b@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox