public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zygo Blaxell <zblaxell@furryterror.org>,
	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit
Date: Sat, 12 Mar 2022 11:24:18 +0800	[thread overview]
Message-ID: <59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com> (raw)
In-Reply-To: <YiwIxnCMjsl8BPPA@hungrycats.org>



On 2022/3/12 10:43, Zygo Blaxell wrote:
> On Sat, Mar 12, 2022 at 12:28:10AM +0100, Jan Ziak wrote:
>> On Sat, Mar 12, 2022 at 12:04 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> As stated before, autodefrag is not really that useful for database.
>>
>> Do you realize that you are claiming that btrfs autodefrag should not
>> - by design - be effective in the case of high-fragmentation files? If
>> it isn't supposed to be useful for high-fragmentation files then where
>> is it supposed to be useful? Low-fragmentation files?
>
> IMHO it's best to deprecate the in-kernel autodefrag option, and start
> over with a better approach.  The kernel is the wrong place to solve
> this problem, and the undesirable and unfixable things in autodefrag
> are a consequence of that early design error.

I'm having the same feeling exactly.

Especially the current autodefrag is putting its own policy (transid
filter) without providing a mechanism to utilize from user space.

Exactly the opposite what we should do, provide a mechanism not a policy.

Not to mention there are quite some limitations of the current policy.


But unfortunately, even we deprecate it right now, it will takes a long
time to really remove it from kernel.

While on the other hand, we also need to introduce new parameters like
@newer_than, and @max_to_defrag to the ioctl interface.

Which may already eat up the unused bytes (only 16 bytes, while
newer_than needs u64, max_to_defrag may also need to be u64).

And user space tool lacks one of the critical info, where the small
writes are.

So even I can't be more happier to deprecate the autodefrag, we still
need to hang on it for a pretty lone time, before a user space tool
which can do everything the same as autodefrag.

Thanks,
Qu

>
> As far as I can tell, in-kernel autodefrag's only purpose is to provide
> exposure to new and exciting bugs on each kernel release, and a lot of
> uncontrolled IO demands even when it's working perfectly.  Inevitably,
> re-reading old fragments that are no longer in memory will consume RAM
> and iops during writeback activity, when memory and IO bandwidth is least
> available.  If we avoid expensive re-reading of extents, then we don't
> get a useful rate of reduction of fragmentation, because we can't coalesce
> small new exists with small existing ones.  If we try to fix these issues
> one at a time, the feature would inevitably grow a lot of complicated
> and brittle configuration knobs to turn it off selectively, because it's
> so awful without extensive filtering.
>
> All the above criticism applies to abstract ideal in-kernel autodefrag,
> _before_ considering whether a concrete implementation might have
> limitations or bugs which make it worse than the already-bad best case.
> 5.16 happened to have a lot of examples of these, but fixing the
> regressions can only restore autodefrag's relative harmlessness, not
> add utility within the constraints the kernel is under.
>
> The right place to do autodefrag is userspace.  Interfaces already
> exist for userspace to 1) discover new extents and their neighbors,
> quickly and safely, across the entire filesystem; 2) invoke defrag_range
> on file extent ranges found in step 1; and 3) run a while (true)
> loop that periodically performs steps 1 and 2.  Indeed, the existing
> kernel autodefrag implementation is already using the same back-end
> infrastructure for parts 1 and 2, so all that would be required for
> userspace is to reimplement (and start improving upon) part 3.
>
> A command-line utility or daemon can locate new extents immediately with
> tree_search queries, either at filesystem-wide scales, or directed at
> user-chosen file subsets.  Tools can quickly assess whether new extents
> are good candidates for defrag, then coalesce them with their neighbors.
>
> The user can choose between different tools to decide basic policy
> questions like: whether to run once in a batch job or continuously in
> the background, what amounts of IO bandwidth and memory to consume,
> whether to recompress data with a more aggressive algorithm/level, which
> reference to a snapshot-shared extent should be preferred for defrag,
> file-type-specific layout optimizations to apply, or any custom or
> experimental selection, scheduling, or optimization logic desired.
>
> Implementations can be kept simple because it's not necessary for
> userspace tools to pile every possible option into a single implementation,
> and support every released option forever (as required for the kernel).
> A specialist implementation can discard existing code with impunity or
> start from scratch with an experimental algorithm, and spend its life
> in a fork of the main userspace autodefrag project with niche users
> who never have to cope with generic users' use cases and vice versa.
> This efficiently distributes development and maintenance costs.
>
> Userspace autodefrag can be implemented today in any programming language
> with btrfs ioctl support, and run on any kernel released in the last
> 6 years.  Alas, I don't know of anybody who's released a userspace
> autodefrag tool yet, and it hasn't been important enough to me to build
> one myself (other than a few proof-of-concept prototypes).
>
> For now, I do defrag mostly ad-hoc with 'btrfs fi defrag' on the most
> severely fragmented files (top N list of files with the highest extent
> counts on the filesystem), and ignore fragmentation everywhere else.
>
>
>> -Jan

  reply	other threads:[~2022-03-12  3:24 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-06 15:59 Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit Jan Ziak
2022-03-07  0:48 ` Qu Wenruo
2022-03-07  2:23   ` Jan Ziak
2022-03-07  2:39     ` Qu Wenruo
2022-03-07  7:31       ` Qu Wenruo
2022-03-10  1:10         ` Jan Ziak
2022-03-10  1:26           ` Qu Wenruo
2022-03-10  4:33             ` Jan Ziak
2022-03-10  6:42               ` Qu Wenruo
2022-03-10 21:31                 ` Jan Ziak
2022-03-10 23:27                   ` Qu Wenruo
2022-03-11  2:42                     ` Jan Ziak
2022-03-11  2:59                       ` Qu Wenruo
2022-03-11  5:04                         ` Jan Ziak
2022-03-11 16:31                           ` Jan Ziak
2022-03-11 20:02                             ` Jan Ziak
2022-03-11 23:04                             ` Qu Wenruo
2022-03-11 23:28                               ` Jan Ziak
2022-03-11 23:39                                 ` Qu Wenruo
2022-03-12  0:01                                   ` Jan Ziak
2022-03-12  0:15                                     ` Qu Wenruo
2022-03-12  3:16                                     ` Zygo Blaxell
2022-03-12  2:43                                 ` Zygo Blaxell
2022-03-12  3:24                                   ` Qu Wenruo [this message]
2022-03-12  3:48                                     ` Zygo Blaxell
2022-03-14 20:09                         ` Phillip Susi
2022-03-14 22:59                           ` Zygo Blaxell
2022-03-15 18:28                             ` Phillip Susi
2022-03-15 19:28                               ` Jan Ziak
2022-03-15 21:06                               ` Zygo Blaxell
2022-03-15 22:20                                 ` Jan Ziak
2022-03-16 17:02                                   ` Zygo Blaxell
2022-03-16 17:48                                     ` Jan Ziak
2022-03-17  2:11                                       ` Zygo Blaxell
2022-03-16 18:46                                 ` Phillip Susi
2022-03-16 19:59                                   ` Zygo Blaxell
2022-03-20 17:50                             ` Forza
2022-03-20 21:15                               ` Zygo Blaxell
2022-03-08 21:57       ` Jan Ziak
2022-03-08 23:40         ` Qu Wenruo
2022-03-09 22:22           ` Jan Ziak
2022-03-09 22:44             ` Qu Wenruo
2022-03-09 22:55               ` Jan Ziak
2022-03-09 23:00                 ` Jan Ziak
2022-03-09  4:48         ` Zygo Blaxell
2022-03-07 14:30 ` Phillip Susi
2022-03-08 21:43   ` Jan Ziak
2022-03-09 18:46     ` Phillip Susi
2022-03-09 21:35       ` Jan Ziak
2022-03-14 20:02         ` Phillip Susi
2022-03-14 21:53           ` Jan Ziak
2022-03-14 22:24             ` Remi Gauvin
2022-03-14 22:51               ` Zygo Blaxell
2022-03-14 23:07                 ` Remi Gauvin
2022-03-14 23:39                   ` Zygo Blaxell
2022-03-15 14:14                     ` Remi Gauvin
2022-03-15 18:51                       ` Zygo Blaxell
2022-03-15 19:22                         ` Remi Gauvin
2022-03-15 21:08                           ` Zygo Blaxell
2022-03-15 18:15             ` Phillip Susi
2022-03-16 16:52           ` Andrei Borzenkov
2022-03-16 18:28             ` Jan Ziak
2022-03-16 18:31             ` Phillip Susi
2022-03-16 18:43               ` Andrei Borzenkov
2022-03-16 18:46               ` Jan Ziak
2022-03-16 19:04               ` Zygo Blaxell
2022-03-17 20:34                 ` Phillip Susi
2022-03-17 22:06                   ` Zygo Blaxell
2022-03-16 12:47 ` Kai Krakow
2022-03-16 18:18   ` Jan Ziak
  -- strict thread matches above, loose matches on Subject: below --
2022-06-17  0:20 Jan Ziak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=0xe2.0x9a.0x9b@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zblaxell@furryterror.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox