From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zygo Blaxell <zblaxell@furryterror.org>,
Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit
Date: Sat, 12 Mar 2022 11:24:18 +0800 [thread overview]
Message-ID: <59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com> (raw)
In-Reply-To: <YiwIxnCMjsl8BPPA@hungrycats.org>
On 2022/3/12 10:43, Zygo Blaxell wrote:
> On Sat, Mar 12, 2022 at 12:28:10AM +0100, Jan Ziak wrote:
>> On Sat, Mar 12, 2022 at 12:04 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> As stated before, autodefrag is not really that useful for database.
>>
>> Do you realize that you are claiming that btrfs autodefrag should not
>> - by design - be effective in the case of high-fragmentation files? If
>> it isn't supposed to be useful for high-fragmentation files then where
>> is it supposed to be useful? Low-fragmentation files?
>
> IMHO it's best to deprecate the in-kernel autodefrag option, and start
> over with a better approach. The kernel is the wrong place to solve
> this problem, and the undesirable and unfixable things in autodefrag
> are a consequence of that early design error.
I'm having the same feeling exactly.
Especially the current autodefrag is putting its own policy (transid
filter) without providing a mechanism to utilize from user space.
Exactly the opposite what we should do, provide a mechanism not a policy.
Not to mention there are quite some limitations of the current policy.
But unfortunately, even we deprecate it right now, it will takes a long
time to really remove it from kernel.
While on the other hand, we also need to introduce new parameters like
@newer_than, and @max_to_defrag to the ioctl interface.
Which may already eat up the unused bytes (only 16 bytes, while
newer_than needs u64, max_to_defrag may also need to be u64).
And user space tool lacks one of the critical info, where the small
writes are.
So even I can't be more happier to deprecate the autodefrag, we still
need to hang on it for a pretty lone time, before a user space tool
which can do everything the same as autodefrag.
Thanks,
Qu
>
> As far as I can tell, in-kernel autodefrag's only purpose is to provide
> exposure to new and exciting bugs on each kernel release, and a lot of
> uncontrolled IO demands even when it's working perfectly. Inevitably,
> re-reading old fragments that are no longer in memory will consume RAM
> and iops during writeback activity, when memory and IO bandwidth is least
> available. If we avoid expensive re-reading of extents, then we don't
> get a useful rate of reduction of fragmentation, because we can't coalesce
> small new exists with small existing ones. If we try to fix these issues
> one at a time, the feature would inevitably grow a lot of complicated
> and brittle configuration knobs to turn it off selectively, because it's
> so awful without extensive filtering.
>
> All the above criticism applies to abstract ideal in-kernel autodefrag,
> _before_ considering whether a concrete implementation might have
> limitations or bugs which make it worse than the already-bad best case.
> 5.16 happened to have a lot of examples of these, but fixing the
> regressions can only restore autodefrag's relative harmlessness, not
> add utility within the constraints the kernel is under.
>
> The right place to do autodefrag is userspace. Interfaces already
> exist for userspace to 1) discover new extents and their neighbors,
> quickly and safely, across the entire filesystem; 2) invoke defrag_range
> on file extent ranges found in step 1; and 3) run a while (true)
> loop that periodically performs steps 1 and 2. Indeed, the existing
> kernel autodefrag implementation is already using the same back-end
> infrastructure for parts 1 and 2, so all that would be required for
> userspace is to reimplement (and start improving upon) part 3.
>
> A command-line utility or daemon can locate new extents immediately with
> tree_search queries, either at filesystem-wide scales, or directed at
> user-chosen file subsets. Tools can quickly assess whether new extents
> are good candidates for defrag, then coalesce them with their neighbors.
>
> The user can choose between different tools to decide basic policy
> questions like: whether to run once in a batch job or continuously in
> the background, what amounts of IO bandwidth and memory to consume,
> whether to recompress data with a more aggressive algorithm/level, which
> reference to a snapshot-shared extent should be preferred for defrag,
> file-type-specific layout optimizations to apply, or any custom or
> experimental selection, scheduling, or optimization logic desired.
>
> Implementations can be kept simple because it's not necessary for
> userspace tools to pile every possible option into a single implementation,
> and support every released option forever (as required for the kernel).
> A specialist implementation can discard existing code with impunity or
> start from scratch with an experimental algorithm, and spend its life
> in a fork of the main userspace autodefrag project with niche users
> who never have to cope with generic users' use cases and vice versa.
> This efficiently distributes development and maintenance costs.
>
> Userspace autodefrag can be implemented today in any programming language
> with btrfs ioctl support, and run on any kernel released in the last
> 6 years. Alas, I don't know of anybody who's released a userspace
> autodefrag tool yet, and it hasn't been important enough to me to build
> one myself (other than a few proof-of-concept prototypes).
>
> For now, I do defrag mostly ad-hoc with 'btrfs fi defrag' on the most
> severely fragmented files (top N list of files with the highest extent
> counts on the filesystem), and ignore fragmentation everywhere else.
>
>
>> -Jan
next prev parent reply other threads:[~2022-03-12 3:24 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-06 15:59 Btrfs autodefrag wrote 5TB in one day to a 0.5TB SSD without a measurable benefit Jan Ziak
2022-03-07 0:48 ` Qu Wenruo
2022-03-07 2:23 ` Jan Ziak
2022-03-07 2:39 ` Qu Wenruo
2022-03-07 7:31 ` Qu Wenruo
2022-03-10 1:10 ` Jan Ziak
2022-03-10 1:26 ` Qu Wenruo
2022-03-10 4:33 ` Jan Ziak
2022-03-10 6:42 ` Qu Wenruo
2022-03-10 21:31 ` Jan Ziak
2022-03-10 23:27 ` Qu Wenruo
2022-03-11 2:42 ` Jan Ziak
2022-03-11 2:59 ` Qu Wenruo
2022-03-11 5:04 ` Jan Ziak
2022-03-11 16:31 ` Jan Ziak
2022-03-11 20:02 ` Jan Ziak
2022-03-11 23:04 ` Qu Wenruo
2022-03-11 23:28 ` Jan Ziak
2022-03-11 23:39 ` Qu Wenruo
2022-03-12 0:01 ` Jan Ziak
2022-03-12 0:15 ` Qu Wenruo
2022-03-12 3:16 ` Zygo Blaxell
2022-03-12 2:43 ` Zygo Blaxell
2022-03-12 3:24 ` Qu Wenruo [this message]
2022-03-12 3:48 ` Zygo Blaxell
2022-03-14 20:09 ` Phillip Susi
2022-03-14 22:59 ` Zygo Blaxell
2022-03-15 18:28 ` Phillip Susi
2022-03-15 19:28 ` Jan Ziak
2022-03-15 21:06 ` Zygo Blaxell
2022-03-15 22:20 ` Jan Ziak
2022-03-16 17:02 ` Zygo Blaxell
2022-03-16 17:48 ` Jan Ziak
2022-03-17 2:11 ` Zygo Blaxell
2022-03-16 18:46 ` Phillip Susi
2022-03-16 19:59 ` Zygo Blaxell
2022-03-20 17:50 ` Forza
2022-03-20 21:15 ` Zygo Blaxell
2022-03-08 21:57 ` Jan Ziak
2022-03-08 23:40 ` Qu Wenruo
2022-03-09 22:22 ` Jan Ziak
2022-03-09 22:44 ` Qu Wenruo
2022-03-09 22:55 ` Jan Ziak
2022-03-09 23:00 ` Jan Ziak
2022-03-09 4:48 ` Zygo Blaxell
2022-03-07 14:30 ` Phillip Susi
2022-03-08 21:43 ` Jan Ziak
2022-03-09 18:46 ` Phillip Susi
2022-03-09 21:35 ` Jan Ziak
2022-03-14 20:02 ` Phillip Susi
2022-03-14 21:53 ` Jan Ziak
2022-03-14 22:24 ` Remi Gauvin
2022-03-14 22:51 ` Zygo Blaxell
2022-03-14 23:07 ` Remi Gauvin
2022-03-14 23:39 ` Zygo Blaxell
2022-03-15 14:14 ` Remi Gauvin
2022-03-15 18:51 ` Zygo Blaxell
2022-03-15 19:22 ` Remi Gauvin
2022-03-15 21:08 ` Zygo Blaxell
2022-03-15 18:15 ` Phillip Susi
2022-03-16 16:52 ` Andrei Borzenkov
2022-03-16 18:28 ` Jan Ziak
2022-03-16 18:31 ` Phillip Susi
2022-03-16 18:43 ` Andrei Borzenkov
2022-03-16 18:46 ` Jan Ziak
2022-03-16 19:04 ` Zygo Blaxell
2022-03-17 20:34 ` Phillip Susi
2022-03-17 22:06 ` Zygo Blaxell
2022-03-16 12:47 ` Kai Krakow
2022-03-16 18:18 ` Jan Ziak
-- strict thread matches above, loose matches on Subject: below --
2022-06-17 0:20 Jan Ziak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59c57200-9c77-3b8a-ab9d-11aef96da852@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=0xe2.0x9a.0x9b@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=zblaxell@furryterror.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox