From: David Sterba <dsterba@suse.cz>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: DanglingPointer <danglingpointerexception@gmail.com>,
linux-btrfs@vger.kernel.org
Subject: Re: btrfs-dedupe broken and unsupported but in official wiki
Date: Fri, 19 Jun 2020 15:11:17 +0200 [thread overview]
Message-ID: <20200619131117.GD27795@twin.jikos.cz> (raw)
In-Reply-To: <20200619050402.GN10769@hungrycats.org>
On Fri, Jun 19, 2020 at 01:04:03AM -0400, Zygo Blaxell wrote:
> It might be nice to keep btrfs-dedupe and bedup _somewhere_ on the wiki,
> clearly marked as not supported and only of historical interest to new
> developers. I learned a lot about what is possible on btrfs from bedup
> in particular (bees was initially a project to combine the features of
> bedup and duperemove), and python is accessible to more developers than
> C or C++. btrfs-dedupe was the first btrfs dedupe agent to combine
> defrag and dedupe operations into a single program.
It's there now.
> > So I do agree with waxhead. It would be preferable if there were an
> > official btrfs deduplication command from btrfs-progs instead of relying on
> > 3rd parties. Joe Bloggs example above can read a web-page instructions
> > saying "run this command... and then this command..."; but he will not have
> > the knowledge, nor comprehension nor time to go through code.
>
> Which of the available candidates for "official btrfs dedupe" would you
> put in btrfs-progs? I see a lot of runners in the race, but no clear
> winner yet.
>
> duperemove is the closest to Waxhead's proposed "-r /somewhere" syntax.
> It's the obvious choice: written in the same language as btrfs-progs, and
> also the oldest btrfs deduper, and it has years of patient, data-driven
> optimization built in.
That there's not even a simple eg. file-based deduper available in
btrfs-progs is kind of bad. Duperemove is indeed closest to that.
> If there wasn't some insurmountable reason
> why duperemove can't be merged with btrfs-progs, then it would have
> happened already, so there must be a reason why this can't ever happen
> (which might be as simple as neither maintainer wants to merge).
I'm not against adding the functionality to btrfs-progs, but merging
whole duperemove feature set might not happen due to additional
dependencies. This would need to be evaluated, but I'm not aware of any
other technical reasons.
I don't remember exactly why duperemove started as a separate project
instead of a subcommand or progs, but we can revisit that.
> Maybe we put duperemove at the top of the Wiki page, as it has the
> simplest command-line for Joe Blogger's use case, and it's relatively
> easy to build for the few people who use distros where it's not packaged.
That's a good idea, a 'quick start' section, with description of most
common usecases using duperemove.
> The stub support for in-kernel dedupe (arguably the only "official"
> btrfs dedupe so far) has been removed due to lack of interest in its
> development. That _was_ available in branches of btrfs-progs
> as 'btrfs dedupe'. It's gone now.
The more I think about in-band dedupe (and how it would complicate
everything), I'm leaning more towards a user-space solution with support
from kernel (ioctls, keeping hashes of recently modified blocks but not
doing the actual deduplication, reading hashes from csum tree, etc).
next prev parent reply other threads:[~2020-06-19 13:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-18 2:28 btrfs-dedupe broken and unsupported but in official wiki DanglingPointer
2020-06-18 10:31 ` David Sterba
2020-06-18 20:43 ` Zygo Blaxell
2020-06-18 22:05 ` DanglingPointer
2020-06-19 5:04 ` Zygo Blaxell
2020-06-19 13:11 ` David Sterba [this message]
2020-06-22 19:49 ` Goffredo Baroncelli
2020-06-22 22:45 ` Zygo Blaxell
2020-07-02 8:27 ` Lakshmipathi.G
2020-07-03 3:16 ` Zygo Blaxell
2020-07-06 10:46 ` Lakshmipathi.G
2020-07-25 7:24 ` Lakshmipathi.G
2020-06-18 20:59 ` waxhead
2020-06-19 13:19 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200619131117.GD27795@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=danglingpointerexception@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox