public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: DanglingPointer <danglingpointerexception@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs-dedupe broken and unsupported but in official wiki
Date: Thu, 18 Jun 2020 16:43:17 -0400	[thread overview]
Message-ID: <20200618204317.GM10769@hungrycats.org> (raw)
In-Reply-To: <16bc2efa-8e88-319f-e90e-cf8536460860@gmail.com>

On Thu, Jun 18, 2020 at 12:28:41PM +1000, DanglingPointer wrote:
> btrfs-dedupe is currently broken and no longer actively supported.
> 
> It no longer builds with current rustc v1.44.0 with cargo
> 
> It is in the official btrfs Deduplication wiki:
> 
>     https://btrfs.wiki.kernel.org/index.php/Deduplication
> 
> There's no real active community and proper QA, reviewing and vetting.
> 
> A poster in the issues area of the projects Github location stated that even
> if fixed, it may not function correctly due to BTRFS having evolved since
> the tool was designed created.
> 
> There's just too many unknowns with this BTRFS specific dedupe tool.
> 
> People using your official wiki and trying to use that deduplication program
> could inadvertently destroy their data through nativity or accident. 
> Especially if they start trying to fix the code.

The point about lack of maintenance with changing Rust dependencies is
fair, but "data loss" is a strong and unsupported statement.  Can you
explain how data loss could occur in even a badly (assume not maliciously)
broken version of btrfs-dedupe?

As far as I can tell, the btrfs-dedupe code uses only non-data-mutating
btrfs kernel interfaces for manipulating extents (fiemap, defrag,
and file_extent_same/deduperange).  None of these should cause data
loss (excluding kernel bugs).

btrfs-dedupe can be trivially tricked into opening files that it did
not intend to (it has no protection against symlink injection and other
TOCCTOU attacks), but it doesn't seem to be able to alter the content
of files once it opens them.

File descriptors pointing to user files are opened O_RDWR, but they are
kept in the scope of the dedupe function and their life-cycle is properly
managed in Rust, so btrfs-dedupe won't mutate files by writing to the
wrong fd (e.g. accidentally close stderr and reopen it to a user file)
unless someone adds some seriously buggy code (see "assume not malicious"
above).

The unsafe C ioctl interfaces are unlikely to change in data-losing ways,
or they'll break all existing userspace tools that use them.  They are
also well encapsulated in the rust-btrfs module.

The errors reported on github seem to be problems with incompatible
changes in the runtime libraries btrfs-dedupe depends on, and also some
reports of what look like pre-existing bugs in the fiemap code that are
blamed on new kernel versions without evidence.  Data-losing breaking
changes in any of the ioctls btrfs-dedupe uses are extremely unlikely.
Those issues may cause btrfs-dedupe to do useless unnecessary work,
or fail to do useful necessary work, but could not cause data loss by
any mechanism I can find.

Contrast with bedup:  bedup uses data-mutating kernel interfaces
(clone_range) for dedupe that have no effective protection against
concurrent data modification.  There is ineffective protection implemented
in bedup (looking in /proc/*/fd for concurrent users of the files) which
may or may not be broken in kernel 5.0, but it's ineffective either way.
The case for data loss in bedup is trivial.  The branch with a patch to
fix it is now 7 years old, so it's fair to say bedup is unmaintained too
(github forks notwithstanding, they didn't fix these issues).

> I recommend you remove it from your website or at least put large warnings
> there that it is broken (which looks ugly, I would rather only stuff that
> works were there since it isn't your project anyway but some 3rd party).

  parent reply	other threads:[~2020-06-18 20:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18  2:28 btrfs-dedupe broken and unsupported but in official wiki DanglingPointer
2020-06-18 10:31 ` David Sterba
2020-06-18 20:43 ` Zygo Blaxell [this message]
2020-06-18 22:05   ` DanglingPointer
2020-06-19  5:04     ` Zygo Blaxell
2020-06-19 13:11       ` David Sterba
2020-06-22 19:49         ` Goffredo Baroncelli
2020-06-22 22:45           ` Zygo Blaxell
2020-07-02  8:27             ` Lakshmipathi.G
2020-07-03  3:16               ` Zygo Blaxell
2020-07-06 10:46                 ` Lakshmipathi.G
2020-07-25  7:24                   ` Lakshmipathi.G
2020-06-18 20:59 ` waxhead
2020-06-19 13:19   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200618204317.GM10769@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=danglingpointerexception@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox