public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: DanglingPointer <danglingpointerexception@gmail.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs-dedupe broken and unsupported but in official wiki
Date: Fri, 19 Jun 2020 08:05:44 +1000	[thread overview]
Message-ID: <65eeb90a-e983-2ae8-14ad-79bcd2960851@gmail.com> (raw)
In-Reply-To: <20200618204317.GM10769@hungrycats.org>

For a large portion of desktop users that are not developers and are 
rustlang illiterate and programming illiterate; they would not now 
whether this tool or that tool or any tool would be safe, or unsafe, or 
have concurrent race conditions, or know the meaning of immutable or mutex.

Think of this scenario; average Joe Bloggs user buys new computer 
without MS Windows.  With the software savings, Joe purchases more 
disks. He then chooses openSuse Leap for his first foray into Linux.
All he cares about are his music files, photos, and videos being safe.  
Joe runs a Cafe down the street and uses the music, photos, and videos 
in various screens at his cafe for the atmosphere.
Times are tough and he's running out of space so he doesn't want the 
accumulate media files duplicated all around the place wasting space to 
conserve storage.

If the official wikis have broken 3rd party tools, then it makes the 
whole adoption process less easy, less friendly, very cryptic, more 
chaotic; and give the impression that btrfs is a mess and not ready (and 
Linux as a whole).  He would not know or have the time to go through the 
code of each deduplication program tool option to figure out if one type 
or the other type is better just like Zygo Blaxell did who can read 
code.  Even if he wanted to, he doesn't know how to nor has the time to 
do it.  He says good-bye to openSuse and buys Windows.

So I do agree with waxhead.  It would be preferable if there were an 
official btrfs deduplication command from btrfs-progs instead of relying 
on 3rd parties.  Joe Bloggs example above can read a web-page 
instructions saying "run this command... and then this command..."; but 
he will not have the knowledge, nor comprehension nor time to go through 
code.

Thanks David Sterba for removing the items and updating the wiki!

On 19/6/20 6:43 am, Zygo Blaxell wrote:
> The point about lack of maintenance with changing Rust dependencies is
> fair, but "data loss" is a strong and unsupported statement.  Can you
> explain how data loss could occur in even a badly (assume not maliciously)
> broken version of btrfs-dedupe?
>
> As far as I can tell, the btrfs-dedupe code uses only non-data-mutating
> btrfs kernel interfaces for manipulating extents (fiemap, defrag,
> and file_extent_same/deduperange).  None of these should cause data
> loss (excluding kernel bugs).
>
> btrfs-dedupe can be trivially tricked into opening files that it did
> not intend to (it has no protection against symlink injection and other
> TOCCTOU attacks), but it doesn't seem to be able to alter the content
> of files once it opens them.
>
> File descriptors pointing to user files are opened O_RDWR, but they are
> kept in the scope of the dedupe function and their life-cycle is properly
> managed in Rust, so btrfs-dedupe won't mutate files by writing to the
> wrong fd (e.g. accidentally close stderr and reopen it to a user file)
> unless someone adds some seriously buggy code (see "assume not malicious"
> above).
>
> The unsafe C ioctl interfaces are unlikely to change in data-losing ways,
> or they'll break all existing userspace tools that use them.  They are
> also well encapsulated in the rust-btrfs module.
>
> The errors reported on github seem to be problems with incompatible
> changes in the runtime libraries btrfs-dedupe depends on, and also some
> reports of what look like pre-existing bugs in the fiemap code that are
> blamed on new kernel versions without evidence.  Data-losing breaking
> changes in any of the ioctls btrfs-dedupe uses are extremely unlikely.
> Those issues may cause btrfs-dedupe to do useless unnecessary work,
> or fail to do useful necessary work, but could not cause data loss by
> any mechanism I can find.
>
> Contrast with bedup:  bedup uses data-mutating kernel interfaces
> (clone_range) for dedupe that have no effective protection against
> concurrent data modification.  There is ineffective protection implemented
> in bedup (looking in /proc/*/fd for concurrent users of the files) which
> may or may not be broken in kernel 5.0, but it's ineffective either way.
> The case for data loss in bedup is trivial.  The branch with a patch to
> fix it is now 7 years old, so it's fair to say bedup is unmaintained too
> (github forks notwithstanding, they didn't fix these issues).
>

  reply	other threads:[~2020-06-18 22:05 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18  2:28 btrfs-dedupe broken and unsupported but in official wiki DanglingPointer
2020-06-18 10:31 ` David Sterba
2020-06-18 20:43 ` Zygo Blaxell
2020-06-18 22:05   ` DanglingPointer [this message]
2020-06-19  5:04     ` Zygo Blaxell
2020-06-19 13:11       ` David Sterba
2020-06-22 19:49         ` Goffredo Baroncelli
2020-06-22 22:45           ` Zygo Blaxell
2020-07-02  8:27             ` Lakshmipathi.G
2020-07-03  3:16               ` Zygo Blaxell
2020-07-06 10:46                 ` Lakshmipathi.G
2020-07-25  7:24                   ` Lakshmipathi.G
2020-06-18 20:59 ` waxhead
2020-06-19 13:19   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65eeb90a-e983-2ae8-14ad-79bcd2960851@gmail.com \
    --to=danglingpointerexception@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox