From: DanglingPointer <danglingpointerexception@gmail.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs-dedupe broken and unsupported but in official wiki
Date: Fri, 19 Jun 2020 08:05:44 +1000 [thread overview]
Message-ID: <65eeb90a-e983-2ae8-14ad-79bcd2960851@gmail.com> (raw)
In-Reply-To: <20200618204317.GM10769@hungrycats.org>
For a large portion of desktop users that are not developers and are
rustlang illiterate and programming illiterate; they would not now
whether this tool or that tool or any tool would be safe, or unsafe, or
have concurrent race conditions, or know the meaning of immutable or mutex.
Think of this scenario; average Joe Bloggs user buys new computer
without MS Windows. With the software savings, Joe purchases more
disks. He then chooses openSuse Leap for his first foray into Linux.
All he cares about are his music files, photos, and videos being safe.
Joe runs a Cafe down the street and uses the music, photos, and videos
in various screens at his cafe for the atmosphere.
Times are tough and he's running out of space so he doesn't want the
accumulate media files duplicated all around the place wasting space to
conserve storage.
If the official wikis have broken 3rd party tools, then it makes the
whole adoption process less easy, less friendly, very cryptic, more
chaotic; and give the impression that btrfs is a mess and not ready (and
Linux as a whole). He would not know or have the time to go through the
code of each deduplication program tool option to figure out if one type
or the other type is better just like Zygo Blaxell did who can read
code. Even if he wanted to, he doesn't know how to nor has the time to
do it. He says good-bye to openSuse and buys Windows.
So I do agree with waxhead. It would be preferable if there were an
official btrfs deduplication command from btrfs-progs instead of relying
on 3rd parties. Joe Bloggs example above can read a web-page
instructions saying "run this command... and then this command..."; but
he will not have the knowledge, nor comprehension nor time to go through
code.
Thanks David Sterba for removing the items and updating the wiki!
On 19/6/20 6:43 am, Zygo Blaxell wrote:
> The point about lack of maintenance with changing Rust dependencies is
> fair, but "data loss" is a strong and unsupported statement. Can you
> explain how data loss could occur in even a badly (assume not maliciously)
> broken version of btrfs-dedupe?
>
> As far as I can tell, the btrfs-dedupe code uses only non-data-mutating
> btrfs kernel interfaces for manipulating extents (fiemap, defrag,
> and file_extent_same/deduperange). None of these should cause data
> loss (excluding kernel bugs).
>
> btrfs-dedupe can be trivially tricked into opening files that it did
> not intend to (it has no protection against symlink injection and other
> TOCCTOU attacks), but it doesn't seem to be able to alter the content
> of files once it opens them.
>
> File descriptors pointing to user files are opened O_RDWR, but they are
> kept in the scope of the dedupe function and their life-cycle is properly
> managed in Rust, so btrfs-dedupe won't mutate files by writing to the
> wrong fd (e.g. accidentally close stderr and reopen it to a user file)
> unless someone adds some seriously buggy code (see "assume not malicious"
> above).
>
> The unsafe C ioctl interfaces are unlikely to change in data-losing ways,
> or they'll break all existing userspace tools that use them. They are
> also well encapsulated in the rust-btrfs module.
>
> The errors reported on github seem to be problems with incompatible
> changes in the runtime libraries btrfs-dedupe depends on, and also some
> reports of what look like pre-existing bugs in the fiemap code that are
> blamed on new kernel versions without evidence. Data-losing breaking
> changes in any of the ioctls btrfs-dedupe uses are extremely unlikely.
> Those issues may cause btrfs-dedupe to do useless unnecessary work,
> or fail to do useful necessary work, but could not cause data loss by
> any mechanism I can find.
>
> Contrast with bedup: bedup uses data-mutating kernel interfaces
> (clone_range) for dedupe that have no effective protection against
> concurrent data modification. There is ineffective protection implemented
> in bedup (looking in /proc/*/fd for concurrent users of the files) which
> may or may not be broken in kernel 5.0, but it's ineffective either way.
> The case for data loss in bedup is trivial. The branch with a patch to
> fix it is now 7 years old, so it's fair to say bedup is unmaintained too
> (github forks notwithstanding, they didn't fix these issues).
>
next prev parent reply other threads:[~2020-06-18 22:05 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-18 2:28 btrfs-dedupe broken and unsupported but in official wiki DanglingPointer
2020-06-18 10:31 ` David Sterba
2020-06-18 20:43 ` Zygo Blaxell
2020-06-18 22:05 ` DanglingPointer [this message]
2020-06-19 5:04 ` Zygo Blaxell
2020-06-19 13:11 ` David Sterba
2020-06-22 19:49 ` Goffredo Baroncelli
2020-06-22 22:45 ` Zygo Blaxell
2020-07-02 8:27 ` Lakshmipathi.G
2020-07-03 3:16 ` Zygo Blaxell
2020-07-06 10:46 ` Lakshmipathi.G
2020-07-25 7:24 ` Lakshmipathi.G
2020-06-18 20:59 ` waxhead
2020-06-19 13:19 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=65eeb90a-e983-2ae8-14ad-79bcd2960851@gmail.com \
--to=danglingpointerexception@gmail.com \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox