From: Jeff Mahoney <jeffm@suse.com>
To: Christoph Anton Mitterer <calestyo@scientia.net>,
linux-btrfs@vger.kernel.org
Subject: Re: out-of-band dedup status?
Date: Thu, 8 Dec 2016 15:15:38 -0500 [thread overview]
Message-ID: <539d7c1c-5041-fb99-0ec5-81291f9f6609@suse.com> (raw)
In-Reply-To: <1481222198.6563.3.camel@scientia.net>
[-- Attachment #1.1: Type: text/plain, Size: 2380 bytes --]
On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote:
> Hey.
>
> I just wondered whether out-of-band/"offline" dedup is safe for general
> use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
> (it tells about unspecified performance issues), but this seems again
> already outdated (kernel 4.7)...
> :-(
SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels. There
haven't been a lot of changes to the kernel component of it. It's
pretty simple: check to see if the ranges are identical between two
files and then reflink between them.
> My intention was to use it with duperemove, but AFAIU, the kernel
> itself will anyway do a byte-by-byte comparison before any
> deduplication, so in principle it should be totally safe regardless of
> the stability of the userland tool, right?
> Especially I wouldn't want that "identity" is only assumed because of
> some checksum identity (or collision ;) ).
Yep. It does a full check in the kernel for precisely that reason.
It's not even enough to do it in userspace because we don't want dedupe
to be race prone. It's either atomically identical or it's not, and we
don't dedupe if it's not. If it changes immediately after the ioctl
returns, that's fine -- the cloned range will be CoW'd properly.
> Also, is there anything to take note of when this is used with
> compression and snapshots?
I don't believe so. IIRC dedupe maps the file to see if it's already
cloned, so it's safe for snapshots (or could relink extents in a
snapshot that diverged and then were restored to their original
contents. Dedupe works with the uncompressed data, so compression
shouldn't matter here. I haven't tested it, though.
> What when I use it with incremental send/receive... i.e. I dedupe the
> "master" and then send/receive this to another btrfs... will it work
> (that is will the copy be also deduplicated, with no longer needed
> extents properly being freed)... or at least not cause any corruptions?
It should. IIRC send also maps the file (using a different mechanism)
and receive will clone those ranges on the other end.
> Any other things in terms of possible issues, data corruption, etc.
> that one should know when using deduplication?
There shouldn't be. We haven't had any bug reports at SUSE.
-Jeff
--
Jeff Mahoney
SUSE Labs
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 841 bytes --]
next prev parent reply other threads:[~2016-12-08 20:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-08 18:36 out-of-band dedup status? Christoph Anton Mitterer
2016-12-08 20:15 ` Jeff Mahoney [this message]
2016-12-08 20:41 ` Chris Murphy
2016-12-08 22:27 ` Christoph Anton Mitterer
2016-12-08 23:31 ` Marc Joliet
2016-12-09 0:45 ` Chris Murphy
2016-12-09 2:26 ` Darrick J. Wong
2016-12-09 2:54 ` Chris Murphy
2016-12-09 8:25 ` Adam Borowski
2016-12-09 12:29 ` Austin S. Hemmelgarn
2016-12-09 18:16 ` Darrick J. Wong
2016-12-09 19:18 ` Chris Murphy
2016-12-09 8:43 ` Adam Borowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=539d7c1c-5041-fb99-0ec5-81291f9f6609@suse.com \
--to=jeffm@suse.com \
--cc=calestyo@scientia.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).