Re: status of inline deduplication in btrfs

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Adam Borowski <kilobyte@angband.pl>,
	shally verma <shallyvermacavium@gmail.com>
Cc: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: status of inline deduplication in btrfs
Date: Mon, 28 Aug 2017 07:30:40 -0400	[thread overview]
Message-ID: <d9689a8d-2e14-e3de-f5d3-656136b42ad7@gmail.com> (raw)
In-Reply-To: <20170828103222.bvdsjpzloo4yubzb@angband.pl>

On 2017-08-28 06:32, Adam Borowski wrote:
> On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote:
>> Am bit confused over here, is your description based on offline-dedupe
>> here Or its with inline deduplication?
> 
> It doesn't matter _how_ you get to excessive reflinking, the resulting
> slowdown is the same.
> 
> By the way, you can try "bees", it does nearline-dedupe which is for
> practical purposes as good as fully online, and unlike the latter, has no
> way to damage your data in case of bugs (mistaken userland dedupe can at
> most make the kernel pointlessly read and compare data).
> 
> I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME
> asynchronously right after a write gets put into the page cache, which in
> most cases is quick enough to avoid writeout.
I would also recommend looking at 'bees'.  If you absolutely _must_ have 
online or near-online deduplication, then this is your best option 
currently from a data safety perspective.

That said, it's worth pointing out that in-line deduplication is not 
always the best answer.  In fact, it's quite often a sub-optimal answer 
compared to a combination of compression, sparse files, and batch 
deduplication.  Compression and usage of sparse files will get you about 
the same space savings most of the time as in-line deduplication (I've 
tested this on ZFS on FreeBSD using native in-line deduplication, and 
with BTRFS on Linux using bees) while using much less memory, and about 
the same amount of processor time.  In the event that you need better 
space savings than that, you're better off using batch deduplication 
because it gives you better control over when you're using more system 
resources and will often get better overall results than in-line 
deduplication.

next prev parent reply	other threads:[~2017-08-28 11:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-23 14:52 status of inline deduplication in btrfs shally verma
2017-08-24  1:09 ` Tsutomu Itoh
2017-08-25 17:31   ` shally verma
2017-08-26  1:36     ` Duncan
2017-08-26 16:15       ` Adam Borowski
2017-08-28  7:19         ` shally verma
2017-08-28 10:32           ` Adam Borowski
2017-08-28 11:30             ` Austin S. Hemmelgarn [this message]
2017-08-28 21:28           ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9689a8d-2e14-e3de-f5d3-656136b42ad7@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=kilobyte@angband.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=shallyvermacavium@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).