linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter A <loony@loonybin.org>
To: linux-btrfs@vger.kernel.org
Subject: Re: Offline Deduplication for Btrfs
Date: Wed, 5 Jan 2011 22:58:36 -0500	[thread overview]
Message-ID: <201101052258.36457.loony@loonybin.org> (raw)
In-Reply-To: <4D251888.7060508@shiftmail.org>

On Wednesday, January 05, 2011 08:19:04 pm Spelic wrote:
> > I'd just make it always use the fs block size. No point in making it 
> > variable.
> 
> Agreed. What is the reason for variable block size?

First post on this list - I mostly was just reading so far to learn more on fs 
design but this is one topic I (unfortunately) have experience with... 

You wouldn't believe the difference variable block size dedupe makes. For a 
pure fileserver, its ok to dedupe on block level but for most other uses, 
variable is king. One big example is backups. Netbackup and most others 
produce one stream with all data even when backing up to disk. Imagine you 
move a whole lot of data from one dir to another. Think a directory with huge 
video files. As a filesystem it would be de-duped nicely. The backup stream 
however may and may not have matching fs blocks. If the directory name before 
and after has the same lengths and such - then yeah, dedupe works. Directory 
name is a byte shorter? Everything in the stream will be offset by one byte - 
and no dedupe will occur at all on the whole dataset. In real world just 
compare the dedupe performance of an Oracle 7000 (zfs and therefore fs block 
based) to a DataDomain (variable lenght) in this usage scenario. Among our 
customers we see something like 3 to 17x dedupe ration on the DD, 1.02 - 1.05 
in the 7000.

There are many other examples and in the end it comes down to if you want 
general purpose de-dupe (e.g. something useful when you serve iscsi luns, 
serve as backup target, ...) or if you only care about a pure file store, 
you're probably going to be ok with fixed block lengths... 

Hope that helps, 

Peter.

-- 
Censorship: noun, circa 1591. a: Relief of the burden of independent thinking.

  reply	other threads:[~2011-01-06  3:58 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50   ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A [this message]
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka
2011-01-06  8:25 ` Yan, Zheng 
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06  9:37 Tomasz Chmielewski
2011-01-06  9:51 ` Mike Hommey
2011-01-06 16:57   ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16  0:18 Arjen Nienhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201101052258.36457.loony@loonybin.org \
    --to=loony@loonybin.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).