linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@redhat.com>
To: Lars Wirzenius <liw@liw.fi>
Cc: Josef Bacik <josef@redhat.com>, Gordan Bobic <gordan@bobich.net>,
	BTRFS MAILING LIST <linux-btrfs@vger.kernel.org>
Subject: Re: Offline Deduplication for Btrfs
Date: Wed, 5 Jan 2011 15:15:19 -0500	[thread overview]
Message-ID: <20110105201518.GD2562@localhost.localdomain> (raw)
In-Reply-To: <1294257493.2953.33.camel@havelock.lan>

On Wed, Jan 05, 2011 at 07:58:13PM +0000, Lars Wirzenius wrote:
> On ke, 2011-01-05 at 14:46 -0500, Josef Bacik wrote:
> > Blah blah blah, I'm not having an argument about which is better because I
> > simply do not care.  I think dedup is silly to begin with, and online dedup even
> > sillier.  The only reason I did offline dedup was because I was just toying
> > around with a simple userspace app to see exactly how much I would save if I did
> > dedup on my normal system, and with 107 gigabytes in use, I'd save 300
> > megabytes.  I'll say that again, with 107 gigabytes in use, I'd save 300
> > megabytes.  So in the normal user case dedup would have been wholey useless to
> > me.
> 
> I have been thinking a lot about de-duplication for a backup application
> I am writing. I wrote a little script to figure out how much it would
> save me. For my laptop home directory, about 100 GiB of data, it was a
> couple of percent, depending a bit on the size of the chunks. With 4 KiB
> chunks, I would save about two gigabytes. (That's assuming no MD5 hash
> collisions.) I don't have VM images, but I do have a fair bit of saved
> e-mail. So, for backups, I concluded it was worth it to provide an
> option to do this. I have no opinion on whether it is worthwhile to do
> in btrfs.
> 

Yeah for things where you are talking about sending it over the network or
something like that every little bit helps.  I think deduplication is far more
interesting and usefull at an application level than at a filesystem level.  For
example with a mail server, there is a good chance that the files will be
smaller than a blocksize and not be able to be deduped, but if the application
that was storing them recognized that it had the same messages and just linked
everything in its own stuff then that would be cool.  Thanks,

Josef

  reply	other threads:[~2011-01-05 20:15 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50   ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik [this message]
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka
2011-01-06  8:25 ` Yan, Zheng 
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06  9:37 Tomasz Chmielewski
2011-01-06  9:51 ` Mike Hommey
2011-01-06 16:57   ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16  0:18 Arjen Nienhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110105201518.GD2562@localhost.localdomain \
    --to=josef@redhat.com \
    --cc=gordan@bobich.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=liw@liw.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).