From: Gordan Bobic <gordan@bobich.net>
To: BTRFS MAILING LIST <linux-btrfs@vger.kernel.org>
Subject: Re: Offline Deduplication for Btrfs
Date: Wed, 05 Jan 2011 20:46:52 +0000 [thread overview]
Message-ID: <4D24D8BC.90808@bobich.net> (raw)
In-Reply-To: <20110105194645.GC2562@localhost.localdomain>
On 01/05/2011 07:46 PM, Josef Bacik wrote:
> Blah blah blah, I'm not having an argument about which is better because I
> simply do not care. I think dedup is silly to begin with, and online dedup even
> sillier.
Offline dedup is more expensive - so why are you of the opinion that it
is less silly? And comparison by silliness quotiend still sounds like an
argument over which is better.
> The only reason I did offline dedup was because I was just toying
> around with a simple userspace app to see exactly how much I would save if I did
> dedup on my normal system, and with 107 gigabytes in use, I'd save 300
> megabytes. I'll say that again, with 107 gigabytes in use, I'd save 300
> megabytes. So in the normal user case dedup would have been wholey useless to
> me.
Dedup isn't for an average desktop user. Dedup is for backup storage and
virtual images. I don't remember anyone ever saying it is for the
average desktop user. I am amazed you got that much saving even - I
wouldn't expect there to be any duplicate files on a normal system.
Compression is a feature that the desktop users would benefit with, not
deduplication.
> Dedup is only usefull if you _know_ you are going to have duplicate information,
> so the two major usecases that come to mind are
>
> 1) Mail server. You have small files, probably less than 4k (blocksize) that
> you are storing hundreds to thousands of. Using dedup would be good for this
> case, and you'd have to have a small dedup blocksize for it to be usefull.
Explain to me why you think this would yield duplicate blocks. If your
server is Maildir, headers will be in the mail files, and because all
emails went to different users, they'd have different headers, and thus
not be dedupable.
> 2) Virtualized guests. If you have 5 different RHEL5 virt guests, chances are
> you are going to share data between them, but unlike with the mail server
> example, you are likely to find much larger chunks that are the same, so you'd
> want a larger dedup blocksize, say 64k. You want this because if you did just
> 4k you'd end up with a ridiculous amount of framentation and performance would
> go down the toilet, so you need a larger dedup blocksize to make for better
> performance.
Fragmentation will cause you problems anyway, the argument in the UNIX
world since year dot was that defragging doesn't make a damn worth of
difference when you have a hundred users hammering away on a machine
that has to skip between all their collective files.
If you have VM image files a-la vmware/xen/kvm, then using blocks of the
same size as the guests is the only way that you are going to get sane
deduplication performance. Otherwise the blocks won't line up. If the
dedupe block size is 4KB and guest fs block size is 4KB, that's a
reasonably clean case.
The biggest win by far, however, would be when using chroot type guests,
as I mentioned.
> So you'd want an online implementation to give you a choice of dedup blocksize,
> which seems to me to be overly complicated.
I'd just make it always use the fs block size. No point in making it
variable.
> And then lets bring up the fact that you _have_ to manually compare any data you
> are going to dedup. I don't care if you think you have the greatest hashing
> algorithm known to man, you are still going to have collisions somewhere at some
> point, so in order to make sure you don't lose data, you have to manually memcmp
> the data. So if you are doing this online, that means reading back the copy you
> want to dedup in the write path so you can do the memcmp before you write. That
> is going to make your write performance _suck_.
IIRC, this is configurable in ZFS so that you can switch off the
physical block comparison. If you use SHA256, the probability of a
collission (unless SHA is broken, in which case we have much bigger
problems) is 1^128. Times 4KB blocks, that is one collission in 10^24
Exabytes. That's one trillion trillion (that's double trillion)
Exabytes. That is considerably more storage space than there is likely
to be available on the planet for some time. And just for good measure,
you could always up the hash to SHA512 or use two different hashes (e.g.
a combination of SHA256 and MD5).
> Do I think offline dedup is awesome? Hell no, but I got distracted doing it as
> a side project so I figured I'd finish it, and I did it in under 1400 lines. I
> dare you to do the same with an online implementation. Offline is simpler to
> implement and simpler to debug if something goes wrong, and has an overall
> easier to control impact on the system.
It is also better done outside the FS if you're not going to do it
properly using FL-COW or fuse based lessfs.
Gordan
next prev parent reply other threads:[~2011-01-05 20:46 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50 ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41 ` Diego Calleja
2011-01-05 19:01 ` Ray Van Dolson
2011-01-05 20:27 ` Gordan Bobic
2011-01-05 20:28 ` Josef Bacik
2011-01-05 20:25 ` Gordan Bobic
2011-01-05 21:14 ` Diego Calleja
2011-01-05 21:21 ` Gordan Bobic
2011-01-05 19:46 ` Josef Bacik
2011-01-05 19:58 ` Lars Wirzenius
2011-01-05 20:15 ` Josef Bacik
2011-01-05 20:34 ` Freddie Cash
2011-01-05 21:07 ` Lars Wirzenius
2011-01-05 20:12 ` Freddie Cash
2011-01-05 20:46 ` Gordan Bobic [this message]
[not found] ` <4D250B3C.6010708@shiftmail.org>
2011-01-06 1:03 ` Gordan Bobic
2011-01-06 1:56 ` Spelic
2011-01-06 10:39 ` Gordan Bobic
2011-01-06 3:33 ` Freddie Cash
2011-01-06 1:19 ` Spelic
2011-01-06 3:58 ` Peter A
2011-01-06 10:48 ` Gordan Bobic
2011-01-06 13:33 ` Peter A
2011-01-06 14:00 ` Gordan Bobic
2011-01-06 14:52 ` Peter A
2011-01-06 15:07 ` Gordan Bobic
2011-01-06 16:11 ` Peter A
2011-01-06 18:35 ` Chris Mason
2011-01-08 0:27 ` Peter A
2011-01-06 14:30 ` Tomasz Torcz
2011-01-06 14:49 ` Gordan Bobic
2011-01-06 1:29 ` Chris Mason
2011-01-06 10:33 ` Gordan Bobic
2011-01-10 15:28 ` Ric Wheeler
2011-01-10 15:37 ` Josef Bacik
2011-01-10 15:39 ` Chris Mason
2011-01-10 15:43 ` Josef Bacik
2011-01-06 12:18 ` Simon Farnsworth
2011-01-06 12:29 ` Gordan Bobic
2011-01-06 13:30 ` Simon Farnsworth
2011-01-06 14:20 ` Ondřej Bílka
2011-01-06 14:41 ` Gordan Bobic
2011-01-06 15:37 ` Ondřej Bílka
2011-01-06 8:25 ` Yan, Zheng
-- strict thread matches above, loose matches on Subject: below --
2011-01-06 9:37 Tomasz Chmielewski
2011-01-06 9:51 ` Mike Hommey
2011-01-06 16:57 ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16 0:18 Arjen Nienhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D24D8BC.90808@bobich.net \
--to=gordan@bobich.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).