From: Simon Farnsworth <simon@farnz.org.uk>
To: linux-btrfs@vger.kernel.org
Subject: Re: Offline Deduplication for Btrfs
Date: Thu, 06 Jan 2011 12:18:34 +0000 [thread overview]
Message-ID: <ig4bur$mns$1@dough.gmane.org> (raw)
In-Reply-To: 4D24AD92.4070107@bobich.net
Gordan Bobic wrote:
> Josef Bacik wrote:
>
>> Basically I think online dedup is huge waste of time and completely
>> useless.
>
> I couldn't disagree more. First, let's consider what is the
> general-purpose use-case of data deduplication. What are the resource
> requirements to perform it? How do these resource requirements differ
> between online and offline?
<snip>
> As an aside, zfs and lessfs both do online deduping, presumably for a
> good reason.
>
> Then again, for a lot of use-cases there are perhaps better ways to
> achieve the targed goal than deduping on FS level, e.g. snapshotting or
> something like fl-cow:
> http://www.xmailserver.org/flcow.html
>
Just a small point; Josef's work provides a building block for a userspace
notify-based online dedupe daemon.
The basic idea is to use fanotify/inotify (whichever of the notification
systems works for this) to track which inodes have been written to. It can
then mmap() the changed data (before it's been dropped from RAM) and do the
same process as an offline dedupe (hash, check for matches, call dedupe
extent ioctl). If you've got enough CPU (maybe running with realtime privs),
you should be able to do this before writes actually hit the disk.
Further, a userspace daemon can do more sophisticated online dedupe than is
reasonable in the kernel - e.g. queue the dedupe extent ioctl phase for idle
time, only dedupe inodes that have been left unwritten for x minutes,
different policies for different bits of the filesystem (dedupe crontabs
immediately on write, dedupe outgoing mail spool only when the mail sticks
around for a while, dedupe all incoming mail immediately, dedupe logfiles
after rotation only, whatever is appropriate).
It can also do more intelligent trickery than is reasonable in-kernel - e.g.
if you know that you're deduping e-mail (line-based), you can search line-
by-line for dedupe blocks, rather than byte-by-byte.
Having said all that, you may well find that having implemented a userspace
online dedupe daemon, there are things the kernel can do to help; you may
even find that you do need to move it entirely into the kernel. Just don't
think that this ioctl rules out online dedupe - in fact, it enables it.
--
Simon Farnsworth
next prev parent reply other threads:[~2011-01-06 12:18 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50 ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41 ` Diego Calleja
2011-01-05 19:01 ` Ray Van Dolson
2011-01-05 20:27 ` Gordan Bobic
2011-01-05 20:28 ` Josef Bacik
2011-01-05 20:25 ` Gordan Bobic
2011-01-05 21:14 ` Diego Calleja
2011-01-05 21:21 ` Gordan Bobic
2011-01-05 19:46 ` Josef Bacik
2011-01-05 19:58 ` Lars Wirzenius
2011-01-05 20:15 ` Josef Bacik
2011-01-05 20:34 ` Freddie Cash
2011-01-05 21:07 ` Lars Wirzenius
2011-01-05 20:12 ` Freddie Cash
2011-01-05 20:46 ` Gordan Bobic
[not found] ` <4D250B3C.6010708@shiftmail.org>
2011-01-06 1:03 ` Gordan Bobic
2011-01-06 1:56 ` Spelic
2011-01-06 10:39 ` Gordan Bobic
2011-01-06 3:33 ` Freddie Cash
2011-01-06 1:19 ` Spelic
2011-01-06 3:58 ` Peter A
2011-01-06 10:48 ` Gordan Bobic
2011-01-06 13:33 ` Peter A
2011-01-06 14:00 ` Gordan Bobic
2011-01-06 14:52 ` Peter A
2011-01-06 15:07 ` Gordan Bobic
2011-01-06 16:11 ` Peter A
2011-01-06 18:35 ` Chris Mason
2011-01-08 0:27 ` Peter A
2011-01-06 14:30 ` Tomasz Torcz
2011-01-06 14:49 ` Gordan Bobic
2011-01-06 1:29 ` Chris Mason
2011-01-06 10:33 ` Gordan Bobic
2011-01-10 15:28 ` Ric Wheeler
2011-01-10 15:37 ` Josef Bacik
2011-01-10 15:39 ` Chris Mason
2011-01-10 15:43 ` Josef Bacik
2011-01-06 12:18 ` Simon Farnsworth [this message]
2011-01-06 12:29 ` Gordan Bobic
2011-01-06 13:30 ` Simon Farnsworth
2011-01-06 14:20 ` Ondřej Bílka
2011-01-06 14:41 ` Gordan Bobic
2011-01-06 15:37 ` Ondřej Bílka
2011-01-06 8:25 ` Yan, Zheng
-- strict thread matches above, loose matches on Subject: below --
2011-01-06 9:37 Tomasz Chmielewski
2011-01-06 9:51 ` Mike Hommey
2011-01-06 16:57 ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16 0:18 Arjen Nienhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='ig4bur$mns$1@dough.gmane.org' \
--to=simon@farnz.org.uk \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).