All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Farnsworth <simon@farnz.org.uk>
To: linux-btrfs@vger.kernel.org
Subject: Re: Offline Deduplication for Btrfs
Date: Thu, 06 Jan 2011 13:30:30 +0000	[thread overview]
Message-ID: <ig4g5m$67u$1@dough.gmane.org> (raw)
In-Reply-To: 4D25B58E.2080208@bobich.net

Gordan Bobic wrote:

> Simon Farnsworth wrote:
> 
>> The basic idea is to use fanotify/inotify (whichever of the notification
>> systems works for this) to track which inodes have been written to. It
>> can then mmap() the changed data (before it's been dropped from RAM) and
>> do the same process as an offline dedupe (hash, check for matches, call
>> dedupe extent ioctl). If you've got enough CPU (maybe running with
>> realtime privs), you should be able to do this before writes actually hit
>> the disk.
> 
> I'm not convinced that racing against the disk write is the way forward
> here.
> 
The point is that implementing a userspace online dedupe daemon that races 
against the disk write is something that can be done by anyone who cares as 
soon as Josef's patch is in place; if it's clear that the userspace daemon 
just does something simple enough to put in the kernel (e.g. a fixed block 
size dedupe), and that extra complexity doesn't gain enough to be 
worthwhile, the code can be ported into the kernel before it gets posted 
here.

Similarly, if you're convinced that it has to be in kernel (I'm not a dedupe 
or filesystems expert, so there may be good reasons I'm unaware of), you can 
reuse parts of Josef's code to write your patch that creates a kernel thread 
to do the work.

If it turns out that complex algorithms for online dedupe are worth the 
effort (like line-by-line e-mail dedupe), then you've got a starting point 
for writing something more complex, and determining just what the kernel 
needs to provide to make things nice - maybe it'll be clear that you need an 
interface that lets you hold up a write while you do the simple end of the 
dedupe work, maybe there will be some other kernel interface that's more 
generic than "dedupe fixed size blocks" that's needed for efficient work.

Either way, Josef's work is a good starting point for online dedupe; you can 
experiment *now* without going into kernel code (heck, maybe even not C - 
Python or Perl would be OK for algorithm exploration), and use the offline 
dedupe support to simplify a patch for online dedupe.

-- 
Simon Farnsworth


  reply	other threads:[~2011-01-06 13:30 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50   ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth [this message]
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka
2011-01-06  8:25 ` Yan, Zheng 
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06  9:37 Tomasz Chmielewski
2011-01-06  9:51 ` Mike Hommey
2011-01-06 16:57   ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16  0:18 Arjen Nienhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ig4g5m$67u$1@dough.gmane.org' \
    --to=simon@farnz.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.