All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heinz-Josef Claes <hjclaes@web.de>
To: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Chris Mason <chris.mason@oracle.com>,
	Edward Shishkin <edward.shishkin@gmail.com>,
	Tomasz Chmielewski <mangoo@wpkg.org>,
	linux-btrfs@vger.kernel.org
Subject: Re: Data Deduplication with the help of an online filesystem check
Date: Tue, 28 Apr 2009 22:36:07 +0200	[thread overview]
Message-ID: <200904282236.07428.hjclaes@web.de> (raw)
In-Reply-To: <20090428201619.GK7217@cip.informatik.uni-erlangen.de>

Am Dienstag, 28. April 2009 22:16:19 schrieb Thomas Glanzmann:
> Hello Heinz,
>
> > It's not only cpu time, it's also memory. You need 32 byte for each 4k
> > block.  It needs to be in RAM for performance reason.
>
> exactly and that is not going to scale.
>
>         Thomas


Hi Thomas,

I wrote a backup tool which uses dedup, so I know a little bit about the 
problem and the performance impact if the checksums are not in memory 
(optionally in that tool).
http://savannah.gnu.org/projects/storebackup

Dedup really helps a lot - I think more than I could imagine before I was 
engaged in this kind of backup. You will not beleve how many identical files 
are in a filesystem to give a simple example.

EMC has very big boxes for this with lots of RAM in it.
I think the first problem which has to be solved is the memory problem. 
Perhaps something asynchronous to find identical blocks and storing the 
checksums on disk?

Heinz

  reply	other threads:[~2009-04-28 20:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-27  3:33 Data Deduplication with the help of an online filesystem check Thomas Glanzmann
2009-04-27 13:37 ` Chris Mason
2009-04-28  5:22   ` Thomas Glanzmann
2009-04-28 10:02     ` Chris Mason
2009-04-28 13:49       ` Andrey Kuzmin
2009-04-28 13:58         ` Chris Mason
2009-04-28 14:04           ` Thomas Glanzmann
2009-04-28 17:21             ` Chris Mason
2009-04-28 20:10               ` Thomas Glanzmann
2009-04-28 20:29                 ` Thomas Glanzmann
2009-04-28 13:58         ` jim owens
2009-04-28 16:10       ` Anthony Roberts
2009-04-28 15:59   ` Thomas Glanzmann
2009-04-28 16:04     ` Tomasz Chmielewski
2009-04-28 17:29       ` Edward Shishkin
2009-04-28 17:34         ` Thomas Glanzmann
2009-04-28 17:38           ` Chris Mason
2009-04-28 17:43             ` Thomas Glanzmann
2009-04-28 17:45             ` Heinz-Josef Claes
2009-04-28 20:16               ` Thomas Glanzmann
2009-04-28 20:36                 ` Heinz-Josef Claes [this message]
2009-04-28 20:52                   ` Thomas Glanzmann
2009-04-28 20:58                     ` Chris Mason
2009-04-28 21:12                       ` Thomas Glanzmann
2009-04-28 21:26                         ` Chris Mason
2009-04-28 22:14                           ` Thomas Glanzmann
2009-04-28 23:18                             ` Chris Mason
2009-04-29 12:03                               ` Thomas Glanzmann
2009-04-29 13:11                                 ` Michael Tharp
2009-04-29 13:14                                 ` Chris Mason
2009-04-29 13:58                                   ` Thomas Glanzmann
2009-04-29 14:31                                     ` Chris Mason
2009-04-29 15:26                                       ` Thomas Glanzmann
2009-04-29 15:45                                         ` Chris Mason
2009-06-04  8:49                                           ` Thomas Glanzmann
2009-06-04 11:43                                             ` Chris Mason
2009-06-04 12:03                                               ` Thomas Glanzmann
2009-06-04 12:43                                                 ` Chris Mason
2009-06-05 12:20                                               ` Tomasz Chmielewski
2009-06-05 12:50                                                 ` Chris Mason
2009-06-05 15:35                                                   ` Tomasz Chmielewski
2009-04-29  0:06                       ` Bron Gondwana
2009-05-06 15:16               ` Sander
2009-04-28 17:32       ` Thomas Glanzmann
2009-04-28 17:41         ` Michael Tharp
2009-04-28 20:14           ` Thomas Glanzmann
2009-05-04 14:29           ` Ric Wheeler
2009-05-04 14:39             ` Tomasz Chmielewski
2009-05-04 14:45               ` Ric Wheeler
2009-05-04 15:15                 ` Thomas Glanzmann
2009-05-04 16:03                   ` Ric Wheeler
2009-05-04 16:16                     ` Andrey Kuzmin
2009-05-04 16:24                       ` Thomas Glanzmann
2009-05-04 18:06                         ` Jan-Frode Myklebust
2009-05-04 19:16                           ` Andrey Kuzmin
2009-05-05  8:02                           ` Thomas Glanzmann
2009-05-04 16:26                     ` Thomas Glanzmann
2009-05-04 19:11                       ` Heinz-Josef Claes
2009-05-04 21:29                         ` Dmitri Nikulin
2009-05-05  7:18                           ` Heinz-Josef Claes
2009-05-24  7:27                         ` Thomas Glanzmann
2009-04-28 17:23     ` Chris Mason
2009-04-28 17:37       ` Thomas Glanzmann
2009-04-28 17:43         ` Chris Mason
2009-04-28 20:15           ` Thomas Glanzmann
2009-04-28 21:19           ` Dmitri Nikulin
2009-04-28 20:24       ` Thomas Glanzmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200904282236.07428.hjclaes@web.de \
    --to=hjclaes@web.de \
    --cc=chris.mason@oracle.com \
    --cc=edward.shishkin@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mangoo@wpkg.org \
    --cc=thomas@glanzmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.