From: Heinz-Josef Claes <hjclaes@web.de>
To: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Chris Mason <chris.mason@oracle.com>,
Edward Shishkin <edward.shishkin@gmail.com>,
Tomasz Chmielewski <mangoo@wpkg.org>,
linux-btrfs@vger.kernel.org
Subject: Re: Data Deduplication with the help of an online filesystem check
Date: Tue, 28 Apr 2009 22:36:07 +0200 [thread overview]
Message-ID: <200904282236.07428.hjclaes@web.de> (raw)
In-Reply-To: <20090428201619.GK7217@cip.informatik.uni-erlangen.de>
Am Dienstag, 28. April 2009 22:16:19 schrieb Thomas Glanzmann:
> Hello Heinz,
>
> > It's not only cpu time, it's also memory. You need 32 byte for each 4k
> > block. It needs to be in RAM for performance reason.
>
> exactly and that is not going to scale.
>
> Thomas
Hi Thomas,
I wrote a backup tool which uses dedup, so I know a little bit about the
problem and the performance impact if the checksums are not in memory
(optionally in that tool).
http://savannah.gnu.org/projects/storebackup
Dedup really helps a lot - I think more than I could imagine before I was
engaged in this kind of backup. You will not beleve how many identical files
are in a filesystem to give a simple example.
EMC has very big boxes for this with lots of RAM in it.
I think the first problem which has to be solved is the memory problem.
Perhaps something asynchronous to find identical blocks and storing the
checksums on disk?
Heinz
next prev parent reply other threads:[~2009-04-28 20:36 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-27 3:33 Data Deduplication with the help of an online filesystem check Thomas Glanzmann
2009-04-27 13:37 ` Chris Mason
2009-04-28 5:22 ` Thomas Glanzmann
2009-04-28 10:02 ` Chris Mason
2009-04-28 13:49 ` Andrey Kuzmin
2009-04-28 13:58 ` Chris Mason
2009-04-28 14:04 ` Thomas Glanzmann
2009-04-28 17:21 ` Chris Mason
2009-04-28 20:10 ` Thomas Glanzmann
2009-04-28 20:29 ` Thomas Glanzmann
2009-04-28 13:58 ` jim owens
2009-04-28 16:10 ` Anthony Roberts
2009-04-28 15:59 ` Thomas Glanzmann
2009-04-28 16:04 ` Tomasz Chmielewski
2009-04-28 17:29 ` Edward Shishkin
2009-04-28 17:34 ` Thomas Glanzmann
2009-04-28 17:38 ` Chris Mason
2009-04-28 17:43 ` Thomas Glanzmann
2009-04-28 17:45 ` Heinz-Josef Claes
2009-04-28 20:16 ` Thomas Glanzmann
2009-04-28 20:36 ` Heinz-Josef Claes [this message]
2009-04-28 20:52 ` Thomas Glanzmann
2009-04-28 20:58 ` Chris Mason
2009-04-28 21:12 ` Thomas Glanzmann
2009-04-28 21:26 ` Chris Mason
2009-04-28 22:14 ` Thomas Glanzmann
2009-04-28 23:18 ` Chris Mason
2009-04-29 12:03 ` Thomas Glanzmann
2009-04-29 13:11 ` Michael Tharp
2009-04-29 13:14 ` Chris Mason
2009-04-29 13:58 ` Thomas Glanzmann
2009-04-29 14:31 ` Chris Mason
2009-04-29 15:26 ` Thomas Glanzmann
2009-04-29 15:45 ` Chris Mason
2009-06-04 8:49 ` Thomas Glanzmann
2009-06-04 11:43 ` Chris Mason
2009-06-04 12:03 ` Thomas Glanzmann
2009-06-04 12:43 ` Chris Mason
2009-06-05 12:20 ` Tomasz Chmielewski
2009-06-05 12:50 ` Chris Mason
2009-06-05 15:35 ` Tomasz Chmielewski
2009-04-29 0:06 ` Bron Gondwana
2009-05-06 15:16 ` Sander
2009-04-28 17:32 ` Thomas Glanzmann
2009-04-28 17:41 ` Michael Tharp
2009-04-28 20:14 ` Thomas Glanzmann
2009-05-04 14:29 ` Ric Wheeler
2009-05-04 14:39 ` Tomasz Chmielewski
2009-05-04 14:45 ` Ric Wheeler
2009-05-04 15:15 ` Thomas Glanzmann
2009-05-04 16:03 ` Ric Wheeler
2009-05-04 16:16 ` Andrey Kuzmin
2009-05-04 16:24 ` Thomas Glanzmann
2009-05-04 18:06 ` Jan-Frode Myklebust
2009-05-04 19:16 ` Andrey Kuzmin
2009-05-05 8:02 ` Thomas Glanzmann
2009-05-04 16:26 ` Thomas Glanzmann
2009-05-04 19:11 ` Heinz-Josef Claes
2009-05-04 21:29 ` Dmitri Nikulin
2009-05-05 7:18 ` Heinz-Josef Claes
2009-05-24 7:27 ` Thomas Glanzmann
2009-04-28 17:23 ` Chris Mason
2009-04-28 17:37 ` Thomas Glanzmann
2009-04-28 17:43 ` Chris Mason
2009-04-28 20:15 ` Thomas Glanzmann
2009-04-28 21:19 ` Dmitri Nikulin
2009-04-28 20:24 ` Thomas Glanzmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200904282236.07428.hjclaes@web.de \
--to=hjclaes@web.de \
--cc=chris.mason@oracle.com \
--cc=edward.shishkin@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=mangoo@wpkg.org \
--cc=thomas@glanzmann.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox