public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Tomasz Chmielewski <mangoo@wpkg.org>,
	Michael Tharp <gxti@partiallystapled.com>,
	Chris Mason <chris.mason@oracle.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Data Deduplication with the help of an online filesystem check
Date: Mon, 04 May 2009 12:03:58 -0400	[thread overview]
Message-ID: <49FF11EE.2060404@redhat.com> (raw)
In-Reply-To: <20090504151518.GA13777@cip.informatik.uni-erlangen.de>

Thomas Glanzmann wrote:
> Hello Ric,
> 
>> (1) Block level or file level dedup?
> 
> what is the difference between the two?
> 
>> (2) Inband dedup (during a write) or background dedup?
> 
> I think inband dedup is way to intensive on ressources (memory) and also
> would kill every performance benchmark. So I think the offline dedup is
> the right way to go.

I would not categorize it as offline, but just not as inband (i.e., you can run 
a low priority background process to handle dedup).  Offline windows are 
extremely rare in production sites these days and it could take a very long time 
to do dedup at the block level over a large file system :-)

> 
>> (3) How reliably can you protect the pool of blocks? How reliably can
>> you protect the database that maps hashes to blocks?
> 
> You have to lock down the i/o requests for the blocks in question and
> compare them byte by byte anyway, just to make sure.

Yes, one advantage we had in centera was that the objects were read-only and 
whole files, so this was not an issue for us.

> 
>> (4) Can you give users who are somewhat jaded confidence in your
>> solution (this is where stats come in very handy!)
> 
> For virtual machines you can reduce the used data by 1/3. Of course it
> can blow in your face when you don't watch your physical resources
> closely.
> 
>         Thomas

1/3 is not sufficient for dedup in my opinion - you can get that with normal 
compression at the block level.

Put another way, if the baseline is bzip2 levels of compression for the block 
device, when is the complexity of dedup going to pay off?

I think that Chris already mentioned that you can (for virt OS images) also 
imagine using copy on write snapshots (of something that is mostly read-only 
like the OS system partitions).  Make 50 copy-on-write snapshots of "/" 
(excluding /home) and you have an effective compression of 98% until someone 
rudely starts writing at least :-)

ric



  reply	other threads:[~2009-05-04 16:03 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-27  3:33 Data Deduplication with the help of an online filesystem check Thomas Glanzmann
2009-04-27 13:37 ` Chris Mason
2009-04-28  5:22   ` Thomas Glanzmann
2009-04-28 10:02     ` Chris Mason
2009-04-28 13:49       ` Andrey Kuzmin
2009-04-28 13:58         ` Chris Mason
2009-04-28 14:04           ` Thomas Glanzmann
2009-04-28 17:21             ` Chris Mason
2009-04-28 20:10               ` Thomas Glanzmann
2009-04-28 20:29                 ` Thomas Glanzmann
2009-04-28 13:58         ` jim owens
2009-04-28 16:10       ` Anthony Roberts
2009-04-28 15:59   ` Thomas Glanzmann
2009-04-28 16:04     ` Tomasz Chmielewski
2009-04-28 17:29       ` Edward Shishkin
2009-04-28 17:34         ` Thomas Glanzmann
2009-04-28 17:38           ` Chris Mason
2009-04-28 17:43             ` Thomas Glanzmann
2009-04-28 17:45             ` Heinz-Josef Claes
2009-04-28 20:16               ` Thomas Glanzmann
2009-04-28 20:36                 ` Heinz-Josef Claes
2009-04-28 20:52                   ` Thomas Glanzmann
2009-04-28 20:58                     ` Chris Mason
2009-04-28 21:12                       ` Thomas Glanzmann
2009-04-28 21:26                         ` Chris Mason
2009-04-28 22:14                           ` Thomas Glanzmann
2009-04-28 23:18                             ` Chris Mason
2009-04-29 12:03                               ` Thomas Glanzmann
2009-04-29 13:11                                 ` Michael Tharp
2009-04-29 13:14                                 ` Chris Mason
2009-04-29 13:58                                   ` Thomas Glanzmann
2009-04-29 14:31                                     ` Chris Mason
2009-04-29 15:26                                       ` Thomas Glanzmann
2009-04-29 15:45                                         ` Chris Mason
2009-06-04  8:49                                           ` Thomas Glanzmann
2009-06-04 11:43                                             ` Chris Mason
2009-06-04 12:03                                               ` Thomas Glanzmann
2009-06-04 12:43                                                 ` Chris Mason
2009-06-05 12:20                                               ` Tomasz Chmielewski
2009-06-05 12:50                                                 ` Chris Mason
2009-06-05 15:35                                                   ` Tomasz Chmielewski
2009-04-29  0:06                       ` Bron Gondwana
2009-05-06 15:16               ` Sander
2009-04-28 17:32       ` Thomas Glanzmann
2009-04-28 17:41         ` Michael Tharp
2009-04-28 20:14           ` Thomas Glanzmann
2009-05-04 14:29           ` Ric Wheeler
2009-05-04 14:39             ` Tomasz Chmielewski
2009-05-04 14:45               ` Ric Wheeler
2009-05-04 15:15                 ` Thomas Glanzmann
2009-05-04 16:03                   ` Ric Wheeler [this message]
2009-05-04 16:16                     ` Andrey Kuzmin
2009-05-04 16:24                       ` Thomas Glanzmann
2009-05-04 18:06                         ` Jan-Frode Myklebust
2009-05-04 19:16                           ` Andrey Kuzmin
2009-05-05  8:02                           ` Thomas Glanzmann
2009-05-04 16:26                     ` Thomas Glanzmann
2009-05-04 19:11                       ` Heinz-Josef Claes
2009-05-04 21:29                         ` Dmitri Nikulin
2009-05-05  7:18                           ` Heinz-Josef Claes
2009-05-24  7:27                         ` Thomas Glanzmann
2009-04-28 17:23     ` Chris Mason
2009-04-28 17:37       ` Thomas Glanzmann
2009-04-28 17:43         ` Chris Mason
2009-04-28 20:15           ` Thomas Glanzmann
2009-04-28 21:19           ` Dmitri Nikulin
2009-04-28 20:24       ` Thomas Glanzmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49FF11EE.2060404@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=gxti@partiallystapled.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mangoo@wpkg.org \
    --cc=thomas@glanzmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox