Re: Content based storage - Leszek Ciesielski

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Leszek Ciesielski <skolima@gmail.com>
To: Hubert Kario <hka@qbs.com.pl>, linux-btrfs@vger.kernel.org
Subject: Re: Content based storage
Date: Wed, 17 Mar 2010 16:33:41 +0100	[thread overview]
Message-ID: <23a15591003170833t3ec4dc3fq9630558aa190afc@mail.gmail.com> (raw)
In-Reply-To: <201003171625.50257.hka@qbs.com.pl>

On Wed, Mar 17, 2010 at 4:25 PM, Hubert Kario <hka@qbs.com.pl> wrote:
> On Wednesday 17 March 2010 09:48:18 Heinz-Josef Claes wrote:
>> Hi,
>>
>> just want to add one correction to your thoughts:
>>
>> Storage is not cheap if you think about enterprise storage on a SAN,
>> replicated to another data centre. Using dedup on the storage boxes =
leads
>> =A0to performance issues and other problems - only NetApp is offerin=
g this at
>> =A0the moment and it's not heavily used (because of the issues).
>
> there are at least two other suppliers with inline dedup products and=
 there is
> OSS solution: lessfs
>
>> So I think it would be a big advantage for professional use to have =
dedup
>> build into the filesystem - processors are faster and faster today a=
nd not
>> =A0the cost drivers any more. I do not think it's a problem to "spen=
d" on
>> =A0core of a 2 socket box with 12 cores for this purpose.
>> Storage is cost intensive:
>> - SAN boxes are expensive
>> - RAID5 in two locations is expensive
>> - FC lines between locations is expensive (depeding very much on whe=
re you
>> are).
>
> In-line dedup is expensive in two ways: first you have to cache the d=
ata going
> to disk and generate checksum for it, then you have to look if such b=
lock is
> already stored -- if the database doesn't fit into RAM (for a VM host=
 it's more
> than likely) it requires at least few disk seeks, if not a few dozen =
for
> really big databases. Then you should read the block/extent back and =
compare
> them bit for bit. And only then write the data to the disk. That redu=
ces your
> IOPS by at least an order of maginitude, if not more.

Sun decided that with SHA256 (which ZFS uses for normal checksumming)
collisions are unlikely enough to skip the read/compare step:
http://blogs.sun.com/bonwick/entry/zfs_dedup . That's not the case, of
course, with btrfs-used CRC32, but a switch to a stronger hash would
be recommended to reduce collisions anyway. And yes, for the truly
paranoid, a forced verification (after the hashes match) is always an
option.

>
> For post-process dedup you can go as fast as your HDDs will allow you=
=2E And
> then, when your machine is mostly idle you can go and churn through t=
he data.
>
> IMHO in-line dedup is a good thing only as storage for backups -- whe=
n you
> have high probability that the stored data is duplicated (and with a =
1:10
> dedup ratio you have 90% probability, it is).
>
> So the CPU cost is only one factor. HDDs are a major bottleneck too.
>
> All things considered, it would be best to have both post-process and=
 in-line
> data deduplication, but I think, that in-line dedup will see much les=
s use.
>
>>
>> Naturally, you would not use this feature for all kind of use cases =
(eg.
>> heavily used database), but I think there is enough need.
>>
>> my 2 cents,
>> Heinz-Josef Claes
> --
> Hubert Kario
> QBS - Quality Business Software
> 02-656 Warszawa, ul. Ksawer=F3w 30/85
> tel. +48 (22) 646-61-51, 646-74-24
> www.qbs.com.pl
>
> System Zarz=B1dzania Jako=B6ci=B1
> zgodny z norm=B1 ISO 9001:2000
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-03-17 15:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-16  9:21 Content based storage David Brown
2010-03-16 22:45 ` Fabio
2010-03-17  8:21   ` David Brown
2010-03-17  0:45 ` Hubert Kario
2010-03-17  8:27   ` David Brown
2010-03-17  8:48     ` Heinz-Josef Claes
2010-03-17 15:25       ` Hubert Kario
2010-03-17 15:33         ` Leszek Ciesielski [this message]
2010-03-17 19:43           ` Hubert Kario
2010-03-20  2:46             ` Boyd Waters
2010-03-20 13:05               ` Ric Wheeler
2010-03-20 21:24                 ` Boyd Waters
2010-03-20 22:16                   ` Ric Wheeler
2010-03-20 22:44                     ` Ric Wheeler
2010-03-21  6:55                       ` Boyd Waters
2010-03-18 23:33   ` create debian package of btrfs kernel from git tree rk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23a15591003170833t3ec4dc3fq9630558aa190afc@mail.gmail.com \
    --to=skolima@gmail.com \
    --cc=hka@qbs.com.pl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).