All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Boyd Waters <waters.boyd@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Content based storage
Date: Sat, 20 Mar 2010 09:05:21 -0400	[thread overview]
Message-ID: <4BA4C811.4060702@redhat.com> (raw)
In-Reply-To: <2b0225fb1003191946k1cf92c63q18e40d41274ce3e8@mail.gmail.com>

On 03/19/2010 10:46 PM, Boyd Waters wrote:
> 2010/3/17 Hubert Kario<hka@qbs.com.pl>:
>    
>> Read further, Sun did provide a way to enable the compare step by using
>> "verify" instead of "on":
>> zfs set dedup=verify<pool>
>>      
> I have tested ZFS deduplication on the same data set that I'm using to
> test btrfs. I used a 5-element radiz, dedup=on, which uses SHA256 for
> ZFS checksumming and duplication detection on Build 133 of OpenSolaris
> for x86_64.
>
> Subjectively, I felt that the array writes were slower than without
> dedup. For a while, the option for "dedup=fletcher4,verify" was in the
> system, which permitted the (faster, more prone to collisions)
> fletcher4 hash for ZFS checksum, and full comparison in the
> (relatively rare) case of collision. Darren Moffat worked to unify the
> ZFS SHA256 code with the OpenSolaris crypo-api implementation, which
> improved performance [1]. But I was not able to test that
> implementation.
>
> My dataset reported a dedup factor of 1.28 for about 4TB, meaning that
> almost a third of the dataset was duplicated. This seemed plausible,
> as the dataset includes multiple backups of a 400GB data set, as well
> as numerous VMWare virtual machines.
>    

It is always interesting to compare this to the rate you would get with 
old fashioned compression to see how effective this is. Seems to be not 
that aggressive if I understand your results correctly.

Any idea of how compressible your data set was?

Regards,

Ric


> Despite the performance hit, I'd be pleased to see work on this
> continue. Darren Moffat's performance improvements were encouraging,
> and the data set integrity was rock-solid. I had a disk failure during
> this test, which almost certainly had far more impact on performance
> than the deduplication: failed writes to the disk were blocking I/O,
> and it got pretty bad before I was able to replace the disk. I never
> lost any data, and array management was dead simple.
>
> So anyway FWIW the ZFS dedup implementation is a good one, and had
> headroom for improvement.
>
> Finally, ZFS also lets you set a minimum number of duplicates that you
> would like applied to the dataset; it only starts pointing to existing
> blocks after the "duplication minimum" is reached. (dedupditto
> property) [2]
>
>
> [1] http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via
> [2] http://opensolaris.org/jive/thread.jspa?messageID=426661
>
>    


  reply	other threads:[~2010-03-20 13:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-16  9:21 Content based storage David Brown
2010-03-16 22:45 ` Fabio
2010-03-17  8:21   ` David Brown
2010-03-17  0:45 ` Hubert Kario
2010-03-17  8:27   ` David Brown
2010-03-17  8:48     ` Heinz-Josef Claes
2010-03-17 15:25       ` Hubert Kario
2010-03-17 15:33         ` Leszek Ciesielski
2010-03-17 19:43           ` Hubert Kario
2010-03-20  2:46             ` Boyd Waters
2010-03-20 13:05               ` Ric Wheeler [this message]
2010-03-20 21:24                 ` Boyd Waters
2010-03-20 22:16                   ` Ric Wheeler
2010-03-20 22:44                     ` Ric Wheeler
2010-03-21  6:55                       ` Boyd Waters
2010-03-18 23:33   ` create debian package of btrfs kernel from git tree rk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BA4C811.4060702@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waters.boyd@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.