linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Boyd Waters <waters.boyd@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Content based storage
Date: Sat, 20 Mar 2010 09:05:21 -0400	[thread overview]
Message-ID: <4BA4C811.4060702@redhat.com> (raw)
In-Reply-To: <2b0225fb1003191946k1cf92c63q18e40d41274ce3e8@mail.gmail.com>

On 03/19/2010 10:46 PM, Boyd Waters wrote:
> 2010/3/17 Hubert Kario<hka@qbs.com.pl>:
>    
>> Read further, Sun did provide a way to enable the compare step by using
>> "verify" instead of "on":
>> zfs set dedup=verify<pool>
>>      
> I have tested ZFS deduplication on the same data set that I'm using to
> test btrfs. I used a 5-element radiz, dedup=on, which uses SHA256 for
> ZFS checksumming and duplication detection on Build 133 of OpenSolaris
> for x86_64.
>
> Subjectively, I felt that the array writes were slower than without
> dedup. For a while, the option for "dedup=fletcher4,verify" was in the
> system, which permitted the (faster, more prone to collisions)
> fletcher4 hash for ZFS checksum, and full comparison in the
> (relatively rare) case of collision. Darren Moffat worked to unify the
> ZFS SHA256 code with the OpenSolaris crypo-api implementation, which
> improved performance [1]. But I was not able to test that
> implementation.
>
> My dataset reported a dedup factor of 1.28 for about 4TB, meaning that
> almost a third of the dataset was duplicated. This seemed plausible,
> as the dataset includes multiple backups of a 400GB data set, as well
> as numerous VMWare virtual machines.
>    

It is always interesting to compare this to the rate you would get with 
old fashioned compression to see how effective this is. Seems to be not 
that aggressive if I understand your results correctly.

Any idea of how compressible your data set was?

Regards,

Ric


> Despite the performance hit, I'd be pleased to see work on this
> continue. Darren Moffat's performance improvements were encouraging,
> and the data set integrity was rock-solid. I had a disk failure during
> this test, which almost certainly had far more impact on performance
> than the deduplication: failed writes to the disk were blocking I/O,
> and it got pretty bad before I was able to replace the disk. I never
> lost any data, and array management was dead simple.
>
> So anyway FWIW the ZFS dedup implementation is a good one, and had
> headroom for improvement.
>
> Finally, ZFS also lets you set a minimum number of duplicates that you
> would like applied to the dataset; it only starts pointing to existing
> blocks after the "duplication minimum" is reached. (dedupditto
> property) [2]
>
>
> [1] http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via
> [2] http://opensolaris.org/jive/thread.jspa?messageID=426661
>
>    


  reply	other threads:[~2010-03-20 13:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-16  9:21 Content based storage David Brown
2010-03-16 22:45 ` Fabio
2010-03-17  8:21   ` David Brown
2010-03-17  0:45 ` Hubert Kario
2010-03-17  8:27   ` David Brown
2010-03-17  8:48     ` Heinz-Josef Claes
2010-03-17 15:25       ` Hubert Kario
2010-03-17 15:33         ` Leszek Ciesielski
2010-03-17 19:43           ` Hubert Kario
2010-03-20  2:46             ` Boyd Waters
2010-03-20 13:05               ` Ric Wheeler [this message]
2010-03-20 21:24                 ` Boyd Waters
2010-03-20 22:16                   ` Ric Wheeler
2010-03-20 22:44                     ` Ric Wheeler
2010-03-21  6:55                       ` Boyd Waters
2010-03-18 23:33   ` create debian package of btrfs kernel from git tree rk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BA4C811.4060702@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waters.boyd@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).