All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Boyd Waters <waters.boyd@gmail.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Content based storage
Date: Sat, 20 Mar 2010 18:44:24 -0400	[thread overview]
Message-ID: <4BA54FC8.60806@redhat.com> (raw)
In-Reply-To: <4BA5492C.5030709@redhat.com>

On 03/20/2010 06:16 PM, Ric Wheeler wrote:
> On 03/20/2010 05:24 PM, Boyd Waters wrote:
>> On Mar 20, 2010, at 9:05 AM, Ric Wheeler<rwheeler@redhat.com>  wrote:
>>>>
>>>> My dataset reported a dedup factor of 1.28 for about 4TB, meaning
>>>> that
>>>> almost a third of the dataset was duplicated.
>>
>>> It is always interesting to compare this to the rate you would get
>>> with old fashioned compression to see how effective this is. Seems
>>> to be not that aggressive if I understand your results correctly.
>>>
>>> Any idea of how compressible your data set was?
>>
>> Well, of course if I used zip on the whole 4 TB that would deal with
>> my duplication issues, and give me a useless, static blob with no
>> checksumming. I haven't tried.
>
> gzip/bzip2 of the block device was not meant to give a best case 
> estimate of what traditional compression can do. Many block devices 
> (including some single spindle disks) can do encryption internally.

I meant to say was not meant to provide a useful compression just meant 
to measure how well block level encryption could do.

ric

>
>>
>> One thing that I did do, seven (!) years ago, was to detect duplicate
>> files (not blocks) and use hard links. I was able to squeeze out all
>> of the air in a series of backups, and was able to see all of them. I
>> used a Perl script for all this. It was nuts, but now I understand why
>> Apple implemented hard links to directories in HFS in order to get
>> thier Time Machine product.  I didn't have copy-on-write, so btrfs
>> snapshots completely spank a manual system like this, but I did get 7-
>> to-1 compression. These days you can use rsync with "--link-target" to
>> make hard-linked duplicates of large directory trees. Tar, cpio, and
>> friends tend to break when transferring hundreds of gigabytes with
>> thousands of hard links. Or they ignore the hard links.
>>
>> Good times. I'm not sure how this is germane to btrfs, except to point
>> out pathological file-system usage that I've actually attempted in
>> real life. I actually use a lot of the ZFS feature set, and I look
>> forward to btrfs stability. I think btrfs can get there.
>
> File level dedup is something we did in a group I worked with before 
> and can certainly be quite effective. Even better, it is much easier 
> to map into normal user expectations :-)
>
> ric
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2010-03-20 22:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-16  9:21 Content based storage David Brown
2010-03-16 22:45 ` Fabio
2010-03-17  8:21   ` David Brown
2010-03-17  0:45 ` Hubert Kario
2010-03-17  8:27   ` David Brown
2010-03-17  8:48     ` Heinz-Josef Claes
2010-03-17 15:25       ` Hubert Kario
2010-03-17 15:33         ` Leszek Ciesielski
2010-03-17 19:43           ` Hubert Kario
2010-03-20  2:46             ` Boyd Waters
2010-03-20 13:05               ` Ric Wheeler
2010-03-20 21:24                 ` Boyd Waters
2010-03-20 22:16                   ` Ric Wheeler
2010-03-20 22:44                     ` Ric Wheeler [this message]
2010-03-21  6:55                       ` Boyd Waters
2010-03-18 23:33   ` create debian package of btrfs kernel from git tree rk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BA54FC8.60806@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waters.boyd@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.