From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boyd Waters Subject: Re: Content based storage Date: Sat, 20 Mar 2010 17:24:39 -0400 Message-ID: <-1949627963487010042@unknownmsgid> References: <201003171625.50257.hka@qbs.com.pl> <23a15591003170833t3ec4dc3fq9630558aa190afc@mail.gmail.com> <201003172043.17314.hka@qbs.com.pl> <2b0225fb1003191946k1cf92c63q18e40d41274ce3e8@mail.gmail.com> <4BA4C811.4060702@redhat.com> Mime-Version: 1.0 (iPhone Mail 7C144) Content-Type: text/plain; charset=ISO-8859-1 Cc: "linux-btrfs@vger.kernel.org" To: Ric Wheeler Return-path: In-Reply-To: <4BA4C811.4060702@redhat.com> List-ID: On Mar 20, 2010, at 9:05 AM, Ric Wheeler wrote: >> >> My dataset reported a dedup factor of 1.28 for about 4TB, meaning >> that >> almost a third of the dataset was duplicated. > It is always interesting to compare this to the rate you would get > with old fashioned compression to see how effective this is. Seems > to be not that aggressive if I understand your results correctly. > > Any idea of how compressible your data set was? Well, of course if I used zip on the whole 4 TB that would deal with my duplication issues, and give me a useless, static blob with no checksumming. I haven't tried. > One thing that I did do, seven (!) years ago, was to detect duplicate files (not blocks) and use hard links. I was able to squeeze out all of the air in a series of backups, and was able to see all of them. I used a Perl script for all this. It was nuts, but now I understand why Apple implemented hard links to directories in HFS in order to get thier Time Machine product. I didn't have copy-on-write, so btrfs snapshots completely spank a manual system like this, but I did get 7- to-1 compression. These days you can use rsync with "--link-target" to make hard-linked duplicates of large directory trees. Tar, cpio, and friends tend to break when transferring hundreds of gigabytes with thousands of hard links. Or they ignore the hard links. Good times. I'm not sure how this is germane to btrfs, except to point out pathological file-system usage that I've actually attempted in real life. I actually use a lot of the ZFS feature set, and I look forward to btrfs stability. I think btrfs can get there.