From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Offline Deduplication for Btrfs Date: Thu, 06 Jan 2011 10:39:31 +0000 Message-ID: <4D259BE3.5060705@bobich.net> References: <1294245410-4739-1-git-send-email-josef@redhat.com> <4D24AD92.4070107@bobich.net> <20110105194645.GC2562@localhost.localdomain> <4D24D8BC.90808@bobich.net> <4D250B3C.6010708@shiftmail.org> <4D2514DC.6060306@bobich.net> <4D25213D.1080504@shiftmail.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed To: linux-btrfs@vger.kernel.org Return-path: In-Reply-To: <4D25213D.1080504@shiftmail.org> List-ID: Spelic wrote: > On 01/06/2011 02:03 AM, Gordan Bobic wrote: >> >> That's just alarmist. AES is being cryptanalyzed because everything >> uses it. And the news of it's insecurity are somewhat exaggerated (for >> now at least). > > Who cares... the fact of not being much used is a benefit for RIPEMD / > blowfish-twofish then. > Nobody makes viruses for Linux because they target windows. Same thing... > RIPEMD has still an advantage over SHA imho, and blowfish over AES. Just because nobody attacked it yet doesn't justify complacency. >>> If there is full blocks compare, a simpler/faster algorithm could be >>> chosen, like md5. Or even a md-64bits which I don't think it exists, but >>> you can take MD4 and then xor the first 8 bytes with the second 8 bytes >>> so to reduce it to 8 bytes only. This is just because it saves 60% of >>> the RAM occupation during dedup, which is expected to be large, and the >>> collisions are still insignificant at 64bits. Clearly you need to do >>> full blocks compare after that. >> >> I really don't think the cost in terms of a few bytes per file for the >> hashes is that significant. > > 20 to 8 = 12 bytes per *filesystem block* saved, I think > Aren't we talking about block-level deduplication? > For every TB of filesystem you occupy 2GB of RAM with hashes instead of > 5.3GB (I am assuming 4K blocks, I don't remember how big are btrfs blocks) > For a 24 * 2TB storage you occupy 96GB instead of 254GB of RAM. It might > be the edge between feasible and not feasible. > Actually it might not be feasible anyway... an option to store hashes > into a ssd should be provided then... You wouldn't necessarily have to keep the whole index in RAM, but if you don't you'd get hit for an extra O(log(n)) disk seeks. Gordan