From: Gordan Bobic <gordan@bobich.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Offline Deduplication for Btrfs
Date: Thu, 06 Jan 2011 10:39:31 +0000 [thread overview]
Message-ID: <4D259BE3.5060705@bobich.net> (raw)
In-Reply-To: <4D25213D.1080504@shiftmail.org>
Spelic wrote:
> On 01/06/2011 02:03 AM, Gordan Bobic wrote:
>>
>> That's just alarmist. AES is being cryptanalyzed because everything
>> uses it. And the news of it's insecurity are somewhat exaggerated (for
>> now at least).
>
> Who cares... the fact of not being much used is a benefit for RIPEMD /
> blowfish-twofish then.
> Nobody makes viruses for Linux because they target windows. Same thing...
> RIPEMD has still an advantage over SHA imho, and blowfish over AES.
Just because nobody attacked it yet doesn't justify complacency.
>>> If there is full blocks compare, a simpler/faster algorithm could be
>>> chosen, like md5. Or even a md-64bits which I don't think it exists, but
>>> you can take MD4 and then xor the first 8 bytes with the second 8 bytes
>>> so to reduce it to 8 bytes only. This is just because it saves 60% of
>>> the RAM occupation during dedup, which is expected to be large, and the
>>> collisions are still insignificant at 64bits. Clearly you need to do
>>> full blocks compare after that.
>>
>> I really don't think the cost in terms of a few bytes per file for the
>> hashes is that significant.
>
> 20 to 8 = 12 bytes per *filesystem block* saved, I think
> Aren't we talking about block-level deduplication?
> For every TB of filesystem you occupy 2GB of RAM with hashes instead of
> 5.3GB (I am assuming 4K blocks, I don't remember how big are btrfs blocks)
> For a 24 * 2TB storage you occupy 96GB instead of 254GB of RAM. It might
> be the edge between feasible and not feasible.
> Actually it might not be feasible anyway... an option to store hashes
> into a ssd should be provided then...
You wouldn't necessarily have to keep the whole index in RAM, but if you
don't you'd get hit for an extra O(log(n)) disk seeks.
Gordan
next prev parent reply other threads:[~2011-01-06 10:39 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50 ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41 ` Diego Calleja
2011-01-05 19:01 ` Ray Van Dolson
2011-01-05 20:27 ` Gordan Bobic
2011-01-05 20:28 ` Josef Bacik
2011-01-05 20:25 ` Gordan Bobic
2011-01-05 21:14 ` Diego Calleja
2011-01-05 21:21 ` Gordan Bobic
2011-01-05 19:46 ` Josef Bacik
2011-01-05 19:58 ` Lars Wirzenius
2011-01-05 20:15 ` Josef Bacik
2011-01-05 20:34 ` Freddie Cash
2011-01-05 21:07 ` Lars Wirzenius
2011-01-05 20:12 ` Freddie Cash
2011-01-05 20:46 ` Gordan Bobic
[not found] ` <4D250B3C.6010708@shiftmail.org>
2011-01-06 1:03 ` Gordan Bobic
2011-01-06 1:56 ` Spelic
2011-01-06 10:39 ` Gordan Bobic [this message]
2011-01-06 3:33 ` Freddie Cash
2011-01-06 1:19 ` Spelic
2011-01-06 3:58 ` Peter A
2011-01-06 10:48 ` Gordan Bobic
2011-01-06 13:33 ` Peter A
2011-01-06 14:00 ` Gordan Bobic
2011-01-06 14:52 ` Peter A
2011-01-06 15:07 ` Gordan Bobic
2011-01-06 16:11 ` Peter A
2011-01-06 18:35 ` Chris Mason
2011-01-08 0:27 ` Peter A
2011-01-06 14:30 ` Tomasz Torcz
2011-01-06 14:49 ` Gordan Bobic
2011-01-06 1:29 ` Chris Mason
2011-01-06 10:33 ` Gordan Bobic
2011-01-10 15:28 ` Ric Wheeler
2011-01-10 15:37 ` Josef Bacik
2011-01-10 15:39 ` Chris Mason
2011-01-10 15:43 ` Josef Bacik
2011-01-06 12:18 ` Simon Farnsworth
2011-01-06 12:29 ` Gordan Bobic
2011-01-06 13:30 ` Simon Farnsworth
2011-01-06 14:20 ` Ondřej Bílka
2011-01-06 14:41 ` Gordan Bobic
2011-01-06 15:37 ` Ondřej Bílka
2011-01-06 8:25 ` Yan, Zheng
-- strict thread matches above, loose matches on Subject: below --
2011-01-06 9:37 Tomasz Chmielewski
2011-01-06 9:51 ` Mike Hommey
2011-01-06 16:57 ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16 0:18 Arjen Nienhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D259BE3.5060705@bobich.net \
--to=gordan@bobich.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).