From: Bart Samwel <bart@samwel.tk>
To: Matthias Schniedermeyer <ms@citd.de>
Cc: Timothy Miller <miller@techsource.com>,
Valdis.Kletnieks@vt.edu,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [OT] Redundancy eliminating file systems, breaking MD5, donating money to OSDL
Date: Wed, 21 Jan 2004 12:46:50 +0100 [thread overview]
Message-ID: <400E66AA.1020403@samwel.tk> (raw)
In-Reply-To: <20040120192114.GA30755@citd.de>
Matthias Schniedermeyer wrote:
> On Sat, Jan 17, 2004 at 02:15:31PM +0100, Bart Samwel wrote:
[...]
>>Let's take a look at the chances. 30 terabytes is, in a best-case scenario
>>(with 512-byte blocks) about 6e10 blocks. That would be roughly
>>6e10*6e10*(2**(-128)), or about 1e-17. With a hundred million machines, the
>>chances of a collision would be about 1e-9, disregarding the fact that all
>>these machines have a large chance of containing similar blocks -- their data
>>isn't truly random, so some blocks have a larger chance of occurring than
>>others. The data sets on the machines are probably reasonably static, so if
>>the collision isn't found *at once* the chances of it occurring later are
>>much smaller. So, even under the most positive assumptions, with a hundred
>>million machines with 30 terabytes of storage each, it's extremely probable
>>that you won't find a collision. (A 96-bit hash could have been broken with
>>this setup however. :) )
>
> There is one fundemental braino in the discussion.
>
> Only HALF the bits are for preventing "accidental" collisions. (The
> "birthday" thing). The rest is for preventing to "brute force" an input
> that produces the same MD5.(*)
From RFC-1321:
"It is conjectured that the difficulty of coming up with two messages
having the same message digest is on the order of 2^64 operations,
and that the difficulty of coming up with any message having a given
message digest is on the order of 2^128 operations."
So, they say it's on the order of 2^64 operations, whatever their
definition of "operation" is. It probably means that they think you need
to take the MD5 of something in the order of 2^64 random messages in
order to have a reasonable chance of finding a duplicate. The RFC says
nothing about half of the bits being for one purpose and the other for
another; in the algorithm all bits seem to be processed in a similar way.
Let's see where they got their 2^64. If you've got 2^64 messages, you've
got (with the same logic as above) approximately 2^64 * 2^64 * (1 /
2^128) = 100% chance of a birthday collision. This seems to support the
idea that the kind of analysis that I just did corresponds to the way
they look at it.
> *: AFAIR i read this in the specs of SHA1 (160 bits). So i guess this
> is also true for MD5.
It might be that you took the "2^64 operations" as meaning "64 bits" (or
2^80 as 80 bits, in the SHA1 case), while it now seems to be the result
of a birthday-collision calculation. Not a surprising misinterpretation
BTW, because the RFC doesn't specify at all where they got this number.
> Btw. I already had (a/the) MD5 collision(*2) in my life.
> *2: I had a direcory of about 1,5 Million images and "md5sum"med them to
> eliminate doubles. The Log-file, at one point, had the same md5sum as
> one of the pictures.
Wow! It might be that MD5 is not such a good hash function after all
then. The security is of course purely based on the hashes of messages
being very randomly distributed. If they really were, the chances of
this happening to you would have been extremely slim (less than 1e-20).
I think the more probable explanation is that MD5 hashes aren't truly
randomly distributed after all. :)
-- Bart
next prev parent reply other threads:[~2004-01-21 11:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-01-16 20:22 [OT] Redundancy eliminating file systems, breaking MD5, donating money to OSDL Timothy Miller
2004-01-16 20:37 ` Valdis.Kletnieks
2004-01-16 20:59 ` Timothy Miller
2004-01-17 13:15 ` Bart Samwel
2004-01-20 19:21 ` Matthias Schniedermeyer
2004-01-21 11:46 ` Bart Samwel [this message]
2004-01-22 0:12 ` Pavel Machek
2004-01-22 8:29 ` Matthias Schniedermeyer
2004-01-22 2:36 ` Jamie Lokier
2004-01-22 8:51 ` Matthias Schniedermeyer
-- strict thread matches above, loose matches on Subject: below --
2004-01-20 18:06 Clayton Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=400E66AA.1020403@samwel.tk \
--to=bart@samwel.tk \
--cc=Valdis.Kletnieks@vt.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=miller@techsource.com \
--cc=ms@citd.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox