From: Andy Isaacson <adi@hexapodia.org>
To: omb@bluewin.ch
Cc: git@vger.kernel.org
Subject: Re: SHA1 hash safety
Date: Mon, 18 Apr 2005 00:43:23 -0700 [thread overview]
Message-ID: <20050418074323.GA29765@hexapodia.org> (raw)
In-Reply-To: <4261852B.6090507@khandalf.com>
[trimmed cc list, nobody wants to read this noise]
On Sat, Apr 16, 2005 at 11:35:39PM +0200, Brian O'Mahoney wrote:
> >> (1) I _have_ seen real-life collisions with MD5, in the context of
> >> Document management systems containing ~10^6 ms-WORD documents.
> >
> > Dude! You could have been *famous*! Why the
> > aitch-ee-double-hockey-sticks didn't you publish this when you found it?
> > Seriously, man.
>
> The MD5 has was fine, or at least the code (a) produced the correct
> results on the published test cases, and, (b) was properly applied to
> all bytes of the file(s). I was surprised when it happened, which is why
> I bothered to post to this list at this time, so I make two more points
OK, I guess it's time for some remedial math.
There are 2^128 = 340282366920938463463374607431768211456 different MD5
hashes.
You are suggesting that you found a collision using ~1e6 = ~1,000,000
plaintexts.
Let's suppose there were actually 100,000,000 = 1e8 plaintexts, just in
case you underestimated the number.
Applying the birthday paradox, we have a 50% probability that you'd find
one collision if there were ~7,213,475,309,916,173 possible hash values.
If you extend the birthday argument ("what is the probability of a
collision if you take N samples from a set of size M?") you get the
following results, with N = 1e8:
50% (1 in 2) probability of collision in 7,213,475,309,916,173.
1% (1 in 100) probability of collision in 497,496,027,172,833,194.
.05% (1 in 1845) probability of collision in 9,223,372,036,854,775,806.
That's where my quick-and-dirty solver craps out, but we're still a
really long ways from
340,282,366,920,938,463,463,374,607,431,768,211,456.
A simple linear extrapolation (certainly wrong, but not by more than a
few dozen orders of magnitude) says that the odds would be
1 in 68,056,473,384,187,692,692 for the full MD5 hash (I'm not even
going to dignify that with a percentage).
I'm not going to do the sums, but I would hazard a guess that it's more
likely your PC suffered a cosmic-ray-induced memory fault - EACH OF THE
FOUR TIMES YOU TESTED IT - causing it to report the same MD5, than that
you actually discovered a collision with a measly million (or even
hundred million) plaintexts.
(Of course, I don't know how many tests of the hash you actually did.
But the point stands.)
Hell, if you're *that* lucky, what are you doing in IT? You could be
making a killing at the roulette table.
Or, even more likely, there was some other factor in the system (most
likely that it was only using a few bits, probably 32, of the hash
when looking for collisions) that resulted in a false alarm.
If you had actual evidence of a collision, I'd love to see it - even if
it's just the equivalent of
% md5 foo
d3b07384d113edec49eaa6238ad5ff00 foo
% md5 bar
d3b07384d113edec49eaa6238ad5ff00 bar
% cmp foo bar
foo bar differ: byte 25, line 1
%
But in the absence of actual evidence, we have to assume (just based on
the probabilities) that there was some error in your testing.
-andy
next prev parent reply other threads:[~2005-04-18 7:39 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-16 12:24 SHA1 hash safety David Lang
2005-04-16 12:31 ` Ingo Molnar
2005-04-16 12:48 ` David Lang
2005-04-16 13:29 ` Brian O'Mahoney
2005-04-16 14:58 ` C. Scott Ananian
2005-04-16 15:11 ` Petr Baudis
2005-04-16 15:36 ` C. Scott Ananian
2005-04-16 22:56 ` David Lang
2005-04-16 23:11 ` Paul Jackson
2005-04-16 23:18 ` Martin Mares
2005-04-17 4:38 ` David A. Wheeler
2005-04-18 0:09 ` Theodore Ts'o
2005-04-16 15:49 ` ross
2005-04-17 6:35 ` Horst von Brand
2005-04-18 2:07 ` Brian O'Mahoney
2005-04-18 16:50 ` C. Scott Ananian
2005-04-16 19:16 ` Paul Jackson
2005-04-16 21:35 ` Brian O'Mahoney
2005-04-18 7:43 ` Andy Isaacson [this message]
2005-04-18 17:04 ` C. Scott Ananian
2005-04-19 22:30 ` David Meybohm
2005-04-19 22:48 ` C. Scott Ananian
2005-04-20 18:56 ` David Meybohm
2005-04-16 22:46 ` David Lang
2005-04-16 23:14 ` Paul Jackson
2005-04-16 22:33 ` David Lang
2005-04-17 3:23 ` Tkil
2005-04-17 4:09 ` Paul Jackson
2005-04-17 4:43 ` Tkil
2005-04-17 5:09 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050418074323.GA29765@hexapodia.org \
--to=adi@hexapodia.org \
--cc=git@vger.kernel.org \
--cc=omb@bluewin.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.