git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Isaacson <adi@hexapodia.org>
To: omb@bluewin.ch
Cc: git@vger.kernel.org
Subject: Re: SHA1 hash safety
Date: Mon, 18 Apr 2005 00:43:23 -0700	[thread overview]
Message-ID: <20050418074323.GA29765@hexapodia.org> (raw)
In-Reply-To: <4261852B.6090507@khandalf.com>

[trimmed cc list, nobody wants to read this noise]

On Sat, Apr 16, 2005 at 11:35:39PM +0200, Brian O'Mahoney wrote:
> >> (1) I _have_ seen real-life collisions with MD5, in the context of
> >>    Document management systems containing ~10^6 ms-WORD documents.
> > 
> > Dude!  You could have been *famous*!  Why the
> > aitch-ee-double-hockey-sticks didn't you publish this when you found it?
> > Seriously, man.
> 
> The MD5 has was fine, or at least the code (a) produced the correct
> results on the published test cases, and, (b) was properly applied to
> all bytes of the file(s). I was surprised when it happened, which is why
> I bothered to post to this list at this time, so I make two more points

OK, I guess it's time for some remedial math.

There are 2^128 = 340282366920938463463374607431768211456 different MD5
hashes.

You are suggesting that you found a collision using ~1e6 = ~1,000,000
plaintexts.

Let's suppose there were actually 100,000,000 = 1e8 plaintexts, just in
case you underestimated the number.

Applying the birthday paradox, we have a 50% probability that you'd find
one collision if there were ~7,213,475,309,916,173 possible hash values.
If you extend the birthday argument ("what is the probability of a
collision if you take N samples from a set of size M?") you get the
following results, with N = 1e8:

50% (1 in 2) probability of collision in           7,213,475,309,916,173.
1% (1 in 100) probability of collision in        497,496,027,172,833,194.
.05% (1 in 1845) probability of collision in   9,223,372,036,854,775,806.

That's where my quick-and-dirty solver craps out, but we're still a
really long ways from

                     340,282,366,920,938,463,463,374,607,431,768,211,456.

A simple linear extrapolation (certainly wrong, but not by more than a
few dozen orders of magnitude) says that the odds would be
1 in 68,056,473,384,187,692,692 for the full MD5 hash (I'm not even
going to dignify that with a percentage).

I'm not going to do the sums, but I would hazard a guess that it's more
likely your PC suffered a cosmic-ray-induced memory fault - EACH OF THE
FOUR TIMES YOU TESTED IT - causing it to report the same MD5, than that
you actually discovered a collision with a measly million (or even
hundred million) plaintexts.

(Of course, I don't know how many tests of the hash you actually did.
But the point stands.)

Hell, if you're *that* lucky, what are you doing in IT?  You could be
making a killing at the roulette table.

Or, even more likely, there was some other factor in the system (most
likely that it was only using a few bits, probably 32, of the hash
when looking for collisions) that resulted in a false alarm.

If you had actual evidence of a collision, I'd love to see it - even if
it's just the equivalent of
% md5 foo
d3b07384d113edec49eaa6238ad5ff00 foo
% md5 bar
d3b07384d113edec49eaa6238ad5ff00 bar
% cmp foo bar
foo bar differ: byte 25, line 1
%

But in the absence of actual evidence, we have to assume (just based on
the probabilities) that there was some error in your testing.

-andy

  reply	other threads:[~2005-04-18  7:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-16 12:24 SHA1 hash safety David Lang
2005-04-16 12:31 ` Ingo Molnar
2005-04-16 12:48   ` David Lang
2005-04-16 13:29     ` Brian O'Mahoney
2005-04-16 14:58       ` C. Scott Ananian
2005-04-16 15:11         ` Petr Baudis
2005-04-16 15:36           ` C. Scott Ananian
2005-04-16 22:56             ` David Lang
2005-04-16 23:11               ` Paul Jackson
2005-04-16 23:18                 ` Martin Mares
2005-04-17  4:38                 ` David A. Wheeler
2005-04-18  0:09                   ` Theodore Ts'o
2005-04-16 15:49         ` ross
2005-04-17  6:35           ` Horst von Brand
2005-04-18  2:07             ` Brian O'Mahoney
2005-04-18 16:50             ` C. Scott Ananian
2005-04-16 19:16         ` Paul Jackson
2005-04-16 21:35         ` Brian O'Mahoney
2005-04-18  7:43           ` Andy Isaacson [this message]
2005-04-18 17:04             ` C. Scott Ananian
2005-04-19 22:30             ` David Meybohm
2005-04-19 22:48               ` C. Scott Ananian
2005-04-20 18:56                 ` David Meybohm
2005-04-16 22:46         ` David Lang
2005-04-16 23:14           ` Paul Jackson
2005-04-16 22:33       ` David Lang
2005-04-17  3:23       ` Tkil
2005-04-17  4:09         ` Paul Jackson
2005-04-17  4:43           ` Tkil
2005-04-17  5:09             ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050418074323.GA29765@hexapodia.org \
    --to=adi@hexapodia.org \
    --cc=git@vger.kernel.org \
    --cc=omb@bluewin.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).