All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: Paul Jackson <pj@sgi.com>
Cc: David Lang <dlang@digitalinsight.com>,
	cscott@cscott.net, pasky@ucw.cz, omb@bluewin.ch, mingo@elte.hu,
	git@vger.kernel.org
Subject: Re: SHA1 hash safety
Date: Sun, 17 Apr 2005 00:38:37 -0400	[thread overview]
Message-ID: <4261E84D.6040208@dwheeler.com> (raw)
In-Reply-To: <20050416161153.534b47d5.pj@sgi.com>

Paul Jackson wrote:
>>what I'm talking about is the chance that somewhere, sometime there will 
>>be two different documents that end up with the same hash
> 
> I have vastly greater chance of a file colliding due to hardware or
> software glitch than a random message digest collision of two legitimate
> documents.

The probability of an accidental overlap for SHA-1 for two
different files is absurdly remote; it's just not worth worrying about.

However, the possibility of an INTENTIONAL overlap is a completely
different matter.  I think the hash algorithm should change in the
future; I have a proposal below.

Someone has ALREADY broken into a server to modify the Linux kernel
code already, so the idea of an attack on kernel code
is not an idle fantasy. MD5 is dead, and SHA-1's work factor has
already been sufficiently broken that people have already been told
"walk to the exits" (i.e., DO NOT USE SHA-1 for new programs like git).

The fact that blobs are compressed first, with a length header
in front, _may_ make it harder to attack.  But maybe not.
I haven't checked for this case, but most decompression algorithms
I know of have a "don't change" mode that essentially just copies the
data behind it.  If the one used in git has such a mode
(I bet it does!), an attacker could use that mode to
make it MUCH easier to create an attack vector than it would
appear at first.  Now the attacker just needs to create a collision
(hmmm, where was that paper?).  Remember, you don't need to
run a hash algorithm over an entire file; you can precompute
to near the end, and then try your iterations from there.
A little hardware (inc. FPGAs) would speed the attack.

Of course, that assumes you actually
check everything to make sure that an attacker can't slip
in something different. After each rsync, are all new files'
hash values checked?  Do they uncompress to right length?
Do they have excess data after the decompression?
I'm hoping that sort of input-checking (since the data
might be from an attacker, if indirectly!) is already going on,
though I haven't reviewed the git source code.

While the jury's still out, the current belief by most folks
I talk to is that SHA-1 variants with more bits, such as SHA-256,
are the way to go now.  The SHA-1 attack simply reduces
the work factor (it's not a COMPLETE break), so adding
more bits is believed to increase the work factor
enough to counter it.

Adding more information to the hash can make attacking even harder.
Here's one idea: whenever that hash algorithm
switch occurs, create a new "hash" value as this:
   SHA-256 "+" uncompressed-length
Where SHA-256 is computed just like SHA-1 is now, e.g.,
SHA-256(file) where file = typecode + length + compressed data.
Leave the internal format as-is (with the length embedded as well).
This means that an attacker has to come up with an attack
that creates the same length uncompressed, yet has the same hash
of the compressed result. That's harder to do.
Length is also really, really cheap to compute :-).
That also might help the convince the "what happens if there's
an accidental collision" crowd: now, if the file lengths
are different, you're GUARANTEED that the hash values are different,
though that's not the best reason to do that.

One reason to think about switching sooner rather than later
is that it'd be really nice if the object store also included
signatures, so that in one fell swoop you could check who signed what
(and thus you could later on CONFIRM with much more certainty who
REALLY submitted a given change... say if it was clearly malicious).
If you switch hash algorithms, the signatures might not work,
depending on how you do it.

--- David A. Wheeler

  parent reply	other threads:[~2005-04-17  4:33 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-16 12:24 SHA1 hash safety David Lang
2005-04-16 12:31 ` Ingo Molnar
2005-04-16 12:48   ` David Lang
2005-04-16 13:29     ` Brian O'Mahoney
2005-04-16 14:58       ` C. Scott Ananian
2005-04-16 15:11         ` Petr Baudis
2005-04-16 15:36           ` C. Scott Ananian
2005-04-16 22:56             ` David Lang
2005-04-16 23:11               ` Paul Jackson
2005-04-16 23:18                 ` Martin Mares
2005-04-17  4:38                 ` David A. Wheeler [this message]
2005-04-18  0:09                   ` Theodore Ts'o
2005-04-16 15:49         ` ross
2005-04-17  6:35           ` Horst von Brand
2005-04-18  2:07             ` Brian O'Mahoney
2005-04-18 16:50             ` C. Scott Ananian
2005-04-16 19:16         ` Paul Jackson
2005-04-16 21:35         ` Brian O'Mahoney
2005-04-18  7:43           ` Andy Isaacson
2005-04-18 17:04             ` C. Scott Ananian
2005-04-19 22:30             ` David Meybohm
2005-04-19 22:48               ` C. Scott Ananian
2005-04-20 18:56                 ` David Meybohm
2005-04-16 22:46         ` David Lang
2005-04-16 23:14           ` Paul Jackson
2005-04-16 22:33       ` David Lang
2005-04-17  3:23       ` Tkil
2005-04-17  4:09         ` Paul Jackson
2005-04-17  4:43           ` Tkil
2005-04-17  5:09             ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4261E84D.6040208@dwheeler.com \
    --to=dwheeler@dwheeler.com \
    --cc=cscott@cscott.net \
    --cc=dlang@digitalinsight.com \
    --cc=git@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=omb@bluewin.ch \
    --cc=pasky@ucw.cz \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.