public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: linux-kernel@vger.kernel.org
Subject: Re: Call to atention about using hash functions as content indexers (SCM saga)
Date: Tue, 12 Apr 2005 11:29:55 -0400	[thread overview]
Message-ID: <20050412152955.GD9684@thunk.org> (raw)
In-Reply-To: <20050411224021.GA25106@larroy.com>

On Tue, Apr 12, 2005 at 12:40:21AM +0200, Pedro Larroy wrote:
> 
> I had a quick look at the source of GIT tonight, I'd like to warn you
> about the use of hash functions as content indexers.
> 
> As probably you are aware, hash functions such as SHA-1 are surjective not
> bijective (1-to-1 map), so they have collisions. Here one can argue
> about the low probability of such a collision, I won't get into
> subjetive valorations of what probability of collision is tolerable for
> me and what is not. 
> 
> I my humble opinion, choosing deliberately, as a design decision, a
> method such as this one, that in some cases could corrupt data in a
> silent and very hard to detect way, is not very good. 

Actually, it will very likely be very, very easy to detect.  What
happens if there is a hash collision?  Well, in the case of a
non-malicious collision, instead of retrieving the correct version of
a file, we will get some random version of another file.  The moment
you try to compile with that incorrect file, you will with an
extremely high probability, get a failed compile, which will be
blantently obvious.

In the case of a malicous attacker trying to introduce a collision, it
should be noted first of all that the recent SHA-1 breakage was a
collision attack, not a pre-image attack.  So it's not useful for
trying to find another message that hashes to the same value as a one
already in use by git.  So the work factor is still 2**80.  Secondly,
even if an attacker could generate another file which has the same
hash as a pre-existing file, it still has to look like a valid git
object, and it still has to be a valid C file or it will again be
blatently obvious when you try to compile the sucker.

> One can also argue
> that the probability of data getting corrupted in the disk, or whatever
> could be higher than that of the collision, again this is not valid
> comparison

That's a matter of some religious dispute.  You can always reduce the
probability of a collsion down to an arbitrarily small value simply by
using a larger hash --- and switch hashes in git is quite simple since
it would just be a matter of running a program to calculate the hash
using a different algorithm, and renaming the files.  You can even use
hardlinks if you want to support two different hash algorithms
simultaneously during some transition period.

So past a certain point, there is a probability that all of molecules
of oxygen in the room will suddenly migrate outdoors, and you could
suffocate.  Is it rational to spend time worrying about that
possibility?  I'll leave that for you to decide.

						- Ted

  parent reply	other threads:[~2005-04-12 15:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-11 22:40 Call to atention about using hash functions as content indexers (SCM saga) Pedro Larroy
2005-04-11 22:51 ` Petr Baudis
2005-04-11 23:23   ` Magnus Damm
2005-04-12 10:02     ` Catalin Marinas
2005-04-12  9:59   ` Barry K. Nathan
2005-04-12 12:10   ` Richard B. Johnson
2005-04-12 15:29 ` Theodore Ts'o [this message]
2005-04-14 15:54   ` Eric D. Mudama
2005-04-12 16:35 ` Eric Rannaud
2005-04-14  8:30   ` Andy Isaacson
2005-04-14 13:35     ` Eric Rannaud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050412152955.GD9684@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox