From: Andreas Ericsson <ae@op5.se>
To: Marco Costalba <mcostalba@gmail.com>
Cc: Jon Smirl <jonsmirl@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Nicolas Pitre <nico@cam.org>
Subject: Re: Why SHA are 40 bytes? (aka looking for flames)
Date: Tue, 24 Apr 2007 16:48:40 +0200 [thread overview]
Message-ID: <462E18C8.4070001@op5.se> (raw)
In-Reply-To: <e5bfff550704211128i12035947i7597e920a0eca163@mail.gmail.com>
Marco Costalba wrote:
>
> Someone more versed then me in SHA1 could tell the probablity to find
> a corrupted object calculating his hash and checking against his
> stored 160bits known good signature and *FAIL* to find as corrupt *the
> same object* calculating his hash and checking against a truncated sha
> to say 20bits.
>
The probability of finding a collision when only 20 bits are used is
1 in 1048575. In other words, repositories would already be exhibiting
collisions with only 20 bits of hash, even with a perfect dispersion.
> I would say this probability is veery veery low in random case (not a
> malicious attack of course, but I think this is not the case with git
> repository as it was with SHA1 designers).
>
I believe the KDE repo is the biggest one in git today, with its several
hundred thousand revisions (and thus most likely several million objects).
The reason it's not much use to cut down the hash-size is that it already
reflects a very small percentage of the total size of the repo, and since
using the full hash allows git to handle up to
1461501637330902918203684832716283019655932542976 objects without
encountering conflicts (it doesn't really, but that's 2 to the power of 160),
this works as a nice size for a hash to be.
If the hash is reduced to 80 bits, the maximum number of unique hashes shrinks
to 1208925819614629174706176 (less than 1% of 160 bits) while only saving 10
bytes of storage per object. Using a more efficient compression algorithm for
the objects themselves (bzip2, anyone?) will most likely reduce storage size
an order of magnitude more than reducing the size of the hash, although at the
expense of CPU-efficiency.
One must also factor in the code-changes necessary to support abbreviated hashes
and ask oneself "is it worth it?". Since using a smaller portion of the hash
doesn't only have upsides, I'd say "no, definitely not".
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
next prev parent reply other threads:[~2007-04-24 14:48 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-21 13:35 Why SHA are 40 bytes? (aka looking for flames) Marco Costalba
2007-04-21 15:08 ` Andy Parkins
2007-04-21 16:53 ` Karl Hasselström
2007-04-21 17:09 ` Marco Costalba
2007-04-21 16:58 ` Marco Costalba
2007-04-21 15:37 ` Jon Smirl
2007-04-21 17:06 ` Marco Costalba
2007-04-21 17:59 ` Jon Smirl
2007-04-21 18:28 ` Marco Costalba
2007-04-21 19:36 ` Jon Smirl
2007-04-24 14:48 ` Andreas Ericsson [this message]
2007-04-24 15:04 ` Nicolas Pitre
2007-04-24 15:18 ` Andreas Ericsson
2007-04-24 16:19 ` Nicolas Pitre
2007-04-22 13:27 ` Nicolas Pitre
2007-04-24 0:46 ` H. Peter Anvin
2007-04-24 2:30 ` Shawn O. Pearce
2007-04-24 2:44 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=462E18C8.4070001@op5.se \
--to=ae@op5.se \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=mcostalba@gmail.com \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).