git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git commit hash clash prevention
@ 2008-10-02  8:53 martin f krafft
  2008-10-02  9:18 ` Thomas Rast
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: martin f krafft @ 2008-10-02  8:53 UTC (permalink / raw)
  To: git discussion list

[-- Attachment #1: Type: text/plain, Size: 985 bytes --]

Hi folks,

the other day during a workshop on Git, one of the attendants asked
about the scenario when two developers, Jane and David, both working
on the same project, both create a commit and the two just so happen
to have the same SHA-1. I realise that the likelihood of this
happening is about as high as the chance of <insert witty joke
here>, but it *is* possible, isn't it? Even though this is thus
somewhat academic, I am still very curious about it.

What happens when David now pulls from Jane? How does Git deal with
this?

I imagine it'll be able to distinguish the two commits based on
metadata, but won't the DAG get corrupted?

Cheers,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"and no one sings me lullabies,
 and no one makes me close my eyes,
 and so i throw the windows wide,
 and call to you across the sky"
                                                   -- pink floyd, 1971
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git commit hash clash prevention
  2008-10-02  8:53 Git commit hash clash prevention martin f krafft
@ 2008-10-02  9:18 ` Thomas Rast
  2008-10-02 11:08   ` Jean-Luc Herren
  2008-10-02 10:07 ` Johannes Schindelin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Thomas Rast @ 2008-10-02  9:18 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

martin f krafft wrote:
> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working
> on the same project, both create a commit and the two just so happen
> to have the same SHA-1. I realise that the likelihood of this
> happening is about as high as the chance of <insert witty joke
> here>, but it *is* possible, isn't it? Even though this is thus
> somewhat academic, I am still very curious about it.
> 
> What happens when David now pulls from Jane? How does Git deal with
> this?

There are two cases:

* The commits are exactly identical.  This won't happen in your
  scenario, but is still theoretically possible if you commit the same
  tree with the same author info, timestamps, etc. on two different
  machines.  Then there is no problem, because they really are the
  same.

* They're not identical, but there is a hash collision.  Git will
  become very confused because it only ever saves one of them.  (I
  suppose it'd "only" corrupt the DAG if the two are commits, but in
  the general case a commit could collide with a tree etc.)

  However, the expected number of objects needed to get a collision is
  on the order of 2**80 (http://en.wikipedia.org/wiki/Birthday_attack),
  and since there are (very roughly) 2**25 seconds in a year and 2**34
  years in the age of the universe, that still leaves you with 2**21
  ages of the universe to go.

(I hope I did the counting right...)

> I imagine it'll be able to distinguish the two commits based on
> metadata, but won't the DAG get corrupted?

No, it does not distinguish between objects in any way but the SHA1.

- Thomas

-- 
Thomas Rast
trast@student.ethz.ch



[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git commit hash clash prevention
  2008-10-02  8:53 Git commit hash clash prevention martin f krafft
  2008-10-02  9:18 ` Thomas Rast
@ 2008-10-02 10:07 ` Johannes Schindelin
  2008-10-02 14:00 ` Jakub Narebski
  2008-10-02 16:04 ` Stephan Beyer
  3 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2008-10-02 10:07 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list

Hi,

On Thu, 2 Oct 2008, martin f krafft wrote:

> the other day during a workshop on Git, one of the attendants asked 
> about the scenario when two developers, Jane and David, both working on 
> the same project, both create a commit and the two just so happen to 
> have the same SHA-1. I realise that the likelihood of this happening is 
> about as high as the chance of <insert witty joke here>, but it *is* 
> possible, isn't it? Even though this is thus somewhat academic, I am 
> still very curious about it.

It _is_ academic.  Did you already discuss the chance that your wife gives 
birth to a mouse?  I haven't done the maths yet, but I am pretty certain 
that this would be more likely than an unintended SHA-1 collision.

> What happens when David now pulls from Jane? How does Git deal with 
> this?

Basically, the commit that David has will not be overwritten.  So every 
commit referring to Jane's commit would point to David's in his 
repository.

But the more likely case (well, as likely goes) would be that either 
Jane's or David's object is actually a blob.  And Git would complain about 
a type mismatch then.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git commit hash clash prevention
  2008-10-02  9:18 ` Thomas Rast
@ 2008-10-02 11:08   ` Jean-Luc Herren
  0 siblings, 0 replies; 7+ messages in thread
From: Jean-Luc Herren @ 2008-10-02 11:08 UTC (permalink / raw)
  To: Thomas Rast, martin f krafft, git discussion list

Hello list!

Thomas Rast wrote:
>   However, the expected number of objects needed to get a collision is
>   on the order of 2**80 (http://en.wikipedia.org/wiki/Birthday_attack),
>   and since there are (very roughly) 2**25 seconds in a year and 2**34
>   years in the age of the universe, that still leaves you with 2**21
>   ages of the universe to go.

In case it's interesting to someone, I once calculated (and wrote
down) the math for the following scenario:

  - 10 billion humans are programming
  - They *each* produce 5000 git objects every day
  - They all push to the same huge repository
  - They keep this up for 50 years

With those highly exagerated assumptions, the probability of
getting a hash collision in that huge git object database is
6e-13.  Provided I got the math right.

So, mathematically speaking you have to say "yes, it *is*
possible".  But math aside it's perfectly correct to say "no, it
won't happen, ever".  (Speaking about the *accidental* case.)

jlh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git commit hash clash prevention
  2008-10-02  8:53 Git commit hash clash prevention martin f krafft
  2008-10-02  9:18 ` Thomas Rast
  2008-10-02 10:07 ` Johannes Schindelin
@ 2008-10-02 14:00 ` Jakub Narebski
  2008-10-02 15:39   ` Johannes Schindelin
  2008-10-02 16:04 ` Stephan Beyer
  3 siblings, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2008-10-02 14:00 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list

martin f krafft <madduck@madduck.net> writes:

> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working
> on the same project, both create a commit and the two just so happen
> to have the same SHA-1. I realise that the likelihood of this
> happening is about as high as the chance of <insert witty joke
> here>, but it *is* possible, isn't it? Even though this is thus
> somewhat academic, I am still very curious about it.
> 
> What happens when David now pulls from Jane? How does Git deal with
> this?

Cannot happen in practice.

But just in case git trusts object it already has in repository over
object which just got fetched (or pushed).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git commit hash clash prevention
  2008-10-02 14:00 ` Jakub Narebski
@ 2008-10-02 15:39   ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2008-10-02 15:39 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: martin f krafft, git discussion list

Hi,

On Thu, 2 Oct 2008, Jakub Narebski wrote:

> martin f krafft <madduck@madduck.net> writes:
> 
> > the other day during a workshop on Git, one of the attendants asked
> > about the scenario when two developers, Jane and David, both working
> > on the same project, both create a commit and the two just so happen
> > to have the same SHA-1. I realise that the likelihood of this
> > happening is about as high as the chance of <insert witty joke
> > here>, but it *is* possible, isn't it? Even though this is thus
> > somewhat academic, I am still very curious about it.
> > 
> > What happens when David now pulls from Jane? How does Git deal with
> > this?
> 
> Cannot happen in practice.
> 
> But just in case git trusts object it already has in repository over
> object which just got fetched (or pushed).

Oh, maybe the most important part: both David and Jane would have to 
rewrite their respective history, changing the respective commits in a 
simple way (such as adding a space to the first line of the commit message 
or some such).  Then, Git is changed to not accept that particular SHA-1 
(we'd introduce a black "list").

All in all, it would be like a borked commit; not really easy to fix, but 
the world would not stop turning because of it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git commit hash clash prevention
  2008-10-02  8:53 Git commit hash clash prevention martin f krafft
                   ` (2 preceding siblings ...)
  2008-10-02 14:00 ` Jakub Narebski
@ 2008-10-02 16:04 ` Stephan Beyer
  3 siblings, 0 replies; 7+ messages in thread
From: Stephan Beyer @ 2008-10-02 16:04 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

Hi,

martin f krafft wrote:
> Hi folks,
> 
> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working
> on the same project, both create a commit and the two just so happen
> to have the same SHA-1.

Changing the committer time is the easiest way to solve this problem,
if it ever happens.

I have wondered how Git would behave if there are two files that are
not equal but have the same SHA-1. But I haven't found any such example
files to test this scenario and have not had the time to write or
look for a tool that generates them. (MD5 collisions can be generated
within 2 hours on usual home hardware and even Wikipedia links to
collided files. An intelligent search for SHA-1 collisions takes
2^63 evaluations and not 2^80 (simple birthday attack) as expected.
So it should be possible to find some random collisions and test the
behavior...)

But even if git behaves terrible useless in such situations, it
does not make any sense to guard against them, because in practice
they just do not happen. (And I think such guards will just slow git
down in the usual case.)

Regards,
  Stephan

-- 
Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-10-02 16:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-02  8:53 Git commit hash clash prevention martin f krafft
2008-10-02  9:18 ` Thomas Rast
2008-10-02 11:08   ` Jean-Luc Herren
2008-10-02 10:07 ` Johannes Schindelin
2008-10-02 14:00 ` Jakub Narebski
2008-10-02 15:39   ` Johannes Schindelin
2008-10-02 16:04 ` Stephan Beyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).