* Re: Git commit hash clash prevention
2008-10-02 8:53 Git commit hash clash prevention martin f krafft
@ 2008-10-02 9:18 ` Thomas Rast
2008-10-02 11:08 ` Jean-Luc Herren
2008-10-02 10:07 ` Johannes Schindelin
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Thomas Rast @ 2008-10-02 9:18 UTC (permalink / raw)
To: martin f krafft; +Cc: git discussion list
[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]
martin f krafft wrote:
> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working
> on the same project, both create a commit and the two just so happen
> to have the same SHA-1. I realise that the likelihood of this
> happening is about as high as the chance of <insert witty joke
> here>, but it *is* possible, isn't it? Even though this is thus
> somewhat academic, I am still very curious about it.
>
> What happens when David now pulls from Jane? How does Git deal with
> this?
There are two cases:
* The commits are exactly identical. This won't happen in your
scenario, but is still theoretically possible if you commit the same
tree with the same author info, timestamps, etc. on two different
machines. Then there is no problem, because they really are the
same.
* They're not identical, but there is a hash collision. Git will
become very confused because it only ever saves one of them. (I
suppose it'd "only" corrupt the DAG if the two are commits, but in
the general case a commit could collide with a tree etc.)
However, the expected number of objects needed to get a collision is
on the order of 2**80 (http://en.wikipedia.org/wiki/Birthday_attack),
and since there are (very roughly) 2**25 seconds in a year and 2**34
years in the age of the universe, that still leaves you with 2**21
ages of the universe to go.
(I hope I did the counting right...)
> I imagine it'll be able to distinguish the two commits based on
> metadata, but won't the DAG get corrupted?
No, it does not distinguish between objects in any way but the SHA1.
- Thomas
--
Thomas Rast
trast@student.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit hash clash prevention
2008-10-02 9:18 ` Thomas Rast
@ 2008-10-02 11:08 ` Jean-Luc Herren
0 siblings, 0 replies; 7+ messages in thread
From: Jean-Luc Herren @ 2008-10-02 11:08 UTC (permalink / raw)
To: Thomas Rast, martin f krafft, git discussion list
Hello list!
Thomas Rast wrote:
> However, the expected number of objects needed to get a collision is
> on the order of 2**80 (http://en.wikipedia.org/wiki/Birthday_attack),
> and since there are (very roughly) 2**25 seconds in a year and 2**34
> years in the age of the universe, that still leaves you with 2**21
> ages of the universe to go.
In case it's interesting to someone, I once calculated (and wrote
down) the math for the following scenario:
- 10 billion humans are programming
- They *each* produce 5000 git objects every day
- They all push to the same huge repository
- They keep this up for 50 years
With those highly exagerated assumptions, the probability of
getting a hash collision in that huge git object database is
6e-13. Provided I got the math right.
So, mathematically speaking you have to say "yes, it *is*
possible". But math aside it's perfectly correct to say "no, it
won't happen, ever". (Speaking about the *accidental* case.)
jlh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit hash clash prevention
2008-10-02 8:53 Git commit hash clash prevention martin f krafft
2008-10-02 9:18 ` Thomas Rast
@ 2008-10-02 10:07 ` Johannes Schindelin
2008-10-02 14:00 ` Jakub Narebski
2008-10-02 16:04 ` Stephan Beyer
3 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2008-10-02 10:07 UTC (permalink / raw)
To: martin f krafft; +Cc: git discussion list
Hi,
On Thu, 2 Oct 2008, martin f krafft wrote:
> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working on
> the same project, both create a commit and the two just so happen to
> have the same SHA-1. I realise that the likelihood of this happening is
> about as high as the chance of <insert witty joke here>, but it *is*
> possible, isn't it? Even though this is thus somewhat academic, I am
> still very curious about it.
It _is_ academic. Did you already discuss the chance that your wife gives
birth to a mouse? I haven't done the maths yet, but I am pretty certain
that this would be more likely than an unintended SHA-1 collision.
> What happens when David now pulls from Jane? How does Git deal with
> this?
Basically, the commit that David has will not be overwritten. So every
commit referring to Jane's commit would point to David's in his
repository.
But the more likely case (well, as likely goes) would be that either
Jane's or David's object is actually a blob. And Git would complain about
a type mismatch then.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit hash clash prevention
2008-10-02 8:53 Git commit hash clash prevention martin f krafft
2008-10-02 9:18 ` Thomas Rast
2008-10-02 10:07 ` Johannes Schindelin
@ 2008-10-02 14:00 ` Jakub Narebski
2008-10-02 15:39 ` Johannes Schindelin
2008-10-02 16:04 ` Stephan Beyer
3 siblings, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2008-10-02 14:00 UTC (permalink / raw)
To: martin f krafft; +Cc: git discussion list
martin f krafft <madduck@madduck.net> writes:
> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working
> on the same project, both create a commit and the two just so happen
> to have the same SHA-1. I realise that the likelihood of this
> happening is about as high as the chance of <insert witty joke
> here>, but it *is* possible, isn't it? Even though this is thus
> somewhat academic, I am still very curious about it.
>
> What happens when David now pulls from Jane? How does Git deal with
> this?
Cannot happen in practice.
But just in case git trusts object it already has in repository over
object which just got fetched (or pushed).
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit hash clash prevention
2008-10-02 14:00 ` Jakub Narebski
@ 2008-10-02 15:39 ` Johannes Schindelin
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2008-10-02 15:39 UTC (permalink / raw)
To: Jakub Narebski; +Cc: martin f krafft, git discussion list
Hi,
On Thu, 2 Oct 2008, Jakub Narebski wrote:
> martin f krafft <madduck@madduck.net> writes:
>
> > the other day during a workshop on Git, one of the attendants asked
> > about the scenario when two developers, Jane and David, both working
> > on the same project, both create a commit and the two just so happen
> > to have the same SHA-1. I realise that the likelihood of this
> > happening is about as high as the chance of <insert witty joke
> > here>, but it *is* possible, isn't it? Even though this is thus
> > somewhat academic, I am still very curious about it.
> >
> > What happens when David now pulls from Jane? How does Git deal with
> > this?
>
> Cannot happen in practice.
>
> But just in case git trusts object it already has in repository over
> object which just got fetched (or pushed).
Oh, maybe the most important part: both David and Jane would have to
rewrite their respective history, changing the respective commits in a
simple way (such as adding a space to the first line of the commit message
or some such). Then, Git is changed to not accept that particular SHA-1
(we'd introduce a black "list").
All in all, it would be like a borked commit; not really easy to fix, but
the world would not stop turning because of it.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Git commit hash clash prevention
2008-10-02 8:53 Git commit hash clash prevention martin f krafft
` (2 preceding siblings ...)
2008-10-02 14:00 ` Jakub Narebski
@ 2008-10-02 16:04 ` Stephan Beyer
3 siblings, 0 replies; 7+ messages in thread
From: Stephan Beyer @ 2008-10-02 16:04 UTC (permalink / raw)
To: martin f krafft; +Cc: git discussion list
[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]
Hi,
martin f krafft wrote:
> Hi folks,
>
> the other day during a workshop on Git, one of the attendants asked
> about the scenario when two developers, Jane and David, both working
> on the same project, both create a commit and the two just so happen
> to have the same SHA-1.
Changing the committer time is the easiest way to solve this problem,
if it ever happens.
I have wondered how Git would behave if there are two files that are
not equal but have the same SHA-1. But I haven't found any such example
files to test this scenario and have not had the time to write or
look for a tool that generates them. (MD5 collisions can be generated
within 2 hours on usual home hardware and even Wikipedia links to
collided files. An intelligent search for SHA-1 collisions takes
2^63 evaluations and not 2^80 (simple birthday attack) as expected.
So it should be possible to find some random collisions and test the
behavior...)
But even if git behaves terrible useless in such situations, it
does not make any sense to guard against them, because in practice
they just do not happen. (And I think such guards will just slow git
down in the usual case.)
Regards,
Stephan
--
Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 827 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread