* About git and the use of SHA-1
@ 2008-04-28 16:29 Henrik Austad
2008-04-28 19:34 ` Daniel Barkalow
` (3 more replies)
0 siblings, 4 replies; 38+ messages in thread
From: Henrik Austad @ 2008-04-28 16:29 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 570 bytes --]
Hi list!
As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
and that is the primary reason for using sha1. However, several places
(including the google tech-talk featuring Linus himself) states that the id's
are cryptographically secure.
As discussed in [1], SHA-1 is not as secure as it once was (and this was in
2005), and I'm wondering - are there any plans for migrating to another
hash-algorithm? I.e. SHA-2, whirlpool..
[1] http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html
--
mvh Henrik Austad
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 16:29 About git and the use of SHA-1 Henrik Austad
@ 2008-04-28 19:34 ` Daniel Barkalow
2008-04-28 21:29 ` Henrik Austad
2008-04-29 15:34 ` Geoffrey Irving
2008-04-29 12:41 ` Dmitry Potapov
` (2 subsequent siblings)
3 siblings, 2 replies; 38+ messages in thread
From: Daniel Barkalow @ 2008-04-28 19:34 UTC (permalink / raw)
To: Henrik Austad; +Cc: git
On Mon, 28 Apr 2008, Henrik Austad wrote:
> Hi list!
>
> As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
> and that is the primary reason for using sha1. However, several places
> (including the google tech-talk featuring Linus himself) states that the id's
> are cryptographically secure.
>
> As discussed in [1], SHA-1 is not as secure as it once was (and this was in
> 2005), and I'm wondering - are there any plans for migrating to another
> hash-algorithm? I.e. SHA-2, whirlpool..
No. The cryptographic security we care about is that it's impractical to
come up with another set of content that hashes to the same value as a
given set of content. The known attacks on SHA-1 (and more broken earlier
hashes in the same general class) only allow the attacker to produce two
files that will collide. Now, it's true that this would allow somebody to
produce a commit where some people see the "good" blob and some people see
the "evil" blob, but (a) the "good" blob contains some large chunk of
random data, which is a major red flag by itself, and (b) all of these
people have to be taking data from the attacker.
If somebody gives you some source, and it's got some large random chunk in
it, and the behavior of the object depends on the content of this chunk,
and it's unspecified where this chunk comes from, you should be aware
that they might be able to swap this chunk for a different chunk. But such
a file is pretty blatantly malicious anyway.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 19:34 ` Daniel Barkalow
@ 2008-04-28 21:29 ` Henrik Austad
2008-04-28 22:15 ` Daniel Barkalow
2008-04-29 6:38 ` Andreas Ericsson
2008-04-29 15:34 ` Geoffrey Irving
1 sibling, 2 replies; 38+ messages in thread
From: Henrik Austad @ 2008-04-28 21:29 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]
On Monday 28 April 2008 21:34:50 Daniel Barkalow wrote:
> On Mon, 28 Apr 2008, Henrik Austad wrote:
> > Hi list!
> >
> > As far as I have gathered, the SHA-1-sum is used as a identifier for
> > commits, and that is the primary reason for using sha1. However, several
> > places (including the google tech-talk featuring Linus himself) states
> > that the id's are cryptographically secure.
> >
> > As discussed in [1], SHA-1 is not as secure as it once was (and this was
> > in 2005), and I'm wondering - are there any plans for migrating to
> > another hash-algorithm? I.e. SHA-2, whirlpool..
>
> No. The cryptographic security we care about is that it's impractical to
> come up with another set of content that hashes to the same value as a
> given set of content. The known attacks on SHA-1 (and more broken earlier
> hashes in the same general class) only allow the attacker to produce two
> files that will collide. Now, it's true that this would allow somebody to
> produce a commit where some people see the "good" blob and some people see
> the "evil" blob, but (a) the "good" blob contains some large chunk of
> random data, which is a major red flag by itself, and (b) all of these
> people have to be taking data from the attacker.
yes, I can see that point, but I was thinking more along the line of:
1) clone repo
2) add malicious code
3) add a huge block of comment, ifdef-block etc somewhere obscure in the code
and keep adding random data untill hash matches a well-known release.
4) publish repo, or even worse, change central repo
Most users, and probably a lot of developers never browse through the *entire*
archive looking for this, and as long as the hash checks out - why would you?
Yes, it would probably be discovered soon enough, but take the linux kernel
as an example - if you get, say 100 infected machines due to this, what would
this do to the reputation of the kernel?
> If somebody gives you some source, and it's got some large random chunk in
> it, and the behavior of the object depends on the content of this chunk,
> and it's unspecified where this chunk comes from, you should be aware
> that they might be able to swap this chunk for a different chunk. But such
> a file is pretty blatantly malicious anyway.
True, but this actually means you have to verify *everything*, even though the
hash checks out.
but yes, I can see your point, and it would most likely be infeasible to
generate a collision using this approach, and changing to another
hashfunction would probably not add much. basically I was just curious and
played ahead with the idea.
Thanks for the answer though :)
--
mvh Henrik Austad
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 21:29 ` Henrik Austad
@ 2008-04-28 22:15 ` Daniel Barkalow
2008-04-29 6:38 ` Andreas Ericsson
1 sibling, 0 replies; 38+ messages in thread
From: Daniel Barkalow @ 2008-04-28 22:15 UTC (permalink / raw)
To: Henrik Austad; +Cc: git
On Mon, 28 Apr 2008, Henrik Austad wrote:
> On Monday 28 April 2008 21:34:50 Daniel Barkalow wrote:
> > On Mon, 28 Apr 2008, Henrik Austad wrote:
> > > Hi list!
> > >
> > > As far as I have gathered, the SHA-1-sum is used as a identifier for
> > > commits, and that is the primary reason for using sha1. However, several
> > > places (including the google tech-talk featuring Linus himself) states
> > > that the id's are cryptographically secure.
> > >
> > > As discussed in [1], SHA-1 is not as secure as it once was (and this was
> > > in 2005), and I'm wondering - are there any plans for migrating to
> > > another hash-algorithm? I.e. SHA-2, whirlpool..
> >
> > No. The cryptographic security we care about is that it's impractical to
> > come up with another set of content that hashes to the same value as a
> > given set of content. The known attacks on SHA-1 (and more broken earlier
> > hashes in the same general class) only allow the attacker to produce two
> > files that will collide. Now, it's true that this would allow somebody to
> > produce a commit where some people see the "good" blob and some people see
> > the "evil" blob, but (a) the "good" blob contains some large chunk of
> > random data, which is a major red flag by itself, and (b) all of these
> > people have to be taking data from the attacker.
>
> yes, I can see that point, but I was thinking more along the line of:
>
> 1) clone repo
> 2) add malicious code
> 3) add a huge block of comment, ifdef-block etc somewhere obscure in the code
> and keep adding random data untill hash matches a well-known release.
> 4) publish repo, or even worse, change central repo
All known methods for step 3, even on hashes considered long broken, will
take until the heat death of the universe. The latest I can find is that,
if you use MD4 (which is weak enough that you can find collisions as
quickly as you can do two hashes), there's a 1 in a quadrillion chance
that your message is weak and somebody could find a replacement with the
same hash using known techniques. (With a plausible amount of work, an
attacker could take a file and modify it only slightly, and find a
replacement for that, but this again requires the attacker to have some
non-trivial input to what gets put in the official tree, which leaves
the attacker as the responsible party for that object).
SHA-1 is enough stronger that the latest attacks are still unable to do
with the current available computing power in years what can be done to
MD4 in milliseconds. So it's highly unlikely that somebody will break
SHA-1 more thoroughly than MD4 is broken any time soon.
> > If somebody gives you some source, and it's got some large random chunk in
> > it, and the behavior of the object depends on the content of this chunk,
> > and it's unspecified where this chunk comes from, you should be aware
> > that they might be able to swap this chunk for a different chunk. But such
> > a file is pretty blatantly malicious anyway.
>
> True, but this actually means you have to verify *everything*, even though the
> hash checks out.
If you don't verify *everything* when the hash checks out, the attacker
will just send you a properly-constructed commit with a back door in the
code. While you're looking for directly-inserted security holes in the
code, you can probably notice if there's some big hunk of line noise in a
comment that might make the file vulnerable to replacement.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 21:29 ` Henrik Austad
2008-04-28 22:15 ` Daniel Barkalow
@ 2008-04-29 6:38 ` Andreas Ericsson
2008-04-29 7:09 ` Russ Dill
1 sibling, 1 reply; 38+ messages in thread
From: Andreas Ericsson @ 2008-04-29 6:38 UTC (permalink / raw)
To: Henrik Austad; +Cc: Daniel Barkalow, git
Henrik Austad wrote:
> On Monday 28 April 2008 21:34:50 Daniel Barkalow wrote:
>> On Mon, 28 Apr 2008, Henrik Austad wrote:
>>> Hi list!
>>>
>>> As far as I have gathered, the SHA-1-sum is used as a identifier for
>>> commits, and that is the primary reason for using sha1. However, several
>>> places (including the google tech-talk featuring Linus himself) states
>>> that the id's are cryptographically secure.
>>>
>>> As discussed in [1], SHA-1 is not as secure as it once was (and this was
>>> in 2005), and I'm wondering - are there any plans for migrating to
>>> another hash-algorithm? I.e. SHA-2, whirlpool..
>> No. The cryptographic security we care about is that it's impractical to
>> come up with another set of content that hashes to the same value as a
>> given set of content. The known attacks on SHA-1 (and more broken earlier
>> hashes in the same general class) only allow the attacker to produce two
>> files that will collide. Now, it's true that this would allow somebody to
>> produce a commit where some people see the "good" blob and some people see
>> the "evil" blob, but (a) the "good" blob contains some large chunk of
>> random data, which is a major red flag by itself, and (b) all of these
>> people have to be taking data from the attacker.
>
> yes, I can see that point, but I was thinking more along the line of:
>
> 1) clone repo
> 2) add malicious code
> 3) add a huge block of comment, ifdef-block etc somewhere obscure in the code
> and keep adding random data untill hash matches a well-known release.
> 4) publish repo, or even worse, change central repo
>
This depends greatly on git accepting objects with a colliding object-name,
which it doesn't. Once you have an object with a particular SHA1, it will
never get overwritten, ever, as git will believe it's about to do unnecessary
work. As such, you'd still have to create a new object, hashing to a new SHA1
and get that new object added to the kernel.
I think perhaps Andrew Morton and a few other "high brass" among the kernel
hackers can get away with pushing crud like that to Linus' public tree
(which is the de facto master copy of published kernel sources), but random
John Doe's such as you and me wouldn't stand a chance, as our patches would
get reviewed by someone who, at the end of the day, makes a living coding
Linux.
> Most users, and probably a lot of developers never browse through the *entire*
> archive looking for this, and as long as the hash checks out - why would you?
> Yes, it would probably be discovered soon enough, but take the linux kernel
> as an example - if you get, say 100 infected machines due to this, what would
> this do to the reputation of the kernel?
>
That depends. If the source of it was Linus' public tree, that would not be
very good at all. If the source was a random tarball off a random webpage
or ftp site (which would be the same as fetching and, unverified, using an
unchecked git repository), I doubt it would matter much.
>
>> If somebody gives you some source, and it's got some large random chunk in
>> it, and the behavior of the object depends on the content of this chunk,
>> and it's unspecified where this chunk comes from, you should be aware
>> that they might be able to swap this chunk for a different chunk. But such
>> a file is pretty blatantly malicious anyway.
>
> True, but this actually means you have to verify *everything*, even though the
> hash checks out.
>
Not really. What you need to verify is that
a) You cloned from somewhere you trust (kernel.org, fe)
b) The SHA1 of the commit you want to build from matches the SHA1 of the same
commit in the repository you originally cloned from.
Colliding objects can never enter a repository. Git is lazy and will reuse the
already existing colliding object with the same name instead.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 6:38 ` Andreas Ericsson
@ 2008-04-29 7:09 ` Russ Dill
2008-04-29 7:21 ` Andreas Ericsson
2008-04-29 12:46 ` Jurko Gospodnetić
0 siblings, 2 replies; 38+ messages in thread
From: Russ Dill @ 2008-04-29 7:09 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Henrik Austad, Daniel Barkalow, git
> Colliding objects can never enter a repository. Git is lazy and will reuse the
> already existing colliding object with the same name instead.
>
I think you are missing the point. One of the pluses behind originally
using SHA-1 and the signed tags is that the system as a whole is
cryptographically secure. You can verify from the public key of
whoever made the tag that yes, this really is the source and history
they tagged. Not only can DNS attacks be made, fooling users into
thinking that they are really connecting to kernel.org, or whatever
else server they expect to be connecting to, but also, the server
itself may be hacked and objects replaced.
I'm just not sure how much time it would take to find a collision.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 7:09 ` Russ Dill
@ 2008-04-29 7:21 ` Andreas Ericsson
2008-04-29 11:05 ` Sverre Rabbelier
2008-04-29 12:46 ` Jurko Gospodnetić
1 sibling, 1 reply; 38+ messages in thread
From: Andreas Ericsson @ 2008-04-29 7:21 UTC (permalink / raw)
To: Russ Dill; +Cc: Henrik Austad, Daniel Barkalow, git
Russ Dill wrote:
>> Colliding objects can never enter a repository. Git is lazy and will reuse the
>> already existing colliding object with the same name instead.
>>
>
> I think you are missing the point. One of the pluses behind originally
> using SHA-1 and the signed tags is that the system as a whole is
> cryptographically secure. You can verify from the public key of
> whoever made the tag that yes, this really is the source and history
> they tagged. Not only can DNS attacks be made, fooling users into
> thinking that they are really connecting to kernel.org, or whatever
> else server they expect to be connecting to, but also, the server
> itself may be hacked and objects replaced.
>
If the server is hacked and objects are replaced, they will either
no longer match their cryptographic signature, meaning they'll be
new objects or git will determine that they are corrupt, or they
*will* match an existing object, but then that object won't be
propagated to other repositories since git refuses to overwrite
already existing objects. Either way, gits refusal to overwrite
objects it already has plays a part in making malicious actions
futile, since malicious code is only worth something if it's
propagated and actually used.
> I'm just not sure how much time it would take to find a collision.
Even crypto-experts are arguing about that, so I'm not surprised.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 7:21 ` Andreas Ericsson
@ 2008-04-29 11:05 ` Sverre Rabbelier
2008-04-29 12:27 ` Andreas Ericsson
0 siblings, 1 reply; 38+ messages in thread
From: Sverre Rabbelier @ 2008-04-29 11:05 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Russ Dill, Henrik Austad, Daniel Barkalow, git
On Tue, Apr 29, 2008 at 9:21 AM, Andreas Ericsson <ae@op5.se> wrote:
> Russ Dill wrote:
> If the server is hacked and objects are replaced, they will either
> no longer match their cryptographic signature, meaning they'll be
> new objects or git will determine that they are corrupt, or they
We were assuming here that once SHA-1 is broken really determined
hackers will be able to come up with objects that -do- match the
SHA-1, so the above is not relevant.
> *will* match an existing object, but then that object won't be
> propagated to other repositories since git refuses to overwrite
> already existing objects. [...]
What about new users cloning the repo? They're just out of luck? I
don't think this argument holds, if we want to 'advertise' that git is
cryptographically secure we can do so only as long as our hashing
algorithm is. (As such, should SHA-1 ever be fully broken we'd need to
either switch to another algorithm or stop advertising being
cryptographically secure.)
> [...] Either way, gits refusal to overwrite
> objects it already has plays a part in making malicious actions
> futile, since malicious code is only worth something if it's
> propagated and actually used.
Of course this is true, it makes it a lot harder to do damage, but it
doesn't eliminate the problem, it's just a free 'extra protection'.
Yes, malicious code is only worth something if it's propagated and
actually used, no, it is not impossible to do so in git if/when SHA-1
turns out to have collisions every other file.
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 11:05 ` Sverre Rabbelier
@ 2008-04-29 12:27 ` Andreas Ericsson
2008-04-29 13:05 ` Paolo Bonzini
0 siblings, 1 reply; 38+ messages in thread
From: Andreas Ericsson @ 2008-04-29 12:27 UTC (permalink / raw)
To: sverre; +Cc: Russ Dill, Henrik Austad, Daniel Barkalow, git
Sverre Rabbelier wrote:
> On Tue, Apr 29, 2008 at 9:21 AM, Andreas Ericsson <ae@op5.se> wrote:
>> Russ Dill wrote:
>> If the server is hacked and objects are replaced, they will either
>> no longer match their cryptographic signature, meaning they'll be
>> new objects or git will determine that they are corrupt, or they
>
> We were assuming here that once SHA-1 is broken really determined
> hackers will be able to come up with objects that -do- match the
> SHA-1, so the above is not relevant.
>
>> *will* match an existing object, but then that object won't be
>> propagated to other repositories since git refuses to overwrite
>> already existing objects. [...]
>
> What about new users cloning the repo? They're just out of luck?
Only until someone who's already cloned the repository fetches
from it, at which point the collision will be detected.
> I
> don't think this argument holds, if we want to 'advertise' that git is
> cryptographically secure we can do so only as long as our hashing
> algorithm is. (As such, should SHA-1 ever be fully broken we'd need to
> either switch to another algorithm or stop advertising being
> cryptographically secure.)
>
True. So far though, the only attacks that have been successful requires
that the attacker is allowed to create both the colliding data-sets,
and so far none has been found that would allow the attacker to follow
any kind of syntactical rules what so ever, so from a practical point
of view, SHA1 is 100% secure *for sourcecode*.
>From a theoretical point of view, no hash is 100% secure, so changing
algorithm buys us nothing.
Besides, "cryprographically secure" is not the same as "will never ever
be broken", because all hashes are obviously susceptible to brute-force
attacks. "Cryptographically secure" means, insofar as I've understood it
that given a source-file and a key, it would take such an extremely
long time to find a different data-set that hashes to the same key that
the result is unusable because the original source is obsolete.
That is why legal documents are always signed with the "most secure"
(or rather, "least insecure") of all available hashes. For our
purposes, SHA1 suffices until someone comes up with a relatively
trivial way of creating a collision within the parameters above.
>> [...] Either way, gits refusal to overwrite
>> objects it already has plays a part in making malicious actions
>> futile, since malicious code is only worth something if it's
>> propagated and actually used.
>
> Of course this is true, it makes it a lot harder to do damage, but it
> doesn't eliminate the problem, it's just a free 'extra protection'.
> Yes, malicious code is only worth something if it's propagated and
> actually used, no, it is not impossible to do so in git if/when SHA-1
> turns out to have collisions every other file.
>
Points of fact so far:
* It possible to create objects with colliding names (SHA1 hash keys).
This holds true whichever algorithm we use, although it will be more
difficult with a stronger algorithm.
* It is impossible to distribute the colliding content to already cloned
repositories. This also holds true for all hash algorithms.
I've been arguing that the value of the first point is so greatly
diminished by the second, that even if SHA1 turns out to be horribly
broken, projects using git will still have a decent protection against
malicious code entering the repository without the knowledge of one of
the authors.
You've been arguing that SHA1 is not theoretically secure, which is
obviously true since no hash is theoretically secure.
I can think of one way to make git a lot more resilient to hash
collisions, regardless of which hash is used, namely: Add the length
of the hashed object to the hash.
In order for an evil-minded hacker to succeed in doing any real harm,
he/she now has to create a conflicting file which is valid for its
type (be it C, PHP, JPEG, AVI, PDF or whatever) and is also the same
length as the original source, without being allowed to create the
original object.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 16:29 About git and the use of SHA-1 Henrik Austad
2008-04-28 19:34 ` Daniel Barkalow
@ 2008-04-29 12:41 ` Dmitry Potapov
2008-04-29 14:41 ` Andreas Ericsson
2008-04-29 15:02 ` Tom Widmer
2008-04-29 17:08 ` Tom Widmer
3 siblings, 1 reply; 38+ messages in thread
From: Dmitry Potapov @ 2008-04-29 12:41 UTC (permalink / raw)
To: Henrik Austad; +Cc: git
On Mon, Apr 28, 2008 at 06:29:07PM +0200, Henrik Austad wrote:
>
> As discussed in [1], SHA-1 is not as secure as it once was (and this was in
> 2005), and I'm wondering - are there any plans for migrating to another
> hash-algorithm? I.e. SHA-2, whirlpool..
SHA-1 is broken in the sense that it requires computation less than
finding a collision by brute force (2^80). It is still very costly and
AFAIK no one yet has found a single collision for SHA-1 yet, but even if
such a collision is found, the question is how it can be exploit?
This collision cannot be used to replace any existing code in Git. The
only way to exploit this collision is to submit a patch based on one
sequence to the maintainer and it should look legitimate to be accepted
and then create another blob with malicious code based on the other
sequence, so the second blob has the same SHA-1 then anyone who pulls
from you will get malicious code.
However, it is tricky to create these two blobs -- one which should pass
inspection and look like as a real improvement but the other one that
should do what you want. All what you have is two sequences of 20 bytes
with the same SHA-1 and you have no control over them. For some binary
files, it is possible by including both good and bad contents in the
submitted blob and using one sequence in the right place to hide the bad
part and make only the good one active/visible. Then the other blob will
be almost the same but contains the other sequence, which is used to
activate the bad part. This can work if the maintainer cannot see
everything but only the "visible" part. However, I don't think you can
do anything like that with _source_ code, which is inspect. And if
submitted code is not reviewed, there is nothing that can protect you
from malicious code getting into the repository (and even worse it will
get directly into the official repository!).
So, I don't think we have to worry much about possibility a collision
attack, but only about preimage attacks; and a preimage attack on SHA-1
is far away from reality.
Dmitry
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 7:09 ` Russ Dill
2008-04-29 7:21 ` Andreas Ericsson
@ 2008-04-29 12:46 ` Jurko Gospodnetić
2008-04-29 16:21 ` Russ Dill
1 sibling, 1 reply; 38+ messages in thread
From: Jurko Gospodnetić @ 2008-04-29 12:46 UTC (permalink / raw)
To: git; +Cc: Andreas Ericsson, Henrik Austad, Daniel Barkalow, git
> I think you are missing the point. One of the pluses behind originally
> using SHA-1 and the signed tags is that the system as a whole is
> cryptographically secure. You can verify from the public key of
> whoever made the tag that yes, this really is the source and history
> they tagged.
I am not really sure I follow this.... how can you 'verify from the
public key of whoever made the tag' that the SHA-1 hash is correct!?
SHA-1 does not have anything do with any externally provided keys or
have I managed to get something confused here?
Best regards,
Jurko Gospodnetić
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 12:27 ` Andreas Ericsson
@ 2008-04-29 13:05 ` Paolo Bonzini
2008-04-29 14:37 ` Andreas Ericsson
0 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2008-04-29 13:05 UTC (permalink / raw)
To: git; +Cc: sverre, Russ Dill, Henrik Austad, Daniel Barkalow, git
> I can think of one way to make git a lot more resilient to hash
> collisions, regardless of which hash is used, namely: Add the length
> of the hashed object to the hash.
Not really, because most attacks are about collisions, not second
preimages. They produce two 64-byte blocks (hence, same length) with
the same hash value.
As such, they allow to change a blob that *the attacker* injected in the
repository. The way the more "spectacular" attacks are devised requires
a "language" with conditional expressions -- for documents, for example,
Postscript is used. If you prepare a postscript file whose code is
if (AAAA == BBBB)
typeset document 1
else
typeset document 2
where AAAA and BBBB are collisions, and you change it to "if (BBBB ==
BBBB) the hash will be the same, but the outcome will be document 1
instead of document 2.
The fact that this requires having the two "behaviors" in the blob is
not a big deal for source code, going in the wrong branch of an "if" can
be an attack. On the other hand, it makes adding the length useless for
collision attacks. True, it wouldn't be useless for second preimage
attacks, but SHA-1 is still secure with respect to those.
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 13:05 ` Paolo Bonzini
@ 2008-04-29 14:37 ` Andreas Ericsson
2008-04-29 14:52 ` Paolo Bonzini
2008-04-29 16:24 ` Russ Dill
0 siblings, 2 replies; 38+ messages in thread
From: Andreas Ericsson @ 2008-04-29 14:37 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: sverre, Russ Dill, Henrik Austad, Daniel Barkalow, git
Paolo Bonzini wrote:
>
>> I can think of one way to make git a lot more resilient to hash
>> collisions, regardless of which hash is used, namely: Add the length
>> of the hashed object to the hash.
>
> Not really, because most attacks are about collisions, not second
> preimages. They produce two 64-byte blocks (hence, same length) with
> the same hash value.
>
> As such, they allow to change a blob that *the attacker* injected in the
> repository. The way the more "spectacular" attacks are devised requires
> a "language" with conditional expressions -- for documents, for example,
> Postscript is used. If you prepare a postscript file whose code is
>
> if (AAAA == BBBB)
> typeset document 1
> else
> typeset document 2
>
> where AAAA and BBBB are collisions, and you change it to "if (BBBB ==
> BBBB) the hash will be the same, but the outcome will be document 1
> instead of document 2.
>
> The fact that this requires having the two "behaviors" in the blob is
> not a big deal for source code, going in the wrong branch of an "if" can
> be an attack. On the other hand, it makes adding the length useless for
> collision attacks. True, it wouldn't be useless for second preimage
> attacks, but SHA-1 is still secure with respect to those.
>
So what you're saying is that if someone owns a repository and adds a
file to it, he can then replace his entire repository with an identical
one where the good file is replaced with a bad one, and this will affect
people who clone *after* the file gets replaced.
Gee, that's one fiendishly large attack vector, quite apart from the
fact that said author first has to come up with a program that gets
widespread enough that a lot of people all of a sudden wants to use
it, but not so widespread that anyone would want to review it before
using it.
I remain unconvinced as to whether or not SHA1 is, for all practical
purposes, cryptographically secure for git's uses. Sure, evil programmers
can screw you over if you use their software without reviewing it, but
that's hardly due to git using a particular cryptographic algorithm.
Otoh, I'm not familiar enough with the nomenclature to say with 100%
certainty what's cryprographically secure and what isn't. I just know
that there are no collision-less hashes, so whatever "cryptographically
secure" really means wrt hashes, "100% collision-free" isn't it.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 12:41 ` Dmitry Potapov
@ 2008-04-29 14:41 ` Andreas Ericsson
2008-04-29 15:42 ` Nicolas Pitre
0 siblings, 1 reply; 38+ messages in thread
From: Andreas Ericsson @ 2008-04-29 14:41 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: Henrik Austad, git
Dmitry Potapov wrote:
> On Mon, Apr 28, 2008 at 06:29:07PM +0200, Henrik Austad wrote:
>> As discussed in [1], SHA-1 is not as secure as it once was (and this was in
>> 2005), and I'm wondering - are there any plans for migrating to another
>> hash-algorithm? I.e. SHA-2, whirlpool..
>
> SHA-1 is broken in the sense that it requires computation less than
> finding a collision by brute force (2^80). It is still very costly and
> AFAIK no one yet has found a single collision for SHA-1 yet, but even if
> such a collision is found, the question is how it can be exploit?
>
> This collision cannot be used to replace any existing code in Git. The
> only way to exploit this collision is to submit a patch based on one
> sequence to the maintainer and it should look legitimate to be accepted
> and then create another blob with malicious code based on the other
> sequence, so the second blob has the same SHA-1 then anyone who pulls
> from you will get malicious code.
>
But they won't, because it's impossible to add two objects with the same
SHA1 hash key to a git repository, since it will lazily re-use the
existing one. In practice, this means that in the case of an "innocent"
hash-collision, git will actually break by refusing to store the new
content.
> However, it is tricky to create these two blobs -- one which should pass
> inspection and look like as a real improvement but the other one that
> should do what you want. All what you have is two sequences of 20 bytes
> with the same SHA-1 and you have no control over them. For some binary
> files, it is possible by including both good and bad contents in the
> submitted blob and using one sequence in the right place to hide the bad
> part and make only the good one active/visible. Then the other blob will
> be almost the same but contains the other sequence, which is used to
> activate the bad part. This can work if the maintainer cannot see
> everything but only the "visible" part. However, I don't think you can
> do anything like that with _source_ code, which is inspect. And if
> submitted code is not reviewed, there is nothing that can protect you
> from malicious code getting into the repository (and even worse it will
> get directly into the official repository!).
>
> So, I don't think we have to worry much about possibility a collision
> attack, but only about preimage attacks; and a preimage attack on SHA-1
> is far away from reality.
>
Right.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 14:37 ` Andreas Ericsson
@ 2008-04-29 14:52 ` Paolo Bonzini
2008-04-29 16:24 ` Russ Dill
1 sibling, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2008-04-29 14:52 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: sverre, Russ Dill, Henrik Austad, Daniel Barkalow, git
> So what you're saying is that if someone owns a repository and adds a
> file to it, he can then replace his entire repository with an identical
> one where the good file is replaced with a bad one, and this will affect
> people who clone *after* the file gets replaced.
>
> Gee, that's one fiendishly large attack vector
I agree (with the irony).
Paolo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 16:29 About git and the use of SHA-1 Henrik Austad
2008-04-28 19:34 ` Daniel Barkalow
2008-04-29 12:41 ` Dmitry Potapov
@ 2008-04-29 15:02 ` Tom Widmer
2008-04-29 17:08 ` Tom Widmer
3 siblings, 0 replies; 38+ messages in thread
From: Tom Widmer @ 2008-04-29 15:02 UTC (permalink / raw)
To: git
Henrik Austad wrote:
> Hi list!
>
> As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
> and that is the primary reason for using sha1. However, several places
> (including the google tech-talk featuring Linus himself) states that the id's
> are cryptographically secure.
>
> As discussed in [1], SHA-1 is not as secure as it once was (and this was in
> 2005), and I'm wondering - are there any plans for migrating to another
> hash-algorithm? I.e. SHA-2, whirlpool..
>
> [1] http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html
Why not wait until the results of:
are available. That will surely be soon enough.
Tom
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 19:34 ` Daniel Barkalow
2008-04-28 21:29 ` Henrik Austad
@ 2008-04-29 15:34 ` Geoffrey Irving
2008-04-29 16:27 ` Daniel Barkalow
1 sibling, 1 reply; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-29 15:34 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Henrik Austad, git
On Mon, Apr 28, 2008 at 12:34 PM, Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Mon, 28 Apr 2008, Henrik Austad wrote:
>
> > Hi list!
> >
> > As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
> > and that is the primary reason for using sha1. However, several places
> > (including the google tech-talk featuring Linus himself) states that the id's
> > are cryptographically secure.
> >
> > As discussed in [1], SHA-1 is not as secure as it once was (and this was in
> > 2005), and I'm wondering - are there any plans for migrating to another
> > hash-algorithm? I.e. SHA-2, whirlpool..
>
> No. The cryptographic security we care about is that it's impractical to
> come up with another set of content that hashes to the same value as a
> given set of content. The known attacks on SHA-1 (and more broken earlier
> hashes in the same general class) only allow the attacker to produce two
> files that will collide. Now, it's true that this would allow somebody to
> produce a commit where some people see the "good" blob and some people see
> the "evil" blob, but (a) the "good" blob contains some large chunk of
> random data, which is a major red flag by itself, and (b) all of these
> people have to be taking data from the attacker.
>
> If somebody gives you some source, and it's got some large random chunk in
> it, and the behavior of the object depends on the content of this chunk,
> and it's unspecified where this chunk comes from, you should be aware
> that they might be able to swap this chunk for a different chunk. But such
> a file is pretty blatantly malicious anyway.
This argument is invalid, since the use of git is not limited to
source code. People
can and do store unreadable binary data in git, and unless you are completely
sure that no one would ever care about the security of that data in a
way that can
be attacked with a single collision, git should be secure about those as well.
For example, I just converted a 20 GB repository to git which, among
other things,
contains pdf files of my tax returns. I have looked them over, but I
have not opened
them in a hex editor and looked them over at the binary level, and I
don't think git
should expect me to.
Incidentally, git was the only version control system I tried except
for subversion that
didn't choke on that repository. Mercurial looked at my file renames
and expanded
the size past 45 GB before I killed it, I had to fix a several bugs in
the bazaar conversion
scripts before I realized it was just too slow, and svk turns out to
be even more like
the Antichrist than subversion itself is (mirroring N repository
copies requires an N-fold
increase in size).
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 14:41 ` Andreas Ericsson
@ 2008-04-29 15:42 ` Nicolas Pitre
2008-04-29 15:59 ` Geoffrey Irving
0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Pitre @ 2008-04-29 15:42 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Dmitry Potapov, Henrik Austad, git
On Tue, 29 Apr 2008, Andreas Ericsson wrote:
> But they won't, because it's impossible to add two objects with the same
> SHA1 hash key to a git repository, since it will lazily re-use the
> existing one. In practice, this means that in the case of an "innocent"
> hash-collision, git will actually break by refusing to store the new
> content.
I'd also like to point out that Git usually receive "untrusted" new
objects via the Git protocol through 'git index-pack'. If you look at
sha1_object() in index-pack.c, you'll see that active verification
against hash collision is performed, and the fetch will abruptly be
aborted if ever that happens.
Yes, writing a test case for this was tricky. :-)
Nicolas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 15:42 ` Nicolas Pitre
@ 2008-04-29 15:59 ` Geoffrey Irving
2008-04-29 16:39 ` Nicolas Pitre
2008-04-29 18:17 ` Matthieu Moy
0 siblings, 2 replies; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-29 15:59 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Andreas Ericsson, Dmitry Potapov, Henrik Austad, git
On Tue, Apr 29, 2008 at 8:42 AM, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 29 Apr 2008, Andreas Ericsson wrote:
>
> > But they won't, because it's impossible to add two objects with the same
> > SHA1 hash key to a git repository, since it will lazily re-use the
> > existing one. In practice, this means that in the case of an "innocent"
> > hash-collision, git will actually break by refusing to store the new
> > content.
>
> I'd also like to point out that Git usually receive "untrusted" new
> objects via the Git protocol through 'git index-pack'. If you look at
> sha1_object() in index-pack.c, you'll see that active verification
> against hash collision is performed, and the fetch will abruptly be
> aborted if ever that happens.
>
> Yes, writing a test case for this was tricky. :-)
Here's the standard scenario for a hash collision attack, with
parties, A, B, and C:
1. C, the malicious one, computes the standard two pdfs with matching
sha1 hashes.
2. C sends the valid pdf to B through a git commit, and B signs it with a tag.
3. C grabs the signature, and then forwards the "signed" commit to A,
but substitutes the invalid pdf with the same hash.
The fact that git will check for hash collisions within one repository
is nice, but it doesn't significantly increase the security of git
against hash collision attacks.
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 12:46 ` Jurko Gospodnetić
@ 2008-04-29 16:21 ` Russ Dill
0 siblings, 0 replies; 38+ messages in thread
From: Russ Dill @ 2008-04-29 16:21 UTC (permalink / raw)
To: Jurko Gospodnetić
Cc: Andreas Ericsson, Henrik Austad, Daniel Barkalow, git
On Tue, Apr 29, 2008 at 5:46 AM, Jurko Gospodnetić
<jurko.gospodnetic@docte.hr> wrote:
>
> > I think you are missing the point. One of the pluses behind originally
> > using SHA-1 and the signed tags is that the system as a whole is
> > cryptographically secure. You can verify from the public key of
> > whoever made the tag that yes, this really is the source and history
> > they tagged.
> >
>
> I am not really sure I follow this.... how can you 'verify from the public
> key of whoever made the tag' that the SHA-1 hash is correct!? SHA-1 does not
> have anything do with any externally provided keys or have I managed to get
> something confused here?
>
Sorry for the confusion, its about using the signed tag and the SHA-1
of the parent commits, along with their associated trees and blobs to
verify the source and history. If you can't trust the signed tag, or
all of the SHA-1's, you can't trust the source and history.
However, as many said, I don't think there is any reason to not trust
SHA-1 is the context of source control.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 14:37 ` Andreas Ericsson
2008-04-29 14:52 ` Paolo Bonzini
@ 2008-04-29 16:24 ` Russ Dill
1 sibling, 0 replies; 38+ messages in thread
From: Russ Dill @ 2008-04-29 16:24 UTC (permalink / raw)
To: Andreas Ericsson
Cc: Paolo Bonzini, sverre, Henrik Austad, Daniel Barkalow, git
On Tue, Apr 29, 2008 at 7:37 AM, Andreas Ericsson <ae@op5.se> wrote:
>
> Paolo Bonzini wrote:
>
> >
> >
> > > I can think of one way to make git a lot more resilient to hash
> > > collisions, regardless of which hash is used, namely: Add the length
> > > of the hashed object to the hash.
> > >
> >
> > Not really, because most attacks are about collisions, not second
> preimages. They produce two 64-byte blocks (hence, same length) with the
> same hash value.
> >
> > As such, they allow to change a blob that *the attacker* injected in the
> repository. The way the more "spectacular" attacks are devised requires a
> "language" with conditional expressions -- for documents, for example,
> Postscript is used. If you prepare a postscript file whose code is
> >
> > if (AAAA == BBBB)
> > typeset document 1
> > else
> > typeset document 2
> >
> > where AAAA and BBBB are collisions, and you change it to "if (BBBB ==
> BBBB) the hash will be the same, but the outcome will be document 1 instead
> of document 2.
> >
> > The fact that this requires having the two "behaviors" in the blob is not
> a big deal for source code, going in the wrong branch of an "if" can be an
> attack. On the other hand, it makes adding the length useless for collision
> attacks. True, it wouldn't be useless for second preimage attacks, but
> SHA-1 is still secure with respect to those.
> >
> >
>
> So what you're saying is that if someone owns a repository and adds a
> file to it, he can then replace his entire repository with an identical
> one where the good file is replaced with a bad one, and this will affect
> people who clone *after* the file gets replaced.
>
No, if someone 0wnz a repository, not owns (Or really, malicious
mirror owners could be in on it). Either that or some form of
redirection attack. When you download a tarball, you can check the
signed checksum that is downloadable along with it. When you clone a
repo, you depend on signed tags.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 15:34 ` Geoffrey Irving
@ 2008-04-29 16:27 ` Daniel Barkalow
0 siblings, 0 replies; 38+ messages in thread
From: Daniel Barkalow @ 2008-04-29 16:27 UTC (permalink / raw)
To: Geoffrey Irving; +Cc: Henrik Austad, git
On Tue, 29 Apr 2008, Geoffrey Irving wrote:
> On Mon, Apr 28, 2008 at 12:34 PM, Daniel Barkalow <barkalow@iabervon.org> wrote:
> > On Mon, 28 Apr 2008, Henrik Austad wrote:
> >
> > > Hi list!
> > >
> > > As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
> > > and that is the primary reason for using sha1. However, several places
> > > (including the google tech-talk featuring Linus himself) states that the id's
> > > are cryptographically secure.
> > >
> > > As discussed in [1], SHA-1 is not as secure as it once was (and this was in
> > > 2005), and I'm wondering - are there any plans for migrating to another
> > > hash-algorithm? I.e. SHA-2, whirlpool..
> >
> > No. The cryptographic security we care about is that it's impractical to
> > come up with another set of content that hashes to the same value as a
> > given set of content. The known attacks on SHA-1 (and more broken earlier
> > hashes in the same general class) only allow the attacker to produce two
> > files that will collide. Now, it's true that this would allow somebody to
> > produce a commit where some people see the "good" blob and some people see
> > the "evil" blob, but (a) the "good" blob contains some large chunk of
> > random data, which is a major red flag by itself, and (b) all of these
> > people have to be taking data from the attacker.
> >
> > If somebody gives you some source, and it's got some large random chunk in
> > it, and the behavior of the object depends on the content of this chunk,
> > and it's unspecified where this chunk comes from, you should be aware
> > that they might be able to swap this chunk for a different chunk. But such
> > a file is pretty blatantly malicious anyway.
>
> This argument is invalid, since the use of git is not limited to
> source code. People
> can and do store unreadable binary data in git, and unless you are completely
> sure that no one would ever care about the security of that data in a
> way that can
> be attacked with a single collision, git should be secure about those as well.
>
> For example, I just converted a 20 GB repository to git which, among
> other things,
> contains pdf files of my tax returns. I have looked them over, but I
> have not opened
> them in a hex editor and looked them over at the binary level, and I
> don't think git
> should expect me to.
If you haven't looked over your PDFs with a hex editor, you're depending
on the security of the software generating the PDFs and on what you did in
generating them. (Looking at the resulting image alone may be unwise if,
for example, you redacted anything.) In any case, on the basis of your
actions, you may this commit. Now, anyone receiving the repository can,
due to the lack of second preimage attacks, be sure that (a) the document
is as you committed it; or (b) the document is different from what you
committed, but you made the substitution; or (c) the document is different
from what you committed, and you were tricked into committing a document
carefully designed by somebody else to be weak. Additionally, it's
infeasible to create a document such that forensics after the fact can't
turn up both the content as originally shown and the content as swapped
from either document.
I'm also not confident that PDFs are, in general, not vulnerable to an
attack where they rasterize entirely differently depending on
environmental factors (e.g., the document you're signing says something
entirely different when printed on A4 paper than what it says printed on
Letter); if so, it doesn't matter much that the document could be
replaced, since an attacker could just control the environment and get the
same effect.
In any case, an attacker can't come along later and make a replacement of
a file that originated in your commit. Also, you know that any sets of
interchangable documents had already been created when you get a commit
that contains one of them.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 15:59 ` Geoffrey Irving
@ 2008-04-29 16:39 ` Nicolas Pitre
2008-04-29 17:48 ` Geoffrey Irving
2008-04-29 18:17 ` Matthieu Moy
1 sibling, 1 reply; 38+ messages in thread
From: Nicolas Pitre @ 2008-04-29 16:39 UTC (permalink / raw)
To: Geoffrey Irving; +Cc: Andreas Ericsson, Dmitry Potapov, Henrik Austad, git
On Tue, 29 Apr 2008, Geoffrey Irving wrote:
> On Tue, Apr 29, 2008 at 8:42 AM, Nicolas Pitre <nico@cam.org> wrote:
> > On Tue, 29 Apr 2008, Andreas Ericsson wrote:
> >
> > > But they won't, because it's impossible to add two objects with the same
> > > SHA1 hash key to a git repository, since it will lazily re-use the
> > > existing one. In practice, this means that in the case of an "innocent"
> > > hash-collision, git will actually break by refusing to store the new
> > > content.
> >
> > I'd also like to point out that Git usually receive "untrusted" new
> > objects via the Git protocol through 'git index-pack'. If you look at
> > sha1_object() in index-pack.c, you'll see that active verification
> > against hash collision is performed, and the fetch will abruptly be
> > aborted if ever that happens.
> >
> > Yes, writing a test case for this was tricky. :-)
>
> Here's the standard scenario for a hash collision attack, with
> parties, A, B, and C:
>
> 1. C, the malicious one, computes the standard two pdfs with matching
> sha1 hashes.
> 2. C sends the valid pdf to B through a git commit, and B signs it with a tag.
> 3. C grabs the signature, and then forwards the "signed" commit to A,
> but substitutes the invalid pdf with the same hash.
>
> The fact that git will check for hash collisions within one repository
> is nice, but it doesn't significantly increase the security of git
> against hash collision attacks.
Sure. But this is all complete handwaving until a practical collision
can be demonstrated. So far the demonstration hasn't happened,
practical or not.
Nicolas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-28 16:29 About git and the use of SHA-1 Henrik Austad
` (2 preceding siblings ...)
2008-04-29 15:02 ` Tom Widmer
@ 2008-04-29 17:08 ` Tom Widmer
3 siblings, 0 replies; 38+ messages in thread
From: Tom Widmer @ 2008-04-29 17:08 UTC (permalink / raw)
To: git
Henrik Austad wrote:
> Hi list!
>
> As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
> and that is the primary reason for using sha1. However, several places
> (including the google tech-talk featuring Linus himself) states that the id's
> are cryptographically secure.
>
> As discussed in [1], SHA-1 is not as secure as it once was (and this was in
> 2005), and I'm wondering - are there any plans for migrating to another
> hash-algorithm? I.e. SHA-2, whirlpool..
>
> [1] http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html
Why not wait until the results of:
http://www.csrc.nist.gov/groups/ST/hash/index.html
are available. That will surely be soon enough (I think 2012 is the
expected finish date), and should prevent having to switch again in the
future.
The necessity or otherwise of improving the hashing will be clearer by
then too.
Tom
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 16:39 ` Nicolas Pitre
@ 2008-04-29 17:48 ` Geoffrey Irving
2008-04-29 17:55 ` Nicolas Pitre
0 siblings, 1 reply; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-29 17:48 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Andreas Ericsson, Dmitry Potapov, Henrik Austad, git
On Tue, Apr 29, 2008 at 9:39 AM, Nicolas Pitre <nico@cam.org> wrote:
>
> On Tue, 29 Apr 2008, Geoffrey Irving wrote:
>
> > On Tue, Apr 29, 2008 at 8:42 AM, Nicolas Pitre <nico@cam.org> wrote:
> > > On Tue, 29 Apr 2008, Andreas Ericsson wrote:
> > >
> > > > But they won't, because it's impossible to add two objects with the same
> > > > SHA1 hash key to a git repository, since it will lazily re-use the
> > > > existing one. In practice, this means that in the case of an "innocent"
> > > > hash-collision, git will actually break by refusing to store the new
> > > > content.
> > >
> > > I'd also like to point out that Git usually receive "untrusted" new
> > > objects via the Git protocol through 'git index-pack'. If you look at
> > > sha1_object() in index-pack.c, you'll see that active verification
> > > against hash collision is performed, and the fetch will abruptly be
> > > aborted if ever that happens.
> > >
> > > Yes, writing a test case for this was tricky. :-)
> >
> > Here's the standard scenario for a hash collision attack, with
> > parties, A, B, and C:
> >
> > 1. C, the malicious one, computes the standard two pdfs with matching
> > sha1 hashes.
> > 2. C sends the valid pdf to B through a git commit, and B signs it with a tag.
> > 3. C grabs the signature, and then forwards the "signed" commit to A,
> > but substitutes the invalid pdf with the same hash.
> >
> > The fact that git will check for hash collisions within one repository
> > is nice, but it doesn't significantly increase the security of git
> > against hash collision attacks.
>
> Sure. But this is all complete handwaving until a practical collision
> can be demonstrated. So far the demonstration hasn't happened,
> practical or not.
Sorry for the confusion: it would handwaving if I was saying git was insecure,
but I'm not. I'm saying that if or when SHA1 becomes vulnerable to collision
attacks, git will be insecure.
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 17:48 ` Geoffrey Irving
@ 2008-04-29 17:55 ` Nicolas Pitre
2008-04-29 18:02 ` Geoffrey Irving
0 siblings, 1 reply; 38+ messages in thread
From: Nicolas Pitre @ 2008-04-29 17:55 UTC (permalink / raw)
To: Geoffrey Irving; +Cc: Andreas Ericsson, Dmitry Potapov, Henrik Austad, git
On Tue, 29 Apr 2008, Geoffrey Irving wrote:
> Sorry for the confusion: it would handwaving if I was saying git was insecure,
> but I'm not. I'm saying that if or when SHA1 becomes vulnerable to collision
> attacks, git will be insecure.
Right. And if or when that happens then we'll make Git secure again
with a different hash. In the mean time there is low return for the
effort involved.
Nicolas
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 17:55 ` Nicolas Pitre
@ 2008-04-29 18:02 ` Geoffrey Irving
2008-04-29 18:41 ` Daniel Barkalow
0 siblings, 1 reply; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-29 18:02 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Andreas Ericsson, Dmitry Potapov, Henrik Austad, git
On Tue, Apr 29, 2008 at 10:55 AM, Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 29 Apr 2008, Geoffrey Irving wrote:
>
>
> > Sorry for the confusion: it would handwaving if I was saying git was insecure,
> > but I'm not. I'm saying that if or when SHA1 becomes vulnerable to collision
> > attacks, git will be insecure.
>
> Right. And if or when that happens then we'll make Git secure again
> with a different hash. In the mean time there is low return for the
> effort involved.
Yes. I wasn't trying to advocate switching, just making sure people
know that the "collisions don't matter" argument is bogus.
One important thing: when SHA1 becomes vulnerable to collision
attacks, it will still be secure to trust the repositories and tags
that exist *at that moment.* I.e., the transition period from SHA1 to
the next hash will also be secure, assuming that preimage attacks
don't become possible simultaneously. So everything is good.
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 15:59 ` Geoffrey Irving
2008-04-29 16:39 ` Nicolas Pitre
@ 2008-04-29 18:17 ` Matthieu Moy
2008-04-29 18:23 ` Fredrik Skolmli
1 sibling, 1 reply; 38+ messages in thread
From: Matthieu Moy @ 2008-04-29 18:17 UTC (permalink / raw)
To: Geoffrey Irving
Cc: Nicolas Pitre, Andreas Ericsson, Dmitry Potapov, Henrik Austad,
git
"Geoffrey Irving" <irving@naml.us> writes:
> Here's the standard scenario for a hash collision attack, with
> parties, A, B, and C:
>
> 1. C, the malicious one, computes the standard two pdfs with matching
> sha1 hashes.
> 2. C sends the valid pdf to B through a git commit, and B signs it with a tag.
> 3. C grabs the signature, and then forwards the "signed" commit to A,
> but substitutes the invalid pdf with the same hash.
Just to add my 2 cents, examples of this are available on the web,
like:
http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/
Same size, same hash. But that's with md5, not sha1.
--
Matthieu
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 18:17 ` Matthieu Moy
@ 2008-04-29 18:23 ` Fredrik Skolmli
0 siblings, 0 replies; 38+ messages in thread
From: Fredrik Skolmli @ 2008-04-29 18:23 UTC (permalink / raw)
To: Matthieu Moy
Cc: Geoffrey Irving, Nicolas Pitre, Andreas Ericsson, Dmitry Potapov,
Henrik Austad, git
On Tue, Apr 29, 2008 at 08:17:51PM +0200, Matthieu Moy wrote:
> > Here's the standard scenario for a hash collision attack, with
> > parties, A, B, and C:
> >
> > 1. C, the malicious one, computes the standard two pdfs with matching
> > sha1 hashes.
> > 2. C sends the valid pdf to B through a git commit, and B signs it with a tag.
> > 3. C grabs the signature, and then forwards the "signed" commit to A,
> > but substitutes the invalid pdf with the same hash.
>
> Just to add my 2 cents, examples of this are available on the web,
> like:
>
> http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/
>
> Same size, same hash. But that's with md5, not sha1.
Well yes, but that's still using the methods already mentioned in this
thread. So you do have to get your "good" code approved before replacing it
with something nasty.
- Fredrik
--
Regards,
Fredrik Skolmli
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 18:02 ` Geoffrey Irving
@ 2008-04-29 18:41 ` Daniel Barkalow
2008-04-29 20:31 ` Geoffrey Irving
0 siblings, 1 reply; 38+ messages in thread
From: Daniel Barkalow @ 2008-04-29 18:41 UTC (permalink / raw)
To: Geoffrey Irving
Cc: Nicolas Pitre, Andreas Ericsson, Dmitry Potapov, Henrik Austad,
git
On Tue, 29 Apr 2008, Geoffrey Irving wrote:
> On Tue, Apr 29, 2008 at 10:55 AM, Nicolas Pitre <nico@cam.org> wrote:
> > On Tue, 29 Apr 2008, Geoffrey Irving wrote:
> >
> >
> > > Sorry for the confusion: it would handwaving if I was saying git was insecure,
> > > but I'm not. I'm saying that if or when SHA1 becomes vulnerable to collision
> > > attacks, git will be insecure.
> >
> > Right. And if or when that happens then we'll make Git secure again
> > with a different hash. In the mean time there is low return for the
> > effort involved.
>
> Yes. I wasn't trying to advocate switching, just making sure people
> know that the "collisions don't matter" argument is bogus.
It's bogus to say they completely don't matter, but I still claim that
they don't matter for the things people actually care about. If people can
generate collisions, they can commit a "weak" blob with a conditional that
can be switched by replacing the blob. But it's almost always true that
people could commit a blob with a conditional that can be switched by
something else under the attacker's more direct control. Using a better
hash function won't save you from a document like:
if (getdate() < 2009)
render_good_text
else
render_evil_text
even if it does help with:
if (AA == AA)
render_good_text
else
render_evil_text
If you're not checking your files for the former, you shouldn't worry
about the latter, because the former is much easier and more subtle.
(Now, an arbitrary preimage attack would actually be significant, still,
because the attacker could replace an honestly-created "restrictive
security policy" file with garbage that will be ignored, leaving stuff
unprotected)
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 18:41 ` Daniel Barkalow
@ 2008-04-29 20:31 ` Geoffrey Irving
2008-04-29 20:50 ` Fredrik Skolmli
2008-04-30 2:58 ` Martin Langhoff
0 siblings, 2 replies; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-29 20:31 UTC (permalink / raw)
To: Daniel Barkalow
Cc: Nicolas Pitre, Andreas Ericsson, Dmitry Potapov, Henrik Austad,
git
On Tue, Apr 29, 2008 at 11:41 AM, Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Tue, 29 Apr 2008, Geoffrey Irving wrote:
>
> > On Tue, Apr 29, 2008 at 10:55 AM, Nicolas Pitre <nico@cam.org> wrote:
> > > On Tue, 29 Apr 2008, Geoffrey Irving wrote:
> > >
> > >
> > > > Sorry for the confusion: it would handwaving if I was saying git was insecure,
> > > > but I'm not. I'm saying that if or when SHA1 becomes vulnerable to collision
> > > > attacks, git will be insecure.
> > >
> > > Right. And if or when that happens then we'll make Git secure again
> > > with a different hash. In the mean time there is low return for the
> > > effort involved.
> >
> > Yes. I wasn't trying to advocate switching, just making sure people
> > know that the "collisions don't matter" argument is bogus.
>
> It's bogus to say they completely don't matter, but I still claim that
> they don't matter for the things people actually care about. If people can
> generate collisions, they can commit a "weak" blob with a conditional that
> can be switched by replacing the blob. But it's almost always true that
> people could commit a blob with a conditional that can be switched by
> something else under the attacker's more direct control. Using a better
> hash function won't save you from a document like:
>
> if (getdate() < 2009)
> render_good_text
> else
> render_evil_text
>
> even if it does help with:
>
> if (AA == AA)
> render_good_text
> else
> render_evil_text
>
> If you're not checking your files for the former, you shouldn't worry
> about the latter, because the former is much easier and more subtle.
I sincerely hope that pdf/postscript don't allow the internal
rendering code to branch based on the current date. That would be an
absurd security hole, and would indeed make you entirely correct. If
you actually know that it is possible to write that in postscript, I
would very much want to see an example.
In any case, in a binary document format that isn't insane (examples
of these at least include black and white .png images of documents), a
visual check of the content is sufficient to ensure that the next
person who looks at it will see roughly the same visual content. Git
should be (and currently is) a secure method of transferring sane
binary documents.
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 20:31 ` Geoffrey Irving
@ 2008-04-29 20:50 ` Fredrik Skolmli
2008-04-29 21:39 ` Geoffrey Irving
2008-04-30 2:58 ` Martin Langhoff
1 sibling, 1 reply; 38+ messages in thread
From: Fredrik Skolmli @ 2008-04-29 20:50 UTC (permalink / raw)
To: Geoffrey Irving
Cc: Daniel Barkalow, Nicolas Pitre, Andreas Ericsson, Dmitry Potapov,
Henrik Austad, git
On Tue, Apr 29, 2008 at 01:31:51PM -0700, Geoffrey Irving wrote:
> I sincerely hope that pdf/postscript don't allow the internal
> rendering code to branch based on the current date. That would be an
> absurd security hole, and would indeed make you entirely correct. If
> you actually know that it is possible to write that in postscript, I
> would very much want to see an example.
Have a look at
* http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/letter_of_rec.ps
vs
* http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/order.ps
both found on a website[1] already mentioned[2] in this thread. :-)
[1]: http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/
[2]: http://marc.info/?l=git&m=120949349923584&w=2
- F
--
Regards,
Fredrik Skolmli
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 20:50 ` Fredrik Skolmli
@ 2008-04-29 21:39 ` Geoffrey Irving
2008-04-29 21:52 ` Fredrik Skolmli
0 siblings, 1 reply; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-29 21:39 UTC (permalink / raw)
To: Fredrik Skolmli
Cc: Daniel Barkalow, Nicolas Pitre, Andreas Ericsson, Dmitry Potapov,
Henrik Austad, git
On Tue, Apr 29, 2008 at 1:50 PM, Fredrik Skolmli <fredrik@frsk.net> wrote:
> On Tue, Apr 29, 2008 at 01:31:51PM -0700, Geoffrey Irving wrote:
>
> > I sincerely hope that pdf/postscript don't allow the internal
> > rendering code to branch based on the current date. That would be an
> > absurd security hole, and would indeed make you entirely correct. If
> > you actually know that it is possible to write that in postscript, I
> > would very much want to see an example.
>
> Have a look at
>
> * http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/letter_of_rec.ps
> vs
> * http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/order.ps
>
> both found on a website[1] already mentioned[2] in this thread. :-)
>
> [1]: http://th.informatik.uni-mannheim.de/People/Lucks/HashCollisions/
> [2]: http://marc.info/?l=git&m=120949349923584&w=2
This is an example of a hash collision, not conditional rendering
based on the current date. I.e., you didn't actually read my email or
the email I was replying to. :)
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 21:39 ` Geoffrey Irving
@ 2008-04-29 21:52 ` Fredrik Skolmli
0 siblings, 0 replies; 38+ messages in thread
From: Fredrik Skolmli @ 2008-04-29 21:52 UTC (permalink / raw)
To: Geoffrey Irving; +Cc: git
On Tue, Apr 29, 2008 at 02:39:46PM -0700, Geoffrey Irving wrote:
> This is an example of a hash collision, not conditional rendering
> based on the current date. I.e., you didn't actually read my email or
> the email I was replying to. :)
Ah, you're right. Didn't notice the part about dates. Sorry ;-)
--
Regards,
Fredrik Skolmli
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-29 20:31 ` Geoffrey Irving
2008-04-29 20:50 ` Fredrik Skolmli
@ 2008-04-30 2:58 ` Martin Langhoff
2008-04-30 5:18 ` Geoffrey Irving
1 sibling, 1 reply; 38+ messages in thread
From: Martin Langhoff @ 2008-04-30 2:58 UTC (permalink / raw)
To: Geoffrey Irving
Cc: Daniel Barkalow, Nicolas Pitre, Andreas Ericsson, Dmitry Potapov,
Henrik Austad, git
On Wed, Apr 30, 2008 at 8:31 AM, Geoffrey Irving <irving@naml.us> wrote:
> I sincerely hope that pdf/postscript don't allow the internal
> rendering code to branch based on the current date. That would be an
> absurd security hole, and would indeed make you entirely correct. If
PS is Turing complete, and does know about dates. So yes, you can make
such conditionals.
That original md5 paper with the 2 PDF files is mainly a good example
that you should trust binary blobs, that's all. The md5 trick is a
nice demo, but misses the point entirely.
I can't find it now, but someone had written a PDF file that printed
Pi computing in inside the PS VM. The tiny file would keep the printer
churning out paper until it ran out of memory. :-)
cheers,
m
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-30 2:58 ` Martin Langhoff
@ 2008-04-30 5:18 ` Geoffrey Irving
2008-04-30 5:47 ` David Brown
0 siblings, 1 reply; 38+ messages in thread
From: Geoffrey Irving @ 2008-04-30 5:18 UTC (permalink / raw)
To: Martin Langhoff
Cc: Daniel Barkalow, Nicolas Pitre, Andreas Ericsson, Dmitry Potapov,
Henrik Austad, git
On Tue, Apr 29, 2008 at 7:58 PM, Martin Langhoff
<martin.langhoff@gmail.com> wrote:
> On Wed, Apr 30, 2008 at 8:31 AM, Geoffrey Irving <irving@naml.us> wrote:
> > I sincerely hope that pdf/postscript don't allow the internal
> > rendering code to branch based on the current date. That would be an
> > absurd security hole, and would indeed make you entirely correct. If
>
> PS is Turing complete, and does know about dates. So yes, you can make
> such conditionals.
I knew postscript was Turing complete, but had (naively) assumed it
executed sandboxed and deterministically and would therefore display
uniformly barring interpreter bugs. Looking over the spec, I can't
find where it's possible to read the current date, but the
usertime/realtime variables are sufficient as long as the attacker
knows how fast the relevant machines are.
> That original md5 paper with the 2 PDF files is mainly a good example
> that you should trust binary blobs, that's all. The md5 trick is a
> nice demo, but misses the point entirely.
>
> I can't find it now, but someone had written a PDF file that printed
> Pi computing in inside the PS VM. The tiny file would keep the printer
> churning out paper until it ran out of memory. :-)
According to wikipedia, PDF doesn't have conditionals or loops of any
kind, so you probably mean a postscript file.
Geoffrey
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-30 5:18 ` Geoffrey Irving
@ 2008-04-30 5:47 ` David Brown
2008-04-30 5:56 ` Martin Langhoff
0 siblings, 1 reply; 38+ messages in thread
From: David Brown @ 2008-04-30 5:47 UTC (permalink / raw)
To: Geoffrey Irving
Cc: Martin Langhoff, Daniel Barkalow, Nicolas Pitre, Andreas Ericsson,
Dmitry Potapov, Henrik Austad, git
On Tue, Apr 29, 2008 at 10:18:55PM -0700, Geoffrey Irving wrote:
>> PS is Turing complete, and does know about dates. So yes, you can make
>> such conditionals.
>
>I knew postscript was Turing complete, but had (naively) assumed it
>executed sandboxed and deterministically and would therefore display
>uniformly barring interpreter bugs. Looking over the spec, I can't
>find where it's possible to read the current date, but the
>usertime/realtime variables are sufficient as long as the attacker
>knows how fast the relevant machines are.
usertime and realtime are from the start of the invocation of the
postscript interpreter, not based on the outside world. So, the
interpreter could wait arbitrarily long, but has no way of knowing any
external reference to time.
I could imagine trickery with PDF signatures and their expiration times,
but you shouldn't be able to do anything with the information, so it would
be an exploit, and would probably be fixed.
David
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: About git and the use of SHA-1
2008-04-30 5:47 ` David Brown
@ 2008-04-30 5:56 ` Martin Langhoff
0 siblings, 0 replies; 38+ messages in thread
From: Martin Langhoff @ 2008-04-30 5:56 UTC (permalink / raw)
To: Geoffrey Irving, Martin Langhoff, Daniel Barkalow, Nicolas Pitre,
Andreas Ericsson
On Wed, Apr 30, 2008 at 5:47 PM, David Brown <git@davidb.org> wrote:
> usertime and realtime are from the start of the invocation of the
> postscript interpreter, not based on the outside world. So, the
You guys are right - I misremembered the spec wrt dates. I had the
distinct impression that there was a way to get the epoch.
Sorry about the noise.
martin
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2008-04-30 5:57 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-28 16:29 About git and the use of SHA-1 Henrik Austad
2008-04-28 19:34 ` Daniel Barkalow
2008-04-28 21:29 ` Henrik Austad
2008-04-28 22:15 ` Daniel Barkalow
2008-04-29 6:38 ` Andreas Ericsson
2008-04-29 7:09 ` Russ Dill
2008-04-29 7:21 ` Andreas Ericsson
2008-04-29 11:05 ` Sverre Rabbelier
2008-04-29 12:27 ` Andreas Ericsson
2008-04-29 13:05 ` Paolo Bonzini
2008-04-29 14:37 ` Andreas Ericsson
2008-04-29 14:52 ` Paolo Bonzini
2008-04-29 16:24 ` Russ Dill
2008-04-29 12:46 ` Jurko Gospodnetić
2008-04-29 16:21 ` Russ Dill
2008-04-29 15:34 ` Geoffrey Irving
2008-04-29 16:27 ` Daniel Barkalow
2008-04-29 12:41 ` Dmitry Potapov
2008-04-29 14:41 ` Andreas Ericsson
2008-04-29 15:42 ` Nicolas Pitre
2008-04-29 15:59 ` Geoffrey Irving
2008-04-29 16:39 ` Nicolas Pitre
2008-04-29 17:48 ` Geoffrey Irving
2008-04-29 17:55 ` Nicolas Pitre
2008-04-29 18:02 ` Geoffrey Irving
2008-04-29 18:41 ` Daniel Barkalow
2008-04-29 20:31 ` Geoffrey Irving
2008-04-29 20:50 ` Fredrik Skolmli
2008-04-29 21:39 ` Geoffrey Irving
2008-04-29 21:52 ` Fredrik Skolmli
2008-04-30 2:58 ` Martin Langhoff
2008-04-30 5:18 ` Geoffrey Irving
2008-04-30 5:47 ` David Brown
2008-04-30 5:56 ` Martin Langhoff
2008-04-29 18:17 ` Matthieu Moy
2008-04-29 18:23 ` Fredrik Skolmli
2008-04-29 15:02 ` Tom Widmer
2008-04-29 17:08 ` Tom Widmer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).