From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Ericsson Subject: Re: About git and the use of SHA-1 Date: Tue, 29 Apr 2008 08:38:37 +0200 Message-ID: <4816C26D.9010304@op5.se> References: <200804281829.11866.henrikau@orakel.ntnu.no> <200804282329.21336.henrikau@orakel.ntnu.no> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: Daniel Barkalow , git@vger.kernel.org To: Henrik Austad X-From: git-owner@vger.kernel.org Tue Apr 29 08:39:29 2008 connect(): Connection refused Return-path: Envelope-to: gcvg-git-2@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1JqjUu-0007Q7-MG for gcvg-git-2@gmane.org; Tue, 29 Apr 2008 08:39:29 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751782AbYD2Gil (ORCPT ); Tue, 29 Apr 2008 02:38:41 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754513AbYD2Gik (ORCPT ); Tue, 29 Apr 2008 02:38:40 -0400 Received: from mail.op5.se ([193.201.96.20]:59485 "EHLO mail.op5.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751608AbYD2Gik (ORCPT ); Tue, 29 Apr 2008 02:38:40 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.op5.se (Postfix) with ESMTP id 159431F0807D; Tue, 29 Apr 2008 08:38:42 +0200 (CEST) X-Virus-Scanned: amavisd-new at X-Spam-Flag: NO X-Spam-Score: -4.399 X-Spam-Level: X-Spam-Status: No, score=-4.399 tagged_above=-10 required=6.6 tests=[ALL_TRUSTED=-1.8, BAYES_00=-2.599] Received: from mail.op5.se ([127.0.0.1]) by localhost (mail.op5.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1ciuASjFfGsT; Tue, 29 Apr 2008 08:38:41 +0200 (CEST) Received: from clix.int.op5.se (unknown [192.168.1.27]) by mail.op5.se (Postfix) with ESMTP id 23A671F08063; Tue, 29 Apr 2008 08:38:41 +0200 (CEST) User-Agent: Thunderbird 2.0.0.12 (X11/20080226) In-Reply-To: <200804282329.21336.henrikau@orakel.ntnu.no> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Henrik Austad wrote: > On Monday 28 April 2008 21:34:50 Daniel Barkalow wrote: >> On Mon, 28 Apr 2008, Henrik Austad wrote: >>> Hi list! >>> >>> As far as I have gathered, the SHA-1-sum is used as a identifier for >>> commits, and that is the primary reason for using sha1. However, several >>> places (including the google tech-talk featuring Linus himself) states >>> that the id's are cryptographically secure. >>> >>> As discussed in [1], SHA-1 is not as secure as it once was (and this was >>> in 2005), and I'm wondering - are there any plans for migrating to >>> another hash-algorithm? I.e. SHA-2, whirlpool.. >> No. The cryptographic security we care about is that it's impractical to >> come up with another set of content that hashes to the same value as a >> given set of content. The known attacks on SHA-1 (and more broken earlier >> hashes in the same general class) only allow the attacker to produce two >> files that will collide. Now, it's true that this would allow somebody to >> produce a commit where some people see the "good" blob and some people see >> the "evil" blob, but (a) the "good" blob contains some large chunk of >> random data, which is a major red flag by itself, and (b) all of these >> people have to be taking data from the attacker. > > yes, I can see that point, but I was thinking more along the line of: > > 1) clone repo > 2) add malicious code > 3) add a huge block of comment, ifdef-block etc somewhere obscure in the code > and keep adding random data untill hash matches a well-known release. > 4) publish repo, or even worse, change central repo > This depends greatly on git accepting objects with a colliding object-name, which it doesn't. Once you have an object with a particular SHA1, it will never get overwritten, ever, as git will believe it's about to do unnecessary work. As such, you'd still have to create a new object, hashing to a new SHA1 and get that new object added to the kernel. I think perhaps Andrew Morton and a few other "high brass" among the kernel hackers can get away with pushing crud like that to Linus' public tree (which is the de facto master copy of published kernel sources), but random John Doe's such as you and me wouldn't stand a chance, as our patches would get reviewed by someone who, at the end of the day, makes a living coding Linux. > Most users, and probably a lot of developers never browse through the *entire* > archive looking for this, and as long as the hash checks out - why would you? > Yes, it would probably be discovered soon enough, but take the linux kernel > as an example - if you get, say 100 infected machines due to this, what would > this do to the reputation of the kernel? > That depends. If the source of it was Linus' public tree, that would not be very good at all. If the source was a random tarball off a random webpage or ftp site (which would be the same as fetching and, unverified, using an unchecked git repository), I doubt it would matter much. > >> If somebody gives you some source, and it's got some large random chunk in >> it, and the behavior of the object depends on the content of this chunk, >> and it's unspecified where this chunk comes from, you should be aware >> that they might be able to swap this chunk for a different chunk. But such >> a file is pretty blatantly malicious anyway. > > True, but this actually means you have to verify *everything*, even though the > hash checks out. > Not really. What you need to verify is that a) You cloned from somewhere you trust (kernel.org, fe) b) The SHA1 of the commit you want to build from matches the SHA1 of the same commit in the repository you originally cloned from. Colliding objects can never enter a repository. Git is lazy and will reuse the already existing colliding object with the same name instead. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231