From mboxrd@z Thu Jan 1 00:00:00 1970 From: Richard Peterson Subject: Re: [Tagging Commits] feedback / discussion request Date: Thu, 5 May 2011 11:39:41 -0400 Message-ID: References: <20110504084212.GB8512@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: git@vger.kernel.org To: Jeff King X-From: git-owner@vger.kernel.org Thu May 05 17:39:53 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QI0ed-0002J3-Gy for gcvg-git-2@lo.gmane.org; Thu, 05 May 2011 17:39:52 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754887Ab1EEPjp convert rfc822-to-quoted-printable (ORCPT ); Thu, 5 May 2011 11:39:45 -0400 Received: from edgy.cirtexhosting.com ([75.126.140.58]:38860 "EHLO edgy.cirtexhosting.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753363Ab1EEPjo convert rfc822-to-8bit (ORCPT ); Thu, 5 May 2011 11:39:44 -0400 Received: from mail-gy0-f174.google.com ([209.85.160.174]:41446) by edgy.cirtexhosting.com with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.69) (envelope-from ) id 1QI0eg-0005pL-F8 for git@vger.kernel.org; Thu, 05 May 2011 11:39:54 -0400 Received: by gyd10 with SMTP id 10so792316gyd.19 for ; Thu, 05 May 2011 08:39:41 -0700 (PDT) Received: by 10.236.194.73 with SMTP id l49mr3158412yhn.132.1304609981571; Thu, 05 May 2011 08:39:41 -0700 (PDT) Received: by 10.147.98.11 with HTTP; Thu, 5 May 2011 08:39:41 -0700 (PDT) In-Reply-To: <20110504084212.GB8512@sigill.intra.peff.net> X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - edgy.cirtexhosting.com X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - rcpeterson.com X-Source: X-Source-Args: X-Source-Dir: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: =46irst off, thanks for the awesome response, Peff, and Sverre and Michael as well. Great stuff, and plenty that I had not thought of. On Wed, May 4, 2011 at 4:42 AM, Jeff King wrote: > On Tue, May 03, 2011 at 07:36:51PM -0400, Richard Peterson wrote: > >> Here are some possible semantics you could assign to signing a commi= t hash: >> >> * Making a verifiable claim of authorship of a commit >> * Making a verifiable claim to have reviewed a commit or set of comm= its >> * Making a verifiable claim to have approved a commit or set of comm= its for >> some purpose >> * Making some other verifiable claim about a commit TBD by your work= flow >> * Making a verifiable claim to have reviewed or approved the entire = tree >> under the commit > > Yeah, all of those make sense in certain workflows. But with the > exception of authorship verification, they are not things you would w= ant > to do at _commit_ time, Even authorship could be claimed after commit time too, for that matter= =2E > but rather something you say later about a > commit. So I think fundamentally you are not interested in adding > signatures to git commits themselves, but rather about making stateme= nts > about commits that happen to be signed. Which is good, because your > problem is much easier. :) > > The nice thing is that git gives you a stable, cryptographically > verifiable identifier for the commit. So all you have to do is mentio= n > it along with some metadata, sign it, and then store it somewhere. > > The first two parts can be as simple as something like: > > =A0(git rev-parse --verify HEAD > =A0 echo "I reviewed this and it meets some standard X." > =A0) | gpg --sign > > where probably you would want to define some kind of parsable metadat= a > format for your particular workflow. > > For storage, you basically have three options: > > =A01. Somewhere completely outside of git. There's no reason this nee= ds > =A0 =A0 to be stored in git at all, depending on your workflow. It ma= y be > =A0 =A0 simpler to keep it in some database related to your review sy= stem > =A0 =A0 (in fact, you may not doing anything cryptographic at all, bu= t > =A0 =A0 merely have a separate review system with a central database = that > =A0 =A0 mentions commits by sha1). I see this as a useful option considering some poor souls in my organization use Subversion, and we could factor out the audit / review workflow to not depend on a single version control system. On the other hand, it makes sense to keep data actually within git if i= t uses a git internal identifier as a key, and its useful to operate on i= t with the git tool set. > > =A02. In git tags. You can already do this with: > > =A0 =A0 =A0 git tag -s -m "I reviewed this" HEAD > > =A0 =A0 But tags aren't a good fit for a workflow that signs every co= mmit > =A0 =A0 (some of them perhaps even multiple times!). You end up with = lots > =A0 =A0 of tag refs. Right - one of the reasons I don't like tags for this. Tags really just= don't fit the bill, unfortunately. > > =A03. In git notes. You can do something like: > > =A0 =A0 =A0 (git rev-parse --verify HEAD > =A0 =A0 =A0 =A0echo "I reviewed this" > =A0 =A0 =A0 ) | gpg --sign -a | > =A0 =A0 =A0 git notes add -F - HEAD > > =A0 =A0 though you'd probably want to be a little more complex, and h= andle > =A0 =A0 lists of signed notes for each commit. And you may want to st= ore > =A0 =A0 these in a separate notes ref from the default one. I had looked at this option, but had failed to see the usefulness of us= ing a different ref. I was worried about cluttering things up, overloading = the intended purpose of notes, and so forth. I wasn't really sure if notes = were intended to be general purpose storage for systematic, structured data. My inclination was to do this outside notes, or even in a parallel implementation to notes, factoring out the common parts. I suppose that looking at notes as somewhat of a free-for-all obviates this need. Is t= his really what notes are for? > > =A0 =A0 The advantage of notes are that they are designed for lots of > =A0 =A0 per-commit storage, and can be accessed fairly efficiently. That was my other concern about notes - performance. Not sure how notes are stored, but I certainly trust you that they're efficient. > > So now you have your review storage system (or authorship, or whateve= r > metadata you want to stick in there). You can peek at it manually, of > course, when you suspect something is not right. But you probably als= o > want to do automatic things, like making sure nothing goes into some > branch "foo" that isn't signed with an authorship note. > > Assuming you are storing with git notes (if you are using some extern= al > system, replace the call to git-notes below with whatever database > lookup you would want), you could use a pre-receive hook that did > something like: > > =A0git rev-list $old..$new | > =A0while read commit; do > =A0 =A0git notes show $commit >tmp > =A0 =A0gpg --verify tmp >data 2>siginfo || die "$commit: signature is= bad" > =A0 =A0# ugh, is there really no better way to get this info from gpg= ? See? We need functions for this stuff! I'll share whatever I come up wi= th, and maybe it will be useful in general. > =A0 =A0perl -lne 'print $1 if /Good signature from "(.*)"/ siginfo >s= igner > =A0 =A0git show --format=3D"%an <%ae>" $commit >author > =A0 =A0cmp author signer || die "$commit: signer and committer don't = match" Yes this needs to be handled robustly. The signer would need to be told at sign-time if his signature didn't match. > =A0 =A0test "`head -n 1 data`" =3D $commit || > =A0 =A0 =A0die "$commit: signed commit does not match" > =A0done > > And obviously that is hacked together and you would want something mo= re > robust, Thank you - this is all solid stuff to get me started. > and you'd need to handle the web of trust for the signing keys > somehow (though I think that is external to this script, and is about > setting up the desired keyring). But I hope it gives a sense of what = you > can do. You could also replace gpg completely with something like > openssl using x.509 certs, if that makes more sense to your > organization. You read my mind. Everybody in my organization has a set of x.509 certs on smartcards. That's phase 2 of my project. > > Developers would have to make a note and push their notes tree first, You mean for hook / verification purposes? Or is there some underlying reason to push notes first? > and then push their actual commits into a branch (and you might want = to > do some verification on the notes they push, like checking that entri= es > for commit $X actually contains signatures for $X, or that the signer > identity matches some ssh credential, or that the pusher isn't deleti= ng > any signatures or erasing note history). > > I suspect you already thought through some of this already. But I wan= ted > to start with first principles, because I really don't think this is = a > _git_ problem as much as it is a _workflow_ problem. So it's importan= t > to first define the workflow you want, and then think about how git c= an > help. Stable commit identifiers already provide much of the basis. I > think notes provide a nice storage format that is efficient and > push-able to other repos (though in a centralized shop, some other > database might make sense, too). What really remains to be done is: > > =A01. Define the metadata format that encapsulates what you want to s= ay > =A0 =A0 about commits. > > =A02. Write scripts to help developers and reviewers make these notes= , > =A0 =A0 and verify them. =A0Write hooks to implement policy on lettin= g > =A0 =A0 commits into certain branches, as above. > > And both of those happen outside of git (though if you write them in = a > generic enough form, I'm sure people on the list would be very happy = to > see them shared). I'll be sure to share. > >> There are 200 developers working on a financial trading system, and = each of >> them has the opportunity to slip malicious code into the project. Wh= en the >> final release is prepared, the project lead signs the tip commit, th= us >> signing the whole tree. Now it is discovered that someone did slip s= ome >> malicious code in. =A0How do you audit the system? Could higher leve= ls of >> individual accountability have discouraged this scenario? > > I like this example. It shows that signing a commit is not really > meaningful by itself; you have to understand the semantics of that > signature (and maybe they're included as comments in the tag object, = or > maybe it is assumed by your organization's workflow). > > In the case of the kernel, Linus signing a commit with a tag implicit= ly > means "I think what is in this tree and everything before it is good,= so > you should feel comfortable using it" (or at least insofar as you tru= st > Linus). > > But it doesn't have to be that way. Your project lead signing may mea= n > "this is good and we should ship it". But developers signing commits = may > simply mean "I promise that I wrote the changes between this commit's > tree and its parent". Those are all signatures of commits, but they m= ean > very different things; the key is adding metadata to know which is > which. > >> I've seen it argued that a proper SSH setup and user management are = the key. >> These are good for security and access control, but not for some dur= able >> form of accountability. > > Right. You are trusting the server's records, not cryptography. The m= ain > advantage is that it's efficient and easy to set up. :) The main reason this doesn't work for me is that codebases are passed a= round my organization like hand-me-down clothes. It is not unheard of to get = the entire repository for a critical application delivered from one shop to= another on CD. We need to be able to verify the integrity of a repository entir= ely independently of any outside information. The only centralized source = of trust in our organization is the certificate authority. Now my big question to ponder: what do do when the CA expires a cert? H= mm... > >> It seems that creating a signed tag is the same as signing a commit.= =A0There >> are a few problems, though. =A0Tags don't provide a secure means of = asserting >> the type of signature being applied to the commit hash. That is - is= the >> hash signed because someone is claiming authorship? Because they are >> asserting the integrity of the entire tree? Because they have review= ed the >> code? Because they reviewed a certain subset of the tree? Of course = there's >> also the issue that tags live in a cluttered namespace. Signing a co= mmit is >> essentially a different thing from providing a name for a commit. Us= ing tags >> just to sign commits requires a glut of tag names. > > Again, metadata. Say what you mean in the free-form content of the ta= g. > For the kernel, there is nothing to be said. Linus signing tags has a > well-known meaning in the community. But in an organization signing f= or > a lot of different reasons, you would want the signed data to say why= it > was signed. > >> I propose expanding the concept of tags, or alternately creating a n= ew >> concept which subsumes the existing tag concept. I'll call this new = concept >> a "sig" for the purposes of this discussion. The concept of a sig cr= oss-cuts >> the concept of a tag. >> >> A tag signs the commit hash. A sig signs a SHA1-based absolute commi= t >> reference with a (possibly null) string concatenated to it. For inst= ance, a >> sig might sign the following string: > > A tag can already include arbitrary data. > > In fact, tags basically do what you want already; it's just that stor= ing > one tag ref per commit is going to be ugly. It might make sense to > replace the ad-hoc gpg signatures I used in my examples above with ta= g > objects, and then store the tag object in the notes tree. > >> "0b9deecf625677cf44058a42c2abd7add5167e81^0 author" >> which would mean that the signor is claiming authorship of that indi= vidual >> commit. (Suggestions for notating a single commit are welcome. "^0" = seemed >> natural.) > > See? You're defining metadata now. :) > >> * What on earth does it mean to tag a range of commits? With commit = ranges >> being siggable, and tags containing sigs, what does it mean to tag a= range >> of 10 commits, for instance? Is that desirable? Does it make any sen= se >> whatsoever? Does it hurt anything if it happens? > > It's slightly more efficient. If I wrote 10 commits, I can either sig= n > each individually saying "I wrote this", or I can make a single > signature showing them all. The tradeoff is that parsing and verifyin= g > metadata becomes a lot more complex. But crytographically speaking, a > range is not ambiguous; > >> * Performance? I think it would be extremely quick to verify a bunch= of >> sigs, but I don't know. Maybe I'm not thinking clearly about it. >> Fortunately, sigs can be ignored entirely and need not affect things= =2E > > Compared to usual git operations, no, it's not quick. But you don't h= ave > to verify all the time. You can verify commits when they enter your > repo, or when you're interested in some aspect of them, or when you > suspect something fishy is going on. You don't have to do it on every > rev-list. Good point. I had thought it would be something to see every time I run git-log, but I suppose it makes perfect sense to do this thing in the n= ightlies or some other rarer occasion. Thanks, Richard Peterson