[Tagging Commits] feedback / discussion request

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Tagging Commits] feedback / discussion request
@ 2011-05-03 23:36 Richard Peterson
  2011-05-03 23:49 ` Sverre Rabbelier
  2011-05-04  8:42 ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Peterson @ 2011-05-03 23:36 UTC (permalink / raw)
  To: git

This is a different idea from that discussed in
http://marc.info/?t=123879411100002&r=1&w=2.

I'm going to present some use cases for signing commits (instead of just
tags). Then I'll present an idea for implementation. I hope to implement
this as described, along with Eric Ritz. We would appreciate any insight you
may have - we want to help, not waste our time or anybody else's.

First, Linus has argued against signing commits in this thread:
http://marc.info/?t=123879411100002&r=1&w=2.  He claims it is pointless to
sign individual commits, as opposed to signing just the tip of the tree
(tags). For many use cases, that's true. But read on.

Here are some possible semantics you could assign to signing a commit hash:

* Making a verifiable claim of authorship of a commit
* Making a verifiable claim to have reviewed a commit or set of commits
* Making a verifiable claim to have approved a commit or set of commits for
some purpose
* Making some other verifiable claim about a commit TBD by your workflow
* Making a verifiable claim to have reviewed or approved the entire tree
under the commit

Claiming to have reviewed or approved the entire tree is useful in many
cases. It's great for something like the Linux kernel.  If you've got a tip
signed by Linus, you've got the kernel. You don't need to care what's merged
in under that, as long as it's signed at the top. It's like an MD5 checksum
on a download. You don't care what mirror you download an ISO from, as long
as the computed hash matches the authoritative hash.

Semantically, someone who signs a tree takes responsibility for everything
included in that tree, to whatever degree that applies in their project.

Now *technically* signing the tree may be equivalent to signing an
individual commit, but don't get wrapped up in that. Stick to the semantics
with me.

Imagine the following scenario to help justify the other use cases above.

There are 200 developers working on a financial trading system, and each of
them has the opportunity to slip malicious code into the project. When the
final release is prepared, the project lead signs the tip commit, thus
signing the whole tree. Now it is discovered that someone did slip some
malicious code in.  How do you audit the system? Could higher levels of
individual accountability have discouraged this scenario?

I've seen it argued that a proper SSH setup and user management are the key.
These are good for security and access control, but not for some durable
form of accountability.

If each commit is individually signed, the authorship claims have teeth. In
our scenario above, a single signature at the tip of the tree did no good in
terms of accountability. You can blame the guy on top, but is it really
reasonable that he review every line? However, if each commit were signed,
tracing the malicious code would be simple. If a reviewer had been required
to sign every commit, or maybe every range of commits (signing
186fa861..8645b061, for instance), then there could be a double layer of
accountability. This kind of "hard" accountability can be valuable in
sensitive projects. I work on such projects.

Some people might not see the use of this kind of auditability. I'll tell
you though - I work in a large organization that uses SVN because of some
kind of perceived auditability. They shy away from Git because it's
"distributed" and therefore not auditable.  Of course that's a
phony-baloney, ignorant argument, but the point is that the need for
auditability is there.

So how do these semantics line up with Git?

It seems that creating a signed tag is the same as signing a commit.  There
are a few problems, though.  Tags don't provide a secure means of asserting
the type of signature being applied to the commit hash. That is - is the
hash signed because someone is claiming authorship? Because they are
asserting the integrity of the entire tree? Because they have reviewed the
code? Because they reviewed a certain subset of the tree? Of course there's
also the issue that tags live in a cluttered namespace. Signing a commit is
essentially a different thing from providing a name for a commit. Using tags
just to sign commits requires a glut of tag names.

I propose expanding the concept of tags, or alternately creating a new
concept which subsumes the existing tag concept. I'll call this new concept
a "sig" for the purposes of this discussion. The concept of a sig cross-cuts
the concept of a tag.

A tag signs the commit hash. A sig signs a SHA1-based absolute commit
reference with a (possibly null) string concatenated to it. For instance, a
sig might sign the following string:

"0b9deecf625677cf44058a42c2abd7add5167e81^0 author"
which would mean that the signor is claiming authorship of that individual
commit. (Suggestions for notating a single commit are welcome. "^0" seemed
natural.)

or

"5ae6f5ca2f70bd7d5ca88c20f2be62bf3844af73..0b9deecf625677cf44058a42c2abd7add5167e81
reviewer"
which means the signor has reviewed that particular chain of commits.

Signing the string
"0b9deecf625677cf44058a42c2abd7add5167e81"
would be the same as signing the entire tree for that hash, which is what
happens in a tag.

A signed tag would essentially be a name associated with a sig - not too far
off from how it works now. A lightweight tag would be a name not associated
with a sig - again, about how it is now.

This strategy has several benefits:

* It is separable from the commit tree.  Linus argued against including
commit signatures within commits. This solution doesn't do that. If someone
wants to ditch all signatures and publish the tree, they can.

* It is extensible. Standard strings like "author" and "reviewer" might get
built-in support, but there is nothing preventing adding custom signature
types to meet your workflow needs.  Someone could even add something like a
datestamp, if the need arose. In addition, there would be no limit to the
number of "author" or "reviewer" sigs that could point to a single commit.
Great for pair programming, group code reviews, or other workflows.

* It can probably be implemented cleanly without breaking the existing tags.

I see a few potential issues:

* What on earth does it mean to tag a range of commits? With commit ranges
being siggable, and tags containing sigs, what does it mean to tag a range
of 10 commits, for instance? Is that desirable? Does it make any sense
whatsoever? Does it hurt anything if it happens?

* Performance? I think it would be extremely quick to verify a bunch of
sigs, but I don't know. Maybe I'm not thinking clearly about it.
Fortunately, sigs can be ignored entirely and need not affect things.

* Any others issues?

Thank you, and please give feedback. I'm no git pro - just a guy with an
idea. Based on your feedback, Eric and I will steer our implementation.

Richard Peterson

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tagging Commits] feedback / discussion request
  2011-05-03 23:36 [Tagging Commits] feedback / discussion request Richard Peterson
@ 2011-05-03 23:49 ` Sverre Rabbelier
  2011-05-04  9:21   ` Michael J Gruber
  2011-05-04  8:42 ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Sverre Rabbelier @ 2011-05-03 23:49 UTC (permalink / raw)
  To: Richard Peterson; +Cc: git

Heya,

On Wed, May 4, 2011 at 01:36, Richard Peterson <richard@rcpeterson.com> wrote:
> Thank you, and please give feedback. I'm no git pro - just a guy with an
> idea. Based on your feedback, Eric and I will steer our implementation.

Have you looked at git notes? They seem relevant. You could use them
to sign commits after the fact, and by multiple people, etc.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tagging Commits] feedback / discussion request
  2011-05-03 23:36 [Tagging Commits] feedback / discussion request Richard Peterson
  2011-05-03 23:49 ` Sverre Rabbelier
@ 2011-05-04  8:42 ` Jeff King
  2011-05-05 15:39   ` Richard Peterson
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2011-05-04  8:42 UTC (permalink / raw)
  To: Richard Peterson; +Cc: git

On Tue, May 03, 2011 at 07:36:51PM -0400, Richard Peterson wrote:

> Here are some possible semantics you could assign to signing a commit hash:
> 
> * Making a verifiable claim of authorship of a commit
> * Making a verifiable claim to have reviewed a commit or set of commits
> * Making a verifiable claim to have approved a commit or set of commits for
> some purpose
> * Making some other verifiable claim about a commit TBD by your workflow
> * Making a verifiable claim to have reviewed or approved the entire tree
> under the commit

Yeah, all of those make sense in certain workflows. But with the
exception of authorship verification, they are not things you would want
to do at _commit_ time, but rather something you say later about a
commit. So I think fundamentally you are not interested in adding
signatures to git commits themselves, but rather about making statements
about commits that happen to be signed. Which is good, because your
problem is much easier. :)

The nice thing is that git gives you a stable, cryptographically
verifiable identifier for the commit. So all you have to do is mention
it along with some metadata, sign it, and then store it somewhere.

The first two parts can be as simple as something like:

  (git rev-parse --verify HEAD
   echo "I reviewed this and it meets some standard X."
  ) | gpg --sign

where probably you would want to define some kind of parsable metadata
format for your particular workflow.

For storage, you basically have three options:

  1. Somewhere completely outside of git. There's no reason this needs
     to be stored in git at all, depending on your workflow. It may be
     simpler to keep it in some database related to your review system
     (in fact, you may not doing anything cryptographic at all, but
     merely have a separate review system with a central database that
     mentions commits by sha1).

  2. In git tags. You can already do this with:

       git tag -s -m "I reviewed this" HEAD

     But tags aren't a good fit for a workflow that signs every commit
     (some of them perhaps even multiple times!). You end up with lots
     of tag refs.

  3. In git notes. You can do something like:

       (git rev-parse --verify HEAD
        echo "I reviewed this"
       ) | gpg --sign -a |
       git notes add -F - HEAD

     though you'd probably want to be a little more complex, and handle
     lists of signed notes for each commit. And you may want to store
     these in a separate notes ref from the default one.

     The advantage of notes are that they are designed for lots of
     per-commit storage, and can be accessed fairly efficiently.

So now you have your review storage system (or authorship, or whatever
metadata you want to stick in there). You can peek at it manually, of
course, when you suspect something is not right. But you probably also
want to do automatic things, like making sure nothing goes into some
branch "foo" that isn't signed with an authorship note.

Assuming you are storing with git notes (if you are using some external
system, replace the call to git-notes below with whatever database
lookup you would want), you could use a pre-receive hook that did
something like:

  git rev-list $old..$new |
  while read commit; do
    git notes show $commit >tmp
    gpg --verify tmp >data 2>siginfo || die "$commit: signature is bad"
    # ugh, is there really no better way to get this info from gpg?
    perl -lne 'print $1 if /Good signature from "(.*)"/ siginfo >signer
    git show --format="%an <%ae>" $commit >author
    cmp author signer || die "$commit: signer and committer don't match"
    test "`head -n 1 data`" = $commit ||
      die "$commit: signed commit does not match"
  done

And obviously that is hacked together and you would want something more
robust, and you'd need to handle the web of trust for the signing keys
somehow (though I think that is external to this script, and is about
setting up the desired keyring). But I hope it gives a sense of what you
can do. You could also replace gpg completely with something like
openssl using x.509 certs, if that makes more sense to your
organization.

Developers would have to make a note and push their notes tree first,
and then push their actual commits into a branch (and you might want to
do some verification on the notes they push, like checking that entries
for commit $X actually contains signatures for $X, or that the signer
identity matches some ssh credential, or that the pusher isn't deleting
any signatures or erasing note history).

I suspect you already thought through some of this already. But I wanted
to start with first principles, because I really don't think this is a
_git_ problem as much as it is a _workflow_ problem. So it's important
to first define the workflow you want, and then think about how git can
help. Stable commit identifiers already provide much of the basis. I
think notes provide a nice storage format that is efficient and
push-able to other repos (though in a centralized shop, some other
database might make sense, too). What really remains to be done is:

  1. Define the metadata format that encapsulates what you want to say
     about commits.

  2. Write scripts to help developers and reviewers make these notes,
     and verify them.  Write hooks to implement policy on letting
     commits into certain branches, as above.

And both of those happen outside of git (though if you write them in a
generic enough form, I'm sure people on the list would be very happy to
see them shared).

> There are 200 developers working on a financial trading system, and each of
> them has the opportunity to slip malicious code into the project. When the
> final release is prepared, the project lead signs the tip commit, thus
> signing the whole tree. Now it is discovered that someone did slip some
> malicious code in.  How do you audit the system? Could higher levels of
> individual accountability have discouraged this scenario?

I like this example. It shows that signing a commit is not really
meaningful by itself; you have to understand the semantics of that
signature (and maybe they're included as comments in the tag object, or
maybe it is assumed by your organization's workflow).

In the case of the kernel, Linus signing a commit with a tag implicitly
means "I think what is in this tree and everything before it is good, so
you should feel comfortable using it" (or at least insofar as you trust
Linus).

But it doesn't have to be that way. Your project lead signing may mean
"this is good and we should ship it". But developers signing commits may
simply mean "I promise that I wrote the changes between this commit's
tree and its parent". Those are all signatures of commits, but they mean
very different things; the key is adding metadata to know which is
which.

> I've seen it argued that a proper SSH setup and user management are the key.
> These are good for security and access control, but not for some durable
> form of accountability.

Right. You are trusting the server's records, not cryptography. The main
advantage is that it's efficient and easy to set up. :)

> It seems that creating a signed tag is the same as signing a commit.  There
> are a few problems, though.  Tags don't provide a secure means of asserting
> the type of signature being applied to the commit hash. That is - is the
> hash signed because someone is claiming authorship? Because they are
> asserting the integrity of the entire tree? Because they have reviewed the
> code? Because they reviewed a certain subset of the tree? Of course there's
> also the issue that tags live in a cluttered namespace. Signing a commit is
> essentially a different thing from providing a name for a commit. Using tags
> just to sign commits requires a glut of tag names.

Again, metadata. Say what you mean in the free-form content of the tag.
For the kernel, there is nothing to be said. Linus signing tags has a
well-known meaning in the community. But in an organization signing for
a lot of different reasons, you would want the signed data to say why it
was signed.

> I propose expanding the concept of tags, or alternately creating a new
> concept which subsumes the existing tag concept. I'll call this new concept
> a "sig" for the purposes of this discussion. The concept of a sig cross-cuts
> the concept of a tag.
> 
> A tag signs the commit hash. A sig signs a SHA1-based absolute commit
> reference with a (possibly null) string concatenated to it. For instance, a
> sig might sign the following string:

A tag can already include arbitrary data.

In fact, tags basically do what you want already; it's just that storing
one tag ref per commit is going to be ugly. It might make sense to
replace the ad-hoc gpg signatures I used in my examples above with tag
objects, and then store the tag object in the notes tree.

> "0b9deecf625677cf44058a42c2abd7add5167e81^0 author"
> which would mean that the signor is claiming authorship of that individual
> commit. (Suggestions for notating a single commit are welcome. "^0" seemed
> natural.)

See? You're defining metadata now. :)

> * What on earth does it mean to tag a range of commits? With commit ranges
> being siggable, and tags containing sigs, what does it mean to tag a range
> of 10 commits, for instance? Is that desirable? Does it make any sense
> whatsoever? Does it hurt anything if it happens?

It's slightly more efficient. If I wrote 10 commits, I can either sign
each individually saying "I wrote this", or I can make a single
signature showing them all. The tradeoff is that parsing and verifying
metadata becomes a lot more complex. But crytographically speaking, a
range is not ambiguous;

> * Performance? I think it would be extremely quick to verify a bunch of
> sigs, but I don't know. Maybe I'm not thinking clearly about it.
> Fortunately, sigs can be ignored entirely and need not affect things.

Compared to usual git operations, no, it's not quick. But you don't have
to verify all the time. You can verify commits when they enter your
repo, or when you're interested in some aspect of them, or when you
suspect something fishy is going on. You don't have to do it on every
rev-list.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tagging Commits] feedback / discussion request
  2011-05-03 23:49 ` Sverre Rabbelier
@ 2011-05-04  9:21   ` Michael J Gruber
  0 siblings, 0 replies; 7+ messages in thread
From: Michael J Gruber @ 2011-05-04  9:21 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Richard Peterson, git

Sverre Rabbelier venit, vidit, dixit 04.05.2011 01:49:
> Heya,
> 
> On Wed, May 4, 2011 at 01:36, Richard Peterson <richard@rcpeterson.com> wrote:
>> Thank you, and please give feedback. I'm no git pro - just a guy with an
>> idea. Based on your feedback, Eric and I will steer our implementation.
> 
> Have you looked at git notes? They seem relevant. You could use them
> to sign commits after the fact, and by multiple people, etc.
> 

Exactly. Sign and store sig in refs/notes/sigs:

git rev-parse <commit> | gpg -sa | git notes --ref=sigs append -F- <commit>

Verify:

git notes --ref=sigs show <commit> | gpg

You can sign any object (blob, tree...) that way, of course.

Everything else (meaning of this sig, just like the meaning of a signed
tag or a s-o-b line) is a matter of project policy.

Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tagging Commits] feedback / discussion request
  2011-05-04  8:42 ` Jeff King
@ 2011-05-05 15:39   ` Richard Peterson
  2011-05-05 18:49     ` Sverre Rabbelier
  2011-05-05 20:30     ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Peterson @ 2011-05-05 15:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git

First off, thanks for the awesome response, Peff, and Sverre and
Michael as well. Great stuff, and plenty that I had not thought of.

On Wed, May 4, 2011 at 4:42 AM, Jeff King <peff@peff.net> wrote:
> On Tue, May 03, 2011 at 07:36:51PM -0400, Richard Peterson wrote:
>
>> Here are some possible semantics you could assign to signing a commit hash:
>>
>> * Making a verifiable claim of authorship of a commit
>> * Making a verifiable claim to have reviewed a commit or set of commits
>> * Making a verifiable claim to have approved a commit or set of commits for
>> some purpose
>> * Making some other verifiable claim about a commit TBD by your workflow
>> * Making a verifiable claim to have reviewed or approved the entire tree
>> under the commit
>
> Yeah, all of those make sense in certain workflows. But with the
> exception of authorship verification, they are not things you would want
> to do at _commit_ time,

Even authorship could be claimed after commit time too, for that matter.

> but rather something you say later about a
> commit. So I think fundamentally you are not interested in adding
> signatures to git commits themselves, but rather about making statements
> about commits that happen to be signed. Which is good, because your
> problem is much easier. :)
>
> The nice thing is that git gives you a stable, cryptographically
> verifiable identifier for the commit. So all you have to do is mention
> it along with some metadata, sign it, and then store it somewhere.
>
> The first two parts can be as simple as something like:
>
>  (git rev-parse --verify HEAD
>   echo "I reviewed this and it meets some standard X."
>  ) | gpg --sign
>
> where probably you would want to define some kind of parsable metadata
> format for your particular workflow.
>
> For storage, you basically have three options:
>
>  1. Somewhere completely outside of git. There's no reason this needs
>     to be stored in git at all, depending on your workflow. It may be
>     simpler to keep it in some database related to your review system
>     (in fact, you may not doing anything cryptographic at all, but
>     merely have a separate review system with a central database that
>     mentions commits by sha1).

I see this as a useful option considering some poor souls in my
organization use Subversion, and we could factor out the audit / review
workflow to not depend on a single version control system.

On the other hand, it makes sense to keep data actually within git if it
uses a git internal identifier as a key, and its useful to operate on it with
the git tool set.

>
>  2. In git tags. You can already do this with:
>
>       git tag -s -m "I reviewed this" HEAD
>
>     But tags aren't a good fit for a workflow that signs every commit
>     (some of them perhaps even multiple times!). You end up with lots
>     of tag refs.

Right - one of the reasons I don't like tags for this. Tags really just don't
fit the bill, unfortunately.

>
>  3. In git notes. You can do something like:
>
>       (git rev-parse --verify HEAD
>        echo "I reviewed this"
>       ) | gpg --sign -a |
>       git notes add -F - HEAD
>
>     though you'd probably want to be a little more complex, and handle
>     lists of signed notes for each commit. And you may want to store
>     these in a separate notes ref from the default one.

I had looked at this option, but had failed to see the usefulness of using
a different ref. I was worried about cluttering things up, overloading the
intended purpose of notes, and so forth. I wasn't really sure if notes were
intended to be general purpose storage for systematic, structured data.

My inclination was to do this outside notes, or even in a parallel
implementation to notes, factoring out the common parts. I suppose that
looking at notes as somewhat of a free-for-all obviates this need. Is this
really what notes are for?

>
>     The advantage of notes are that they are designed for lots of
>     per-commit storage, and can be accessed fairly efficiently.

That was my other concern about notes - performance. Not sure how
notes are stored, but I certainly trust you that they're efficient.

>
> So now you have your review storage system (or authorship, or whatever
> metadata you want to stick in there). You can peek at it manually, of
> course, when you suspect something is not right. But you probably also
> want to do automatic things, like making sure nothing goes into some
> branch "foo" that isn't signed with an authorship note.
>
> Assuming you are storing with git notes (if you are using some external
> system, replace the call to git-notes below with whatever database
> lookup you would want), you could use a pre-receive hook that did
> something like:
>
>  git rev-list $old..$new |
>  while read commit; do
>    git notes show $commit >tmp
>    gpg --verify tmp >data 2>siginfo || die "$commit: signature is bad"
>    # ugh, is there really no better way to get this info from gpg?

See? We need functions for this stuff! I'll share whatever I come up with,
and maybe it will be useful in general.

>    perl -lne 'print $1 if /Good signature from "(.*)"/ siginfo >signer
>    git show --format="%an <%ae>" $commit >author
>    cmp author signer || die "$commit: signer and committer don't match"

Yes this needs to be handled robustly. The signer would need to be told
at sign-time if his signature didn't match.

>    test "`head -n 1 data`" = $commit ||
>      die "$commit: signed commit does not match"
>  done
>
> And obviously that is hacked together and you would want something more
> robust,

Thank you - this is all solid stuff to get me started.

> and you'd need to handle the web of trust for the signing keys
> somehow (though I think that is external to this script, and is about
> setting up the desired keyring). But I hope it gives a sense of what you
> can do. You could also replace gpg completely with something like
> openssl using x.509 certs, if that makes more sense to your
> organization.

You read my mind. Everybody in my organization has a set of x.509 certs
on smartcards. That's phase 2 of my project.

>
> Developers would have to make a note and push their notes tree first,

You mean for hook / verification purposes? Or is there some underlying
reason to push notes first?

> and then push their actual commits into a branch (and you might want to
> do some verification on the notes they push, like checking that entries
> for commit $X actually contains signatures for $X, or that the signer
> identity matches some ssh credential, or that the pusher isn't deleting
> any signatures or erasing note history).
>
> I suspect you already thought through some of this already. But I wanted
> to start with first principles, because I really don't think this is a
> _git_ problem as much as it is a _workflow_ problem. So it's important
> to first define the workflow you want, and then think about how git can
> help. Stable commit identifiers already provide much of the basis. I
> think notes provide a nice storage format that is efficient and
> push-able to other repos (though in a centralized shop, some other
> database might make sense, too). What really remains to be done is:
>
>  1. Define the metadata format that encapsulates what you want to say
>     about commits.
>
>  2. Write scripts to help developers and reviewers make these notes,
>     and verify them.  Write hooks to implement policy on letting
>     commits into certain branches, as above.
>
> And both of those happen outside of git (though if you write them in a
> generic enough form, I'm sure people on the list would be very happy to
> see them shared).

I'll be sure to share.


>
>> There are 200 developers working on a financial trading system, and each of
>> them has the opportunity to slip malicious code into the project. When the
>> final release is prepared, the project lead signs the tip commit, thus
>> signing the whole tree. Now it is discovered that someone did slip some
>> malicious code in.  How do you audit the system? Could higher levels of
>> individual accountability have discouraged this scenario?
>
> I like this example. It shows that signing a commit is not really
> meaningful by itself; you have to understand the semantics of that
> signature (and maybe they're included as comments in the tag object, or
> maybe it is assumed by your organization's workflow).
>
> In the case of the kernel, Linus signing a commit with a tag implicitly
> means "I think what is in this tree and everything before it is good, so
> you should feel comfortable using it" (or at least insofar as you trust
> Linus).
>
> But it doesn't have to be that way. Your project lead signing may mean
> "this is good and we should ship it". But developers signing commits may
> simply mean "I promise that I wrote the changes between this commit's
> tree and its parent". Those are all signatures of commits, but they mean
> very different things; the key is adding metadata to know which is
> which.
>
>> I've seen it argued that a proper SSH setup and user management are the key.
>> These are good for security and access control, but not for some durable
>> form of accountability.
>
> Right. You are trusting the server's records, not cryptography. The main
> advantage is that it's efficient and easy to set up. :)

The main reason this doesn't work for me is that codebases are passed around
my organization like hand-me-down clothes. It is not unheard of to get the
entire repository for a critical application delivered from one shop to another
on CD. We need to be able to verify the integrity of a repository entirely
independently of any outside information.  The only centralized source of trust
in our organization is the certificate authority.

Now my big question to ponder: what do do when the CA expires a cert? Hmm...


>
>> It seems that creating a signed tag is the same as signing a commit.  There
>> are a few problems, though.  Tags don't provide a secure means of asserting
>> the type of signature being applied to the commit hash. That is - is the
>> hash signed because someone is claiming authorship? Because they are
>> asserting the integrity of the entire tree? Because they have reviewed the
>> code? Because they reviewed a certain subset of the tree? Of course there's
>> also the issue that tags live in a cluttered namespace. Signing a commit is
>> essentially a different thing from providing a name for a commit. Using tags
>> just to sign commits requires a glut of tag names.
>
> Again, metadata. Say what you mean in the free-form content of the tag.
> For the kernel, there is nothing to be said. Linus signing tags has a
> well-known meaning in the community. But in an organization signing for
> a lot of different reasons, you would want the signed data to say why it
> was signed.
>
>> I propose expanding the concept of tags, or alternately creating a new
>> concept which subsumes the existing tag concept. I'll call this new concept
>> a "sig" for the purposes of this discussion. The concept of a sig cross-cuts
>> the concept of a tag.
>>
>> A tag signs the commit hash. A sig signs a SHA1-based absolute commit
>> reference with a (possibly null) string concatenated to it. For instance, a
>> sig might sign the following string:
>
> A tag can already include arbitrary data.
>
> In fact, tags basically do what you want already; it's just that storing
> one tag ref per commit is going to be ugly. It might make sense to
> replace the ad-hoc gpg signatures I used in my examples above with tag
> objects, and then store the tag object in the notes tree.
>
>> "0b9deecf625677cf44058a42c2abd7add5167e81^0 author"
>> which would mean that the signor is claiming authorship of that individual
>> commit. (Suggestions for notating a single commit are welcome. "^0" seemed
>> natural.)
>
> See? You're defining metadata now. :)
>
>> * What on earth does it mean to tag a range of commits? With commit ranges
>> being siggable, and tags containing sigs, what does it mean to tag a range
>> of 10 commits, for instance? Is that desirable? Does it make any sense
>> whatsoever? Does it hurt anything if it happens?
>
> It's slightly more efficient. If I wrote 10 commits, I can either sign
> each individually saying "I wrote this", or I can make a single
> signature showing them all. The tradeoff is that parsing and verifying
> metadata becomes a lot more complex. But crytographically speaking, a
> range is not ambiguous;
>
>> * Performance? I think it would be extremely quick to verify a bunch of
>> sigs, but I don't know. Maybe I'm not thinking clearly about it.
>> Fortunately, sigs can be ignored entirely and need not affect things.
>
> Compared to usual git operations, no, it's not quick. But you don't have
> to verify all the time. You can verify commits when they enter your
> repo, or when you're interested in some aspect of them, or when you
> suspect something fishy is going on. You don't have to do it on every
> rev-list.

Good point. I had thought it would be something to see every time I run
git-log, but I suppose it makes perfect sense to do this thing in the nightlies
or some other rarer occasion.

Thanks,

Richard Peterson

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tagging Commits] feedback / discussion request
  2011-05-05 15:39   ` Richard Peterson
@ 2011-05-05 18:49     ` Sverre Rabbelier
  2011-05-05 20:30     ` Jeff King
  1 sibling, 0 replies; 7+ messages in thread
From: Sverre Rabbelier @ 2011-05-05 18:49 UTC (permalink / raw)
  To: Richard Peterson; +Cc: Jeff King, git

Heya,

On Thu, May 5, 2011 at 17:39, Richard Peterson <richard@rcpeterson.com> wrote:
> Now my big question to ponder: what do do when the CA expires a cert? Hmm...

You could re-sign the commits with the new cert, notes are mutable,
and they keep history too. So you could just create a commit on the
notes history ref "re-sign commits for expired cert", optionally
removing the old signature. The hook verifying that no-one is
tampering with the notes might get complex if you do that kind of
stuff though (might be easier to just append the new signature and
keep the old one in place).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tagging Commits] feedback / discussion request
  2011-05-05 15:39   ` Richard Peterson
  2011-05-05 18:49     ` Sverre Rabbelier
@ 2011-05-05 20:30     ` Jeff King
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff King @ 2011-05-05 20:30 UTC (permalink / raw)
  To: Richard Peterson; +Cc: git

On Thu, May 05, 2011 at 11:39:41AM -0400, Richard Peterson wrote:

> >  3. In git notes. You can do something like:
> >
> >       (git rev-parse --verify HEAD
> >        echo "I reviewed this"
> >       ) | gpg --sign -a |
> >       git notes add -F - HEAD
> >
> >     though you'd probably want to be a little more complex, and handle
> >     lists of signed notes for each commit. And you may want to store
> >     these in a separate notes ref from the default one.
> 
> I had looked at this option, but had failed to see the usefulness of using
> a different ref. I was worried about cluttering things up, overloading the
> intended purpose of notes, and so forth. I wasn't really sure if notes were
> intended to be general purpose storage for systematic, structured data.

They are definitely intended to be general purpose storage. For one such
(ab)use, see the textconv-caching subsystem. It maps binary blobs into
their converted text counterparts. So we are keying on blobs (not
commits!), and storing arbitrarily gigantic data in the notes values.
And the nice thing is that because notes use git objects for storage, we
get all the usual delta compression benefits on the result.

> My inclination was to do this outside notes, or even in a parallel
> implementation to notes, factoring out the common parts. I suppose that
> looking at notes as somewhat of a free-for-all obviates this need. Is this
> really what notes are for?

Yep. Definitely use notes if you are going to do the storage in git.

> >     The advantage of notes are that they are designed for lots of
> >     per-commit storage, and can be accessed fairly efficiently.
> 
> That was my other concern about notes - performance. Not sure how
> notes are stored, but I certainly trust you that they're efficient.

Each notes tree is stored as a git tree full of entries representing the
commit (or other object) hashes. And each entry points to a blob which
is the note's value. And then as the notes change over time, we version
them with commit objects. So you can make notes and your coworker can
make notes, and you can merge them together.

For fun, you can do:

  # make a repo to play around in
  git clone /path/to/some/repo notes-test
  cd notes-test

  # make some notes. You could also use "notes add -F"
  # to add notes with arbitrary binary content.
  git notes --ref=foo add -m "this is note 1" HEAD
  git notes --ref=foo add -m "this is note 2" HEAD^

  # check them out in context
  git log --show-notes=foo -2

  # and then see how they're stored
  git checkout refs/notes/foo
  grep . *

> > Developers would have to make a note and push their notes tree first,
> 
> You mean for hook / verification purposes? Or is there some underlying
> reason to push notes first?

Yeah, for a pre-receive hook. You need to first tell the server "here
are some signatures" by pushing the notes, and then it can verify those
signatures when trying to put commits on actual branches.

> I'll be sure to share.

Great. I look forward to seeing the result.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-05-05 20:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-03 23:36 [Tagging Commits] feedback / discussion request Richard Peterson
2011-05-03 23:49 ` Sverre Rabbelier
2011-05-04  9:21   ` Michael J Gruber
2011-05-04  8:42 ` Jeff King
2011-05-05 15:39   ` Richard Peterson
2011-05-05 18:49     ` Sverre Rabbelier
2011-05-05 20:30     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).