git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Verifiable git archives?
@ 2014-01-09  3:10 Andy Lutomirski
  2014-01-09 19:26 ` Stefan Beller
  2014-01-09 20:11 ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2014-01-09  3:10 UTC (permalink / raw)
  To: git

It's possible, in principle, to shove enough metadata into the output
of 'git archive' to allow anyone to verify (without cloning the repo)
to verify that the archive is a correct copy of a given commit.  Would
this be considered a useful feature?

Presumably there would be a 'git untar' command that would report
failure if it fails to verify the archive contents.

This could be as simple as including copies of the commit object and
all relevant tree objects and checking all of the hashes when
untarring.

(Even better: allow subsets of the repository to be archived and
verified as well.)

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-09  3:10 Verifiable git archives? Andy Lutomirski
@ 2014-01-09 19:26 ` Stefan Beller
  2014-01-09 20:11 ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Stefan Beller @ 2014-01-09 19:26 UTC (permalink / raw)
  To: Andy Lutomirski, git

On 09.01.2014 04:10, Andy Lutomirski wrote:
> It's possible, in principle, to shove enough metadata into the output
> of 'git archive' to allow anyone to verify (without cloning the repo)
> to verify that the archive is a correct copy of a given commit.  Would
> this be considered a useful feature?
> 

Do you know git bundles?


> Presumably there would be a 'git untar' command that would report
> failure if it fails to verify the archive contents.
> 
> This could be as simple as including copies of the commit object and
> all relevant tree objects and checking all of the hashes when
> untarring.
> 

I thought the git archive rather had the purpose of creating plain
archives not polluted with any gitish stuff.

> (Even better: allow subsets of the repository to be archived and
> verified as well.)

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-09  3:10 Verifiable git archives? Andy Lutomirski
  2014-01-09 19:26 ` Stefan Beller
@ 2014-01-09 20:11 ` Junio C Hamano
  2014-01-09 20:51   ` Andy Lutomirski
  2014-01-19  0:35   ` Michael Haggerty
  1 sibling, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2014-01-09 20:11 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: git

Andy Lutomirski <luto@amacapital.net> writes:

> It's possible, in principle, to shove enough metadata into the output
> of 'git archive' to allow anyone to verify (without cloning the repo)
> to verify that the archive is a correct copy of a given commit.  Would
> this be considered a useful feature?
>
> Presumably there would be a 'git untar' command that would report
> failure if it fails to verify the archive contents.
>
> This could be as simple as including copies of the commit object and
> all relevant tree objects and checking all of the hashes when
> untarring.

You only need the object name of the top-level tree.  After "untar"
the archive into an empty directory, make it a new repository and
"git add . && git write-tree"---the result should match the
top-level tree the archive was supposed to contain.

Of course, you can write "git verify-archive" that does the same
computation all in-core, without actually extracting the archive
into an empty directory.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-09 20:11 ` Junio C Hamano
@ 2014-01-09 20:51   ` Andy Lutomirski
  2014-01-09 22:46     ` Junio C Hamano
  2014-01-19  0:35   ` Michael Haggerty
  1 sibling, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2014-01-09 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jan 9, 2014 at 12:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> It's possible, in principle, to shove enough metadata into the output
>> of 'git archive' to allow anyone to verify (without cloning the repo)
>> to verify that the archive is a correct copy of a given commit.  Would
>> this be considered a useful feature?
>>
>> Presumably there would be a 'git untar' command that would report
>> failure if it fails to verify the archive contents.
>>
>> This could be as simple as including copies of the commit object and
>> all relevant tree objects and checking all of the hashes when
>> untarring.
>
> You only need the object name of the top-level tree.  After "untar"
> the archive into an empty directory, make it a new repository and
> "git add . && git write-tree"---the result should match the
> top-level tree the archive was supposed to contain.

Hmm.  I didn't realize that there was enough metadata in the 'git
archive' output to reproduce the final tree.  If I can make it work,
would you accept a patch to add another extended pax header containing
the commit object and the top-level tree hash to the 'git archive'
tarball output?

>
> Of course, you can write "git verify-archive" that does the same
> computation all in-core, without actually extracting the archive
> into an empty directory.

Hmm.  I'll play with this.

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-09 20:51   ` Andy Lutomirski
@ 2014-01-09 22:46     ` Junio C Hamano
  2014-01-09 22:50       ` Andy Lutomirski
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2014-01-09 22:46 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: git

Andy Lutomirski <luto@amacapital.net> writes:

>> You only need the object name of the top-level tree.  After "untar"
>> the archive into an empty directory, make it a new repository and
>> "git add . && git write-tree"---the result should match the
>> top-level tree the archive was supposed to contain.
>
> Hmm.  I didn't realize that there was enough metadata in the 'git
> archive' output to reproduce the final tree.

We do record the commit object name in the extended header when
writing a tar archive already, but you have to grab the commit
object from somewhere in order to read the top-level tree object
name, which we do not record.

Also, if you used keyword substitution and such when creating an
archive, then the filesystem entities resulting from expanding it
would not match the original.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-09 22:46     ` Junio C Hamano
@ 2014-01-09 22:50       ` Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2014-01-09 22:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jan 9, 2014 at 2:46 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>>> You only need the object name of the top-level tree.  After "untar"
>>> the archive into an empty directory, make it a new repository and
>>> "git add . && git write-tree"---the result should match the
>>> top-level tree the archive was supposed to contain.
>>
>> Hmm.  I didn't realize that there was enough metadata in the 'git
>> archive' output to reproduce the final tree.
>
> We do record the commit object name in the extended header when
> writing a tar archive already, but you have to grab the commit
> object from somewhere in order to read the top-level tree object
> name, which we do not record.

This could be changed :)

>
> Also, if you used keyword substitution and such when creating an
> archive, then the filesystem entities resulting from expanding it
> would not match the original.
>

In the simple case, you'd need to have an archive with no prefix or
funny business (or at least a known prefix).  In the fancy case, you
could at least verify that all the file contents really came from git,
but then you'd really need the tree objects.

The use case I have in mind is for projects to distribute archives but
only need to sign the tagged git commit id.  I think this should be
doable without too much pain.  (This assumes that the release doesn't
contain autogen output and such.)

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-09 20:11 ` Junio C Hamano
  2014-01-09 20:51   ` Andy Lutomirski
@ 2014-01-19  0:35   ` Michael Haggerty
  2014-01-21 19:38     ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Haggerty @ 2014-01-19  0:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andy Lutomirski, git

On 01/09/2014 09:11 PM, Junio C Hamano wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> 
>> It's possible, in principle, to shove enough metadata into the output
>> of 'git archive' to allow anyone to verify (without cloning the repo)
>> to verify that the archive is a correct copy of a given commit.  Would
>> this be considered a useful feature?
>>
>> Presumably there would be a 'git untar' command that would report
>> failure if it fails to verify the archive contents.
>>
>> This could be as simple as including copies of the commit object and
>> all relevant tree objects and checking all of the hashes when
>> untarring.
> 
> You only need the object name of the top-level tree.  After "untar"
> the archive into an empty directory, make it a new repository and
> "git add . && git write-tree"---the result should match the
> top-level tree the archive was supposed to contain.
> [...]

This wouldn't work if any files were excluded from the archive using
gitattribute "export-ignore" (or "export-subst", which you already
mentioned in a follow-up email).

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-19  0:35   ` Michael Haggerty
@ 2014-01-21 19:38     ` Junio C Hamano
  2014-01-25 21:56       ` Andy Lutomirski
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2014-01-21 19:38 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Andy Lutomirski, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 01/09/2014 09:11 PM, Junio C Hamano wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>> 
>>> It's possible, in principle, to shove enough metadata into the output
>>> of 'git archive' to allow anyone to verify (without cloning the repo)
>>> to verify that the archive is a correct copy of a given commit.  Would
>>> this be considered a useful feature?
>>>
>>> Presumably there would be a 'git untar' command that would report
>>> failure if it fails to verify the archive contents.
>>>
>>> This could be as simple as including copies of the commit object and
>>> all relevant tree objects and checking all of the hashes when
>>> untarring.
>> 
>> You only need the object name of the top-level tree.  After "untar"
>> the archive into an empty directory, make it a new repository and
>> "git add . && git write-tree"---the result should match the
>> top-level tree the archive was supposed to contain.
>> [...]
>
> This wouldn't work if any files were excluded from the archive using
> gitattribute "export-ignore" (or "export-subst", which you already
> mentioned in a follow-up email).

Correct.  By "and such" below, I meant any and all futzing that
makes the resulting working tree different from the tree object
being archived ;-)  That includes the line-ending configuration
and other things as well.

    Also, if you used keyword substitution and such when creating an
    archive, then the filesystem entities resulting from expanding
    it would not match the original.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Verifiable git archives?
  2014-01-21 19:38     ` Junio C Hamano
@ 2014-01-25 21:56       ` Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2014-01-25 21:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, git

[-- Attachment #1: Type: text/plain, Size: 642 bytes --]

Here's a rather hackish implementation of the write side.  Any
thoughts on the format?  (Obviously the implementation needs work.
For example, it needs to be optional.

Thoughts so far:
 - I want to put the value of "prefix" into an extended header.
 - Should blobs have their sha1 hashes in an extended header?  Pros:
it makes figuring out substitutions easier.  Cons: it adds 512 bytes
per file.
 - I want to support tags as roots.
 - I (or someone) need to write a verifier / verified unpacker.  Does
git accept Python code?

This thing is tested in the sense that GNU tar unpacks its output
without any warnings or other fanfare.

--Andy

[-- Attachment #2: verifiable_archive.patch --]
[-- Type: text/x-patch, Size: 3683 bytes --]

diff --git a/archive-tar.c b/archive-tar.c
index 719b629..c6bf7e4 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -2,6 +2,8 @@
  * Copyright (c) 2005, 2006 Rene Scharfe
  */
 #include "cache.h"
+#include "tree.h"
+#include "object.h"
 #include "tar.h"
 #include "archive.h"
 #include "streaming.h"
@@ -200,6 +202,74 @@ static int write_extended_header(struct archiver_args *args,
 	return 0;
 }
 
+/*
+ * A GIT-SCM object header is a global extended header that embeds a single
+ * git object.  This object serves a purpose described by the "purpose"
+ * field.  Valid purposes include:
+ *
+ *  - "root" -- an object that, by itself, in conjunction with other roots,
+ *    or in conjunction with external data, identifies a root to use to
+ *    verify this archive.
+ *  - "vrfy" -- an object that can be use to prove that the contents
+ *    of this archive are as described.
+ *
+ * There's one basic rule to observe: every "vrfy" object must hash to
+ * a SHA-1 that matches something described in a "root", another "vrfy" object,
+ * or something typed in by a user decoding the archive.
+ *
+ * (Of course, if you want the archive to be usefully verifiable, all of the
+ *  non-GIT-SCM contents should also be attributable to an appropriate
+ *  "vrfy" object.)
+ *
+ * The fields are:
+ *  GIT-SCM.obj.purpose: the purpose of the embedded object
+ *  GIT-SCM.obj.sha1: the sha1 of the embedded object
+ *  GIT-SCM.obj.type: the type of the embedded object
+ *  GIT-SCM.obj.data: the data in the embedded object
+ *
+ * The block header is intentionally unspecified, except that it must
+ * have typeflag 'g'.  (This is to allow some flexibility in trying to
+ * preserve compatibility with old tar implementations.)
+ */
+static int write_gitscm_obj_header(struct archiver_args *args,
+				   const char *purpose,
+				   const unsigned char *sha1)
+{
+	struct strbuf ext_header = STRBUF_INIT;
+	struct ustar_header header;
+	unsigned int mode;
+	enum object_type type;
+	unsigned long size;
+	void *buffer;
+	const char *typestr;
+	int err = 0;
+
+	strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.purpose",
+				 purpose, strlen(purpose));
+	strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.sha1",
+				 sha1_to_hex(sha1), 40);
+
+	buffer = read_sha1_file(sha1, &type, &size);
+	typestr = typename(type);
+
+	strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.type",
+				 typestr, strlen(typestr));
+	strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.data",
+				 buffer, size);
+	free(buffer);
+	buffer = NULL;
+
+	memset(&header, 0, sizeof(header));
+	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
+	mode = 0100666;
+	strcpy(header.name, "pax_global_header");
+	prepare_header(args, &header, mode, ext_header.len);
+	write_blocked(&header, sizeof(header));
+	write_blocked(ext_header.buf, ext_header.len);
+	strbuf_release(&ext_header);
+	return err;
+}
+
 static int write_tar_entry(struct archiver_args *args,
 			   const unsigned char *sha1,
 			   const char *path, size_t pathlen,
@@ -212,6 +282,10 @@ static int write_tar_entry(struct archiver_args *args,
 	void *buffer;
 	int err = 0;
 
+	if (S_ISDIR(mode)) {
+		write_gitscm_obj_header(args, "vrfy", sha1);
+	}
+
 	memset(&header, 0, sizeof(header));
 
 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
@@ -384,8 +458,11 @@ static int write_tar_archive(const struct archiver *ar,
 
 	if (args->commit_sha1)
 		err = write_global_extended_header(args);
-	if (!err)
+	if (!err) {
+		write_gitscm_obj_header(args, "root", args->commit_sha1);
+		write_gitscm_obj_header(args, "vrfy", args->tree->object.sha1);
 		err = write_archive_entries(args, write_tar_entry);
+	}
 	if (!err)
 		write_trailer();
 	return err;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-01-25 21:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-09  3:10 Verifiable git archives? Andy Lutomirski
2014-01-09 19:26 ` Stefan Beller
2014-01-09 20:11 ` Junio C Hamano
2014-01-09 20:51   ` Andy Lutomirski
2014-01-09 22:46     ` Junio C Hamano
2014-01-09 22:50       ` Andy Lutomirski
2014-01-19  0:35   ` Michael Haggerty
2014-01-21 19:38     ` Junio C Hamano
2014-01-25 21:56       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).