* Verifiable git archives? @ 2014-01-09 3:10 Andy Lutomirski 2014-01-09 19:26 ` Stefan Beller 2014-01-09 20:11 ` Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Andy Lutomirski @ 2014-01-09 3:10 UTC (permalink / raw) To: git It's possible, in principle, to shove enough metadata into the output of 'git archive' to allow anyone to verify (without cloning the repo) to verify that the archive is a correct copy of a given commit. Would this be considered a useful feature? Presumably there would be a 'git untar' command that would report failure if it fails to verify the archive contents. This could be as simple as including copies of the commit object and all relevant tree objects and checking all of the hashes when untarring. (Even better: allow subsets of the repository to be archived and verified as well.) --Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-09 3:10 Verifiable git archives? Andy Lutomirski @ 2014-01-09 19:26 ` Stefan Beller 2014-01-09 20:11 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Stefan Beller @ 2014-01-09 19:26 UTC (permalink / raw) To: Andy Lutomirski, git On 09.01.2014 04:10, Andy Lutomirski wrote: > It's possible, in principle, to shove enough metadata into the output > of 'git archive' to allow anyone to verify (without cloning the repo) > to verify that the archive is a correct copy of a given commit. Would > this be considered a useful feature? > Do you know git bundles? > Presumably there would be a 'git untar' command that would report > failure if it fails to verify the archive contents. > > This could be as simple as including copies of the commit object and > all relevant tree objects and checking all of the hashes when > untarring. > I thought the git archive rather had the purpose of creating plain archives not polluted with any gitish stuff. > (Even better: allow subsets of the repository to be archived and > verified as well.) Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-09 3:10 Verifiable git archives? Andy Lutomirski 2014-01-09 19:26 ` Stefan Beller @ 2014-01-09 20:11 ` Junio C Hamano 2014-01-09 20:51 ` Andy Lutomirski 2014-01-19 0:35 ` Michael Haggerty 1 sibling, 2 replies; 9+ messages in thread From: Junio C Hamano @ 2014-01-09 20:11 UTC (permalink / raw) To: Andy Lutomirski; +Cc: git Andy Lutomirski <luto@amacapital.net> writes: > It's possible, in principle, to shove enough metadata into the output > of 'git archive' to allow anyone to verify (without cloning the repo) > to verify that the archive is a correct copy of a given commit. Would > this be considered a useful feature? > > Presumably there would be a 'git untar' command that would report > failure if it fails to verify the archive contents. > > This could be as simple as including copies of the commit object and > all relevant tree objects and checking all of the hashes when > untarring. You only need the object name of the top-level tree. After "untar" the archive into an empty directory, make it a new repository and "git add . && git write-tree"---the result should match the top-level tree the archive was supposed to contain. Of course, you can write "git verify-archive" that does the same computation all in-core, without actually extracting the archive into an empty directory. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-09 20:11 ` Junio C Hamano @ 2014-01-09 20:51 ` Andy Lutomirski 2014-01-09 22:46 ` Junio C Hamano 2014-01-19 0:35 ` Michael Haggerty 1 sibling, 1 reply; 9+ messages in thread From: Andy Lutomirski @ 2014-01-09 20:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, Jan 9, 2014 at 12:11 PM, Junio C Hamano <gitster@pobox.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> It's possible, in principle, to shove enough metadata into the output >> of 'git archive' to allow anyone to verify (without cloning the repo) >> to verify that the archive is a correct copy of a given commit. Would >> this be considered a useful feature? >> >> Presumably there would be a 'git untar' command that would report >> failure if it fails to verify the archive contents. >> >> This could be as simple as including copies of the commit object and >> all relevant tree objects and checking all of the hashes when >> untarring. > > You only need the object name of the top-level tree. After "untar" > the archive into an empty directory, make it a new repository and > "git add . && git write-tree"---the result should match the > top-level tree the archive was supposed to contain. Hmm. I didn't realize that there was enough metadata in the 'git archive' output to reproduce the final tree. If I can make it work, would you accept a patch to add another extended pax header containing the commit object and the top-level tree hash to the 'git archive' tarball output? > > Of course, you can write "git verify-archive" that does the same > computation all in-core, without actually extracting the archive > into an empty directory. Hmm. I'll play with this. --Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-09 20:51 ` Andy Lutomirski @ 2014-01-09 22:46 ` Junio C Hamano 2014-01-09 22:50 ` Andy Lutomirski 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2014-01-09 22:46 UTC (permalink / raw) To: Andy Lutomirski; +Cc: git Andy Lutomirski <luto@amacapital.net> writes: >> You only need the object name of the top-level tree. After "untar" >> the archive into an empty directory, make it a new repository and >> "git add . && git write-tree"---the result should match the >> top-level tree the archive was supposed to contain. > > Hmm. I didn't realize that there was enough metadata in the 'git > archive' output to reproduce the final tree. We do record the commit object name in the extended header when writing a tar archive already, but you have to grab the commit object from somewhere in order to read the top-level tree object name, which we do not record. Also, if you used keyword substitution and such when creating an archive, then the filesystem entities resulting from expanding it would not match the original. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-09 22:46 ` Junio C Hamano @ 2014-01-09 22:50 ` Andy Lutomirski 0 siblings, 0 replies; 9+ messages in thread From: Andy Lutomirski @ 2014-01-09 22:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, Jan 9, 2014 at 2:46 PM, Junio C Hamano <gitster@pobox.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >>> You only need the object name of the top-level tree. After "untar" >>> the archive into an empty directory, make it a new repository and >>> "git add . && git write-tree"---the result should match the >>> top-level tree the archive was supposed to contain. >> >> Hmm. I didn't realize that there was enough metadata in the 'git >> archive' output to reproduce the final tree. > > We do record the commit object name in the extended header when > writing a tar archive already, but you have to grab the commit > object from somewhere in order to read the top-level tree object > name, which we do not record. This could be changed :) > > Also, if you used keyword substitution and such when creating an > archive, then the filesystem entities resulting from expanding it > would not match the original. > In the simple case, you'd need to have an archive with no prefix or funny business (or at least a known prefix). In the fancy case, you could at least verify that all the file contents really came from git, but then you'd really need the tree objects. The use case I have in mind is for projects to distribute archives but only need to sign the tagged git commit id. I think this should be doable without too much pain. (This assumes that the release doesn't contain autogen output and such.) --Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-09 20:11 ` Junio C Hamano 2014-01-09 20:51 ` Andy Lutomirski @ 2014-01-19 0:35 ` Michael Haggerty 2014-01-21 19:38 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Michael Haggerty @ 2014-01-19 0:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Andy Lutomirski, git On 01/09/2014 09:11 PM, Junio C Hamano wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> It's possible, in principle, to shove enough metadata into the output >> of 'git archive' to allow anyone to verify (without cloning the repo) >> to verify that the archive is a correct copy of a given commit. Would >> this be considered a useful feature? >> >> Presumably there would be a 'git untar' command that would report >> failure if it fails to verify the archive contents. >> >> This could be as simple as including copies of the commit object and >> all relevant tree objects and checking all of the hashes when >> untarring. > > You only need the object name of the top-level tree. After "untar" > the archive into an empty directory, make it a new repository and > "git add . && git write-tree"---the result should match the > top-level tree the archive was supposed to contain. > [...] This wouldn't work if any files were excluded from the archive using gitattribute "export-ignore" (or "export-subst", which you already mentioned in a follow-up email). Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-19 0:35 ` Michael Haggerty @ 2014-01-21 19:38 ` Junio C Hamano 2014-01-25 21:56 ` Andy Lutomirski 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2014-01-21 19:38 UTC (permalink / raw) To: Michael Haggerty; +Cc: Andy Lutomirski, git Michael Haggerty <mhagger@alum.mit.edu> writes: > On 01/09/2014 09:11 PM, Junio C Hamano wrote: >> Andy Lutomirski <luto@amacapital.net> writes: >> >>> It's possible, in principle, to shove enough metadata into the output >>> of 'git archive' to allow anyone to verify (without cloning the repo) >>> to verify that the archive is a correct copy of a given commit. Would >>> this be considered a useful feature? >>> >>> Presumably there would be a 'git untar' command that would report >>> failure if it fails to verify the archive contents. >>> >>> This could be as simple as including copies of the commit object and >>> all relevant tree objects and checking all of the hashes when >>> untarring. >> >> You only need the object name of the top-level tree. After "untar" >> the archive into an empty directory, make it a new repository and >> "git add . && git write-tree"---the result should match the >> top-level tree the archive was supposed to contain. >> [...] > > This wouldn't work if any files were excluded from the archive using > gitattribute "export-ignore" (or "export-subst", which you already > mentioned in a follow-up email). Correct. By "and such" below, I meant any and all futzing that makes the resulting working tree different from the tree object being archived ;-) That includes the line-ending configuration and other things as well. Also, if you used keyword substitution and such when creating an archive, then the filesystem entities resulting from expanding it would not match the original. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Verifiable git archives? 2014-01-21 19:38 ` Junio C Hamano @ 2014-01-25 21:56 ` Andy Lutomirski 0 siblings, 0 replies; 9+ messages in thread From: Andy Lutomirski @ 2014-01-25 21:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael Haggerty, git [-- Attachment #1: Type: text/plain, Size: 642 bytes --] Here's a rather hackish implementation of the write side. Any thoughts on the format? (Obviously the implementation needs work. For example, it needs to be optional. Thoughts so far: - I want to put the value of "prefix" into an extended header. - Should blobs have their sha1 hashes in an extended header? Pros: it makes figuring out substitutions easier. Cons: it adds 512 bytes per file. - I want to support tags as roots. - I (or someone) need to write a verifier / verified unpacker. Does git accept Python code? This thing is tested in the sense that GNU tar unpacks its output without any warnings or other fanfare. --Andy [-- Attachment #2: verifiable_archive.patch --] [-- Type: text/x-patch, Size: 3683 bytes --] diff --git a/archive-tar.c b/archive-tar.c index 719b629..c6bf7e4 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -2,6 +2,8 @@ * Copyright (c) 2005, 2006 Rene Scharfe */ #include "cache.h" +#include "tree.h" +#include "object.h" #include "tar.h" #include "archive.h" #include "streaming.h" @@ -200,6 +202,74 @@ static int write_extended_header(struct archiver_args *args, return 0; } +/* + * A GIT-SCM object header is a global extended header that embeds a single + * git object. This object serves a purpose described by the "purpose" + * field. Valid purposes include: + * + * - "root" -- an object that, by itself, in conjunction with other roots, + * or in conjunction with external data, identifies a root to use to + * verify this archive. + * - "vrfy" -- an object that can be use to prove that the contents + * of this archive are as described. + * + * There's one basic rule to observe: every "vrfy" object must hash to + * a SHA-1 that matches something described in a "root", another "vrfy" object, + * or something typed in by a user decoding the archive. + * + * (Of course, if you want the archive to be usefully verifiable, all of the + * non-GIT-SCM contents should also be attributable to an appropriate + * "vrfy" object.) + * + * The fields are: + * GIT-SCM.obj.purpose: the purpose of the embedded object + * GIT-SCM.obj.sha1: the sha1 of the embedded object + * GIT-SCM.obj.type: the type of the embedded object + * GIT-SCM.obj.data: the data in the embedded object + * + * The block header is intentionally unspecified, except that it must + * have typeflag 'g'. (This is to allow some flexibility in trying to + * preserve compatibility with old tar implementations.) + */ +static int write_gitscm_obj_header(struct archiver_args *args, + const char *purpose, + const unsigned char *sha1) +{ + struct strbuf ext_header = STRBUF_INIT; + struct ustar_header header; + unsigned int mode; + enum object_type type; + unsigned long size; + void *buffer; + const char *typestr; + int err = 0; + + strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.purpose", + purpose, strlen(purpose)); + strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.sha1", + sha1_to_hex(sha1), 40); + + buffer = read_sha1_file(sha1, &type, &size); + typestr = typename(type); + + strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.type", + typestr, strlen(typestr)); + strbuf_append_ext_header(&ext_header, "GIT-SCM.obj.data", + buffer, size); + free(buffer); + buffer = NULL; + + memset(&header, 0, sizeof(header)); + *header.typeflag = TYPEFLAG_GLOBAL_HEADER; + mode = 0100666; + strcpy(header.name, "pax_global_header"); + prepare_header(args, &header, mode, ext_header.len); + write_blocked(&header, sizeof(header)); + write_blocked(ext_header.buf, ext_header.len); + strbuf_release(&ext_header); + return err; +} + static int write_tar_entry(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, @@ -212,6 +282,10 @@ static int write_tar_entry(struct archiver_args *args, void *buffer; int err = 0; + if (S_ISDIR(mode)) { + write_gitscm_obj_header(args, "vrfy", sha1); + } + memset(&header, 0, sizeof(header)); if (S_ISDIR(mode) || S_ISGITLINK(mode)) { @@ -384,8 +458,11 @@ static int write_tar_archive(const struct archiver *ar, if (args->commit_sha1) err = write_global_extended_header(args); - if (!err) + if (!err) { + write_gitscm_obj_header(args, "root", args->commit_sha1); + write_gitscm_obj_header(args, "vrfy", args->tree->object.sha1); err = write_archive_entries(args, write_tar_entry); + } if (!err) write_trailer(); return err; ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-01-25 21:57 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-09 3:10 Verifiable git archives? Andy Lutomirski 2014-01-09 19:26 ` Stefan Beller 2014-01-09 20:11 ` Junio C Hamano 2014-01-09 20:51 ` Andy Lutomirski 2014-01-09 22:46 ` Junio C Hamano 2014-01-09 22:50 ` Andy Lutomirski 2014-01-19 0:35 ` Michael Haggerty 2014-01-21 19:38 ` Junio C Hamano 2014-01-25 21:56 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).