From: Simon Josefsson <simon@josefsson.org>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Making bit-by-bit reproducible Git Bundles?
Date: Thu, 13 Mar 2025 21:16:34 +0100 [thread overview]
Message-ID: <87msdo1yal.fsf@josefsson.org> (raw)
In-Reply-To: <20250313051538.GA94015@coredump.intra.peff.net> (Jeff King's message of "Thu, 13 Mar 2025 01:15:38 -0400")
[-- Attachment #1: Type: text/plain, Size: 2987 bytes --]
Jeff King <peff@peff.net> writes:
> [now without threading]
> $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum
> c897caf9c68d2c37d997d3973196886af3b0b46e -
>
> [and we can do it again. yay!]
> $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum
> c897caf9c68d2c37d997d3973196886af3b0b46e -
That's the commands I use -- it doesn't lead to the same hash in two
different 'git clone's. I tried running 'git clone' with the same '-c
pack.threads=1' but it made no difference.
> 2. There is no way to pass pack-objects options down through
> git-bundle. So you'd have to either assemble the bundle yourself,
> or perhaps generate a stable on-disk pack state, and then generate
> the bundle. Perhaps something like:
>
> # make one single pack, with no reuse, using the default options
> git -c pack.threads=1 repack -adf
Yay! You may have solved this for me. I have to verify this a bit
more, but this looks promising (these are two different git clones):
jas@kaka:~/t/gnulib-1$ git -c pack.threads=1 repack -adf
jas@kaka:~/t/gnulib-1$ git -c 'pack.threads=1' bundle create gnulib.bundle --all
jas@kaka:~/t/gnulib-1$ sha256sum gnulib.bundle
c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890 gnulib.bundle
jas@kaka:~/t/gnulib-1$ cd ../gnulib-2
jas@kaka:~/t/gnulib-2$ git -c pack.threads=1 repack -adf
jas@kaka:~/t/gnulib-2$ git -c 'pack.threads=1' bundle create gnulib.bundle --all
jas@kaka:~/t/gnulib-2$ sha256sum gnulib.bundle
c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890 gnulib.bundle
jas@kaka:~/t/gnulib-2$
> So I think it's possible, but I doubt it's very ergonomic. You're
> probably better off using some checksum over Git's logical model, rather
> than the stored bytes. The obvious one is that a single Git commit hash
> unambiguously represents the whole tree and all of history leading up to
> it, because of the chains of hashes.
>
> But that implies you trust Git's object hash algorithm.
Right -- I think anything but bit-by-bit identical files is going to be
too complex to verify.
> # print all commits in topological order, with ties broken by
> # committer date, which should be stable. And then follow up with the
> # trees and blobs for each.
> git rev-list --topo-order --objects HEAD >objects
>
> # now print the contents of each object (preceded by its name, type,
> # and length, so there's no chance of weird prepending or appending
> # attacks). We cut off the path information from rev-list here, since
> # the ordered set of objects is all we care about.
> cut -d' ' -f1 objects |
> git cat-file --batch >content
>
> # and then take a hash over that content; this will be unambiguous.
> sha256sum <content
How to read this output? Could this be made git bundle compatible?
But if the above is solves it, this part isn't necessary.
/Simon
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1251 bytes --]
next prev parent reply other threads:[~2025-03-13 20:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 11:40 Making bit-by-bit reproducible Git Bundles? Simon Josefsson
2025-03-12 16:02 ` Junio C Hamano
2025-03-13 3:09 ` Kyle Lippincott
2025-03-13 7:59 ` Simon Josefsson
2025-03-13 5:15 ` Jeff King
2025-03-13 13:36 ` Junio C Hamano
2025-03-13 20:16 ` Simon Josefsson [this message]
2025-03-13 21:07 ` Kyle Lippincott
2025-03-13 22:09 ` Junio C Hamano
2025-03-14 2:42 ` Jeff King
2025-03-14 22:24 ` rsbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87msdo1yal.fsf@josefsson.org \
--to=simon@josefsson.org \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).