git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Josefsson <simon@josefsson.org>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Making bit-by-bit reproducible Git Bundles?
Date: Thu, 13 Mar 2025 21:16:34 +0100	[thread overview]
Message-ID: <87msdo1yal.fsf@josefsson.org> (raw)
In-Reply-To: <20250313051538.GA94015@coredump.intra.peff.net> (Jeff King's message of "Thu, 13 Mar 2025 01:15:38 -0400")

[-- Attachment #1: Type: text/plain, Size: 2987 bytes --]

Jeff King <peff@peff.net> writes:

>   [now without threading]
>   $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum
>   c897caf9c68d2c37d997d3973196886af3b0b46e  -
>
>   [and we can do it again. yay!]
>   $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum
>   c897caf9c68d2c37d997d3973196886af3b0b46e  -

That's the commands I use -- it doesn't lead to the same hash in two
different 'git clone's.  I tried running 'git clone' with the same '-c
pack.threads=1' but it made no difference.

>   2. There is no way to pass pack-objects options down through
>      git-bundle. So you'd have to either assemble the bundle yourself,
>      or perhaps generate a stable on-disk pack state, and then generate
>      the bundle. Perhaps something like:
>
>        # make one single pack, with no reuse, using the default options
>        git -c pack.threads=1 repack -adf

Yay!  You may have solved this for me.  I have to verify this a bit
more, but this looks promising (these are two different git clones):

jas@kaka:~/t/gnulib-1$ git -c pack.threads=1 repack -adf
jas@kaka:~/t/gnulib-1$ git -c 'pack.threads=1' bundle create gnulib.bundle --all
jas@kaka:~/t/gnulib-1$ sha256sum gnulib.bundle 
c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890  gnulib.bundle
jas@kaka:~/t/gnulib-1$ cd ../gnulib-2
jas@kaka:~/t/gnulib-2$ git -c pack.threads=1 repack -adf
jas@kaka:~/t/gnulib-2$ git -c 'pack.threads=1' bundle create gnulib.bundle --all
jas@kaka:~/t/gnulib-2$ sha256sum gnulib.bundle 
c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890  gnulib.bundle
jas@kaka:~/t/gnulib-2$ 

> So I think it's possible, but I doubt it's very ergonomic. You're
> probably better off using some checksum over Git's logical model, rather
> than the stored bytes. The obvious one is that a single Git commit hash
> unambiguously represents the whole tree and all of history leading up to
> it, because of the chains of hashes.
>
> But that implies you trust Git's object hash algorithm.

Right -- I think anything but bit-by-bit identical files is going to be
too complex to verify.

>   # print all commits in topological order, with ties broken by
>   # committer date, which should be stable. And then follow up with the
>   # trees and blobs for each.
>   git rev-list --topo-order --objects HEAD >objects
>
>   # now print the contents of each object (preceded by its name, type,
>   # and length, so there's no chance of weird prepending or appending
>   # attacks). We cut off the path information from rev-list here, since
>   # the ordered set of objects is all we care about.
>   cut -d' ' -f1 objects |
>   git cat-file --batch >content
>
>   # and then take a hash over that content; this will be unambiguous.
>   sha256sum <content

How to read this output?  Could this be made git bundle compatible?

But if the above is solves it, this part isn't necessary.

/Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1251 bytes --]

  parent reply	other threads:[~2025-03-13 20:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 11:40 Making bit-by-bit reproducible Git Bundles? Simon Josefsson
2025-03-12 16:02 ` Junio C Hamano
2025-03-13  3:09 ` Kyle Lippincott
2025-03-13  7:59   ` Simon Josefsson
2025-03-13  5:15 ` Jeff King
2025-03-13 13:36   ` Junio C Hamano
2025-03-13 20:16   ` Simon Josefsson [this message]
2025-03-13 21:07     ` Kyle Lippincott
2025-03-13 22:09       ` Junio C Hamano
2025-03-14  2:42     ` Jeff King
2025-03-14 22:24       ` rsbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87msdo1yal.fsf@josefsson.org \
    --to=simon@josefsson.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).