From: Simon Josefsson <simon@josefsson.org>
To: Kyle Lippincott <spectral@google.com>
Cc: git@vger.kernel.org
Subject: Re: Making bit-by-bit reproducible Git Bundles?
Date: Thu, 13 Mar 2025 08:59:36 +0100 [thread overview]
Message-ID: <87tt7xicnr.fsf@josefsson.org> (raw)
In-Reply-To: <CAO_smViryqTa1LfQSsPbBYcSvijs-UkYkHaot3CK1j=uiuEppQ@mail.gmail.com> (Kyle Lippincott's message of "Wed, 12 Mar 2025 20:09:03 -0700")
[-- Attachment #1: Type: text/plain, Size: 3066 bytes --]
Kyle Lippincott <spectral@google.com> writes:
>> Can anyone explain what is causing the irreproducibility? Running
>> diffoscope is not helpful, since the bundle is compressed and diffoscope
>> doesn't seem to know how to untangle it.
>
> Spent some time on this, and when I followed the instructions, the
> diffs were in the pack file portion of the bundle file, different
> "tree" objects were produced at different points in the pack file. But
> it produces identical bundles if I run `git bundle create` multiple
> times in the same clone. My guess is that the non-determinism is
> coming from the clone process being multi-threaded, meaning that the
> order things are created in the filesystem during the clone,
> presumably due to multithreading happening during the clone process,
> or maybe during gc? The contents of .git/objects/pack have different
> hashes across my two clones, and I haven't investigated why.
Yes, my perception is also that the reproducibility problems happens
during 'git clone'. Within the same git clone, it is no problem to
create a bit-by-bit reproducible git bundle. But if you work in two
different clones, I haven't been able to find any set of commands that
leads to identical results.
FWIW, some other ways to do the clone that I have tried but didn't get
to work (of course I may have made some mistake in my attempts):
# dumb protocol doesn't repack the objects
GIT_SMART_HTTP=0 git clone https://git.savannah.gnu.org/git/gnulib.git
# using rsync fetches .git identical as upstream
rsync -av git.savannah.gnu.org::git/gnulib.git/ gnulib
>> If this is not possible today, what do you think about changes to make
>> this work?
>
> What is your end goal with being able to reproduce the bundles?
Good question - I should have made that clear.
The end goal is for someone other than me as uploader of the gnulib git
bundle to be able re-create it bit-by-bit identical. This pursuit is in
the name of improved software security supply-chain security. Compare
efforts to make gzip and tarball files reproducible by others:
https://www.gnu.org/software/tar/manual/html_node/Reproducibility.html
https://www.gnu.org/software/gzip/manual/html_node/Environment.html
> Producing an identical bit-for-bit bundle might be doable by doing
> some form of sorting of the objects in the pack file, but this would
> only get us closer to bit-for-bit reproducibility *on the same machine
> and versions of everything*. There could be some changes to git, zlib,
> machine architecture, etc. that causes deterministic but different
> values to be produced. As an example, maybe future versions of zlib
> compress better, producing an equal result when decompressed, but a
> different compressed result.
That is an improvement compared to todays situation where nobody can
reproduce the git bundle at all. Being able to reproduce it using the
same environment (toolchain) is better. This is similar for
reproducible builds of binaries: typically you need to reproduce a
similar environment to get reproducible results.
/Simon
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1251 bytes --]
next prev parent reply other threads:[~2025-03-13 8:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 11:40 Making bit-by-bit reproducible Git Bundles? Simon Josefsson
2025-03-12 16:02 ` Junio C Hamano
2025-03-13 3:09 ` Kyle Lippincott
2025-03-13 7:59 ` Simon Josefsson [this message]
2025-03-13 5:15 ` Jeff King
2025-03-13 13:36 ` Junio C Hamano
2025-03-13 20:16 ` Simon Josefsson
2025-03-13 21:07 ` Kyle Lippincott
2025-03-13 22:09 ` Junio C Hamano
2025-03-14 2:42 ` Jeff King
2025-03-14 22:24 ` rsbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tt7xicnr.fsf@josefsson.org \
--to=simon@josefsson.org \
--cc=git@vger.kernel.org \
--cc=spectral@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).