From: Simon Richter <Simon.Richter@hogyros.de>
To: git <git@vger.kernel.org>
Cc: Ian Jackson <ijackson@chiark.greenend.org.uk>
Subject: Git generated tarballs and Debian
Date: Tue, 28 Apr 2026 17:40:05 +0900 [thread overview]
Message-ID: <9030b26d-02ed-4452-b212-a69a4ff21e2d@hogyros.de> (raw)
Hi,
in Debian, we're shipping "original" tarballs for each software package,
and the Debian specific changes in a separate file.
Historically, this users could do a bitwise comparison of the original
tarball and the one in Debian to verify that these were unchanged.
With git, some authors have stopped releasing official tarballs, so
we're using git-archive a lot -- but this is reproducible only by
accident. GitHub also prepares some release tarballs that may or not be
bitwise identical to what git archive produces.
I've written a small tool that generates the tree checksum for a given
tarball (running inside a SECCOMP environment, not writing anything to
disk), that already goes a long way to make tarballs verifiable: one can
check whether that ID is the same as the one mentioned in a commit (and
the comment inside a git-archive generated tarball is helpful in finding
which commit).
The downsides of that are:
1. that you still need a copy of the commit to verify it, as it's not
included in the tarball.
We could add an ancillary file that contains the commit object (its
checksum being reproducible, and containing the tree checksum) and
possibly a signed tag object as well, so that is solvable inside Debian.
Another option would be to extend the git-archive format to include them
as a (longer) comment in the global pax header.
2. that it doesn't work for submodules
What we do currently is generate multiple archives with different
prefixes, and concatenate them using tar. That loses all the pax global
headers though, so commit information is lost. In addition, putting the
actual contents into a subdirectory instead of a commit reference means
that generating the tree object from the tarball contents means the
checksum does not match.
What we could do is generate multiple archives, and keep them separate,
but the Debian toolchain can only unpack additional archives into a
direct subdirectory of the main archive (e.g. "orig.tar.gz" gets
unpacked to "foo-1.0", then "orig-addon.tar.gz" gets unpacked into
"foo-1.0/addon"). We can fix _that_ with symlinks, but it gets more and
more hacky.
One thing we could do inside git here is add a method to create archives
that include submodules (that gets rid of the concatenation), but in
order for this to be easily verifiable, I still need to know where
submodules are and what their commit objects are (so I know the commit
checksum and can verify the tree checksum).
The goal is to extend what I can already do inside the Linux kernel:
$ git rev-parse HEAD
94dfcc4a99b0cece77e73dc3011284050f95da89
$ git rev-parse HEAD^{tree}
2d14d43ce9f062160262f4e4f162f5ff0ed91a5e
$ git archive --format=tar HEAD | git-treeof
Commit-Hint: 94dfcc4a99b0cece77e73dc3011284050f95da89
Tree-SHA1: 2d14d43ce9f062160262f4e4f162f5ff0ed91a5e
so the "Commit-Hint" can become a stronger statement "I have seen a
commit object with this checksum that actually refers to the correct
tree", and to allow this to work for repositories with submodules.
Does it make sense to extend git here to allow this, or should I try to
solve this entirely within Debian?
Simon
next reply other threads:[~2026-04-28 8:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 8:40 Simon Richter [this message]
2026-04-28 10:25 ` Git generated tarballs and Debian brian m. carlson
2026-04-28 11:32 ` Simon Richter
2026-04-28 11:50 ` Theodore Tso
2026-04-28 21:20 ` brian m. carlson
2026-04-29 7:30 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9030b26d-02ed-4452-b212-a69a4ff21e2d@hogyros.de \
--to=simon.richter@hogyros.de \
--cc=git@vger.kernel.org \
--cc=ijackson@chiark.greenend.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox