Git development
 help / color / mirror / Atom feed
* Git generated tarballs and Debian
@ 2026-04-28  8:40 Simon Richter
  2026-04-28 10:25 ` brian m. carlson
  0 siblings, 1 reply; 6+ messages in thread
From: Simon Richter @ 2026-04-28  8:40 UTC (permalink / raw)
  To: git; +Cc: Ian Jackson

Hi,

in Debian, we're shipping "original" tarballs for each software package, 
and the Debian specific changes in a separate file.

Historically, this users could do a bitwise comparison of the original 
tarball and the one in Debian to verify that these were unchanged.

With git, some authors have stopped releasing official tarballs, so 
we're using git-archive a lot -- but this is reproducible only by 
accident. GitHub also prepares some release tarballs that may or not be 
bitwise identical to what git archive produces.

I've written a small tool that generates the tree checksum for a given 
tarball (running inside a SECCOMP environment, not writing anything to 
disk), that already goes a long way to make tarballs verifiable: one can 
check whether that ID is the same as the one mentioned in a commit (and 
the comment inside a git-archive generated tarball is helpful in finding 
which commit).

The downsides of that are:

1. that you still need a copy of the commit to verify it, as it's not 
included in the tarball.

We could add an ancillary file that contains the commit object (its 
checksum being reproducible, and containing the tree checksum) and 
possibly a signed tag object as well, so that is solvable inside Debian.

Another option would be to extend the git-archive format to include them 
as a (longer) comment in the global pax header.

2. that it doesn't work for submodules

What we do currently is generate multiple archives with different 
prefixes, and concatenate them using tar. That loses all the pax global 
headers though, so commit information is lost. In addition, putting the 
actual contents into a subdirectory instead of a commit reference means 
that generating the tree object from the tarball contents means the 
checksum does not match.

What we could do is generate multiple archives, and keep them separate, 
but the Debian toolchain can only unpack additional archives into a 
direct subdirectory of the main archive (e.g. "orig.tar.gz" gets 
unpacked to "foo-1.0", then "orig-addon.tar.gz" gets unpacked into 
"foo-1.0/addon"). We can fix _that_ with symlinks, but it gets more and 
more hacky.

One thing we could do inside git here is add a method to create archives 
that include submodules (that gets rid of the concatenation), but in 
order for this to be easily verifiable, I still need to know where 
submodules are and what their commit objects are (so I know the commit 
checksum and can verify the tree checksum).

The goal is to extend what I can already do inside the Linux kernel:

$ git rev-parse HEAD
94dfcc4a99b0cece77e73dc3011284050f95da89
$ git rev-parse HEAD^{tree}
2d14d43ce9f062160262f4e4f162f5ff0ed91a5e
$ git archive --format=tar HEAD | git-treeof
Commit-Hint: 94dfcc4a99b0cece77e73dc3011284050f95da89
Tree-SHA1: 2d14d43ce9f062160262f4e4f162f5ff0ed91a5e

so the "Commit-Hint" can become a stronger statement "I have seen a 
commit object with this checksum that actually refers to the correct 
tree", and to allow this to work for repositories with submodules.

Does it make sense to extend git here to allow this, or should I try to 
solve this entirely within Debian?

    Simon

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-29  7:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28  8:40 Git generated tarballs and Debian Simon Richter
2026-04-28 10:25 ` brian m. carlson
2026-04-28 11:32   ` Simon Richter
2026-04-28 11:50   ` Theodore Tso
2026-04-28 21:20     ` brian m. carlson
2026-04-29  7:30     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox