Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Theodore Tso <tytso@mit.edu>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Simon Richter <Simon.Richter@hogyros.de>,
	git <git@vger.kernel.org>,
	Ian Jackson <ijackson@chiark.greenend.org.uk>
Subject: Re: Git generated tarballs and Debian
Date: Wed, 29 Apr 2026 03:30:02 -0400	[thread overview]
Message-ID: <20260429073002.GA717507@coredump.intra.peff.net> (raw)
In-Reply-To: <20260428115017.GA71700@macsyma-wired.lan>

On Tue, Apr 28, 2026 at 07:50:17AM -0400, Theodore Tso wrote:

> I know that in the past, using --format=tgz has broken based on
> different compression parameters used by git (and whether it used an
> external or internal compressor).  I also know that if $commit is a
> tree-id, this can result in the timestamps being not reproduible.  I
> also don't use export-subst.
> 
> There is also the difference in the prefix used by github and gitlab,
> but that's arguably not git's fault.
> 
> What other gotchas are there?  How is this likely to be inconsistent
> in the future?  How much work is there to provide that guarantee in
> the future?

The biggest unexpected change I recall was caused by a bug/compatibility
fix. 22f0dcd963 (archive-tar: split long paths more carefully,
2013-01-05) changed how some long paths were represented to be more
compatible between GNU tar and NetBSD. Lots of Homebrew recipes, etc,
were broken when GitHub deployed a version of Git with that commit.

I think there was a more recent one in 2023-ish caused by some
gzip-related changes (but it was after my time and I don't know the
details).

I feel like there was one in the middle, too, but I'm having trouble
digging it up (I think GitHub reverted 22f0dcd963 at the time and
finally reinstated it in 2017 after a warning period, so that might be
what I'm thinking of).

But I'm not sure how often we'd do fixes like that. Not a lot, as the
tar code is pretty stable. But is 82a46af13e (archive-tar: fix pax
extended header length calculation, 2019-08-17), for example, likely to
have changed hashes for some repos? Probably.

So I think if you really want byte-for-byte compatibility of git-archive
you have to cement the behavior, bugs and all, behind some kind of
version flag, and every possible behavior change has to be analyzed for
a potential version bump.

Though breaking some obscure cases once every 5-10 years is maybe not
_so_ bad, and we can live with it. ;)

-Peff

      parent reply	other threads:[~2026-04-29  7:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  8:40 Git generated tarballs and Debian Simon Richter
2026-04-28 10:25 ` brian m. carlson
2026-04-28 11:32   ` Simon Richter
2026-04-28 11:50   ` Theodore Tso
2026-04-28 21:20     ` brian m. carlson
2026-04-29  7:30     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260429073002.GA717507@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Simon.Richter@hogyros.de \
    --cc=git@vger.kernel.org \
    --cc=ijackson@chiark.greenend.org.uk \
    --cc=sandals@crustytoothpaste.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox