git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Eli Schwartz" <eschwartz93@gmail.com>,
	"René Scharfe" <l.s.r@web.de>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	"Konstantin Ryabitsev" <konstantin@linuxfoundation.org>,
	"Michal Suchánek" <msuchanek@suse.de>,
	"Raymond E . Pasco" <ray@ameretat.dev>,
	demerphq <demerphq@gmail.com>, "Theodore Ts'o" <tytso@mit.edu>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH 6/9] archive: use "gzip -cn" for stability, not "git archive gzip"
Date: Thu,  2 Feb 2023 10:32:26 +0100	[thread overview]
Message-ID: <patch-6.9-34c7ce73099-20230202T093212Z-avarab@gmail.com> (raw)
In-Reply-To: <cover-0.9-00000000000-20230202T093212Z-avarab@gmail.com>

This reverts and amends [1] so that we don't use "git archive gzip" by
default, but only fall back on it when we cannot invoke "gzip".

As noted in the discussion at [2] that commit first released with
v2.38.0 caused widespread breakage in the wild: Hosting sites like
GitHub tend to offer a feature to download tagged releases as
archives, which are generated by some variant of "git archive
--format=tgz".

Downstream distributors then tend to (re-)download those archives
as-is, hardcoding their known hash their packaging systems. See [3],
[4] etc. for reports of those systems breaking in conjunction with
[1].

The reason for "why" is entirely missing from the commit message for
[1], but as seen in the question about that in [5] and reply at [6] at
the time it was to "avoid a run[time] dependency; the build/test
dependency remains.".

It's not immediately apparent what the second part of that is
referring to, as [1] also removed the "GZIP" prerequisite from some
tests. The answer is that we still have other tests that need "GZIP",
but those are invoking "gzip(1)" explicitly.

In any case, whatever promises we make in the future about the
stability and non-stability of "git archive" output (or the derived
compressed artifact), this amount of fallout isn't worth it to get to
the stated goal in [1].

Let's instead default to "gzip -cn" again, but if we can't find it
fall back on "git archive gzip". Note that we'll only fallback if that
"gzip -cn" is ours, not if it comes from the user's own
"tar.<format>.command" configuration.

If we do need the fallback we'll warn about it. No such warning will
be emitted if the user has explicitly asked for "git archive gzip".

1. 4f4be00d302 (archive-tar: use internal gzip by default, 2022-06-15)
2. https://lore.kernel.org/git/a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com/
3. https://github.com/Homebrew/homebrew-core/issues/121877
4. https://github.com/bazel-contrib/SIG-rules-authors/issues/11
5. https://lore.kernel.org/git/220615.86wndhwt9a.gmgdl@evledraar.gmail.com/
6. https://lore.kernel.org/git/3ed80afd-34b3-afd8-5ffb-0187a4475ee1@web.de/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/tar.txt |  8 ++++++--
 archive-tar.c                | 20 +++++++++++++++-----
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/Documentation/config/tar.txt b/Documentation/config/tar.txt
index 5456fc617a2..37f24baa73a 100644
--- a/Documentation/config/tar.txt
+++ b/Documentation/config/tar.txt
@@ -16,8 +16,12 @@ tar.<format>.command::
 	to the command (e.g., `-9`).
 +
 The `tar.gz` and `tgz` formats are defined automatically and use the
-magic command `git archive gzip` by default, which invokes an internal
-implementation of gzip.
+command `gzip -cn` by default. An internal gzip implementation can be
+used by specifying the value `git archive gzip`.
++
+If 'gzip -cn' cannot be executed we'll fall back on `git archive gzip`
+with a warning, if you don't have a gzip(1) and would like to use the
+internal `git archive gzip` without warning, configure it explicitly.
 +
 The automatically defined commands do not invoke the shell, avoiding
 the minor overhead of an extra sh(1) process.
diff --git a/archive-tar.c b/archive-tar.c
index dfc133deac7..26efb911ebc 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -464,6 +464,7 @@ static void tgz_write_block(const void *data)
 }
 
 static const char internal_gzip_command[] = "git archive gzip";
+static const char gzip_cn_command[] = "gzip -cn";
 
 static int gzip_internally(const struct archiver *ar,
 			   struct archiver_args *args)
@@ -494,12 +495,15 @@ static int write_tar_filter_archive(const struct archiver *ar,
 {
 	struct strbuf cmd = STRBUF_INIT;
 	struct child_process filter = CHILD_PROCESS_INIT;
+	int filter_is_gzip_cn = 0;
 	int r;
 
 	if (!ar->filter_command)
 		BUG("tar-filter archiver called with no filter defined");
 
-	if (!strcmp(ar->filter_command, internal_gzip_command))
+	if (!strcmp(ar->filter_command, gzip_cn_command))
+		filter_is_gzip_cn = 1;
+	else if (!strcmp(ar->filter_command, internal_gzip_command))
 		return gzip_internally(ar, args);
 
 	strbuf_addstr(&cmd, ar->filter_command);
@@ -515,8 +519,14 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	filter.in = -1;
 	filter.silent_exec_failure = 1;
 
-	if (start_command(&filter) < 0)
-		die_errno(_("unable to start '%s' filter"), cmd.buf);
+	if (start_command(&filter) < 0) {
+		if (!filter_is_gzip_cn)
+			die_errno(_("unable to start '%s' filter"), cmd.buf);
+
+		warning_errno(_("unable to start '%s' filter, falling back to '%s'"),
+			      cmd.buf, internal_gzip_command);
+		return gzip_internally(ar, args);
+	}
 	close(1);
 	if (dup2(filter.in, 1) < 0)
 		die_errno(_("unable to redirect descriptor"));
@@ -544,9 +554,9 @@ void init_tar_archiver(void)
 	int configured = 1;
 	register_archiver(&tar_archiver);
 
-	tar_filter_config("tar.tgz.command", internal_gzip_command, NULL);
+	tar_filter_config("tar.tgz.command", gzip_cn_command, NULL);
 	tar_filter_config("tar.tgz.remote", "true", NULL);
-	tar_filter_config("tar.tar.gz.command", internal_gzip_command, NULL);
+	tar_filter_config("tar.tar.gz.command", gzip_cn_command, NULL);
 	tar_filter_config("tar.tar.gz.remote", "true", NULL);
 	git_config(git_tar_config, &configured);
 	for (i = 0; i < nr_tar_filters; i++) {
-- 
2.39.1.1392.g63e6d408230


  parent reply	other threads:[~2023-02-02  9:32 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31  0:06 Stability of git-archive, breaking (?) the Github universe, and a possible solution Eli Schwartz
2023-01-31  7:49 ` Ævar Arnfjörð Bjarmason
2023-01-31  9:11   ` Eli Schwartz
2023-02-02  9:32   ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 1/9] archive & tar config docs: de-duplicate configuration section Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 2/9] git config docs: document "tar.<format>.{command,remote}" Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 3/9] archiver API: make the "flags" in "struct archiver" an enum Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 4/9] archive: omit the shell for built-in "command" filters Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 5/9] archive-tar.c: move internal gzip implementation to a function Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` Ævar Arnfjörð Bjarmason [this message]
2023-02-02  9:32     ` [PATCH 7/9] test-lib.sh: add a lazy GZIP prerequisite Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 8/9] archive tests: test for "gzip -cn" and "git archive gzip" stability Ævar Arnfjörð Bjarmason
2023-02-02  9:32     ` [PATCH 9/9] git archive docs: document output non-stability Ævar Arnfjörð Bjarmason
2023-02-02 10:25       ` brian m. carlson
2023-02-02 10:30         ` Ævar Arnfjörð Bjarmason
2023-02-02 16:34         ` Junio C Hamano
2023-02-04 17:46           ` brian m. carlson
2023-02-02 16:17     ` [PATCH 0/9] git archive: use gzip again by default, document output stabilty Phillip Wood
2023-02-02 16:40       ` Junio C Hamano
2023-02-03 13:49       ` Ævar Arnfjörð Bjarmason
2023-02-06 14:46         ` Phillip Wood
2023-02-03 15:47       ` Theodore Ts'o
2023-02-02 16:25     ` Junio C Hamano
2023-02-04 18:08       ` René Scharfe
2023-02-05 21:30         ` Ævar Arnfjörð Bjarmason
2023-02-12 17:41           ` René Scharfe
2023-02-02 19:23     ` Raymond E. Pasco
2023-02-03  8:06       ` [PATCH] archive: document output stability concerns Raymond E. Pasco
2023-01-31  9:54 ` Stability of git-archive, breaking (?) the Github universe, and a possible solution brian m. carlson
2023-01-31 11:31   ` Ævar Arnfjörð Bjarmason
2023-01-31 15:05   ` Konstantin Ryabitsev
2023-01-31 22:32     ` brian m. carlson
2023-02-01  9:40       ` Ævar Arnfjörð Bjarmason
2023-02-01 11:34         ` demerphq
2023-02-01 12:21           ` Michal Suchánek
2023-02-01 12:48             ` demerphq
2023-02-01 13:43               ` Ævar Arnfjörð Bjarmason
2023-02-01 15:21                 ` demerphq
2023-02-01 18:56                   ` Theodore Ts'o
2023-02-02 21:19                     ` Joey Hess
2023-02-03  4:02                       ` Theodore Ts'o
2023-02-03 13:32                         ` Ævar Arnfjörð Bjarmason
2023-02-01 23:16         ` brian m. carlson
2023-02-01 23:37           ` Junio C Hamano
2023-02-02 23:01             ` brian m. carlson
2023-02-02 23:47               ` rsbecker
2023-02-03 13:18                 ` Ævar Arnfjörð Bjarmason
2023-02-02  0:42           ` Ævar Arnfjörð Bjarmason
2023-02-01 12:17       ` Raymond E. Pasco
2023-01-31 15:56   ` Eli Schwartz
2023-01-31 16:20     ` Konstantin Ryabitsev
2023-01-31 16:34       ` Eli Schwartz
2023-01-31 20:34         ` Konstantin Ryabitsev
2023-01-31 20:45         ` Michal Suchánek
2023-02-01  1:33     ` brian m. carlson
2023-02-01 12:42   ` Ævar Arnfjörð Bjarmason
2023-02-01 23:18     ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patch-6.9-34c7ce73099-20230202T093212Z-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=demerphq@gmail.com \
    --cc=eschwartz93@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=l.s.r@web.de \
    --cc=msuchanek@suse.de \
    --cc=ray@ameretat.dev \
    --cc=sandals@crustytoothpaste.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).