git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH 0/7] clone: fix init of refdb with wrong object format
Date: Mon, 11 Dec 2023 12:34:38 +0100	[thread overview]
Message-ID: <ZXbzzlyWC3HTUyDA@tanuki> (raw)
In-Reply-To: <xmqq7clmn3w1.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3953 bytes --]

On Sat, Dec 09, 2023 at 07:16:30PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > While at it I noticed that this actually fixes a bug with bundle URIs
> > when the object formats diverge in this way.
> > ...
> > This patch series is actually the last incompatibility for the reftable
> > backend that I have found. All tests except for the files-backend
> > specific ones pass now with the current state I have at [1], which is
> > currently at e6f2f592b7 (t: skip tests which are incompatible with
> > reftable, 2023-11-24)
> 
> An existing test
> 
>     $ make && cd t && GIT_TEST_DEFAULT_HASH=sha256 sh t5550-http-fetch-dumb.sh
> 
> passes with vanilla Git 2.43, but with these patches applied, it
> fails the "7 - empty dumb HTTP" step.

Indeed -- now that the GitLab CI definitions have landed in your master
branch I can also see this failure as part of our CI job [1]. And with
the NixOS httpd refactorings I'm actually able to run these tests in the
first place. Really happy to see that things come together like this, as
it means that we'll detect such issues before we send the series to the
mailing list from now on.

Anyway, regarding the test itself. I think the refactorings in this
patch series uncover a preexisting and already-known issue with empty
repositories when using the dumb HTTP protocol: there is no proper way
to know about the hash function that the remote repository uses [2].

Before my refactorings we used to fall back to the local default hash
format with which we've already initialized the repository, which is
wrong. Now we use the hash format we detected via the remote, which we
cannot detect because the remote is empty and does not advertise the
hash function, so we fall back to SHA1 and thus also do the wrong thing.
The only correct thing here would be to use the actual hash function
that the remote repository uses, but we have no to do so. So we're kind
of stuck here and can only choose between two different wrong ways to
pick the hash function.

We can restore the previous wrong behaviour by honoring GIT_DEFAULT_HASH
in git-remote-curl(1) in the case where we do not have a repository set
up yet. So something similar in spirit to the following:

```
diff --git a/remote-curl.c b/remote-curl.c
index fc29757b65..7e97c9c2e9 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -27,6 +27,7 @@
 static struct remote *remote;
 /* always ends with a trailing slash */
 static struct strbuf url = STRBUF_INIT;
+static int nongit;
 
 struct options {
 	int verbosity;
@@ -275,8 +276,30 @@ static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
 {
 	const char *p = memchr(heads->buf, '\t', heads->len);
 	int algo;
-	if (!p)
-		return the_hash_algo;
+
+	if (!p) {
+		const char *env;
+
+		if (!nongit)
+			return the_hash_algo;
+
+		/*
+		 * In case the remote neither advertises the hash format nor
+		 * any references we have no way to detect the correct hash
+		 * format. We can thus only guess what the remote is using,
+		 * where the best guess is to fall back to the default hash.
+		 */
+		env = getenv("GIT_DEFAULT_HASH");
+		if (env) {
+			algo = hash_algo_by_name(env);
+			if (algo == GIT_HASH_UNKNOWN)
+				die(_("unknown hash algorithm '%s'"), env);
+		} else {
+			algo = GIT_HASH_SHA1;
+		}
+
+		return &hash_algos[algo];
+	}
 
 	algo = hash_algo_by_length((p - heads->buf) / 2);
 	if (algo == GIT_HASH_UNKNOWN)
@@ -1521,7 +1544,6 @@ static int stateless_connect(const char *service_name)
 int cmd_main(int argc, const char **argv)
 {
 	struct strbuf buf = STRBUF_INIT;
-	int nongit;
 	int ret = 1;
 
 	setup_git_directory_gently(&nongit);
```

+Cc brian, as he's the author of [2].

Patrick

[1]: https://gitlab.com/gitlab-org/git/-/jobs/5723052108
[2]: ac093d0790 (remote-curl: detect algorithm for dumb HTTP by size, 2020-06-19)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-12-11 11:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06 12:39 [PATCH 0/7] clone: fix init of refdb with wrong object format Patrick Steinhardt
2023-12-06 12:39 ` [PATCH 1/7] setup: extract function to create the refdb Patrick Steinhardt
2023-12-06 21:10   ` Karthik Nayak
2023-12-07  7:22     ` Patrick Steinhardt
2023-12-08 22:54       ` Junio C Hamano
2023-12-11 11:34         ` Patrick Steinhardt
2023-12-11 15:50       ` Karthik Nayak
2023-12-06 12:39 ` [PATCH 2/7] setup: allow skipping creation of " Patrick Steinhardt
2023-12-06 12:39 ` [PATCH 3/7] remote-curl: rediscover repository when fetching refs Patrick Steinhardt
2023-12-08 23:09   ` Junio C Hamano
2023-12-11 11:35     ` Patrick Steinhardt
2023-12-06 12:40 ` [PATCH 4/7] builtin/clone: fix bundle URIs with mismatching object formats Patrick Steinhardt
2023-12-06 21:13   ` Karthik Nayak
2023-12-07  7:22     ` Patrick Steinhardt
2023-12-08 23:11   ` Junio C Hamano
2023-12-06 12:40 ` [PATCH 5/7] builtin/clone: set up sparse checkout later Patrick Steinhardt
2023-12-06 12:40 ` [PATCH 6/7] builtin/clone: skip reading HEAD when retrieving remote Patrick Steinhardt
2023-12-06 12:40 ` [PATCH 7/7] builtin/clone: create the refdb with the correct object format Patrick Steinhardt
2023-12-06 13:09 ` [PATCH 0/7] clone: fix init of refdb with wrong " Patrick Steinhardt
2023-12-10  3:16 ` Junio C Hamano
2023-12-11 11:34   ` Patrick Steinhardt [this message]
2023-12-11 14:57     ` Junio C Hamano
2023-12-11 15:32       ` Patrick Steinhardt
2023-12-11 22:17         ` brian m. carlson
2023-12-12  7:00 ` [PATCH v2 " Patrick Steinhardt
2023-12-12  7:00   ` [PATCH v2 1/7] setup: extract function to create the refdb Patrick Steinhardt
2023-12-12  7:00   ` [PATCH v2 2/7] setup: allow skipping creation of " Patrick Steinhardt
2023-12-12  7:00   ` [PATCH v2 3/7] remote-curl: rediscover repository when fetching refs Patrick Steinhardt
2023-12-12  7:00   ` [PATCH v2 4/7] builtin/clone: fix bundle URIs with mismatching object formats Patrick Steinhardt
2023-12-12  7:00   ` [PATCH v2 5/7] builtin/clone: set up sparse checkout later Patrick Steinhardt
2023-12-12  7:01   ` [PATCH v2 6/7] builtin/clone: skip reading HEAD when retrieving remote Patrick Steinhardt
2023-12-12  7:01   ` [PATCH v2 7/7] builtin/clone: create the refdb with the correct object format Patrick Steinhardt
2023-12-12 19:20   ` [PATCH v2 0/7] clone: fix init of refdb with wrong " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXbzzlyWC3HTUyDA@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).