From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH 0/7] clone: fix init of refdb with wrong object format
Date: Mon, 11 Dec 2023 16:32:52 +0100 [thread overview]
Message-ID: <ZXcrpGQhH121AQ7y@tanuki> (raw)
In-Reply-To: <xmqqmsugvlbe.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 4580 bytes --]
On Mon, Dec 11, 2023 at 06:57:25AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> >> An existing test
> >>
> >> $ make && cd t && GIT_TEST_DEFAULT_HASH=sha256 sh t5550-http-fetch-dumb.sh
> >>
> >> passes with vanilla Git 2.43, but with these patches applied, it
> >> fails the "7 - empty dumb HTTP" step.
> > ...
> > Before my refactorings we used to fall back to the local default hash
> > format with which we've already initialized the repository, which is
> > wrong. Now we use the hash format we detected via the remote, which we
> > cannot detect because the remote is empty and does not advertise the
> > hash function, so we fall back to SHA1 and thus also do the wrong thing.
>
> Yeah, that is why I did *not* say "the series *breaks* existing
> tests". It triggers a failure, and in this case, a test failure
> does not mean the behaviour is broken because there is no correct
> answer. ... oh, wait. There isn't?
>
> I wonder if the right thing to do is to advertise the hash function
> even from an absolutely empty repository. There are no objects in
> such a repository, but it should already know what hash function to
> use when it creates its first object (determined at the repository
> creation time), so that _could_ be advertised.
For the smart HTTP and SSH protocols definitely, and we already do. But
it's a different story for dumb HTTP, unfortunately, where there is no
CGI-like thing sitting between the client and the repository's data.
> > The only correct thing here would be to use the actual hash function
> > that the remote repository uses, but we have no to do so.
>
> We have "no way to do so"? We have "not done so"?
We have not done so until now, and we have no easy way to change this on
the server-side as the server is not controlled by us in the first
place. That leaves two options I can think of:
- Try harder on the client-side, e.g. by trying to download the
gitconfig as you propose further down. I wonder whether admins would
typically block access to the config, but doubt they do.
- Change the format of `info/refs` to include the hash format, as this
file _is_ controlled by us on the server-side. Doesn't help though
in an empty repository, where the file is likely to never have been
generated in the first place.
So it seems like downloading the gitconfig is the only viable option
that I can think of right now.
It does increase the potential attack surface though because we would
start to unconditionally parse a config file from an untrusted source,
and we did hit issues in our config parser in the past already. You
could argue that we already parse untrusted configs via `.gitmodules`,
but these require opt-in to actually be used by anything if I'm not
mistaken.
So... I dunno.
> It is possible for the client side to download the $GIT_DIR/config
> file from the remote to learn what value extensions.objectFormat is
> in use over there instead, I think, but at the same time, I highly
> suspect that dumb HTTP outlived its usefulness to warrant such an
> additional investment of engineering resource.
Fair enough. All of this feels like an edge case (admin that uses dumb
HTTP) in an edge case (the cloned repository uses SHA256) in an edge
case (the remote repository is empty). Sure, SHA256 is likely to gain in
popularity eventually. But at the same time I'd expect that dumb HTTP
will become increasingly rare.
Taken together, chances for this to happen should be fairly low.
> The simplest "fix" might be to leave what happens in this narrow
> case (i.e. cloning over dumb HTTP from an empty repository)
> undefined by not testing (or not insisting on one particular
> outcome), but ...
I would be fine with that outcome, as well. It's not like the current
behaviour is correct in all cases either. The only benefit of that
behaviour is that a user can in fact work around the broken cases by
setting `GIT_HASH_DEFAULT` to the correct hash, and that benefit would
be retained by the diff I sent that made remote-curl.c aware of this
environment variable.
One additional solution would be to print a user-visible warning a la
"warning: failed to detect hash function of empty remote repository" and
then call it a day, potentially pointing out that a user can correct it
by re-cloning with `GIT_HASH_DEFAULT`. But the warning may not be
actionable by the user, because they may not know what hash function the
remote uses, either.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-12-11 15:32 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 12:39 [PATCH 0/7] clone: fix init of refdb with wrong object format Patrick Steinhardt
2023-12-06 12:39 ` [PATCH 1/7] setup: extract function to create the refdb Patrick Steinhardt
2023-12-06 21:10 ` Karthik Nayak
2023-12-07 7:22 ` Patrick Steinhardt
2023-12-08 22:54 ` Junio C Hamano
2023-12-11 11:34 ` Patrick Steinhardt
2023-12-11 15:50 ` Karthik Nayak
2023-12-06 12:39 ` [PATCH 2/7] setup: allow skipping creation of " Patrick Steinhardt
2023-12-06 12:39 ` [PATCH 3/7] remote-curl: rediscover repository when fetching refs Patrick Steinhardt
2023-12-08 23:09 ` Junio C Hamano
2023-12-11 11:35 ` Patrick Steinhardt
2023-12-06 12:40 ` [PATCH 4/7] builtin/clone: fix bundle URIs with mismatching object formats Patrick Steinhardt
2023-12-06 21:13 ` Karthik Nayak
2023-12-07 7:22 ` Patrick Steinhardt
2023-12-08 23:11 ` Junio C Hamano
2023-12-06 12:40 ` [PATCH 5/7] builtin/clone: set up sparse checkout later Patrick Steinhardt
2023-12-06 12:40 ` [PATCH 6/7] builtin/clone: skip reading HEAD when retrieving remote Patrick Steinhardt
2023-12-06 12:40 ` [PATCH 7/7] builtin/clone: create the refdb with the correct object format Patrick Steinhardt
2023-12-06 13:09 ` [PATCH 0/7] clone: fix init of refdb with wrong " Patrick Steinhardt
2023-12-10 3:16 ` Junio C Hamano
2023-12-11 11:34 ` Patrick Steinhardt
2023-12-11 14:57 ` Junio C Hamano
2023-12-11 15:32 ` Patrick Steinhardt [this message]
2023-12-11 22:17 ` brian m. carlson
2023-12-12 7:00 ` [PATCH v2 " Patrick Steinhardt
2023-12-12 7:00 ` [PATCH v2 1/7] setup: extract function to create the refdb Patrick Steinhardt
2023-12-12 7:00 ` [PATCH v2 2/7] setup: allow skipping creation of " Patrick Steinhardt
2023-12-12 7:00 ` [PATCH v2 3/7] remote-curl: rediscover repository when fetching refs Patrick Steinhardt
2023-12-12 7:00 ` [PATCH v2 4/7] builtin/clone: fix bundle URIs with mismatching object formats Patrick Steinhardt
2023-12-12 7:00 ` [PATCH v2 5/7] builtin/clone: set up sparse checkout later Patrick Steinhardt
2023-12-12 7:01 ` [PATCH v2 6/7] builtin/clone: skip reading HEAD when retrieving remote Patrick Steinhardt
2023-12-12 7:01 ` [PATCH v2 7/7] builtin/clone: create the refdb with the correct object format Patrick Steinhardt
2023-12-12 19:20 ` [PATCH v2 0/7] clone: fix init of refdb with wrong " Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXcrpGQhH121AQ7y@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox