From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Mike Hommey <mh@glandium.org>
Subject: Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
Date: Mon, 4 Mar 2024 07:44:32 +0100 [thread overview]
Message-ID: <ZeVt0CAfpYFZqT2i@tanuki> (raw)
In-Reply-To: <xmqqle75ppa3.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 6140 bytes --]
On Tue, Feb 27, 2024 at 01:31:48PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > Instead, fix this issue by partially initializing the refdb up to the
> > point where it becomes discoverable by Git commands.
>
> As you mentioned earlier, this indeed is ugly, but I agree that
> there is no other reasonable way out. I am also unsure if this is a
> viable workaround in the longer term, but it should do for now.
>
> > + /*
> > + * We have a chicken-and-egg situation between initializing the refdb
> > + * and spawning transport helpers:
> > + *
> > + * - Initializing the refdb requires us to know about the object
> > + * format. We thus have to spawn the transport helper to learn
> > + * about it.
> > + * - The transport helper may want to access the Git repository. But
> > + * because the refdb has not been initialized, we don't have "HEAD"
> > + * or "refs/". Thus, the helper cannot find the Git repository.
> > + *
> > + * Ideally, we would have structured the helper protocol such that it's
> > + * mandatory for the helper to first announce its capabilities without
> > + * yet assuming a fully initialized repository. Like that, we could
> > + * have added a "lazy-refdb-init" capability that announces whether the
> > + * helper is ready to handle not-yet-initialized refdbs. If any helper
> > + * didn't support them, we would have fully initialized the refdb with
> > + * the SHA1 object format, but later on bailed out if we found out that
> > + * the remote repository used a different object format.
> > + * But we didn't, and thus we use the following workaround to partially
> > + * initialize the repository's refdb such that it can be discovered by
> > + * Git commands. To do so, we:
> > + *
> > + * - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
> > + *
> > + * - Create the "refs/" directory.
> > + *
> > + * - Set up the ref storage format and repository version as
> > + * required.
>
> "as required" by whom and in what way?
>
> "The code to recognize a repository requires them to be set already,
> but they do not have to be the real value---we just assign random
> valid values for now, let remote helper do its work and then set the
> real values after they are done" would be a plausible interpretation
> of the above. Is that what is going on?
Partially. While we cannot yet determine the object format, we do know
the ref storage format as it is specified by the caller when invoking
git-clone(1) with the `--ref-format=` switch, not by the remote repo.
In that case, we'd have to set up the ref format accordingly and also
set the repository version to "1".
Patrick
> > + * This is sufficient for Git commands to discover the Git directory.
> > + */
> > + initialize_repository_version(GIT_HASH_UNKNOWN,
> > + the_repository->ref_storage_format, 1);
>
> OK, so we start with "we do not know it yet" here.
>
> > + strbuf_addf(&buf, "%s/HEAD", git_dir);
> > + write_file(buf.buf, "ref: refs/heads/.invalid");
> > +
> > + strbuf_reset(&buf);
> > + strbuf_addf(&buf, "%s/refs", git_dir);
> > + safe_create_dir(buf.buf, 1);
> > +
> > /*
> > * additional config can be injected with -c, make sure it's included
> > * after init_db, which clears the entire config environment.
> > @@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> > free(remote_name);
> > strbuf_release(&reflog_msg);
> > strbuf_release(&branch_top);
> > + strbuf_release(&buf);
> > strbuf_release(&key);
> > free_refs(mapped_refs);
> > free_refs(remote_head_points_at);
> > diff --git a/setup.c b/setup.c
> > index b69b1cbc2a..e3b76e84b5 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
> > char repo_version_string[10];
> > int repo_version = GIT_REPO_VERSION;
> >
> > + /*
> > + * Note that we initialize the repository version to 1 when the ref
> > + * storage format is unknown. This is on purpose so that we can add the
> > + * correct object format to the config during git-clone(1). The format
> > + * version will get adjusted by git-clone(1) once it has learned about
> > + * the remote repository's format.
> > + */
> > if (hash_algo != GIT_HASH_SHA1 ||
> > ref_storage_format != REF_STORAGE_FORMAT_FILES)
> > repo_version = GIT_REPO_VERSION_READ;
> > @@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
> > "%d", repo_version);
> > git_config_set("core.repositoryformatversion", repo_version_string);
> >
> > - if (hash_algo != GIT_HASH_SHA1)
> > + if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
> > git_config_set("extensions.objectformat",
> > hash_algos[hash_algo].name);
> > else if (reinit)
>
> Here we refrain from writing extensions.objectformat in the
> repository when we do not know what hash function is being used.
> Without this change, we would instead remove extensions.objectformat,
> which does not sound too bad, as "reinit" is passed by the new call
> we saw above and this "else if (reinit)" will do exactly that anyway.
>
> In any case, after we finish talking with the other side over the
> transport, we make another call to initialize_repository_version()
> with the real objectformat and everything will be fine.
>
> The ealier description of the implementation idea made it sound
> really yucky, but I do not think the solution presented here is too
> bad.
>
> > diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
> > index 1544d6dc6b..bcfb358c51 100755
> > --- a/t/t5801/git-remote-testgit
> > +++ b/t/t5801/git-remote-testgit
> > @@ -12,6 +12,11 @@ url=$2
> >
> > dir="$GIT_DIR/testgit/$alias"
> >
> > +if ! git rev-parse --is-inside-git-dir
> > +then
> > + exit 1
> > +fi
> > +
> > h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
> > t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-03-04 6:44 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-27 14:27 [PATCH 0/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
2024-02-28 11:32 ` Karthik Nayak
2024-03-04 6:44 ` Patrick Steinhardt
2024-03-04 16:28 ` Junio C Hamano
2024-03-05 11:43 ` Patrick Steinhardt
2024-03-05 15:59 ` Junio C Hamano
2024-03-06 11:17 ` [PATCH] t0610: remove unused variable assignment Patrick Steinhardt
2024-03-01 1:20 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Justin Tobler
2024-03-04 6:44 ` Patrick Steinhardt
2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
2024-02-27 21:31 ` Junio C Hamano
2024-03-04 6:44 ` Patrick Steinhardt [this message]
2024-03-04 16:17 ` Junio C Hamano
2024-05-03 2:04 ` Mike Hommey
2024-02-27 20:58 ` [PATCH 0/2] " Junio C Hamano
2024-02-27 21:33 ` Junio C Hamano
2024-03-04 6:44 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZeVt0CAfpYFZqT2i@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mh@glandium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).