All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benson Muite <benson_muite@emailplus.org>
To: Jeff King <peff@peff.net>, Simon Richter <Simon.Richter@hogyros.de>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Mirror repositories for submodules
Date: Fri, 05 Jun 2026 07:54:50 +0300	[thread overview]
Message-ID: <87mrx9r3hh.fsf@emailplus.org> (raw)
In-Reply-To: <20260604061605.GA3194609@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:
>
>> Cloning from our server will, depending on what upstream uses, either a
>> relative URL (which will go to our server, but we have little control over
>> what the name part of the repository base URL is going to be), or an
>> absolute URL that instructs clients to pull from another place, which
>> conflicts with our goal to have a self-contained archive.
>> 
>> The idea posited earlier, to have a "repository identity" that remains the
>> same across forks and clones, is somewhat appealing, but the best idea I can
>> come up with is generating some kind of repository UUID, and adding a
>> symlink -- not a great design because it pollutes outside the repo:
>> 
>>     $ mkdir myproject
>>     $ cd myproject
>>     $ git init
>>     $ ls -l ..
>>     lrwxrwxrwx 1 simon simon   9 Jun  4 14:05
>> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>>     drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
>> 
>> On the other hand, this can be used to construct a stable relative submodule
>> URL.
>
> Here's a thought experiment. What if you put the UUID into a URL, like:
>
>   repoid://123456789.git
>
> Then your in-repo .gitconfig would point to that repo id and be
> consistent. Of course you need some way to tell Git how to retrieve
> repoid:// URLs. You could do so with a custom remote helper
> (git-remote-repoid), but presumably that helper is eventually going to
> end up going over one of the normal Git protocols.
>
> So we just need to tell Git how to resolve repo id URLs into concrete
> URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
> for example, you can add a submodule but convert it into a uuid like
> this:
>
>   $ git submodule add https://github.com/git/git.git
>   $ git config -f .gitmodules submodule.git.url
>   https://github.com/git/git.git
>   $ git config -f .gitmodules submodule.git.url repoid://123456789.git
>   $ git commit -am 'add submodule with magic repoid'
>
> Now if somebody else comes along and clones it naively, the repo uuid is
> not useful to git by itself:
>
>   $ git clone --recurse-submodules repo
>   Submodule 'git' (repoid://123456789.git) registered for path 'git'
>   Cloning into '/home/peff/tmp/repo/git'...
>   fatal: transport 'repoid' not allowed
>   fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed
>
> But imagine that "somehow" they have learned that 123456789.git can be
> found at some URL. You can do this:
>
>   git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
>       clone --recurse-submodules repo.git
>
> which would clone from the original URL. Or you could even imagine that
> they have a cache of repositories named by uuid, and then:
>
>   git -c url.https://my/cache/.insteadOf=repoid:// ...
>
> would rewrite all repoid://'s automatically.
>
> The use of "-c" here is mostly for illustration. It is a per-command
> config, so when you later try to update the submodule, you'd run into
> the same problem. Probably you'd want to stuff your mapping into on-disk
> config (either ~/.gitconfig, or if you have a lot of them, perhaps some
> file included from there).
>
> It would be nice if you could use "git clone -c" (note "-c" as an option
> to "clone", not to "git") to set a permanent per-repo config variable.
> But sadly the URL rewriting happens in the submodule repository, not the
> parent. So it has to be a per-user setting.
>
>
> Now, all of that said, do we still need uuids at all? If the canonical
> submodule name is https://github.com/git/git.git, then anybody can just
> rewrite that locally in the same way using url.*.insteadOf config. And I
> think this is a pretty standard way of using submodules. E.g., you might
> rewrite https:// into ssh:// if you prefer that protocol. Or point to a
> local server if it's faster for you.
>
> Which makes me wonder if I am missing something about the original
> request that started this thread. But it sounds to me like it is just
> asking for the existing URL-rewriting feature.
>

The  problem is that one might have multiple repositories, submodules
may themselves have submodules.  Typically a primary development
organization will have its own host, but may also have mirrors on other
services which maybe more convenient for others to use.  A recursive
clone could give upto 20 repositories not all of which are maintained by
the same organization.  URL-rewriting each of them can be inefficient,
especially when the upstream maintains the mirror repositories and can
indicate that in the source repositories.


> -Peff

  parent reply	other threads:[~2026-06-05  4:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01  6:11 Mirror repositories for submodules Benson Muite
2026-06-04  1:09 ` Junio C Hamano
2026-06-04  5:11   ` Simon Richter
2026-06-04  6:16     ` Jeff King
2026-06-04  9:27       ` Simon Richter
2026-06-05  4:54       ` Benson Muite [this message]
2026-06-05  4:47     ` Benson Muite
2026-06-05  9:34       ` Matt Hunter
2026-06-05  5:05     ` Benson Muite
2026-06-05 12:10       ` Simon Richter
2026-06-05  4:37   ` Benson Muite
2026-06-05  4:57   ` Benson Muite

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mrx9r3hh.fsf@emailplus.org \
    --to=benson_muite@emailplus.org \
    --cc=Simon.Richter@hogyros.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.