Re: [PATCH v3] submodule: fetch missing objects from default remote

public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: Nasser Grainawi <nasser.grainawi@oss.qualcomm.com>
Cc: git@vger.kernel.org,  "D. Ben Knoble" <ben.knoble@gmail.com>,
	 Patrick Steinhardt <ps@pks.im>,
	 Jacob Keller <jacob.keller@gmail.com>
Subject: Re: [PATCH v3] submodule: fetch missing objects from default remote
Date: Thu, 22 Jan 2026 10:49:05 -0800	[thread overview]
Message-ID: <xmqq5x8to53y.fsf@gitster.g> (raw)
In-Reply-To: <20260122152722.866341-1-nasser.grainawi@oss.qualcomm.com> (Nasser Grainawi's message of "Thu, 22 Jan 2026 07:27:22 -0800")

Nasser Grainawi <nasser.grainawi@oss.qualcomm.com> writes:

> When be76c21282 (fetch: ensure submodule objects fetched, 2018-12-06)
> added support for fetching a missing submodule object by id, it
> hardcoded the remote name as "origin" and deferred anything more
> complicated for a later patch. Implement the NEEDSWORK item to remove
> the hardcoded assumption by adding and using a submodule helper subcmd
> 'get-default-remote'. Fixing this lets 'git fetch --recurse-submodules'
> succeed when the fetched commit(s) in the superproject trigger a
> submodule fetch, and that submodule's default remote name is not
> "origin".
>
> Add non-"origin" remote tests to t5526-fetch-submodules.sh and
> t5572-pull-submodule.sh demonstrating this works as expected and add
> dedicated tests for get-default-remote.
>
> Signed-off-by: Nasser Grainawi <nasser.grainawi@oss.qualcomm.com>
> Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
> ---

Thanks.  Jacob, this v3 is not exactly the same as v1 that you
reviewed (and range-diff relative to v2 does not capture what got
changed between the version you saw and this version), but I just
checked that they are "essentially identical" except for the
proposed log message.  Are you happy with having your Reviewed-by on
this version?

>  builtin/submodule--helper.c             |  38 +++++
>  submodule.c                             |  17 ++-
>  t/meson.build                           |   1 +
>  t/t5526-fetch-submodules.sh             |  52 +++++++
>  t/t5572-pull-submodule.sh               |  21 ++-
>  t/t7425-submodule-get-default-remote.sh | 186 ++++++++++++++++++++++++
>  6 files changed, 312 insertions(+), 3 deletions(-)
>  create mode 100755 t/t7425-submodule-get-default-remote.sh
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index d537ab087a..b180a24091 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -112,6 +112,43 @@ static int get_default_remote_submodule(const char *module_path, char **default_
>  	return 0;
>  }
>  
> +static int module_get_default_remote(int argc, const char **argv, const char *prefix,
> +				     struct repository *repo UNUSED)
> +{
> +	const char *path;
> +	char *resolved_path = NULL;
> +	char *default_remote = NULL;
> +	int code;
> +	struct option options[] = {
> +		OPT_END()
> +	};
> +	const char *const usage[] = {
> +		N_("git submodule--helper get-default-remote <path>"),
> +		NULL
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, options, usage, 0);
> +	if (argc != 1)
> +		usage_with_options(usage, options);

Hmph, I am not sure what is going on.  What are we getting out of
parse_options() here?  Would it be the same to see if we got
anything remaining on the command line by checking argc and call
usage_with_options() without calling parse_options(), or am I
missing something?

> +	path = argv[0];
> +	if (prefix && *prefix && !is_absolute_path(path)) {
> +		resolved_path = xstrfmt("%s%s", prefix, path);
> +		path = resolved_path;
> +	}
> +
> +	code = get_default_remote_submodule(path, &default_remote);
> +	if (code) {
> +		free(resolved_path);
> +		return code;
> +	}
> +
> +	printf("%s\n", default_remote);

Do we know that the value of default_remote has no funny bytes in
it, like newline?  In the end the name has to become part of
refs/remotes/<name>/HEAD that has to be a valid refname, so not
giving any facility to quote funny bytes and allowing the caller of
this helper to assume a LF terminated single line should be fine, so
I am guessing that the answer is yes, but I offhand do not know how
we know that we do not have to worry about such a situation in the
code path that begins with get_default_remote_submodule().

> diff --git a/submodule.c b/submodule.c
> index 40a5c6fb9d..6599657f34 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1706,6 +1706,8 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>  	if (spf->oid_fetch_tasks_nr) {
>  		struct fetch_task *task =
>  			spf->oid_fetch_tasks[spf->oid_fetch_tasks_nr - 1];
> +		struct child_process cp_remote = CHILD_PROCESS_INIT;
> +		struct strbuf remote_name = STRBUF_INIT;
>  		spf->oid_fetch_tasks_nr--;
>  
>  		child_process_init(cp);
> @@ -1719,8 +1721,19 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>  		strvec_pushf(&cp->args, "--submodule-prefix=%s%s/",
>  			     spf->prefix, task->sub->path);
>  
> -		/* NEEDSWORK: have get_default_remote from submodule--helper */
> -		strvec_push(&cp->args, "origin");
> +		cp_remote.git_cmd = 1;
> +		strvec_pushl(&cp_remote.args, "submodule--helper",
> +			     "get-default-remote", task->sub->path, NULL);
> +
> +		if (!capture_command(&cp_remote, &remote_name, 0)) {
> +			strbuf_trim_trailing_newline(&remote_name);
> +			strvec_push(&cp->args, remote_name.buf);
> +		} else {
> +			/* Fallback to "origin" if the helper fails */
> +			strvec_push(&cp->args, "origin");
> +		}
> +		strbuf_release(&remote_name);
> +
>  		oid_array_for_each_unique(task->commits,
>  					  append_oid_to_argv, &cp->args);

OK, this part is quite straight-forward.

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 5e566205ba..a5a273b392 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -929,6 +929,58 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
>  	)
>  '
>  
> +test_expect_success 'fetch --recurse-submodules works with custom remote names' '
> +	# depends on the previous test for setup
> +
> +	# Rename the remote in sub1 from "origin" to "custom_remote"
> +	git -C downstream/sub1 remote rename origin custom_remote &&
> +
> +	# Create new commits in the original submodules
> +	C=$(git -C submodule commit-tree -m "change outside refs/heads for custom remote" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/custom1 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	test_tick &&
> +
> +	D=$(git -C sub1 commit-tree -m "change outside refs/heads for custom remote" HEAD^{tree}) &&
> +	git -C sub1 update-ref refs/changes/custom2 $D &&
> +	git update-index --cacheinfo 160000 $D sub1 &&
> +
> +	git commit -m "updated submodules outside of refs/heads for custom remote" &&
> +	E=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/custom3 $E &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin refs/changes/custom3:refs/heads/my_other_branch &&
> +		git -C submodule cat-file -t $C &&
> +		git -C sub1 cat-file -t $D &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'

Hmph, these overly long lines are eyesore.  I wonder if we can do
something about them?

I also wonder if we want to make sure we are getting from the remote
that is given the custom name in a more direct way (instead of "we
see that our fetch succeeds, and because there is no other remote,
it must have gotten what is needed from the renamed one"), or is it
too much paranoia?

> +test_expect_success 'fetch new submodule commit on-demand in FETCH_HEAD from custom remote' '
> +	# depends on the previous test for setup
> +
> +	C=$(git -C submodule commit-tree -m "another change outside refs/heads for custom remote" HEAD^{tree}) &&
> +	git -C submodule update-ref refs/changes/custom4 $C &&
> +	git update-index --cacheinfo 160000 $C submodule &&
> +	test_tick &&
> +
> +	D=$(git -C sub1 commit-tree -m "another change outside refs/heads for custom remote" HEAD^{tree}) &&
> +	git -C sub1 update-ref refs/changes/custom5 $D &&
> +	git update-index --cacheinfo 160000 $D sub1 &&
> +
> +	git commit -m "updated submodules outside of refs/heads" &&
> +	E=$(git rev-parse HEAD) &&
> +	git update-ref refs/changes/custom6 $E &&
> +	(
> +		cd downstream &&
> +		git fetch --recurse-submodules origin refs/changes/custom6 &&
> +		git -C submodule cat-file -t $C &&
> +		git -C sub1 cat-file -t $D &&
> +		git checkout --recurse-submodules FETCH_HEAD
> +	)
> +'

Are we testing anything new in this test, compared to the previous
one?  Both update the submodule sub1 by adding a new ref under
refs/changes/ hierachy and have "git fetch --recurse-submodules"
follow the changes.

> diff --git a/t/t7425-submodule-get-default-remote.sh b/t/t7425-submodule-get-default-remote.sh
> new file mode 100755
> index 0000000000..b842af9a2d
> --- /dev/null
> +++ b/t/t7425-submodule-get-default-remote.sh
> ...
> +test_expect_success 'get-default-remote returns origin for initialized submodule' '
> +	(
> +		cd super &&
> +		git submodule update --init &&
> +		echo "origin" >expect &&
> +		git submodule--helper get-default-remote subpath >actual &&
> +		test_cmp expect actual
> +	)
> +'

OK, we have dedicated tests like these to ensure that when we say
"get default remote" we get what we expect, which is very good.

The "too much paranoia" question above is primarily about "are we
really saying get-default-remote properly at the place where it
matters?"  IOW, we guarantee that the lower layer works as expected
with t7425, but the way we validate that we call the lower layer
correctly felt a bit indirect.

Other than these, looking very good.

Thanks.

next prev parent reply	other threads:[~2026-01-22 18:49 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 21:36 [PATCH] Fetch missing submodule objects from default remote Nasser Grainawi
2026-01-13  2:17 ` Jacob Keller
2026-01-13 21:51 ` Ben Knoble
2026-01-13 22:41   ` Nasser Grainawi
2026-01-14  2:19     ` D. Ben Knoble
2026-01-14 14:10       ` Junio C Hamano
2026-01-14 21:22         ` Ben Knoble
2026-01-14 14:05   ` Junio C Hamano
2026-01-14 19:23     ` Nasser Grainawi
2026-01-14 19:48 ` [PATCH v2] submodule: fetch missing " Nasser Grainawi
2026-01-21  0:48   ` Nasser Grainawi
2026-01-22 15:27   ` [PATCH v3] " Nasser Grainawi
2026-01-22 18:49     ` Junio C Hamano [this message]
2026-01-22 20:16       ` Jacob Keller
2026-01-22 20:38         ` Junio C Hamano
2026-02-25 21:55         ` Junio C Hamano
2026-03-02 22:06           ` Jacob Keller
2026-02-27 18:29       ` Nasser Grainawi
2026-01-22 21:21     ` Junio C Hamano
2026-01-23 23:26     ` Junio C Hamano
2026-01-24  2:18       ` Junio C Hamano
2026-02-20 23:12         ` Junio C Hamano
2026-02-27 17:20           ` Nasser Grainawi
2026-03-01  2:53     ` [PATCH v4] " Nasser Grainawi
2026-03-02 22:09       ` Jacob Keller
2026-03-02 22:11         ` Junio C Hamano
2026-03-03  2:11       ` Junio C Hamano
2026-03-03 19:00         ` Nasser Grainawi
2026-03-03 19:26           ` Nasser Grainawi
2026-03-03 20:02             ` Junio C Hamano
2026-03-03 20:09       ` [PATCH v5] " Nasser Grainawi
2026-03-03 20:47         ` Ramsay Jones
2026-03-03 21:26           ` Junio C Hamano
2026-03-03 23:29             ` Nasser Grainawi
2026-03-03 23:40         ` [PATCH v6] " Nasser Grainawi
2026-03-09 23:40           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq5x8to53y.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=nasser.grainawi@oss.qualcomm.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox